人工知能検定３級例題を公開してみた１

2022年11月27日 19:25

Bias, Variance and Regularization
Choose the incorrect statement about bias/variance below.

Bias and variance are a trade-off (as one becomes smaller, the other becomes larger).
Variance is the variance (scatter) of the model's predictions, and if this value is large, the model is over-trained.
Bias is the noise found among the data, which cannot be suppressed by machine learning methods.
Regularization has the effect of suppressing the variance and preventing overlearning by applying constraints to the parameters (weights) and the model during learning.

Model Validation Methods and Data Leakage
The accuracy of a forecasting model must be correctly evaluated. To this end, attention should be paid to the method of data partitioning during the data preparation phase. Which of the following specific examples regarding data partitioning methods is not consistent with the objective and the method?

1: We want to use the master data of gym customers to predict the number of quitters. The objective variable is "quit after 6 months"/"never quit," and the explanatory variable is "the proportion of the objective variable class in each age group. In the holdout validation, only the objective variables of the training data, excluding the data for validation, were used to create new features.
2: When performing simple cross-validation on time series data, future information is leaked to predict past objective variables. Since it is impossible to have information about the future when implementing in reality, we divided the data along the time series and performed cross-validation while maintaining the temporal relationship between the training data and the validation data.
3: We would like to predict the winners and losers using data from a table tennis world championship match. Since the names of each of the two players on the matchup card are important explanatory variables that represent the match, we used them as they are in the model without any preprocessing, such as masking. As a result, we determined that there was no need for cross-validation because the percentage of correct answers to the training data was 100%.
4: In the cross-validation of the multi-level classification task, stratified extraction was performed to prevent unevenness in the proportion of each class in each fold. Here, stratified extraction is a method to equalize the proportion of each class in each fold based on the assumption that the proportion of each class in the test data is approximately the same as the proportion of each class in the training data.

この記事が気に入ったらサポートをしてみませんか？