Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What motivates the proportion of the data split?

April 26, 2017Data motivates proportion Split

0

Posted

What motivates the proportion of the data split?

1 Answer

0

Posted

The proportions training/validation/test are 10/1/100. The validation set size is purposely small. Hence, using the validation set performance as your performance prediction is probably not a good idea. The training set is ten times larger than the validation set, to encourage participants to devise strategies of cross-validation or other ways of using the training data to make performance predictions. The test set is 100 times larger than the validation set. Thus, the error bar of our estimate of your “generalization performance” based on test data predictions will be approximately an order of magnitude smaller than the validation error bar.