What are the population, sample, training set, design set, validation set, and test set?
There seems to be no term in the NN literature for the set of all cases that you want to be able to generalize to. Statisticians call this set the “population”. Neither is there a consistent term in the NN literature for the set of cases that are available for training and evaluating an NN. Statisticians call this set the “sample”. The sample is usually a subset of the population. (Neurobiologists mean something entirely different by “population,” apparently some collection of neurons, but I have never found out the exact meaning. I am going to continue to use “population” in the statistical sense until NN researchers reach a consensus on some other terms for “population” and “sample”; I suspect this will never happen.) In NN methodology, the sample is often subdivided into “training”, “validation”, and “test” sets. The distinctions among these subsets are crucial, but the terms “validation” and “test” sets are often confused. There is no book in the NN literature more authoritative th