What factors should be used to divide data into different groups?
This is often the most difficult question in any statistical analysis. One can always make an argument for saying that certain data are somehow “different,” and ought to be treated separately from other data. One of the major issues debated on this website is whether the counties of north Florida are “different” from the rest of Florida. Certain counties do appear to be different in terms of the difference between how people register and how they vote. However, in crossover, there is no persuasive evidence that they are in any way unusual. If data are obtained from a controlled experiment, one can propose as many factors as desired, increasing the number of data points as needed. When the experiment is uncontrolled, post hoc assignment of factors can be deceptive. After all, if there’s a finite chance of finding statistical significance by chance, if one invents enough factors, one is all but guaranteed of finding statistical significance where there is none.