What are some of the problems with stepwise regression?
All of this material is quoted from various e-mails that appeared on STAT-L/SCI.STAT.CONSULT in 1996. Thanks go to Ira Bernstein, Ronan Conroy, Frank Harrell for their detailed explanations and to Richard Ulrich who originally compiled these comments. I have done some very minor editing, (mostly adding and changing line breaks) but have tried to avoid any substantive changes to these well written explanations. Frank Harrell’s comments: Here are SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high. 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution. 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med). 4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem. 5. It gives biased regression coefficients that need shrinkage (