What is the difference between models that predict and models that explain?
Statistical models used for prediction can contain many more proxy variables than models used for explanation. In other words, it is not as important to measure the underlying causal variables in a predictive model, since the intent is not primarily to explain the relationships, but instead predict a response. There are usually practical constraints that motivate the selection of proxy variables, such as financial, personnel, and equipment costs associated with capturing the underlying true causal variable. Because predictive models can be estimated using proxy variables and often satisfy the purpose of the model, more care must be taken to interpret the model results, and statements inferring causality should be avoided. In addition, the external validity of predictive models is more at risk when proxy variables are employed, since proxy variables may work well in one region but not in another, and proxy variables may not be consistently applied over time.