Why does the tree change when non-splitting variables are dropped?
A28. If a variable does not enter the tree as a primary node splitter, it may still play a important role in the tree as a surrogate splitter. If you have turned the displaying of surrogate splitters off, you will not see how these variables affect the tree but they will still be used internally by CART when applying the tree to data. The Variable Importance Table produced by CART ranks the variables in the tree by their importance, a statistic measuring how strongly a variable acts as a primary or surrogate splitter. Suppose a variable enters the tree as the top surrogate splitter in many nodes, but never as the primary splitter. If this variable is removed from the list of potential predictor variables and the tree is rebuilt, it will probably be a very different tree, and certainly will be if there are missing values in the data for the primary node-splitting variables. Steinberg, Dan and Colla, Phillip. CART–Classification and Regression Trees. San Diego, CA: Salford Systems, 1997