Is GCV the same as the Bayesian approach to optimizing alpha?
Cross-validation is undeniably a sensible pragmatic procedure for model comparison and for setting regularisation constants. But it is not the same as evaluating the evidence. Here is my argument for why the evidence and cross validation are fundamentally different: The evidence can be decomposed as follows if we want: (subsuming alpha, beta into H, and using my notation — data points are t_1..t_N) P( t_1…t_N | H ) = P( t_1 | H ) P( t_2 | t_1, H ) P( t_3 | t_1,t_2, H ) * … P( t_N | t_1..t_N-1, H ) — i.e. the product of all the predictive probabilities for each of the data points, using the model `trained’ on the previous data points. Now by arbitrary reordering of the points, I can expand the evidence in N! different such ways. The individual terms in the expansion will vary in value as I do this, but the overall product will stay the same. One way of representing this is to make a plot of the log predictive probability for each data point: | – log P( t_i | t_1…t_i-1, H ) |* |