What is the relation between Bayesian inference for models and the `risk in Probability Theory (eg `Structural Risk Minimization)?
The relationship to the `Effective VC dimension’ is discussed in the last two pages of @INPROCEEDINGS{MacKay.nips4, KEY =”MacKay”, AUTHOR =”D. J. C. MacKay”, TITLE =”Bayesian Model Comparison and Backprop Nets”, BOOKTITLE =”Advances in Neural Information Processing Systems 4″, EDITOR =”J. E. Moody and S. J. Hanson and R. P. Lippmann”, PUBLISHER =”Morgan Kaufmann”, ADDRESS =”San Mateo, California”, YEAR =”1992″, PAGES =”839-846″} Criticisms of the evidence The standard Bayesian philosophy seems always to assume that there is a `true’ parameter out there and that the prior is non-zero at that value. What bothers me is that in cases where this is not necessarily the case, we may expect the evidence and the generalization error criterion to yield very different model selection criteria. This can be seen, for example, in recent results by Marion and Saad from Edinburgh University who solved a linear model exactly and found a mismatch between prediction error and evidence. My conclusion woul