Why do models sometimes disagree with “obvious” exons from ESTs or homologous rice genes?
Two reasons. First, while annotation prediction programs does take homology information into account, they also adheres to an internal statistical model for what coding sequences in maize and related grasses “should” look like. So homology evidence may be “overriden” if it is inconsistent with expected codon usage, etc. A second and related problem is that ESTs are imperfect and sometimes grossly wrong, as they may include unspliced (retained) introns and/or genomic contamination of the cDNA library. By using a statistical model, gene predictors are able to reject such false data in some cases.