How does a gene prediction program work?
The programs that try to predict genes look for what we call “splicing sites,” the sites that are used to splice one exon onto another exon in an RNA. We’ve got a little information about the nature of splicing sites. There [are] some consensus sequences, but it’s not an absolutely perfect thing. So, they also look in between those splicing sites to make sure that the sequence looks like it could encode a protein, that it doesn’t have any stop codons, [and also] that it’s about the right distribution of codons. It’s a delicate matter, because what it’s really doing is collecting statistical evidence about what tends to occur in genes-the exons of genes, the introns, the splicing sites-and run a statistical model over the DNA to say, “aha, this looks like a plausible gene model.” Of course life doesn’t do it that way [and] cells don’t run any statistical model. The cell is smart enough to know just where the gene is and where it starts and where it stops and how to splice it. So to find