How do the polya and promoter algorithms work?
The polyadenylation site recognizer looks for the sequence AATAAA. It then examines a 72-base region around the sequence using a simple Markov model and reports back a score for that site. If a polya site is within 5,000 bases of the stop codon of a gene and does not fall in an intron, then it is retained. At most one polya (the highest scoring, in the case of multiples) is assigned to each gene model. The promoter recognition system looks for a TATA or ATA and examines the region around these bases. In particular, the neural net is fed information on GC content, information on CAAT position and the surrounding area, GGGCGG position and the surrounding area, and ATG sequences and the surrounding areas, The neural net evaluates the scores and assigns a total score to the promoter. Again, the promoter must be within 5,000 bases of a start codon of a predicted GrailEXP gene model to be retained and cannot fall within an intron. At most one promoter element is assigned to each gene model.