How are Grail Exon Candidates predicted?
All potential splice sites within the sequence are examined by neural networks (start, AAG acceptor, YAG acceptor, or GT donor) and assigned scores. All possible candidates within a sequence are then examined. For each candidate, coding scores are calculated. These scores, along with the splice site scores and GC content information, are fed to the final neural net, which produces a final score for that exon. This entire process is “thresholded”, i.e. the splice sites must score sufficiently high to continue, then the coding, then finally the overall score. All candidates with a score above a preset threshold are maintained. The raw list of exons is then organized into clusters. Each cluster is filtered for repetitives. Candidates flagged as repetitive elements are eliminated. Next a strand resolution process is applied, wherein overlapping exons on opposite strands are examined and the lower scoring cluster (containing what we call “shadow exons”) is eliminated. The final list of exon