Where did the gene set come from?
Consensus gene predictions were built around several evidence sources. TIGR transcript assemblies were mapped on repeat-masked genome sequences, applying GenomeThreader with a splice site model of maize. Assemblies and ESTs of the following species were mapped: Allium cepa, Ananas comosus, Avena sativa, Brachypodium distachyon, Curcuma longa, Hordeum vulgare, Oryza sativa, Saccharum officinarum, Secale cereale, Sorghum bicolor, Sorghum halapense, Sorghum propinquum Triticum aestivum, Zea mays and Zingiber officinale. We also generated optimal spliced alignments (OSAs) as well as blastX alignments for a reference set of proteins consisting of the SWISSPROT database , the Arabidopsis (TAIR6), Saccharomyces cerevisiae and Rice (RAP2) proteomes. For each OSA, possible reading frames of size ³50 amino acids were collected as candidates for gene models. In addition, we identified gene models on repeat masked genomic sequences by ab initio methods (Fgenesh++, GeneID, GenomeScan/PASA). Next, w