How do you select which sequences are of sufficient quality and length for reporting to GenBank?
Vector sequences are automatically trimmed off the “raw sequence” after base calling is performed. The trimmed sequences less than 100 bases are rejected (the well is listed as a failure), and sequences with a phred score <15 are also rejected (the well is listed as a failure). Our typical phred scores are ~35 - 43. A phred score of 20 means 1 base calling error is likely in every one hundred bases; a phred score of 30 means 1 base calling error is likely in every one thousand bases; and a phred score of 40 means 1 base calling error in every 10,000 bases. We are performing single pass, single strand sequencing. To insure that the "quality scores" are realistic, we hand-check some of those ESTs that match existing maize genes by performing a BLAST search against maize genes at GenBank. For ESTs from 100 - 600+ bases, identities are 97-100%. Some of the mismatches are likely to be true polymorphisms, and some are sequencing errors.