How valid is it to estimate base quality by looking at alignments?
Depends on: a) what you are aligning against and b) what you are doing the alignment with When estimating error rates from alignments the key thing to bear in mind that what you want is the probability that a base is wrong. However what you actually get is the probability that a base is wrong given that the read it is in uniquely alignable to your reference. This in turn depends on several things. i. How sensitive your alignment program is. The ELAND program only detects alignments with at most two errors per fragment, therefore the noisier reads having three or more errors will be ignored, meaning that error rate estimates obtained from ELAND alignments underestimate the true error rate. On the other hand, ignoring this issue completely does enable you to make spurious claims about your platform’s error rate and still get published in PNAS. ii. The uniqueness of reads in your target. This in turn depends on your read length, the length of your reference sequence and how repetitive you