I found homologs with BLAST (or PSI-BLAST or FASTA) that are not reported by a SAM-T05 database search. Are they BLAST (or PSI-BLAST, FASTA) more sensitive than SAM-T05?
You mentioned that FASTA, BLAST, and PSI-BLAST found a high-scoring similar sequence that SAM-T05 did not find. This happens fairly often the most common causes are composition bias and large helices (particularly coiled-coils). The programs FASTA, BLAST, and PSI-BLAST can all be fooled into reporting very strong scores for sequences whose only similarity is that they both have long amphipathic helices. SAM-T05’s reverse-sequence-null model cancels this signal (as well as composition bias and length signals), resulting in a method with many fewer false positives. A few true positives are lost, but not too many. As an example, the leucine zipper 1ce0A gets only 25 sequences in the 1ce0A.t02.a2m alignment. The 19 PDB sequences in the alignment are all homologs (at least, similar structure and somewhat similar sequence). Other methods are likely to get almost any coiled-coil as a strong hit. This is an example of the reverse-sequence-null model removing a lot of trash (and possibly some g
Related Questions
- I found homologs with BLAST (or PSI-BLAST or FASTA) that are not reported by a SAM-T02 database search. Are they BLAST (or PSI-BLAST, FASTA) more sensitive than SAM-T02?
- How do I tell BLAST to search multiple database using Bio::Tools::Run::StandAloneBlast?
- How does Blast search to protein database performed?