How do I do motif searches with Bioperl? Can I do “find all sequences that are 75% identical” to a given motif?
There are a number of approaches. Within Bioperl take a look at Bio::Tools::SeqPattern. Or, take a look at the TFBS package, at http://forkhead.cgb.ki.se/TFBS (Transcription Factor Binding Site). This Bioperl-compliant package specializes in pattern searching of nucleotide sequence using matrices. It’s also conceivable that the combination of Bioperl and Perl’s regular expressions could do the trick. You might also consider the CPAN module String::Approx (this module addresses the percent match query), but experienced users question whether its distance estimates are correct, the Unix agrep command is thought to be faster and more accurate. Finally, you could use EMBOSS, as discussed in the previous question (or you could use Pise to run EMBOSS applications). The relevant programs would be fuzzpro or fuzznuc.
Related Questions
- Given the price of petrol seems to be almost identical across the big oil companies, is that indicative of potential collusion, or do you have identical cost structures?
- How do I do motif searches with Bioperl? Can I do "find all sequences that are 75% identical" to a given motif?
- How many amino-acid sequences can fold to a given protein structure?