What do the sequence IDs in a SAM-T06 a2m file mean?
Most of the sequence IDs in a SAM-T06 a2m file come from the IDs in the NR database. The sequence IDs may be modified by SAM to indicate the first and last sequence positions that matched the SAM-T06 HMM. For example in the following sequence ID taken from a SAM-T06 alignment, >gi|16080670|ref|NP_391498.1|_1:234 (NC_000964) similar to hypothetical proteins [Bacillus subtilis] gi|7450240|pir||G70067 conserved hypothetical protein ywqL – Bacillus subtilis gi|1894750|emb|CAB07450.1| (Z92952) product similar to E.coli YjaF protein [Bacillus subtilis] gi|2636142|emb|CAB15634.1| (Z99122) similar to hypothetical proteins [Bacillus subtilis] the original sequence name gi|16080670|ref|NP_391498.1| has had _1:234 appended to indicate that the SAM-T06 HMM for the alignment matched the sequence starting a sequence position 1 and ending at sequence position 234. • I found homologs with BLAST (or PSI-BLAST or FASTA) that are not reported by a SAM-T06 database search. Are they BLAST (or PSI-BLAST, FA