What is a scoring matrix?
The following explanation was edited from a contribution by Amelie Stein. The aim of a sequence alignment, is to match “the most similar elements” of two sequences. This similarity must be evaluated somehow. For example, consider the following two alignments: (a) AIWQH AL-QH (b) AIWQH A-LQH They seem quite similar: both contain one “indel” and one substitution, just at different positions. However, if we think of the letters as amino acid residues rather than elements of strings, alignment (a) is the better one, because isoleucine (I) and leucine (L) are similar sidechains, while tryptophan (W) has a very different structure. This is a physico-chemical measure; we might prefer these days to say that leucine simply substitutes for isoleucine more frequently—without giving an underlying “reason” for this observation. However we explain it, it is much more likely that a mutation changed I into L and that W was lost, as in (a), than that W changed into L and I was lost. We would expect t