How does PAUP* deal with missing characters under the likelihood criterion?
The likelihood is computed by summing the likelihoods over each possible assignment of A, C, G, or T to the taxon with the missing datum. Generally, if all of the nearby taxa have the same state, this sum will be dominated by the term with this same state assigned to the “missing” value, but each of the other states will contribute some small, nonzero, value to the likelihood. On the other hand, if there is considerable ambiguity in the sense that the surrounding taxa have different states, or the branch leading to a missing-data taxon is very long, each of the possible assignments makes a larger contribution to the total likelihood. It’s all in the same spirit as likelihood in the absence of missing data–there are lots of ways that the pattern of nucleotides at the tips of the tree could have been generated, and all of them contribute something to the total likelihood (generally some much more than others). With missing data, there are several states that a taxon might have taken if