How is the Poisson p-value (ppv) computed?
Submitted: 2004/05/11; Answered: 2004/05/11 by Jonathan Schug TESS computes the Poisson p-value (Ppv) as follows. Consider a single matrix that occurs m times in a single sequence of length L with a minimum La score of s. We have precomputed the per-trial probability q of seeing a hit from this matrix with a score at least as good as s in random sequence with a uniform base composition, i.e., p(A) = p(C) = p(G) = p(T) = 0.25. We assume that the distribution of the number of hits is given by the Poisson distribution: P(n) = rn e-r / n! where r = 2 L q and is the expected number of hits in a search of both strands of your sequence. This is a reasonable assumption, but we note some possible sources of error below. The Poisson p-value is the probability of seeing at least m hits which is the sum of P(n) for n ≥ m. What can go wrong? The main source of error occurs with sites that are self-similar, e.g., ACGTTAC. Note that this sequence can overlap with it self, i.e., ACGTTACGTTAC. In this