Fishers left sided test seems to rank a lot of bigrams first! Why?
As sample sizes get larger, the hypergeometric probabilities associated with each possible 2×2 table of bigram data (given fixed marginal totals) tend to approach 1. Consider setting the precision of the test relatively high (10 or 15 digits) in order to observe this. When the default setting is used (4 digits) there tends to be quite a lot of rounding to 1.0000. The paper “Fishing for Exactness” shows how these hypergeometric probabilities are computed. Find it with the 1996 entries at: http://www.d.umn.edu/~tpederse/pubs.html It is also available at: http://xxx.lanl.