What is the word frequency distribution in the NY Times?
This doesn’t directly answer your question, but you might be interested to look at this word frequency list from the Brown Corpus. This is a collection of text from books, newspaper and magazine articles. You can see that short, common words dominate the top of the charts, as you might expect. By far, words like the, of, by, that, is, for, etc. are the most popular. The frequency distribution of words is said to follow Zipf’s law. If you’re interested in word frequency, you might want to check out this searchable database of word frequency composed of data from Time magazine.