Could spammers fool Bayesian filters by filling their spams with random words?
They would have to get rid of the bad words as well as adding neutral ones. Only the most interesting fifteen words contribute to the probability, and neutral words like “onion”, no matter how many there are of them, can’t compete with the incriminating “viagra” for statistical significance. To outweigh incriminating words, the spammers would need to dilute their emails with especially innocent words, i.e. those that are not merely neutral but occur disproportionately often in the user’s legitimate email. But these words (e.g. the nicknames of one’s friends, terms one uses in one’s work) are different for each recipient, and the spammers have no way of figuring out what they are.