Are there plans to develop a server-side SpamBayes solution?
The problem with a server-side solution is that everyone has a different idea of what is spam – that’s the whole strength of the bayesian-style filtering concept. If you are certain that all of your users would agree on what is spam and what is not, then this might work for you, but otherwise you really have to have individual databases for each user. Either way, you should be able to modify SpamBayes easily enough to fit into your setup. Some people have in fact done this and have been kind enough to donate notes about how they have gone about it. If you also do this but in some other way, please let us know so that we can add to the information.
The problem with a server-side solution is that everyone has a different idea of what is spam – that’s the whole strength of the bayesian-style filtering concept. If you are certain that all of your users would agree on what is spam and what is not, then this might work for you, but otherwise you really have to have individual databases for each user. Either way, you should be able to modify spambayes easily enough to fit into your setup. Please let the list know if you do have success in this area, and we’ll update this answer.
———————————————————— The problem with a server-side solution is that everyone has a different idea of what is spam – that’s the whole strength of the bayesian-style filtering concept. If you are certain that *all* of your users would agree on what is spam and what is not, then this might work for you, but otherwise you really have to have individual databases for each user. Either way, you should be able to modify spambayes easily enough to fit into your setup. Please let the list know if you do have success in this area, and we’ll update this answer. Forget tokenizing words – you should use character n-grams! ———————————————————– This was quite carefully tested. Character 3-grams gave five times as many false positives, and twice as many false negatives as splitting on whitespace (words). Character 5-grams came fairly close to words with false positives, but the number of false negatives was worse than with