Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Are there plans to develop a server-side SpamBayes solution?

April 26, 2017develop plans server-side solution SpamBayes

0

Posted

Are there plans to develop a server-side SpamBayes solution?

3 Answers

0

Posted

The problem with a server-side solution is that everyone has a different idea of what is spam – that’s the whole strength of the bayesian-style filtering concept. If you are certain that all of your users would agree on what is spam and what is not, then this might work for you, but otherwise you really have to have individual databases for each user. Either way, you should be able to modify SpamBayes easily enough to fit into your setup. Some people have in fact done this and have been kind enough to donate notes about how they have gone about it. If you also do this but in some other way, please let us know so that we can add to the information.

0

10 Posted

The problem with a server-side solution is that everyone has a different idea of what is spam – that’s the whole strength of the bayesian-style filtering concept. If you are certain that all of your users would agree on what is spam and what is not, then this might work for you, but otherwise you really have to have individual databases for each user. Either way, you should be able to modify spambayes easily enough to fit into your setup. Please let the list know if you do have success in this area, and we’ll update this answer.

0

10 Posted

———————————————————— The problem with a server-side solution is that everyone has a different idea of what is spam – that’s the whole strength of the bayesian-style filtering concept. If you are certain that *all* of your users would agree on what is spam and what is not, then this might work for you, but otherwise you really have to have individual databases for each user. Either way, you should be able to modify spambayes easily enough to fit into your setup. Please let the list know if you do have success in this area, and we’ll update this answer. Forget tokenizing words – you should use character n-grams! ———————————————————– This was quite carefully tested. Character 3-grams gave five times as many false positives, and twice as many false negatives as splitting on whitespace (words). Character 5-grams came fairly close to words with false positives, but the number of false negatives was worse than with