Why is the Probabilistic Model superior to traditional free text systems?
Traditional free text systems are based on simple keywords and Boolean logic (primarily the AND, OR and NOT operators). Whilst this technique is very precise it does fall down when the number of documents retrieved is too large to examine exhaustively. In this case the ability to rank documents, with the most important ones at the top of the list, is of paramount importance. Over time the traditional systems have introduced various ways to rank results but this is not based on a sophisticated model of term profiles across the collection of indexed documents and tend to rely too heavily on a within document frequency (wdf) analysis. The statistical model of term frequency across the document collection is unique to the Probabilistic Model. This model not only allows initial relevance ranking to be more accurate but it also provides a mechanism for iterative searching based on relevance feedback.