What is conceptSearching and how does it compare to simple keywords searching?
A Probabilistic implementation that worked on the basis that words appears in documents independently from other words would provide a reasonable level of accuracy. However, if the implementation understands that the co-location of words is relevant and should form part of the weighting process then a significant improvement in the relevance ranking can be achieved. For example, consider the following query: “dangerous dog attacks baby” A human would interpret this phrase as being about a wild animal attacking an infant. However, a simple IR system that assumes that words appear independently from each other would assume that any document containing the phrase: “dangerous virus attacks baby dog” Would be 100 % relevant to the above query on the basis that it contains all of the words. Most humans would disagree. conceptSearching uses Shannon’s Information Theory to compute the incremental value of compound terms based on an analysis of the probability of the joint occurrence.