How does Collexis deal with low concept density documents or queries?
A standard possibility is to index a document without a thesaurus. This process incorporates most of the indexing steps (stop words, normalization etc.), but will generate a fingerprint with word-based entries instead of concept entries. Since Collexis is able to work with multiple thesauri simultaneously, such a “free text” fingerprint can be used in addition to a thesaurus- based fingerprint and can take into account terms not present in the thesaurus. These word-based entries can relate to any number of consecutive words (bigrams, trigrams, etc.). Naturally, such a free text fingerprint does not offer the advantages of a thesaurus-based fingerprint like multilingualism, synonymy, etc.