Do the parsers add all words into the index?
After the initial parsing of a document into terms, there might be other considerations to be made before adding the term into the index, such as whether or not that word is important enough to add, whether to add the word as is or to index its stem form instead, and whether to recognize certain words as acronyms. Having an acronyms list, ignoring stopwords (very common words, like “the”, “and”, “it”), and indexing word stems (so “stem”, “stemming”, and “stems” would all become the same term) are features supported by Lemur. These features are all supported by the provided application, BuildIndex.