How can I combine key phrases that were extracted from many different documents?
For some applications, you may wish to have a list of key phrases that covers a whole collection of documents, where each document has been processed individually by Extractor. If you have no constraints on the size of the list of key phrases, you might simply take the union of all of the phrases as your combined list. To reduce the size of the list slightly, you might drop words that have the same stem (e.g., “automobile” and “automobiles”). If you want to substantially reduce the size of the list, then you can assign a normalized score to each key phrase and select the key phrases with the highest normalized scores.