How many words are unique to each Shakespeare work?
Does Shakespeare use mostly the same vocabulary in each of his works, or does he use different vocabulary? We can answer this question by finding out how many words appear uniquely in a single work in Shakespeare. The percent of words which appear in just one work compared to the total number of distinct words in Shakespeare tells us how varied is Shakespeare’s vocabulary across all his works. The following script automates the search for work-specific unique words as follows. The steps are: • Compile a list of all the unique words (as either spellings or lemmata) and their counts for each individual work in Shakespeare. Start by retrieving the Shakespeare corpus using “getCorpus”, and then the list of works in the Shakespeare corpus using “getWorks”. Given a work, the “getWordCounts” method retrieved the distinct words and counts for that work as a Java TreeMap object. • Compile the combined list of words and counts across all of Shakespeare’s works by merging the individual lists of