Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Why am I running out of memory tagging a lot of data (using the 2008-10-26 version)?

April 26, 2017Data memory running tagging version

0

Posted

Why am I running out of memory tagging a lot of data (using the 2008-10-26 version)?

1 Answer

0

Posted

You’re probably using the tagString() method. Unfortunately, it does use increasing memory in this version. That method may well not be what you want anyway. It assumes that the input is correctly tokenized according to the conventions of the tagger training corpus. For the English models we use derived from the Penn Treebank, this means things like separating off contractions of “be” and “n’t”, rendering parentheses as -LRB-, -RRB-, etc. If you don’t do this correctly, then accuracy will suffer. (For no very good reason) in the 2008-09-28 distribution, the tagSentence method is set up to do tagging by using a beam search, whereas the main method of MaxentTagger and the tagSentence(Sentence) method called in TaggerDemo.java call a different Viterbi search routine to do the part-of-speech tagging. There seem to be problems with the former, and so you should use tagSentence().