Why do I get an OutOfMemoryException ten minutes after starting a broad scoped crawl?
If using 64-bit JVM, see Gordon’s note to the list on 12/19/2005, Re: Large crawl experience (like, 500M links). See the note in [ 896772 ] “Site-first”/’frontline’ prioritization and this Release Note, 5.1.1 Crawl Size Upper Bounds. See this note by Kris from the list, 1027 for how to mitigate memory-use when using HostQueuesFrontier. The advice is less applicable if using a post-1.2.0, BdbFrontier Heritrix. See sections ‘Crawl Size Upper Bounds Update’ in the Release Notes.