Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How does a HarvestMan project finish?

April 26, 2017finish harvestman project

0

Posted

How does a HarvestMan project finish?

1 Answer

0

Posted

(Make sure that you have read items 3.1 – 3.6 before reading this.) As mentioned before, HarvestMan works by the co-operation of crawler and fetcher family of tracker threads, each feeding on the data provided by the other. A project nears its end when there are no more web-pages to crawl according to the configurations of the project. This means that the fetchers have less web-page data to fetch, which in-turn dries the data source for the crawlers. The crawlers in-turn go idle, thus posting less data to the url queue, which again dries the data source for the fetchers. The synergy works at this phase also, as it does when the project is active and all tracker threads are running. After some time, all the tracker threads go idle, as there is no more data to feed from the queues. In the main thread of the HarvestMan program, there is a loop that spins continously checking for this event. Once all threads go idle, the loop detects it and exits; the project (and the program) comes to a h