Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How do the crawlers and fetchers co-operate?

April 26, 2017co-operate crawlers fetchers

0

Posted

How do the crawlers and fetchers co-operate?

1 Answer

0

Posted

The design of HarvestMan forces the crawlers and fetchers to be synergic. This is because, the crawlers obtain their data (web-page data) from the data queue, and post their results to the url queue. The fetchers in turn obtain their data (urls) from the url queue, and post their results to the data queue. The program starts off by spawing the first thread which is a fetcher. It gets the web-page data for the starting page and posts it to the data queue. The first crawler in line gets this data, parses it and extracts the links, posting it to the url queue. The next fetcher thread waiting in the url queue gets this data, and the process repeats in a synergic manner, till the program runs out of urls to parse, when the project ends. 3.5.