How do the crawlers and fetchers co-operate?
The design of HarvestMan forces the crawlers and fetchers to be synergic. This is because, the crawlers obtain their data (web-page data) from the data queue, and post their results to the url queue. The fetchers in turn obtain their data (urls) from the url queue, and post their results to the data queue. The program starts off by spawing the first thread which is a fetcher. It gets the web-page data for the starting page and posts it to the data queue. The first crawler in line gets this data, parses it and extracts the links, posting it to the url queue. The next fetcher thread waiting in the url queue gets this data, and the process repeats in a synergic manner, till the program runs out of urls to parse, when the project ends. 3.5.