Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Does HarvestMan obey the Robots Exclusion Protocol ?

0
Posted

Does HarvestMan obey the Robots Exclusion Protocol ?

0

Yes. HarvestMan respects the rules laid down by website managers in the robots.txtrules in the web server. These rules specify certain limitations to crawling certain areas of the web site depending upon the user agent of the browser client. (Some site owners block entire sections to all clients). HarvestMan obeys the robot exclusion protocol by default. There is way to bypass this protocol by disabling this feature. However, it is a good idea to always enable it to follow Internet etiquette and also to prevent yourself getting fined or sued by website owners for not following the robots.txt rules. Support for robots.txt rules is available in Python. HarvestMan uses a customised form of this module.

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.