Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How can I get htdig to ignore the robots.txt file or meta robots tags?

file htdig Meta Robots Tags
0
10 Posted

How can I get htdig to ignore the robots.txt file or meta robots tags?

0

You can’t, and you shouldn’t. The Standard for Robot Exclusion exists for a very good reason, and any well behaved indexing engine or spider should conform to it. If you have a problem with a robots.txt file, you really should take it up with the site’s webmaster. If they don’t have a problem with you indexing their site, they shouldn’t mind setting up a User-agent entry in their robots.txt file with a name you both agree on. The user agent setting that htdig uses for matching entries in robots.txt can be changed via the robotstxt_name attribute in your config file.If you have a problem with a robots meta tag in a document (see question 4.15) you should take it up with the author or maintainer of that page. These tags are an all or nothing deal, as they can’t be set up to allow some engines and disallow others. If htdig encounters them, it has to give the page’s creator the benefit of the doubt and honour them. If exceptions to the rule are wanted, this should be done with a robots.

0

You can’t, and you shouldn’t. The Standard for Robot Exclusion exists for a very good reason, and any well behaved indexing engine or spider should conform to it. If you have a problem with a robots.txt file, you really should take it up with the site’s webmaster. If they don’t have a problem with you indexing their site, they shouldn’t mind setting up a User-agent entry in their robots.txt file with a name you both agree on. The user agent setting that htdig uses for matching entries in robots.txt can be changed via the robotstxt_name attribute in your config file. If you have a problem with a robots meta tag in a document (see question 4.15) you should take it up with the author or maintainer of that page. These tags are an all or nothing deal, as they can’t be set up to allow some engines and disallow others. If htdig encounters them, it has to give the page’s creator the benefit of the doubt and honour them. If exceptions to the rule are wanted, this should be done with a robots.tx

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123