How can I get htdig to ignore the robots.txt file or meta robots tags?
You can’t, and you shouldn’t. The Standard for Robot Exclusion exists for a very good reason, and any well behaved indexing engine or spider should conform to it. If you have a problem with a robots.txt file, you really should take it up with the site’s webmaster. If they don’t have a problem with you indexing their site, they shouldn’t mind setting up a User-agent entry in their robots.txt file with a name you both agree on. The user agent setting that htdig uses for matching entries in robots.txt can be changed via the robotstxt_name attribute in your config file.If you have a problem with a robots meta tag in a document (see question 4.15) you should take it up with the author or maintainer of that page. These tags are an all or nothing deal, as they can’t be set up to allow some engines and disallow others. If htdig encounters them, it has to give the page’s creator the benefit of the doubt and honour them. If exceptions to the rule are wanted, this should be done with a robots.
You can’t, and you shouldn’t. The Standard for Robot Exclusion exists for a very good reason, and any well behaved indexing engine or spider should conform to it. If you have a problem with a robots.txt file, you really should take it up with the site’s webmaster. If they don’t have a problem with you indexing their site, they shouldn’t mind setting up a User-agent entry in their robots.txt file with a name you both agree on. The user agent setting that htdig uses for matching entries in robots.txt can be changed via the robotstxt_name attribute in your config file. If you have a problem with a robots meta tag in a document (see question 4.15) you should take it up with the author or maintainer of that page. These tags are an all or nothing deal, as they can’t be set up to allow some engines and disallow others. If htdig encounters them, it has to give the page’s creator the benefit of the doubt and honour them. If exceptions to the rule are wanted, this should be done with a robots.tx