Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Is htdig actually finding links to the PDF, Word, etc. documents you want to index?

April 26, 2017actually documents htdig index LINKS PDF word

0

Posted

Is htdig actually finding links to the PDF, Word, etc. documents you want to index?

1 Answer

0

Posted

5.25 and 5.18), and then find out how htdig is looking at the links in your HTML files to see if it’s ignoring or rejecting links to your externally parsed documents (questions 4.1 and 5.27). • If it is finding and accepting the links to these documents, is it correctly fetching them and passing them on to the appropriate external converter to be able to index them? Look at htdig -vvv output, around the time it tries to fetch one of these, and see what it does next. Does the file size look right? Are there any error messages around there? If the external converter isn’t even being called, take a close look at your external_parsers attribute setting to make sure it’s correct (see question 5.31). • If it is attempting to convert them, is the external converter doing what it should, to feed some indexable text back into htdig’s parser? You can also try htdig -vvvv (4 -v options) to see if it’s actually parsing individual words from any of these. If this is too much output to wade through,