Is htdig actually finding links to the PDF, Word, etc. documents you want to index?
5.25 and 5.18), and then find out how htdig is looking at the links in your HTML files to see if it’s ignoring or rejecting links to your externally parsed documents (questions 4.1 and 5.27). • If it is finding and accepting the links to these documents, is it correctly fetching them and passing them on to the appropriate external converter to be able to index them? Look at htdig -vvv output, around the time it tries to fetch one of these, and see what it does next. Does the file size look right? Are there any error messages around there? If the external converter isn’t even being called, take a close look at your external_parsers attribute setting to make sure it’s correct (see question 5.31). • If it is attempting to convert them, is the external converter doing what it should, to feed some indexable text back into htdig’s parser? You can also try htdig -vvvv (4 -v options) to see if it’s actually parsing individual words from any of these. If this is too much output to wade through,