Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How do I index Word, Excel, PowerPoint or PostScript documents?

0
10 Posted

How do I index Word, Excel, PowerPoint or PostScript documents?

0

This must be done with an external parser or converter. A sample of such an external converter is the contrib/doc2html/doc2html.pl Perl script. It will parse Word, PostScript, PDF and other documents, when used with the appropriate document to text converters. It uses catdoc to parse Word documents, and ps2ascii to parse PostScript files. The comments in the Perl script and accompanying documentation indicate where you can obtain these converters.Versions of htdig before 3.1.4 don’t support external converters, so you have to use an external parser script such as contrib/parse_doc.pl (or better yet, upgrade htdig if you can). External converter scripts are simpler to write and maintain than a full external parser, as they just convert input documents to text/plain or text/html, and pass that back to htdig to be parsed. Parsing is more consistent across document types with external converters, because the final work is done by htdig’s internal parsers.

0

This must be done with an external parser or converter. A sample of such an external converter is the contrib/doc2html/doc2html.pl Perl script. It will parse Word, PostScript, PDF and other documents, when used with the appropriate document to text converters. It uses catdoc to parse Word documents, and ps2ascii to parse PostScript files. The comments in the Perl script and accompanying documentation indicate where you can obtain these converters. Versions of htdig before 3.1.4 don’t support external converters, so you have to use an external parser script such as contrib/parse_doc.pl (or better yet, upgrade htdig if you can). External converter scripts are simpler to write and maintain than a full external parser, as they just convert input documents to text/plain or text/html, and pass that back to htdig to be parsed. Parsing is more consistent across document types with external converters, because the final work is done by htdig’s internal parsers. External parser scripts tend to b

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123