What is Harvest?
Harvest is a system to collect information and make them searchable using a web interface. Harvest can collect information on inter- and intranet using http, ftp, nntp as well as local files like data on harddisk, CDROM and file servers. Current list of supported formats in addition to HTML include TeX, DVI, PS, full text, mail, man pages, news, troff, WordPerfect, RTF, Microsoft Word/Excel, SGML, C sources and many more. Stubs for PDF support is included in Harvest and will use Xpdf or Acroread to process PDF files. Adding support for new format is easy due to Harvest’s modular design.