Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Can I use Lucene to crawl my site or other sites on the Internet?

Crawl internet lucene site sites
0
Posted

Can I use Lucene to crawl my site or other sites on the Internet?

0

No. Lucene does not know how to access external document, nor does it know how to extract the content and links of HTML and other document format. Lucene focuses on the indexing and searching and does it great. However, several crawlers are available which you could use: list of Open Source Crawlers in Java. regain is an Open Source tool that crawls web sites, stores them in a Lucene index and offers a search web interface. Also see Nutch for a powerful Lucene-based search engine.

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123