Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

I have a big fetchlist in my segments folder. How can I fetch only some sites at a time?

0
Posted

I have a big fetchlist in my segments folder. How can I fetch only some sites at a time?

0

• You have to decide how many pages you want to crawl before generating segments and use the options of bin/nutch generate. • Use -topN to limit the amount of pages all together. • Use -numFetchers to generate multiple small segments. • Now you could either generate new segments. Maybe you whould use -adddays to allow bin/nutch generate to put all the urls in the new fetchlist again. Add more then 7 days if you did not make a updatedb. • Or send the process a unix STOP signal. You should be able to index the part of the segment for crawling which is allready fetched. Then later send a CONT signal to the process. Do not turn off your computer between!

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123