I have a big fetchlist in my segments folder. How can I fetch only some sites at a time?
• You have to decide how many pages you want to crawl before generating segments and use the options of bin/nutch generate. • Use -topN to limit the amount of pages all together. • Use -numFetchers to generate multiple small segments. • Now you could either generate new segments. Maybe you whould use -adddays to allow bin/nutch generate to put all the urls in the new fetchlist again. Add more then 7 days if you did not make a updatedb. • Or send the process a unix STOP signal. You should be able to index the part of the segment for crawling which is allready fetched. Then later send a CONT signal to the process. Do not turn off your computer between!