Can I use WebSPHINX to crawl the entire Web, like search engines do?
WebSPHINX isn’t designed for enormous crawls like that. Search engines typically use distributed crawlers running on farms of PCs with a fat network pipe and a distributed filesystem or database for managing the crawl frontier and storing page data. WebSPHINX is intended more for personal use, to crawl perhaps a hundred or a thousand web pages. If you want to use WebSPHINX for large crawls, you should definitely read the next question about memory usage.