Can I restart a project to download only links that failed in a previous run (web-page caching)?
Yes. You can, since HarvestMan has an inbuilt caching mechanism for documents downloaded from the network. From version 1.2, the caching mechanism is available and enabled by default. HarvestMan uses an MD5 checksumhm-cache in the HarvestMan project directory. When you re-start a project, HarvestMan loads the cache information for the project, if it exists. When it encounters a url, it compares the signature of the url data with the signature of the cache url and verifies if it is the same. If it is the same, the document has not changed, so HarvestMan skips this url. Otherwise it downloads it. The cache is regenerated at the end of every project. HarvestMan catches any keyboard interrupts by the user and makes sure that the cache is generated if the user decides to end the program by sending a keyboard interrupt, thereby making sure that precious network bandwidth is not wasted. You can disable web-page caching by disabling a configuration variable in HarvestMan configuration file. 5