Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

HTTrack is taking too much time for parsing, it is very slow. Whats wrong?

April 26, 2017httrack parsing slow taking time wrong

0

Posted

HTTrack is taking too much time for parsing, it is very slow. Whats wrong?

1 Answer

0

Posted

Former (before 3.04) releases of HTTrack had problems with parsing. It was really slow, and performances -especially with huge HTML files- were not really good. The engine is now optimized, and should parse very quickly all html files. For example, a 10MB HTML file should be scanned in less than 3 or 4 seconds. Therefore, higher values mean that the engine had to wait a bit for testing several links. • Sometimes, links are malformed in pages. “a href=”/foo”” instead of “a href=”/foo/””, for example, is a common mistake. It will force the engine to make a supplemental request, and find the real /foo/ location. • Dynamic pages. Links with names terminated by .php3, .asp or other type which are different from the regular .html or .htm will require a supplemental request, too. HTTrack has to “know” the type (called “MIME type”) of a file before forming the destination filename. Files like foo.gif are “known” to be images, “.html” are obviously HTML pages – but “.php3” pages may be either d