Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

FTP links are not caught! Whats happening?

April 26, 2017FTP happening LINKS

0

Posted

FTP links are not caught! Whats happening?

1 Answer

0

Posted

FTP files might be seen as external links, especially if they are located in outside domain. You have either to accept all external links (See the links options, -n option) or only specific files (see filters section). Example: You are downloading http://www.someweb.com/foo/ and can not get ftp://ftp.someweb.com files Then, add the filter rule +ftp.someweb.com/* to accept all files from this (ftp) location Q: I got some weird messages telling that robots.txt do not allow several files to be captured. What’s going on? A: These rules, stored in a file called robots.txt, are given by the website, to specify which links or folders should not be caught by robots and spiders – for example, /cgi-bin or large images files. They are followed by default by HTTrack, as it is advised. Therefore, you may miss some files that would have been downloaded without these rules – check in your logs if it is the case: Info: Note: due to www.foobar.com remote robots.txt rules, links begining with these path