Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

My web crawler needs to use a web proxy, user authentication, cookies, a special user-agent, etc. What do I do?

April 26, 2017authentication cookies crawler Needs proxy special user user-agent web

0

Posted

My web crawler needs to use a web proxy, user authentication, cookies, a special user-agent, etc. What do I do?

1 Answer

0

Posted

WebSPHINX uses the built-in Java classes URL and URLConnection to fetch web pages. If you’re running the Crawler Workbench inside a browser, that means your crawler uses the proxy, authentication, cookies, and user-agent of the browser, so if you can visit the site manually, then you can crawl it. If you’re running your crawler from the command line, however, you’ll have to configure Java to set up your proxy, authentication, user-agents, and so forth.