How does a robot decide where to visit?
This depends on the robot, each one uses different strategies. In general they start from a historical list of URLs, especially of documents with many links elsewhere, such as server lists, “What’s New” pages, and the most popular sites on the Web. Most indexing services also allow you to submit URLs manually, which will then be queued and visited by the robot. Sometimes other sources for URLs are used, such as scanners through USENET postings, published mailing list achives etc. Given those starting points a robot can select URLs to visit and index, and to parse and use as a source for new URLs. How does an indexing robot decide what to index? If an indexing robot knows about a document, it may decide to parse it, and insert it into its database. How this is done depends on the robot: Some robots index the HTML Titles, or the first few paragraphs, or parse the entire HTML and index all words, with weightings depending on HTML constructs, etc. Some parse the META tag, or other special