Web Crawlers

A Web Crawler (also known as a web spider) is a program or automated script that browses the Web, creating copies of visited pages to be processed and indexed by a search engine. They can also be used for web maintenance by checking links and validating HTML code.

Crawlers "crawl" through a site a page at a time, following all links to all other pages until everything has been read.

Web crawlers can also be used to gather large amounts of raw data for data mining, an important tool in this era of Big Data.

Some methods of web crawling have led to questions on the legality of this process. Issues arise when crawlers access websites without authorization, scrape or store user data, and cause data breaches.