1. Web Scraping :
Web Scraping is a technique used to extract a large amount of data from websites and then saving it to the local machine in the form of XML, excel or SQL. The tools used for web scraping are known as web scrapers. On the basis of the requirements given, they can extract the data from any website in a fraction of time. This automation of tasks is very helpful for developing data for machine learning and other purpose. They work in four steps:
- Sending the request to the target page.
- Getting response from the target page.
- Parsing and extracting the response.
- Download the data.
Some of the popular web scraping tools are ProWebScraper, Webscraper.io, etc.
2. Web Crawling :
Web Crawling is analogous to a spider crawling but the place of crawling here is the web!. It basically visits a website and read web pages for the purpose of building entries for search engine index. The tools that are used for web crawling are known as web crawlers or spiders. A series of web pages are analyzed and links to the pages on them are then followed for even more links thus it does a deep search for extracting of information. Famous search engines such as Google, Yahoo and Bing do web crawling and use this information for indexing web pages. Examples are Scrapy and Apache nut.
Difference between Web Scraping and Web Crawling :
|S.NO.||Web Scraping||Web Crawling|
|1.||The tool used is Web Scraper.||The tool used Web Crawler or Spiders.|
|2.||It is used for downloading information||It is used for indexing of Web pages|
|3.||It need not visit all the pages of website for infomation.||It visits each and every page, until the last line for information.|
|4.||A Web Scraper doesn’t obey robots.txt in most of the cases.||It always obeys robots.txt.|
|5.||It is done on both small and large scale.||It is mostly employed in large scale.|
|6.||Application areas include Retail Marketing, Equity search and Machine learning.||Used in search engines to give search results to the user.|
|7.||Data de-duplication is not necessarily a part of Web Scraping.||Data de-duplication is and integral part of Web Scraping.|
|8.||This needs crawl agent and a parser for parsing the response.||This only needs only crawl agent.||9.||ProWebScraper, Web Scraper.io are the examples||Google, Yahoo or Bing do Web Crawling|
- Nodejs | Web Crawling using Cheerio
- Introduction to Web Scraping
- Reading selected webpage content using Python Web Scraping
- Data Scraping for Android Apps using google-play-scraper in Node.js
- Difference between IoT and M2M
- Difference between DFA and NFA
- Difference Between SMO and SEO
- Difference between LAN, MAN and WAN
- Difference between IoE and IoT
- Difference between PCI-E and PCI-X
- Difference between LAN and WAN
- Difference Between DAS and SAN
- Difference between DVD-R and DVD-RW
- Difference between PNG and GIF
- Difference between PCI and PCI-X
- Difference between SDN and NFV
- Difference Between GIT and SVN
- Difference between T-SQL and PL-SQL
- Difference between DAS and NAS
- Difference between ASP and ASP.NET
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.