Open In App

What is Web Scraping in Node.js ?

Last Updated : 30 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Web Scraping means collecting any type of data such as images, text, or video from the internet. It is much useful when someone has to collect a large amount of data, it saves so much time by making the process automated.

Puppeteer: In Node.js, there are many modules for Web Scraping but one of the easy-to-implement & popular modules is Puppeteer. Puppeteer provides many methods that make the whole process of Web Scraping & Web Automation much easier. We can install this module in our project directory by typing the command.

npm install puppeteer

Approach: 

Step 1: Require Puppeteer Module

const puppeteer = require('puppeteer');

Step 2: Make an async function

async function webScraper() {
    ...
};

webScraper();

Step 3: Inside the function, create two constants, first is a browser const that is used to launch Puppeteer, and the second is a page const that is used to browse & open a new page for scraping purposes.

async function webScraper() {
    const browser = await puppeteer.launch({})
       const page = await browser.newPage()
};
webScraper();

Step 4: Using the goto method, open the website which we want to scrape, then select the element that text we want, then extract text from that element & log the text into the console.

await page.goto(
'https://www.geeksforgeeks.org/explain-the-mechanism-of-event-loop-in-node-js/')
var element = await page.waitFor("h1")
var text = await page.evaluate(element => element.textContent, element)
console.log(text)
browser.close()

Example:

Javascript




const puppeteer = require('puppeteer');
 
async function webScraper() {
    const browser = await puppeteer.launch({})
    const page = await browser.newPage()
    await page.goto(
    let element = await page.waitFor("h1")
    let text = await page.evaluate(
        element => element.textContent, element)
    console.log(text)
    browser.close()
};
 
webScraper();


Step to run the application: Open the terminal and type the following command.

node app.js

Output:

 


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads