What is Web Scraping in Node.js ?

Last Updated : 30 Mar, 2023

Web Scraping means collecting any type of data such as images, text, or video from the internet. It is much useful when someone has to collect a large amount of data, it saves so much time by making the process automated.

Puppeteer: In Node.js, there are many modules for Web Scraping but one of the easy-to-implement & popular modules is Puppeteer. Puppeteer provides many methods that make the whole process of Web Scraping & Web Automation much easier. We can install this module in our project directory by typing the command.

npm install puppeteer

Approach:

Step 1: Require Puppeteer Module

const puppeteer = require('puppeteer');

Step 2: Make an async function

async function webScraper() {
    ...
};

webScraper();

Step 3: Inside the function, create two constants, first is a browser const that is used to launch Puppeteer, and the second is a page const that is used to browse & open a new page for scraping purposes.

async function webScraper() {
    const browser = await puppeteer.launch({})
       const page = await browser.newPage()
};
webScraper();

Step 4: Using the goto method, open the website which we want to scrape, then select the element that text we want, then extract text from that element & log the text into the console.

await page.goto(
'https://www.geeksforgeeks.org/explain-the-mechanism-of-event-loop-in-node-js/')
var element = await page.waitFor("h1")
var text = await page.evaluate(element => element.textContent, element)
console.log(text)
browser.close()

Example:

Javascript

const puppeteer = require('puppeteer');
 
async function webScraper() {
    const browser = await puppeteer.launch({})
    const page = await browser.newPage()
    await page.goto(
'https://www.geeksforgeeks.org/explain-the-mechanism-of-event-loop-in-node-js/')
    let element = await page.waitFor("h1")
    let text = await page.evaluate(
        element => element.textContent, element)
    console.log(text)
    browser.close()
};
 
webScraper();