Open In App

What is Web Scraping in Node.js ?

Web Scraping means collecting any type of data such as images, text, or video from the internet. It is much useful when someone has to collect a large amount of data, it saves so much time by making the process automated.

Puppeteer: In Node.js, there are many modules for Web Scraping but one of the easy-to-implement & popular modules is Puppeteer. Puppeteer provides many methods that make the whole process of Web Scraping & Web Automation much easier. We can install this module in our project directory by typing the command.



npm install puppeteer

Approach: 

Step 1: Require Puppeteer Module



const puppeteer = require('puppeteer');

Step 2: Make an async function

async function webScraper() {
    ...
};

webScraper();

Step 3: Inside the function, create two constants, first is a browser const that is used to launch Puppeteer, and the second is a page const that is used to browse & open a new page for scraping purposes.

async function webScraper() {
    const browser = await puppeteer.launch({})
       const page = await browser.newPage()
};
webScraper();

Step 4: Using the goto method, open the website which we want to scrape, then select the element that text we want, then extract text from that element & log the text into the console.

await page.goto(
'https://www.geeksforgeeks.org/explain-the-mechanism-of-event-loop-in-node-js/amp/')
var element = await page.waitFor("h1")
var text = await page.evaluate(element => element.textContent, element)
console.log(text)
browser.close()

Example:




const puppeteer = require('puppeteer');
 
async function webScraper() {
    const browser = await puppeteer.launch({})
    const page = await browser.newPage()
    await page.goto(
    let element = await page.waitFor("h1")
    let text = await page.evaluate(
        element => element.textContent, element)
    console.log(text)
    browser.close()
};
 
webScraper();

Step to run the application: Open the terminal and type the following command.

node app.js

Output:

 

Article Tags :