Create a Newsletter Sourcing Data using MongoDB
There are many news delivery websites are available like ndtv.com. In this article, let us see the very useful and interesting feature of how to get the data from ndtv.com via scraping feature i.e. extracting the contents from ndtv.com and storing them into MongoDB. MongoDB is a NoSQL documentum model database.
Module Installation: Install the required modules using the following command.
npm install body-parser npm install cheerio npm install express npm install express-handlebars npm install mongoose npm install request
Project Structure: It will look like this.
Filename: server.js: This is the important file required to start the app running. To call the ndtv site, scrape the data, and store it in MongoDB database.
Steps to run the application: Run the server.js file using the following command.
Output: We will see the following output on the terminal screen.
App is running
Now open any browser and go to http://localhost:3000/, we will get a similar page like below.
To get the news from ndtv.com, we need to click on Get New Articles. This will internally call our /scrape path. Once this call is done, in MongoDB, under ndtvnews database, articles named collection got filled with the data as shown below:
Here, the initially saved attribute will be false, id is automatically got created in MongoDB and this is the unique identification of a document in a collection. This attribute only helps to view a document, save a document, etc.
On click on View article on NDTV, it will navigate to the respective article. This is getting possible only because of id attribute which is present in the articles collection. So, when we click on View article on NDTV, as it is a hyperlink, directly that document _id value is internally picked up and the link is displayed. When Save article is clicked, _Id value will be the identification part for that article.
Working: Entire working model of project is explained in the video:
Conclusion: It is easier and simpler to scrape any news website and display the title contents alone along with a link that follows to proceed, and we can save the article and check out the saved articles easily.