Web Scraping is a powerful tool to gather information from a website. To scrape multiple URLs, we can use a Python library called Newspaper3k. The Newspaper3k package is a Python library used for Web Scraping articles, It is built on top of requests and for parsing lxml. This module is a modified and better version of the Newspaper module which is also used for the same purpose.
To install this module type the below command in the terminal.
pip install newspaper3k
- First we will define a list containing the URLs or assign a single URL.
- We will create an Article object passing in the parameters such as the name of the URL and optional parameters like language=’en’, for English
- We will then download and parse the file.
- Finally, display the data extracted.
Below are some examples based on the above approach:
Below is a program to scarp data from a given URL.
Here, we scrap data from multiple URLs and then display it.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.