Skip to content
Related Articles

Related Articles

Scraping websites with Newspaper3k in Python
  • Difficulty Level : Easy
  • Last Updated : 29 Dec, 2020

Web Scraping is a powerful tool to gather information from a website. To scrape multiple URLs, we can use a Python library called Newspaper3k. The Newspaper3k package is a Python library used for Web Scraping articles, It is built on top of requests and for parsing lxml. This module is a modified and better version of the Newspaper module which is also used for the same purpose.

Installation:

To install this module type the below command in the terminal.

pip install newspaper3k

Step-by-step Approach:

  1. First we will define a list containing the URLs or assign a single URL. 
  2. We will create an Article object passing in the parameters such as the name of the URL and optional parameters     like language=’en’, for English
  3. We will then download and parse the file.
  4. Finally, display the data extracted.

Below are some examples based on the above approach:

Example 1

Below is a program to scarp data from a given URL.



Python3




# Import required module
import newspaper
  
# Assingn url
  
# Extract web data
url_i = newspaper.Article(url="%s" % (url), language='en')
url_i.download()
url_i.parse()
  
# Display scrapped data
print(url_i.text)

Output:

Example 2

Here, we scrap data from multiple URLs and then display it.

Python3




# Import required modules
import newspaper
  
# Define list of urls
  
# Parse through each url and display its content
for url in list_of_urls:
    url_i = newspaper.Article(url="%s" % (url), language='en')
    url_i.download()
    url_i.parse()
    print(url_i.text)

Output:


Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up
Recommended Articles
Page :