Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Scraping websites with Newspaper3k in Python

  • Difficulty Level : Easy
  • Last Updated : 08 Sep, 2021

Web Scraping is a powerful tool to gather information from a website. To scrape multiple URLs, we can use a Python library called Newspaper3k. The Newspaper3k package is a Python library used for Web Scraping articles, It is built on top of requests and for parsing lxml. This module is a modified and better version of the Newspaper module which is also used for the same purpose.

Installation:

To install this module type the below command in the terminal.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

pip install newspaper3k

Step-by-step Approach:

  1. First we will define a list containing the URLs or assign a single URL. 
  2. We will create an Article object passing in the parameters such as the name of the URL and optional parameters     like language=’en’, for English
  3. We will then download and parse the file.
  4. Finally, display the data extracted.

Below are some examples based on the above approach:



Example 1

Below is a program to scrap data from a given URL.

Python3




# Import required module
import newspaper
  
# Assingn url
  
# Extract web data
url_i = newspaper.Article(url="%s" % (url), language='en')
url_i.download()
url_i.parse()
  
# Display scrapped data
print(url_i.text)

Output:

Example 2

Here, we scrap data from multiple URLs and then display it.

Python3




# Import required modules
import newspaper
  
# Define list of urls
  
# Parse through each url and display its content
for url in list_of_urls:
    url_i = newspaper.Article(url="%s" % (url), language='en')
    url_i.download()
    url_i.parse()
    print(url_i.text)

Output:




My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!