Skip to content
Related Articles

Related Articles

Scraping websites with Newspaper3k in Python
  • Last Updated : 29 Dec, 2020

Web Scraping is a powerful tool to gather information from a website. To scrape multiple URLs, we can use a Python library called Newspaper3k. The Newspaper3k package is a Python library used for Web Scraping articles, It is built on top of requests and for parsing lxml. This module is a modified and better version of the Newspaper module which is also used for the same purpose.

Installation:

To install this module type the below command in the terminal.

pip install newspaper3k

Step-by-step Approach:

  1. First we will define a list containing the URLs or assign a single URL. 
  2. We will create an Article object passing in the parameters such as the name of the URL and optional parameters     like language=’en’, for English
  3. We will then download and parse the file.
  4. Finally, display the data extracted.

Below are some examples based on the above approach:

Example 1

Below is a program to scarp data from a given URL.



Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# Import required module
import newspaper
  
# Assingn url
  
# Extract web data
url_i = newspaper.Article(url="%s" % (url), language='en')
url_i.download()
url_i.parse()
  
# Display scrapped data
print(url_i.text)

chevron_right


Output:

Example 2

Here, we scrap data from multiple URLs and then display it.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# Import required modules
import newspaper
  
# Define list of urls
  
# Parse through each url and display its content
for url in list_of_urls:
    url_i = newspaper.Article(url="%s" % (url), language='en')
    url_i.download()
    url_i.parse()
    print(url_i.text)

chevron_right


Output:


Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up
Recommended Articles
Page :