Skip to content
Related Articles

Related Articles

Create GUI to Web Scrape articles in Python
  • Last Updated : 29 Dec, 2020

Prerequisite- GUI Application using Tkinter

In this article, we are going to write scripts to extract information from the article in the given URL. Information like Title, Meta information, Articles Description, etc., will be extracted.

We are going to use Goose Module.

Goose module helps to extract the following information:

  • The main text of an article.
  • Main image of the article.
  • Any YouTube/Vimeo movies embedded in the article.
  • Meta Description.
  • Meta tags.

To start with, install the module required using the following command.



pip install goose3

Approach

  • Import the module.
  • Create an object with Goose().extract(URL) function.
  • Get Title with obj.title attribute.
  • Get meta description with obj.meta_description attribute.
  • Get text with obj.article.cleaned_text attribute.

Implementation

Step 1: Initializing the requirements.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# import module
from goose3 import Goose
  
# var for URL
  
# initialization with
article = Goose().extract(url)

chevron_right


Step 2: Extracting the title.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

print("Title of the article :\n",article.title)

chevron_right


Output:

Title extraction from article using Python

Step 3: Extracting meta information

Python3



filter_none

edit
close

play_arrow

link
brightness_4
code

print("Meta infromation :\n",article.meta_description)

chevron_right


Output:

Meta description extraction from article using Python

Step 4: Extracting article

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

print("Article Text :\n",article.cleaned_text[:300])

chevron_right


Output:

article text extraction from article using Python

Step 5: Visualizing using Tkinter

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# import modules
from tkinter import *
from goose3 import Goose
  
# for getting information
def info():
    article = Goose().extract(e1.get())
    title.set(article.title)
    meta.set(article.meta_description)
    string = article.cleaned_text[:150]
    art_dec.set(string.split("\n"))
      
# object of tkinter
# and background set to grey
master = Tk()
master.configure(bg='light grey')
  
# Variable Classes in tkinter
title = StringVar();
meta = StringVar();
art_dec = StringVar();
  
# Creating label for each information
# name using widget Label 
Label(master, text="Website URL : "
      bg = "light grey").grid(row=0, sticky=W)
Label(master, text="Title :",
      bg = "light grey").grid(row=3, sticky=W)
Label(master, text="Meta information :",
      bg = "light grey").grid(row=4, sticky=W)
Label(master, text="Article description :",
      bg = "light grey").grid(row=5, sticky=W)
  
# Creating lebel for class variable
# name using widget Entry
Label(master, text="", textvariable=title,
      bg = "light grey").grid(row=3,column=1, sticky=W)
Label(master, text="", textvariable=meta,
      bg = "light grey").grid(row=4,column=1, sticky=W)
Label(master, text="", textvariable=art_dec,
      bg = "light grey").grid(row=5,column=1, sticky=W)
  
e1 = Entry(master, width = 100)
e1.grid(row=0, column=1)
  
# creating a button using the widget  
# to call the submit function 
b = Button(master, text="Show", command=info , bg = "Blue")
b.grid(row=0, column=2,columnspan=2, rowspan=2,padx=5, pady=5,)
  
mainloop()

chevron_right


Output:

extracting information from article using Python

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up
Recommended Articles
Page :