In this article, we are going to write a python script to extract author information from GeeksforGeeks article.
Module needed
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Import module
- Make requests instance and pass into URL
- Initialize the article Title
- Pass URL into a getdata()
- Scrape the data with the help of requests and Beautiful Soup
- Find the required details and filter them.
Stepwise execution of scripts:
Step 1: Import all dependence
Python
import requests
from bs4 import BeautifulSoup
|
Step 2: Create a URL get function
Python3
def getdata(url):
r = requests.get(url)
return r.text
|
Step 3: Now merge the Article name into URL and pass the URL into the getdata() function and Convert that data into HTML code
Python3
article = "optparse-module-in-python"
htmldata = getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser' )
print (soup)
|
Output:

Step 4: Traverse the author’s name from the HTML document.
Python
for i in soup.find( 'div' , class_ = "author_handle" ):
Author = i.get_text()
print (Author)
|
Output:
kumar_satyam
Step 5: Now create a URL with author-name and get HTML code.
Python3
htmldata = getdata(profile)
soup = BeautifulSoup(htmldata, 'html.parser' )
|
Step 6: Traverse the author’s information.
Python3
name = soup.find(
'div' , class_ = 'mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold medText' ).get_text()
author_info = []
for item in soup.find_all( 'div' , class_ = 'mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold' ):
author_info.append(item.get_text())
print ( "Author name :" )
print (name)
print ( "Author information :" )
print (author_info)
|
Output:
Author name : Satyam Kumar
Author information :
[‘LNMI patna’, ‘\nhttps://www.linkedin.com/in/satyam-kumar-174273101/’]
Complete code:
Python3
import requests
from bs4 import BeautifulSoup
def getdata(url):
r = requests.get(url)
return r.text
article = "optparse-module-in-python"
htmldata = getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser' )
for i in soup.find( 'div' , class_ = "author_handle" ):
Author = i.get_text()
htmldata = getdata(profile)
soup = BeautifulSoup(htmldata, 'html.parser' )
name = soup.find(
'div' , class_ = 'mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold medText' ).get_text()
author_info = []
for item in soup.find_all( 'div' , class_ = 'mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold' ):
author_info.append(item.get_text())
print ( "Author name :" , name)
print ( "Author information :" )
print (author_info)
|
Output:
Author name : Satyam Kumar
Author information :
[‘LNMI patna’, ‘\nhttps://www.linkedin.com/in/satyam-kumar-174273101/’]