Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Extract Author’s information from Geeksforgeeks article using Python

  • Last Updated : 25 Aug, 2021

In this article, we are going to write a python script to extract author information from GeeksforGeeks article.

Module needed

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
  • requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests

Approach:

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

  • Import module
  • Make requests instance and pass into URL
  • Initialize the article Title
  • Pass URL into a getdata()
  • Scrape the data with the help of requests and Beautiful Soup
  • Find the required details and filter them.

Stepwise execution of scripts:



Step 1: Import all dependence

Python




# import module
import requests
from bs4 import BeautifulSoup

 
Step 2: Create a URL get function 

Python3




# link for extract html data
# Making a GET request
     
def getdata(url):
    r=requests.get(url)
    return r.text

Step 3: Now merge the Article name into URL and pass the URL into the getdata() function and Convert that data into HTML code 

Python3




# input article by geek
article = "optparse-module-in-python"
 
# url
 
# pass the url
# into getdata function
htmldata=getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser')
 
# display html code
print(soup)

Output: 



Step 4: Traverse the author’s name from the HTML document. 

Python




# traverse author name
for i in soup.find('div', class_="author_handle"):
    Author = i.get_text()
print(Author)

Output: 

kumar_satyam

Step 5: Now create a URL with author-name and get HTML code. 

Python3




# now get author information
# with author name
profile ='https://auth.geeksforgeeks.org/user/'+Author+'/profile'
 
# pass the url
# into getdata function
htmldata=getdata(profile)
soup = BeautifulSoup(htmldata, 'html.parser')

Step 6: Traverse the author’s information.

Python3




# traverse information of author
name = soup.find(
    'div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold medText').get_text()
 
 
author_info = []
for item in soup.find_all('div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold'):
    author_info.append(item.get_text())
 
print("Author name :")
print(name)
print("Author information  :")
print(author_info)

Output:

Author name : Satyam Kumar 
Author information  : 
[‘LNMI patna’, ‘\nhttps://www.linkedin.com/in/satyam-kumar-174273101/’] 
 

Complete code:

Python3




# import module
import requests
from bs4 import BeautifulSoup
 
# link for extract html data
# Making a GET request
 
 
def getdata(url):
    r = requests.get(url)
    return r.text
 
 
# input article by geek
article = "optparse-module-in-python"
 
# url
 
 
# pass the url
# into getdata function
htmldata = getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser')
 
# traverse author name
for i in soup.find('div', class_="author_handle"):
    Author = i.get_text()
 
# now get author information
# with author name
profile = 'https://auth.geeksforgeeks.org/user/'+Author+'/profile'
 
# pass the url
# into getdata function
htmldata = getdata(profile)
soup = BeautifulSoup(htmldata, 'html.parser')
 
# traverse information of author
name = soup.find(
    'div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold medText').get_text()
 
 
author_info = []
for item in soup.find_all('div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold'):
    author_info.append(item.get_text())
 
print("Author name :", name)
print("Author information  :")
print(author_info)

Output:

Author name : Satyam Kumar 
Author information  : 
[‘LNMI patna’, ‘\nhttps://www.linkedin.com/in/satyam-kumar-174273101/’] 
 




My Personal Notes arrow_drop_up
Recommended Articles
Page :