Extract Author’s information from Geeksforgeeks article using Python

Last Updated : 25 Aug, 2021

In this article, we are going to write a python script to extract author information from GeeksforGeeks article.

Module needed

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.

pip install bs4

requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.

pip install requests

Approach:

Import module
Make requests instance and pass into URL
Initialize the article Title
Pass URL into a getdata()
Scrape the data with the help of requests and Beautiful Soup
Find the required details and filter them.

Stepwise execution of scripts:

Step 1: Import all dependence

Python

# import module
import requests
from bs4 import BeautifulSoup

Step 2: Create a URL get function

Python3

# link for extract html data
# Making a GET request 
     
def getdata(url):
    r=requests.get(url)
    return r.text

Step 3: Now merge the Article name into URL and pass the URL into the getdata() function and Convert that data into HTML code

Python3

# input article by geek
article = "optparse-module-in-python"
 
# url
url = "https://www.geeksforgeeks.org/"+article
 
# pass the url
# into getdata function
htmldata=getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser')
 
# display html code
print(soup)

Output:

Step 4: Traverse the author’s name from the HTML document.

Python

# traverse author name
for i in soup.find('div', class_="author_handle"):
    Author = i.get_text()
print(Author)

Output:

kumar_satyam

Step 5: Now create a URL with author-name and get HTML code.

Python3

# now get author information
# with author name
profile ='https://auth.geeksforgeeks.org/user/'+Author+'/profile'
 
# pass the url
# into getdata function
htmldata=getdata(profile)
soup = BeautifulSoup(htmldata, 'html.parser')

Step 6: Traverse the author’s information.

Python3

# traverse information of author
name = soup.find(
    'div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold medText').get_text()
 
 
author_info = []
for item in soup.find_all('div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold'):
    author_info.append(item.get_text())
 
print("Author name :")
print(name)
print("Author information  :")
print(author_info)

Output:

Author name : Satyam Kumar
Author information :
[‘LNMI patna’, ‘\nhttps://www.linkedin.com/in/satyam-kumar-174273101/’]

Complete code:

Python3

# import module
import requests
from bs4 import BeautifulSoup
 
# link for extract html data
# Making a GET request
 
 
def getdata(url):
    r = requests.get(url)
    return r.text
 
 
# input article by geek
article = "optparse-module-in-python"
 
# url
url = "https://www.geeksforgeeks.org/"+article
 
 
# pass the url
# into getdata function
htmldata = getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser')
 
# traverse author name
for i in soup.find('div', class_="author_handle"):
    Author = i.get_text()
 
# now get author information
# with author name
profile = 'https://auth.geeksforgeeks.org/user/'+Author+'/profile'
 
# pass the url
# into getdata function
htmldata = getdata(profile)
soup = BeautifulSoup(htmldata, 'html.parser')
 
# traverse information of author
name = soup.find(
    'div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold medText').get_text()
 
 
author_info = []
for item in soup.find_all('div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold'):
    author_info.append(item.get_text())
 
print("Author name :", name)
print("Author information  :")
print(author_info)

Output:

Author name : Satyam Kumar
Author information :
[‘LNMI patna’, ‘\nhttps://www.linkedin.com/in/satyam-kumar-174273101/’]

Suggest improvement

Extracting Code From GeeksForGeeks Article

Share your thoughts in the comments

Extract Author’s information from Geeksforgeeks article using Python

Module needed

Python

Python3

Python3

Python

Python3

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?