Extract Author’s information from Geeksforgeeks article using Python

In this article, we are going to write a python script to extract author information from GeeksforGeeks article.

Module needed

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
  • requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests

Approach:

  • Import module
  • Make requests instance and pass into URL
  • Initialize the article Title
  • Pass URL into a getdata()
  • Scrape the data with the help of requests and Beautiful Soup
  • Find the required details and filter them.

Stepwise execution of scripts:

Step 1: Import all dependence

Python



filter_none

edit
close

play_arrow

link
brightness_4
code

# import module
import requests
from bs4 import BeautifulSoup

chevron_right


Step 2: Create a URL get function

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# link for extract html data
# Making a GET request 
      
def getdata(url):
    r=requests.get(url)
    return r.text

chevron_right


Step 3: Now merge the Article name into URL and pass the URL into the getdata() function and Convert that data into HTML code

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# input article by geek
article = "optparse-module-in-python"
  
# url
  
# pass the url
# into getdata function
htmldata=getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser')
  
# display html code
print(soup)

chevron_right


Output:

Step 4: Traverse the author’s name from the HTML document.

Python



filter_none

edit
close

play_arrow

link
brightness_4
code

# traverse auther name
for i in soup.find('div', class_="author_handle"):
    Author = i.get_text()
print(Author)

chevron_right


Output:

kumar_satyam

 Step 5: Now create a URL with author-name and get HTML code.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# now get auther infromation
# with auther name
profile ='https://auth.geeksforgeeks.org/user/'+Author+'/profile' 
  
# pass the url
# into getdata function
htmldata=getdata(profile)
soup = BeautifulSoup(htmldata, 'html.parser')

chevron_right


Step 6: Traverse the author’s information.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# traverse information of auther
name = soup.find(
    'div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold medText').get_text()
  
  
author_info = []
for item in soup.find_all('div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold'):
    author_info.append(item.get_text())
  
print("Author name :")
print(name)
print("Author information  :")
print(author_info)

chevron_right


Output:

Author name : Satyam Kumar
Author information  :
[‘LNMI patna’, ‘\nhttps://www.linkedin.com/in/satyam-kumar-174273101/’]

Complete code:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# import module
import requests
from bs4 import BeautifulSoup
  
# link for extract html data
# Making a GET request
  
  
def getdata(url):
    r = requests.get(url)
    return r.text
  
  
# input article by geek
article = "optparse-module-in-python"
  
# url
  
  
# pass the url
# into getdata function
htmldata = getdata(url)
soup = BeautifulSoup(htmldata, 'html.parser')
  
# traverse auther name
for i in soup.find('div', class_="author_handle"):
    Author = i.get_text()
  
# now get auther infromation
# with auther name
profile = 'https://auth.geeksforgeeks.org/user/'+Author+'/profile'
  
# pass the url
# into getdata function
htmldata = getdata(profile)
soup = BeautifulSoup(htmldata, 'html.parser')
  
# traverse information of auther
name = soup.find(
    'div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold medText').get_text()
  
  
author_info = []
for item in soup.find_all('div', class_='mdl-cell mdl-cell--9-col mdl-cell--12-col-phone textBold'):
    author_info.append(item.get_text())
  
print("Author name :", name)
print("Author information  :")
print(author_info)

chevron_right


Output:

Author name : Satyam Kumar
Author information  :
[‘LNMI patna’, ‘\nhttps://www.linkedin.com/in/satyam-kumar-174273101/’]

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.