Open In App

Extracting Code From GeeksForGeeks Article

Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite:

Modules Needed

  • requests- Requests allows you to send HTTP/1.1 requests extremely easily. This module also doesn’t come built-in with Python. To install simply type the given command in the terminal.
pip install requests
  • bs4 :- Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this, type the given command in the terminal.
pip install bs4

Approach:

  • Import modules
  • Get the article name as input
  • Initiate a get request to the URL
  • Scrap the code and language name in which it is written using bs4

A lot can be done with this concept and using the given, for example you can directly save each code in separate file with their extension or you can scrap complete article and extract important information like writer details.

Below is the implementation.

Python3




import requests
from bs4 import BeautifulSoup
  
# input  geeks for geeks article
article = 'extract-authors-information-from-geeksforgeeks-article-using-python'
index_Code = 3
  
# url
  
  
# Making a GET request
# to fetch article from
# geeksforgeeks servers
def getdata(url):
    r = requests.get(url)
    return r.text
  
  
def codescrapper(soup, article=None):
    codes_languages = soup.find_all('h2', class_='tabtitle')
    codes = soup.find_all("div", class_='code-container')
    count_codes_language = len(codes_languages)
    print(url)
      
    if article and article <= count_codes_language:
        print(codes[article-1].get_text())
          
    else:
        for x in range(count_codes_language):
            print(codes[x].get_text())
  
  
if __name__ == '__main__':
    
    complete_article_html = getdata(url)
    soup = BeautifulSoup(complete_article_html, 'html.parser')
    codescrapper(soup, index_Code)


Output:



Last Updated : 29 Dec, 2020
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads