Create GitHub API to fetch user profile image and number of repositories using Python and Flask
GitHub is where developers shape the future of software, together, contribute to the open-source community, manage Git repositories, etc. It is one of the most used tools by a developer and its profile is shared to showcase or let others contribute to its projects. Web Scraping using python is also one of the best methods to get data.
In this article, we will create an API to fetch a user’s profile image and its followers. Following is the flow in which this blog would guide to create an API:
- Setting up the App Directory
- Web Scrape data from GitHub.
- Beautiful Soup in Python would be used.
- Create an API.
- Flask would be used.
Setting up the App Directory
Step 1: Create a folder (eg. GitHubGFG).
Step 2: Set up the virtual environment. Here we create an environment .env
python -m venv .env
Step 3: Activate the environment.
.env\Scripts\activate
Scraping the Data
Step 1: In Python, we have Beautiful Soup which is a library to pull out data from HTML files. To install Beautiful Soup, run a simple command;
pip install beautifulsoup4
Step 2: Install the Requests module of Python. Requests allows to send HTTP/1.1 requests extremely easily.
pip install requests
Create a python file. (eg: github.py)
Step 3: Following are the steps for Scraping data from the Web Page. To get the HTML text from the web page;
github_html = requests.get(f'https://github.com/{username}').text
The {username} will have the GitHub username of the required user. To represent the parsed object as a whole we use the BeautifulSoup object,
soup = BeautifulSoup(github_html, "html.parser")
Example:
Python3
from bs4 import BeautifulSoup import requests username = "kothawleprem" soup = BeautifulSoup(github_html, "html.parser" ) print (soup) |
Output:
Now find the avatar class in the HTML document as it has the required URL for the profile image.
find_all(): The find_all() method looks through a tag’s descendants and retrieves all descendants that match the filters. Here our filter is an img tag with the class as avatar.
Python3
avatar_block = soup.find_all( 'img' , class_ = 'avatar' ) print (avatar_block) |
Following is the output of avatar_block:
The image URL is in the src attribute, to get the URL text use .get():
Python3
img_url = avatar_block[ 4 ].get( 'src' ) print (img_url) |
Following is the output of img_url:
Find the first Counter class in the HTML document as it has the required data for the number of repositories.
find(): The find() method looks through a tag’s descendants and retrieves a single descendant that matches the filters. Here our filter is a span tag with the class as Counter.
repos = soup.find('span',class_="Counter").text
The entire code would be as follows:
Python3
from bs4 import BeautifulSoup import requests username = "kothawleprem" soup = BeautifulSoup(github_html, "html.parser" ) avatar_block = soup.find_all( 'img' , class_ = 'avatar' ) img_url = avatar_block[ 4 ].get( 'src' ) repos = soup.find( 'span' , class_ = "Counter" ).text print (img_url) print (repos) |
Output:
https://avatars.githubusercontent.com/u/59017652?v=4 33
Creating the API
We will use Flask which is a micro web framework written in Python.
pip install Flask
Following is the starter code for our flask application.
Python3
# We import the Flask Class, an instance of # this class will be our WSGI application. from flask import Flask # We create an instance of this class. The first # argument is the name of the application’s module # or package. # __name__ is a convenient shortcut for # this that is appropriate for most cases.This is # needed so that Flask knows where to look for resources # such as templates and static files. app = Flask(__name__) # We use the route() decorator to tell Flask what URL # should trigger our function. @app .route( '/' ) def github(): return "Welcome to GitHubGFG!" # main driver function if __name__ = = "__main__" : # run() method of Flask class runs the # application on the local development server. app.run(debug = True ) |
Open localhost on your browser:
Getting the GitHub username from the URL:
Python3
from flask import Flask app = Flask(__name__) @app .route( '/<username>' ) def github(username): return f "Username: {username}" if __name__ = = "__main__" : app.run(debug = True ) |
Output:
We would now add our code of Web Scrapping and some helper methods provided by Flask to properly return JSON data. jsonify is a function in Flask. It serializes data to JavaScript Object Notation (JSON) format. Consider the following code:
Python3
import requests from bs4 import BeautifulSoup from flask import Flask app = Flask(__name__) @app .route( '/<username>' ) def github(username): soup = BeautifulSoup(github_html, "html.parser" ) avatar_block = soup.find_all( 'img' , class_ = 'avatar' ) img_url = avatar_block[ 4 ].get( 'src' ) repos = soup.find( 'span' , class_ = "Counter" ).text # Creating a dictionary for our data result = { 'imgUrl' : img_url, 'numRepos' : repos, } return result if __name__ = = "__main__" : app.run(debug = True ) |
Output:
If the username is not correct or for any other reason, we need to add our code in the try and except block to handle exceptions. The final code would be as follows:
Python3
import requests from bs4 import BeautifulSoup from flask import Flask app = Flask(__name__) @app .route( '/<username>' ) def github(username): try : soup = BeautifulSoup(github_html, "html.parser" ) avatar_block = soup.find_all( 'img' , class_ = 'avatar' ) img_url = avatar_block[ 4 ].get( 'src' ) repos = soup.find( 'span' , class_ = "Counter" ).text # Creating a dictionary for our data result = { 'imgUrl' : img_url, 'numRepos' : repos, } except : result = { "message" : "Invalid Username!" }, 400 return result if __name__ = = "__main__" : app.run(debug = True ) |
Please Login to comment...