Currently, it’s not possible to sort GSoC participating organizations by the programming languages they use in their code. This results in students spending a lot of time going through each organization’s page and manually sorting through them.
This article introduces a way for students to write their own Python script using the BeautifulSoup4 library. Using this script the students can find the organization that uses the language they desire to contribute in.
You will learn the following through this article:
- How to use Requests library to send HTTPS requests to webpages
- How to use BeautifulSoup4 library in python to parse HTML code
- Output data in the form of a spreadsheet (eg. MS Excel) using OpenPyXL
The above module does not come pre-installed with Python. To install them type the below command in the terminal.
pip install requests pip install beautifulsoup4 pip install openpyxl
Note: Only beginner level knowledge of Python 3 is required for following this article. For more information, refer to Python Programming Language
Step 1: Import the required libraries
Step 2: Create a response object using Requests. We will be using the Archive page as our source
Step 3: Create a BeautifulSoup object
From the Archive page’s source code:
We can see that the Orgs’s name is in a
H4 tag with class name “
organization-card__name font-black-54” .
Using BS4, we can search for this particular tag in the HTML code and store the text in a list.
Step 4: Opening each Orgs’s GSoC page and finding the languages used
Step 5: Writing the list to a spreadsheet
openpyxl library, we first a create a workbook. In this workbook we open a sheet using wb[‘Sheet’], where we will actually write the data. Using the
cell().value function, we can directly write values to each cell. Finally we save the workbook using
Note: The spreadsheet will be stored in the same directory as the Python file
Due to repeated requests to the website, the server may block your IP address after repeated attempts. Using a VPN will solve this issue.
If the problem still persists, add the following to your code:
- How to Create a Programming Language using Python?
- Difference Between Go and Python Programming Language
- Python - Fastest Growing Programming Language
- Why is Python the Best-Suited Programming Language for Machine Learning?
- Which Programming Language to Choose?
- Cyber Security in Context to Organisations
- Difference between Simultaneous and Hierarchical Access Memory Organisations
- How to Prepare For GSoC (Google Summer of Code) - A Complete Guide
- [TopTalent.in] Dhananjay Sathe Talks About His GSoC Experience And How To Hack It
- Tips and Tricks for Competitive Programmers | Set 2 (Language to be used for Competitive Programming)
- Python program to find all possible pairs with given sum
- Python program to find the first day of given year
- Python program to find IP Address
- Hello World Program : First program while learning Programming
- Python program to find day of the week for a given date
- Python program to find second maximum value in Dictionary
- Python program to find Cumulative sum of a list
- Python program to Find the Jumbo GCD subarray
- Python program to find birthdate on the same day you were born
- Python program to find sum of elements in list
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.