Scrape LinkedIn Using Selenium And Beautiful Soup in Python
In this article, we are going to scrape Linkedln using Selenium and Beautiful Soup libraries in Python.
First of all, we need to install some libraries. Execute the following commands in the terminal.
pip install selenium pip install beautifulsoup4
In order to use selenium, we also need a web driver. You can download the web driver of either Internet Explorer, Firefox, or Chrome. In this article, we will be using the Chrome web driver.
Note: While following along with this article, if you get an error, there are most likely 2 possible reasons for that.
- The webpage took too long to load (probably because of a slow internet connection). In this case, use time.sleep() function to provide extra time for the webpage to load. Specify the number of seconds to sleep as per your need.
- The HTML of the webpage has changed from the one when this article was written. If so, you will have to manually select the required webpage elements, instead of copying the element names written below. How to find the element names is explained below. Additionally, don’t decrease the window height and width from the default height and width. It also changes the HTML of the webpage.
Logging in to LinkedIn
Here we will write code for login into Linkedin, First, we need to initiate the web driver using selenium and send a get request to the URL and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button.
After executing the above command, you will be logged into your LinkedIn profile. Here is what it would look like.
Extracting Data From a LinkedIn Profile
Here is the video of the execution of the complete code.
2.A) Opening a Profile and Scrolling to the Bottom
Let us say that you want to extract data from Kunal Shah’s LinkedIn profile. First of all, we need to open his profile using the URL of his profile. Then we have to scroll to the bottom of the web page so that the complete data gets loaded.
Now, we need to scroll to the bottom. Here is the code to do that:
The page is now scrolled to the bottom. As the page is completely loaded, we will scrape the data we want.
Extracting Data from the Profile
To extract data, firstly, store the source code of the web page in a variable. Then, use this source code to create a Beautiful Soup object.
Extracting Profile Introduction:
To extract the profile introduction, i.e., the name, the company name, and the location, we need to find the source code of each element. First, we will find the source code of the div tag that contains the profile introduction.
Now, we will use Beautiful Soup to import this div tag into python.
We now have the required HTML to extract the name, company name, and location. Let’s extract the information now:
Name --> Kunal Shah Works At --> Founder : CRED Location --> Bengaluru, Karnataka, India
Extracting Data from the Experience Section
Next, we will extract the Experience from the profile.
We have to go inside the HTML tags until we find our desired information. In the above image, we can see the HTML to extract the current job title and the name of the company. We now need to go inside each tag to extract the data
Scrape Job Title, company name and experience:
'Founder' 'CRED' Apr 2018 – Present, 3 yrs 6 mos
Extracting Job Search Data
We will use selenium to open the jobs page.
Now that the jobs page is open, we will create a BeautifulSoup object to scrape the data.
Scrape Job Title:
First of all, we will scrape the Job Titles.
On skimming through the HTML of this page, we will find that each Job Title has the class name “job-card-list__title”. We will use this class name to extract the job titles.
Scrape Company Name:
Next, we will extract the Company Name.
We will use the class name to extract the names of the companies:
Scrape Job Location:
Finally, we will extract the Job Location.
Once again, we will use the class name to extract the location.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course