Open In App

Automatically Get Top 10 Jobs from LinkedIn Using Python

Last Updated : 21 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Here we are going to use Clicknium to scrape LinkedIn top 10 jobs. First, we will login to LinkedIn to search the jobs according to the job keyword(the title, the skill, or the company) and the location, and then get the top 10 jobs in the search results. For each job, we will get the job information, such as the title, the company name, the size of the company, the post date, the job type, and the link URL. At last, we will save the results into CSV file.

The steps overview are as below:

  • Login to LinkedIn
  • Search jobs with the keyword and location
  • Scrape the information of the top 10 jobs
  • Save search results into csv file

Installation

1.1 Python modules

Clicknium python module provides methods to automate various types of applications in Windows, such as Web browser, Windows Desktop application, Java application and Sap windows GUI app, etc. In this sample, we also use pywin32 python module to get clipboard data, pywin32 python module provides access to many of the Windows APIs from Python.

 Install the python libraries with the following commands:

pip install clicknium
pip install pywin32

1.2 Clicknium Visual Studio Code Extension

Clicknium VS Code extension provides ways to install extension with the chosen browser, Clicknium use the browser extension to interact with the browser.  It also helps us get elements, edit elements or validate elements easier than before.  

Login to LinkedIn

2.1 Capturing Steps using clicknium VS Code extension

Besides writing Python source code to automate the login process and the job search as well as the storing of the data, we also need to capture the web elements on Chrome browser using the clicknium VS Code extension. To launch the extension, press Ctrl+Shift+P to open the command palette and type to select “clicknium capture”. This will open a new capture dialog and let the user record web elements using Ctrl+Click. After following the discussed steps as discussed below, click complete and execute the Python source code for clicknium.

clicknium capture dialog

Launch Clicknium Capture Dialog

2.2 In this section, we will scrape the related elements of the login page

clicknium capture linkedin login

login page

2.3 Open the browser with LinkedIn website, input the account username and password and then click the Sign in button

Python3




from clicknium import clicknium as cc, locator
  
# Create a browser instance with
# "cc.chrome", for edge browser using "cc.edge"
# Open browser with specified url and
# get browser tab For default, it will
# wait the page load completely. You do
# not need to add extra time.sleep()
_tab = cc.chrome.open("https://www.linkedin.com/", is_wait_complete=True)
  
# Find input box for username
# Fill in with the key value 'linkedin_login_name'
# in setting.json
_tab.find_element(locator.chrome.linkedin.login.login_email).set_text(
    Setting.login_name)
  
# Find input box for password
# Fill in with the key value 'linkedin_login_password'
# in setting.json
_tab.find_element(locator.chrome.linkedin.login.login_password).set_text(
    Setting.login_password)
  
# Find submit button, and click it to login
_tab.find_element(locator.chrome.linkedin.login.signin).click()
  
# Wait skip add phone button appears in 5 seconds,
# if it exists, click the 'skip' button
_tab.wait_appear(locator.chrome.linkedin.login.skip_add_phone,
                 wait_timeout=5).click()


Search jobs with the keyword and location

3.1 In this section, we will scrape the related elements of the job search page

clicknium capture linkedin job search

job search page

3.2 Switch to the Jobs tab, fill out keyword and location of the job, and then click the Search button

Python3




# Wait the page load completely
# after submitting login information
# Find job channel and click it
# to switch to job channel
_tab.wait_appear(locator.chrome.linkedin.job.jobs_channel,
                 wait_timeout=5).click()
  
# Wait job search keyword input
# box exists in 10 seconds
# If exists fill in with the key
# value 'linkedin_search_job_key'
# in setting.json
_tab.wait_appear(locator.chrome.linkedin.job.job_search_key,
                 wait_timeout=10).set_text(Setting.search_job_key)
  
# Find job search location input box
# Fill in with the key value
# 'linkedin_search_job_location' in setting.json
_tab.find_element(locator.chrome.linkedin.job.job_search_location).set_text(
    Setting.search_job_location)
  
# Find the search button, and click
# it to search
_tab.find_element(locator.chrome.linkedin.job.job_search).click()


Scrape the information of the top 10 jobs

4.1 In this section, we will scrape the elements below:

clicknium capture job details elements

job detail information

4.2 Get the job item from the searching result list with parameter index

Python3




# Here we set range(1,11) to get top
# 10 jobs, it can be set with any value
for i in range(1, 11):
  
    # Wait the job item appears in 5 second,
    # and get the element with index value
    ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_listitem, {
                           "index": i}, wait_timeout=5)


4.3 Get the title, the company name, the size of the company, the post date, the job type for each job item

Python3




# Initial job item search dict
details = {}
  
# Click job item
ele.click()
  
# Wait job item's title appears in 5 seconds
job_title_ele = _tab.wait_appear(
    locator.chrome.linkedin.jobitem.job_title, wait_timeout=5)
  
# If job item's title exists, get the title
# string and save into result object 'details'
if job_title_ele:
details["Job Title"] = job_title_ele.get_text().strip()
  
# Wait job item's company name appears in 5 seconds
job_company_ele = _tab.wait_appear(
    locator.chrome.linkedin.jobitem.job_company, wait_timeout=2)
  
# If job item's company name exists, get the company
# name string and save into result object 'details'
if job_company_ele:
    details["Company Name"] = job_company_ele.get_text().strip()
  
# Wait job item's company scale appears in 5 seconds
company_size_ele = _tab.wait_appear(
    locator.chrome.linkedin.jobitem.company_size, wait_timeout=2)
  
# If job item's company scale exists, get the
# company scale string and save into result
# object 'details'
if company_size_ele:
    scale = company_size_ele.get_text().strip(
    ) if "employees" in company_size_ele.get_text() else ""
    details["Company Size"] = scale
  
# Wait job item's post date appears in 5 seconds  
job_post_date_ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_post_date, 
                                     wait_timeout = 2)
  
# If job item's post date exists, get 
# the post date string and save into 
# result object 'details'
if job_post_date_ele:
    post_date = job_post_date_ele.get_text().strip() \
    if "ago" in job_post_date_ele.get_text() else ""
    details["Post Date"] = post_date
              
# Wait job item's type appears in 5 seconds  
job_type_ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_type,
                                wait_timeout = 2)
  
# If job item's type exists, get the type string
# and save into result object 'details'
if job_type_ele:
    details["Job Type"] = job_type_ele.get_text().strip()


4.4 Get job link 

4.4.1 Getting clipboard data with pywin32

Python3




# Library for win32 clipboard api
import win32clipboard
  
# Get clipboard data
def get_clipboard_data():
    try:
        
        # Call open clipboard api
        win32clipboard.OpenClipboard()
  
        # Call get clipboard data api, and return the data
        data = win32clipboard.GetClipboardData()
        return data
    except:
        
        # If it got exception, return empty string
        return ""
    finally:
        
        # Call close clipboard api
        win32clipboard.CloseClipboard()


4.4.2 Click the Share button and Copy link button, then get data from clipboard 

Python3




# Wait job item's share button appears
# in 5 seconds
job_share_btn_ele = _tab.wait_appear(
    locator.chrome.linkedin.jobitem.share_button, wait_timeout=2)
  
# If job item's share button exists, click
# the share button
if job_share_btn_ele:
    job_share_btn_ele.click()
  
    # Wait the copy link button appears in 5 seconds
    copy_link = _tab.wait_appear(
        locator.chrome.linkedin.jobitem.copy_link, wait_timeout=2)
      
    # If the copy link exists, click the copy
    # link to set clipboard data
    if copy_link:
        copy_link.click()
  
        # Sleep 0.2 second to wait the clipboard 
        # in ready state
        sleep(0.2)
  
        # Get the job link string and save into 
        # result object 'details'
        details["Job Link"] = get_clipboard_data()


Save search results into csv file

5.1 Here is the content in result csv file:

saved records from clicknium python script

CSV File of Saved Records

5.2 Use python built-in module csv to save data into csv file

Python3




# Library for csv operations api
import csv
  
# Save the list of dicts info csv file
def list_dict_to_csv(dicts, filename="test.csv"):
  
    # Open csv file and get file object
    with open(filename, 'w', newline='') as output_file:
        
        # Get csv header with the dicts keys
        keys = dicts[0].keys()
  
        # Initial DictWriter object
        dict_writer = csv.DictWriter(output_file, keys)
  
        # Write header into csv
        dict_writer.writeheader()
  
        # Write row datas into csv
        dict_writer.writerows(dicts)


Below is the complete implementation

6.1 sample.py

Python3




# Library for web automation apis
# Locator used for selector reference
from clicknium import clicknium as cc, locator
  
# Library for delay function
from time import sleep
  
# Library for save dict list data into csv file
from csvutils import list_dict_to_csv
  
# Library for clear clipboard and get clipboard data
from clipboard import get_clipboard_data, clear_clipboard_data
  
# Library for get setting in 'setting.json' file
from setting import Setting
  
# Login to LinkedIn page
# Find input box for username and password,
# and fill in with the value in setting.json
# Find submit button, and click it to login
# Wait 'skip add phone' button if it needs,
# and click the 'skip' button
def login():
    
    # Find input box for username
    # Fill in with the key value
    # 'linkedin_login_name' in setting.json
    _tab.find_element(locator.chrome.linkedin.login.login_email).set_text(
        Setting.login_name)
  
    # Find input box for password
    # Fill in with the key value
    # 'linkedin_login_password' in setting.json
    _tab.find_element(locator.chrome.linkedin.login.login_password).set_text(
        Setting.login_password)
  
    # Find submit button, and click it to login
    _tab.find_element(locator.chrome.linkedin.login.signin).click()
  
    # Wait skip add phone button appears in 5
    # seconds, if it exists, click the 'skip' button
    _tab.wait_appear(
        locator.chrome.linkedin.login.skip_add_phone, wait_timeout=5).click()
  
  
def search_jobs():
    
    # Wait the page load completely after 
    # submitting login information
    # Find job channel and click it to
    # switch to job channel
    _tab.wait_appear(locator.chrome.linkedin.job.jobs_channel,
                     wait_timeout=5).click()
  
    # Wait job search keyword input box exists
    # in 10 seconds If exists fill in with
    # the key value 'linkedin_search_job_key' 
    # in setting.json
    _tab.wait_appear(locator.chrome.linkedin.job.job_search_key,
                     wait_timeout=10).set_text(Setting.search_job_key)
  
    # Find job search location input box
    # Fill in with the key value
    # 'linkedin_search_job_location' in setting.json
    _tab.find_element(locator.chrome.linkedin.job.job_search_location).set_text(
        Setting.search_job_location)
  
    # Find the search button, and click it to search
    _tab.find_element(locator.chrome.linkedin.job.job_search).click()
  
# Scrape the information of the top 10 jobs
# For each job item, get the title,
# the company name, the size of the company,
# the post date, the job type
# Save search results into csv file
def get_job_top10_list():
    # Initial search result list
    job_list = []
  
    # Clear clipboard data first
    clear_clipboard_data()
  
    # Here we set range(1,11) to get top 10 jobs,
    # it can be set with any value
    for i in range(1, 11):
  
        # Wait the job item appears in 5 second,
        # and get the element with index value
        ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_listitem, {
                               "index": i}, wait_timeout=5)
  
        # If job item exists, click the job
        # item to get detail information
        if ele:
            # Initial job item search dict
            details = {}
  
            # Click job item
            ele.click()
  
            # Wait job item's title appears in 5 seconds
            job_title_ele = _tab.wait_appear(
                locator.chrome.linkedin.jobitem.job_title, wait_timeout=5)
              
            # If job item's title exists, get
            # the title string and save into 
            # result object 'details'
            if job_title_ele:
                details["Job Title"] = job_title_ele.get_text().strip()
  
            # Wait job item's company name appears in 5 seconds
            job_company_ele = _tab.wait_appear(
                locator.chrome.linkedin.jobitem.job_company, wait_timeout=2)
              
            # If job item's company name exists
            #, get the company name string and
            # save into result object 'details'
            if job_company_ele:
                details["Company Name"] = job_company_ele.get_text().strip()
  
            # Wait job item's company scale appears in 5 seconds
            company_size_ele = _tab.wait_appear(
                locator.chrome.linkedin.jobitem.company_size, wait_timeout=2)
              
            # If job item's company scale exists,
            # get the company scale string and
            # save into result object 'details'
            if company_size_ele:
                scale = company_size_ele.get_text().strip(
                ) if "employees" in company_size_ele.get_text() else ""
                details["Company Size"] = scale
  
            # Wait job item's post date appears in 5 seconds
            job_post_date_ele = _tab.wait_appear(
                locator.chrome.linkedin.jobitem.job_post_date, wait_timeout=2)
              
            # If job item's post date exists,
            # get the post date string and save
            # into result object 'details'
            if job_post_date_ele:
                post_date = job_post_date_ele.get_text().strip(
                ) if "ago" in job_post_date_ele.get_text() else ""
                details["Post Date"] = post_date
  
            # Wait job item's type appears in 5 seconds
            job_type_ele = _tab.wait_appear(
                locator.chrome.linkedin.jobitem.job_type, wait_timeout=2)
              
            # If job item's type exists, get the
            # type string and save into result
            # object 'details'
            if job_type_ele:
                details["Job Type"] = job_type_ele.get_text().strip()
  
            # Wait job item's share button appears in 5 seconds
            job_share_btn_ele = _tab.wait_appear(
                locator.chrome.linkedin.jobitem.share_button, wait_timeout=2)
              
            # If job item's share button exists,
            # click the share button
            if job_share_btn_ele:
                job_share_btn_ele.click()
  
                # Wait the copy link button appears in 5 seconds
                copy_link = _tab.wait_appear(
                    locator.chrome.linkedin.jobitem.copy_link, wait_timeout=2)
                  
                # If the copy link exists, click the copy
                # link to set clipboard data
                if copy_link:
                    copy_link.click()
  
                    # Sleep 0.2 second to wait the clipboard in ready state
                    sleep(0.2)
  
                    # Get the job link string and save
                    # into result object 'details'
                    details["Job Link"] = get_clipboard_data()
  
            # Save job item's result to list object
            job_list.append(details)
  
    # If it has any results, save into the csv file,
    # set the file path with the key
    # value 'result_csv_file' in setting.json
    if job_list:
        list_dict_to_csv(job_list, Setting.result_csv_file)
  
  
if __name__ == "__main__":
    
    # Create a browser instance with "cc.chrome",
    # for edge browser using "cc.edge"
    # Open browser with specified url and get browser tab
    # For default, it will wait the page load
    # completely. You do not need to add extra time.sleep()
    _tab = cc.chrome.open("https://www.linkedin.com/", is_wait_complete=True)
  
    # Check whether it needs to login in with username and password
    # True: means it needs to login in with username and password
    # False: means the website has remember authentication information
    if _tab.is_existing(locator.chrome.linkedin.login.login_email):
        # Login to LinkedIn
        login()
  
    # Search jobs with the keyword and location
    search_jobs()
  
    # Get top 10 jobs information from search
    # results and save into csv file
    get_job_top10_list()


6.2 csvutils.py

Python3




# Library for csv operations api
import csv
  
# Save the list of dicts info csv file
def list_dict_to_csv(dicts, filename="test.csv"):
  
    # Open csv file and get file object
    with open(filename, 'w', newline='') as output_file:
        # Get csv header with the dicts keys
        keys = dicts[0].keys()
  
        # Initial DictWriter object
        dict_writer = csv.DictWriter(output_file, keys)
  
        # Write header into csv
        dict_writer.writeheader()
  
        # Write row datas into csv
        dict_writer.writerows(dicts)


6.3 clipboard.py

Python3




# Library for win32 clipboard api
import win32clipboard
  
# Clear clipboard data
def clear_clipboard_data():
    try:
        # Call open clipboard api
        win32clipboard.OpenClipboard()
  
        # Call empty clipboard api
        win32clipboard.EmptyClipboard()
    finally:
        # Call close clipboard api
        win32clipboard.CloseClipboard()
  
# Get clipboard data
def get_clipboard_data():
    try:
        # Call open clipboard api
        win32clipboard.OpenClipboard()
  
        # Call get clipboard data api, and return the data
        data = win32clipboard.GetClipboardData()
        return data
    except:
        # If it got exception, return empty string
        return ""
    finally:
        # Call close clipboard api
        win32clipboard.CloseClipboard()


6.4 setting.py

Python3




# Library for json operations api
import json
  
class Setting(object):
           
    # Open json file and get file object
    # Load json data
    with open("setting.json") as f:
        data = json.load(f)
  
    # Value set for LinkedIn login username
    login_name = data['linkedin_login_name']
  
    # Value set for LinkedIn login password
    login_password = data['linkedin_login_password']
  
    # Value set for LinkedIn job search keyword
    search_job_key = data['linkedin_search_job_key']
  
    # Value set for LinkedIn job search location
    search_job_location = data['linkedin_search_job_location']
  
    # Value set for csv file path to save search results
    result_csv_file = data['result_csv_file']


6.5 setting.json

Python3




{
    "linkedin_login_name": "your account username",
    "linkedin_login_password": "your account password",
    "linkedin_search_job_key": "your desired job title",
    "linkedin_search_job_location": "your desired job location",
    "result_csv_file": "C:\\test\\test.csv"
}


6.6 Output

Here is the video of the complete execution:

complete execution GIF

complete execution



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads