In this article, we are going to see how to control the web browser with Python using selenium. Selenium is an open-source tool that automates web browsers. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, etc.
To install this module, run these commands into your terminal:
pip install selenium
For automation please download the latest Google Chrome along with chromedriver from here.
Here we will automate the authorization at “https://auth.geeksforgeeks.org” and extract the Name, Email, Institute name from the logged-in profile.
Initialization and Authorization
First, we need to initiate the web driver using selenium and send a get request to the url and Identify the HTML document and find the input tags and button tags that accept username/email, password, and sign-in button.
To send the user given email and password to the input tags respectively:
driver.find_element_by_name('user').send_keys(email) driver.find_element_by_name('pass').send_keys(password)
Identify the button tag and click on it using the CSS selector via selenium webdriver:
driver.find_element_by_css_selector(‘button.btn.btn-green.signin-button’).click()
Scraping Data
Scraping Basic Information from GFG Profile
After clicking on Sign in, a new page should be loaded containing the Name, Institute Name, and Email id. Identify the tags containing the above data and select them.
container = driver.find_elements_by_css_selector(‘div.mdl-cell.mdl-cell–9-col.mdl-cell–12-col-phone.textBold’)
Get the text from each of these tags from the returned list of selected css selectors:
name = container[0].text try: institution = container[1].find_element_by_css_selector('a').text except: institution = container[1].text email_id = container[2].text
Finally, print the output:
print({"Name": name, "Institution": institution, "Email ID": email})
Scraping Information from Practice tab
Click on the Practice tab and wait for few seconds to load the page.
driver.find_elements_by_css_selector('a.mdl-navigation__link')[1].click()
Find the container containing all the information and select the grids using CSS selector from the container having information.
container = driver.find_element_by_css_selector(‘div.mdl-cell.mdl-cell–7-col.mdl-cell–12-col-phone.whiteBgColor.mdl-shadow–2dp.userMainDiv’)
grids = container.find_elements_by_css_selector(‘div.mdl-grid’)
Iterate each of the selected grids and extract the text from it and add it to a set/list for output.
res = set() for grid in grids: res.add(grid.text.replace('\n',':'))
Below is the full implementation:
# Import the required modules from selenium import webdriver
import time
# Main Function if __name__ = = '__main__' :
# Provide the email and password
email = 'example@example.com'
password = 'password'
options = webdriver.ChromeOptions()
options.add_argument( "--start-maximized" )
options.add_argument( '--log-level=3' )
# Provide the path of chromedriver present on your system.
driver = webdriver.Chrome(executable_path = "C:/chromedriver/chromedriver.exe" ,
chrome_options = options)
driver.set_window_size( 1920 , 1080 )
# Send a get request to the url
time.sleep( 5 )
# Finds the input box by name in DOM tree to send both
# the provided email and password in it.
driver.find_element_by_name( 'user' ).send_keys(email)
driver.find_element_by_name( 'pass' ).send_keys(password)
# Find the signin button and click on it.
driver.find_element_by_css_selector(
'button.btn.btn-green.signin-button' ).click()
time.sleep( 5 )
# Returns the list of elements
# having the following css selector.
container = driver.find_elements_by_css_selector(
'div.mdl-cell.mdl-cell--9-col.mdl-cell--12-col-phone.textBold' )
# Extracts the text from name,
# institution, email_id css selector.
name = container[ 0 ].text
try :
institution = container[ 1 ].find_element_by_css_selector( 'a' ).text
except :
institution = container[ 1 ].text
email_id = container[ 2 ].text
# Output Example 1
print ( "Basic Info" )
print ({ "Name" : name,
"Institution" : institution,
"Email ID" : email})
# Clicks on Practice Tab
driver.find_elements_by_css_selector(
'a.mdl-navigation__link' )[ 1 ].click()
time.sleep( 5 )
# Selected the Container containing information
container = driver.find_element_by_css_selector(
'div.mdl - cell.mdl - cell - - 7 - col.mdl - cell - - 12 - col - phone.\
whiteBgColor.mdl - shadow - - 2dp .userMainDiv')
# Selected the tags from the container
grids = container.find_elements_by_css_selector(
'div.mdl-grid' )
# Iterate each tag and append the text extracted from it.
res = set ()
for grid in grids:
res.add(grid.text.replace( '\n' , ':' ))
# Output Example 2
print ( "Practice Info" )
print (res)
# Quits the driver
driver.close()
driver.quit()
|
Output: