Suppose you want to travel to places using Indian Railways and have booked a train. But you are not sure that the train is on time or not and doing this manually can be very hectic. So in this article, we are going to write a Python script to get live train status using a train name or train code.
Modules needed
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Request allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.
pip install requests
Let’s see the stepwise execution of the script.
Step 1: Import all dependence
Python3
# import module # import pandas as pd import requests from bs4 import BeautifulSoup |
Step 2: Create a URL get function
Python3
# user define function # Scrape the data def getdata(url): r = requests.get(url) return r.text |
Step 3: Now merge the train name into URL and pass the URL into the getdata() function and Convert that data into HTML code.
Note: Strongly recommended you to get Train name and code from here. Click
Python3
# input by geek train_name = "03391-rajgir-new-delhi-clone-special-rgd-to-ndls" # url # pass the url # into getdata function htmldata = getdata(url) soup = BeautifulSoup(htmldata, 'html.parser' ) # display html code print (soup) |
Output:
Step 4: Traverse the live status from the HTML document.
Python3
# traverse the live status from # this Html code data = [] for item in soup.find_all( 'script' , type = "application/ld+json" ): data.append(item.get_text()) # convert into dataframe df = pd.read_json (data[ 2 ]) # display this column of # dataframe print (df[ "mainEntity" ][ 0 ]) |
Output:
{‘@type’: ‘Question’, ‘name’: ‘Q) Where is my train (03391) RGD NDLS HUMSFR ?’,
‘acceptedAnswer’: {‘@type’: ‘Answer’, ‘text’: ‘A: 03391 RGD NDLS HUMSFR is 10 kms to VARANASI JN (312 kms Covered so far). It is expected to reach New Delhi by 02:30.’}}
Step 5: Now get the required data from this directory.
Python3
print (df[ "mainEntity" ][ 0 ][ 'name' ]) print (df[ "mainEntity" ][ 0 ][ 'acceptedAnswer' ][ 'text' ]) |
Output:
Q) Where is my train (03391) RGD NDLS HUMSFR ?
A: 03391 RGD NDLS HUMSFR is 10 kms to VARANASI JN (312 kms Covered so far). It is expected to reach New Delhi by 02:30.
Full implementation:
Python3
# import module import requests from bs4 import BeautifulSoup import pandas as pd # user define function # Scrape the data def getdata(url): r = requests.get(url) return r.text # input by geek train_name = "03391-rajgir-new-delhi-clone-special-rgd-to-ndls" # url # pass the url # into getdata function htmldata = getdata(url) soup = BeautifulSoup(htmldata, 'html.parser' ) # traverse the live status from # this Html code data = [] for item in soup.find_all( 'script' , type = "application/ld+json" ): data.append(item.get_text()) # convert into dataframe df = pd.read_json(data[ 2 ]) # display this column of # dataframe print (df[ "mainEntity" ][ 0 ][ 'name' ]) print (df[ "mainEntity" ][ 0 ][ 'acceptedAnswer' ][ 'text' ]) |
Output:
Q) Where is my train (03391) RGD NDLS HUMSFR ?
A: 03391 RGD NDLS HUMSFR is 6 kms to VARANASI JN (316 kms Covered so far). It is expected to reach New Delhi by 02:30.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.