Scraping Covid-19 statistics using BeautifulSoup

Coronavirus, one of the biggest pandemic has brought all of the world in Danger.Along with this, it is one of trending News, everyone has this day. In this article, we will be scraping data and printing Covid-19 statistics in human-readable form. The data will be scraped from this website

Prerequisites:

  • The libraries ‘requests’, ‘bs4’, and ‘texttable’ have to be installed –
    pip install bs4
    pip install requests
    pip install texttable
Project - Let's head over to code, create a file called run.py.
filter_none
edit close
play_arrow
link brightness_4 code
# importing modules
import requests
from bs4 import BeautifulSoup
  
# URL for scrapping data
  
# get URL html
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
  
data = []
  
# soup.find_all('td') will scrape every element in the url's table
data_iterator = iter(soup.find_all('td')) 
# data_iterator is the iterator of the table
  
# This loop will keep repeating till there is data available in the iterator
while True:
    try:
        country = next(data_iterator).text
        confirmed = next(data_iterator).text
        deaths = next(data_iterator).text
        continent = next(data_iterator).text
  
        # For 'confirmed' and 'deaths', make sure to remove the commas and convert to int
        data.append((
            country,
            int(confirmed.replace(', ', '')),
            int(deaths.replace(', ', '')),
            continent
        ))
  
    # StopIteration error is raised when there are no more elements left to iterate through
    except StopIteration:
        break
  
# Sort the data by the number of confirmed cases
data.sort(key = lambda row: row[1], reverse = True)
chevron_right

To print the data in human readable format, we will use the library 'texttable'

filter_none

edit
close

play_arrow

link
brightness_4
code

# create texttable object
table = tt.Texttable()
table.add_rows([(None, None, None, None)] + data)  # Add an empty row at the beginning for the headers
table.set_cols_align(('c', 'c', 'c', 'c'))  # 'l' denotes left, 'c' denotes center, and 'r' denotes right
table.header((' Country ', ' Number of cases ', ' Deaths ', ' Continent '))
  
print(table.draw())

chevron_right


Output-

+---------------------------+-------------------+----------+-------------------+
|          Country          |  Number of cases  |  Deaths  |     Continent     |
+===========================+===================+==========+===================+
|       United States       |      644348       |  28554   |   North America   |
+---------------------------+-------------------+----------+-------------------+
|           Spain           |      180659       |  18812   |      Europe       |
+---------------------------+-------------------+----------+-------------------+
|           Italy           |      165155       |  21645   |      Europe       |
+---------------------------+-------------------+----------+-------------------+
...

NOTE: The output depends on the current statistics

Stay home, stay safe!




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.