Skip to content
Related Articles

Related Articles

Parsing tables and XML with BeautifulSoup

View Discussion
Improve Article
Save Article
  • Last Updated : 08 Apr, 2021

Perquisites:  Web scrapping using Beautiful soup, XML Parsing

Scraping is a very essential skill that everybody should learn, It helps us to scrap data from a website or a file that can be used in another beautiful manner by the programmer. In this article, we will learn how to Extract a Table from a website and XML from a file.
Here, we will scrap data using the Beautiful Soup Python Module.

Modules Required:

  • bs4: Beautiful Soup is a Python library for pulling data out of HTML and XML files. It can be installed using the below command:
pip install bs4
  • lxml: It is a Python library that allows us to handle XML and HTML files. It can be installed using the below command:
pip install lxml
  • request: Requests allows you to send HTTP/1.1 requests extremely easily. It can be installed using the below command:
pip install request

Step-by-step Approach to parse Tables:

Step 1: Firstly, we need to import modules and then assign the URL.

Python3




# import required modules
import bs4 as bs
import requests
  
# assign URL

Step 2: Create a BeautifulSoap object for parsing.

Python3




# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")

Step 3: Then find the table and its rows. 

Python3




# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')

Step 4: Now create a loop to find all the td tags in the table and then print all the table data tags.

Python3




# display tables
for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    print(data)

Below is the complete program based on the above approach:

Python3




# import required modules
import bs4 as bs
import requests
  
# assign URL
  
# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")
  
# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')
  
# display tables
for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    print(data)

Output:

Step-by-step Approach to parse XML files:

Step 1: Before moving on, You can create your own ‘xml file’ or you can just copy and paste below code, and name it as test1.xml file on your system.

<?xml version="1.0" ?>
<books>
  <book>
    <title>Introduction of Geeksforgeeks V1</title>
    <author>Gfg</author>
    <price>6.99</price>
  </book>
  <book>
    <title>Introduction of Geeksforgeeks V2</title>
    <author>Gfg</author>
    <price>8.99</price>
  </book>
  <book>
    <title>Introduction of Geeksforgeeks V2</title>
    <author>Gfg</author>
    <price>9.35</price>
  </book>
</books>

Step 2: Create a python file and import modules.

Python3




# import required modules
from bs4 import BeautifulSoup

Step 3: Read the content of the XML.

Python3




# reading content
file = open("test1.xml", "r")
contents = file.read()

Step 4: Parse the content of the XML.

Python3




# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')

Step 5: Display the content of the XML file.

Python3




# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')

Below is the complete program based on the above approach:

Python3




# import required modules
from bs4 import BeautifulSoup
  
# reading content
file = open("test1.xml", "r")
contents = file.read()
  
# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')
  
# display content
for data in titles:
    print(data.get_text())

Output:


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!