Skip to content
Related Articles

Related Articles

Improve Article
Parsing tables and XML with BeautifulSoup
  • Last Updated : 08 Apr, 2021

Perquisites:  Web scrapping using Beautiful soup, XML Parsing

Scraping is a very essential skill that everybody should learn, It helps us to scrap data from a website or a file that can be used in another beautiful manner by the programmer. In this article, we will learn how to Extract a Table from a website and XML from a file.
Here, we will scrap data using the Beautiful Soup Python Module.

Modules Required:

  • bs4: Beautiful Soup is a Python library for pulling data out of HTML and XML files. It can be installed using the below command:
pip install bs4
  • lxml: It is a Python library that allows us to handle XML and HTML files. It can be installed using the below command:
pip install lxml
  • request: Requests allows you to send HTTP/1.1 requests extremely easily. It can be installed using the below command:
pip install request

Step-by-step Approach to parse Tables:

Step 1: Firstly, we need to import modules and then assign the URL.

Python3






# import required modules
import bs4 as bs
import requests
  
# assign URL

Step 2: Create a BeautifulSoap object for parsing.

Python3




# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")

Step 3: Then find the table and its rows. 

Python3




# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')

Step 4: Now create a loop to find all the td tags in the table and then print all the table data tags.

Python3




# display tables
for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    print(data)

Below is the complete program based on the above approach:

Python3






# import required modules
import bs4 as bs
import requests
  
# assign URL
  
# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")
  
# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')
  
# display tables
for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    print(data)

Output:

Step-by-step Approach to parse XML files:

Step 1: Before moving on, You can create your own ‘xml file’ or you can just copy and paste below code, and name it as test1.xml file on your system.

<?xml version="1.0" ?>
<books>
  <book>
    <title>Introduction of Geeksforgeeks V1</title>
    <author>Gfg</author>
    <price>6.99</price>
  </book>
  <book>
    <title>Introduction of Geeksforgeeks V2</title>
    <author>Gfg</author>
    <price>8.99</price>
  </book>
  <book>
    <title>Introduction of Geeksforgeeks V2</title>
    <author>Gfg</author>
    <price>9.35</price>
  </book>
</books>

Step 2: Create a python file and import modules.

Python3




# import required modules
from bs4 import BeautifulSoup

Step 3: Read the content of the XML.

Python3




# reading content
file = open("test1.xml", "r")
contents = file.read()

Step 4: Parse the content of the XML.

Python3




# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')

Step 5: Display the content of the XML file.



Python3




# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')

Below is the complete program based on the above approach:

Python3




# import required modules
from bs4 import BeautifulSoup
  
# reading content
file = open("test1.xml", "r")
contents = file.read()
  
# parsing
soup = BeautifulSoup(contents, 'xml')
titles = soup.find_all('title')
  
# display content
for data in titles:
    print(data.get_text())

Output:

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :