Extract JSON from HTML using BeautifulSoup in Python
In this article, we are going to extract JSON from HTML using BeautifulSoup in Python.
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Request allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.
pip install requests
- Import all the required modules.
- Pass the URL in the get function(UDF) so that it will pass a GET request to a URL, and it will return a response.
Syntax: requests.get(url, args)
- Now Parse the HTML content using bs4.
Syntax: BeautifulSoup(page.text, ‘html.parser’)
- page.text : It is the raw HTML content.
- html.parser : Specifying the HTML parser we want to use.
- Now get all the required data with find() function.
Now find the customer list with li, a, p tag where some unique class or id. You can open the webpage in the browser and inspect the relevant element by pressing right-click as shown in the figure.
- Create a Json file and use json.dump() method to convert python objects into appropriate JSON objects.
Below is the full implementation:
Created Json File
Our JSON file output:
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course