Open In App

How to write the output to HTML file with Python BeautifulSoup?

In this article, we are going to write the output to an HTML file with Python BeautifulSoup.  BeautifulSoup is a python library majorly used for web scraping but in this article, we will discuss how to write the output to an HTML file.

Modules needed and installation:



pip install bs4

Approach:

Steps to be followed:



Step 1: Import the required libraries.




# Import libraries
from bs4 import BeautifulSoup
import requests

Step 2: We will perform a get request to the Google search engine home page and extract its page content and make a soup object out of it by passing it to beautiful soup, and we will set the markup as html.parser.

Note: if you are extracting a xml page set the markup as xml.parser




# set the url to perform the get request
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using beautiful
# soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")

Step 3: We use the file data type of python and write the soup object in the output file. We will set the encoding to UTF-8. We will use .prettify() function on soup object that will make it easier to read. We will convert the soup object to a string before writing it.

We will store the output file in the same directory with the name output.html




# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string  
    file.write(str(soup.prettify()))

Below is the full implementation:




# Import libraries
from bs4 import BeautifulSoup
import requests
  
# set the url to perform the get request
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using
# beautiful soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")
  
# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string
    file.write(str(soup.prettify()))

Output:


Article Tags :