Open In App

How to write the output to HTML file with Python BeautifulSoup?

Last Updated : 08 Apr, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to write the output to an HTML file with Python BeautifulSoup.  BeautifulSoup is a python library majorly used for web scraping but in this article, we will discuss how to write the output to an HTML file.

Modules needed and installation:

pip install bs4

Approach:

  • We will first import all the required libraries.
  • Make a get request to the desired URL and extract its page content.
  • Using the file data type of python write the output in a new file.

Steps to be followed:

Step 1: Import the required libraries.

Python3




# Import libraries
from bs4 import BeautifulSoup
import requests


Step 2: We will perform a get request to the Google search engine home page and extract its page content and make a soup object out of it by passing it to beautiful soup, and we will set the markup as html.parser.

Note: if you are extracting a xml page set the markup as xml.parser

Python3




# set the url to perform the get request
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using beautiful
# soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")


Step 3: We use the file data type of python and write the soup object in the output file. We will set the encoding to UTF-8. We will use .prettify() function on soup object that will make it easier to read. We will convert the soup object to a string before writing it.

We will store the output file in the same directory with the name output.html

Python3




# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string  
    file.write(str(soup.prettify()))


Below is the full implementation:

Python3




# Import libraries
from bs4 import BeautifulSoup
import requests
  
# set the url to perform the get request
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using
# beautiful soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")
  
# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string
    file.write(str(soup.prettify()))


Output:



Similar Reads

BeautifulSoup object - Python Beautifulsoup
BeautifulSoup object is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. The BeautifulSoup object represents the parsed document as a whole. For most purposes, you can treat it as a Tag object. Syntax: BeautifulS
2 min read
Write Os.System Output In File Using Python
Python is a high-level programming language. There are many modules. However, we will use os.system module in this Program. This module provides a portable way of using operating system-dependent functionality. The "os" and "os.path()" modules include many functions to interact with the file system. In this article, we will explore different method
3 min read
Find the title tags from a given html document using BeautifulSoup in Python
Let's see how to Find the title tags from a given html document using BeautifulSoup in python. so we can find the title tag from html document using BeautifulSoup find() method. The find function takes the name of the tag as string input and returns the first found match of the particular tag from the webpage. Example 1: Python Code # import Beauti
1 min read
Extract JSON from HTML using BeautifulSoup in Python
In this article, we are going to extract JSON from HTML using BeautifulSoup in Python. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.pip install bs4requests: Request allows you to send HTTP/1
3 min read
How to get Scrapy Output File in XML File?
Prerequisite: Implementing Web Scraping in Python with Scrapy Scrapy provides a fast and efficient method to scrape a website. Web Scraping is used to extract the data from websites. In Scrapy we create a spider and then use it to crawl a website. In this article, we are going to extract population by country data from worldometers website. Let's i
2 min read
Read content from one file and write it into another file
Prerequisite: Reading and Writing to text files in Python Python provides inbuilt functions for creating, writing, and reading files. Two types of files can be handled in python, normal text files and binary files (written in binary language,0s, and 1s). Text files: In this type of file, Each line of text is terminated with a special character call
2 min read
BeautifulSoup - Scraping List from HTML
Prerequisite: RequestsBeautifulSoup Python can be employed to scrap information from a web page. It can also be used to retrieve data provided within a specific tag, this article how list elements can be scraped from HTML. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not co
2 min read
BeautifulSoup - Scraping Paragraphs from HTML
In this article, we will discuss how to scrap paragraphs from HTML using Beautiful Soup Method 1: using bs4 and urllib. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. For installing the module-pip install bs4.urllib: urllib is a package that collects several modules for working with URLs. It
3 min read
Remove all style, scripts, and HTML tags using BeautifulSoup
Prerequisite: BeautifulSoup, Requests Beautiful Soup is a Python library for pulling data out of HTML and XML files. In this article, we are going to discuss how to remove all style, scripts, and HTML tags using beautiful soup. Required Modules: bs4: Beautiful Soup (bs4) is a python library primarily used to extract data from HTML, XML, and other m
2 min read
Get all HTML tags with BeautifulSoup
Web scraping is a process of using bots like software called web scrapers in extracting information from HTML or XML content. Beautiful Soup is one such library used for scraping data through python. Beautiful Soup parses through the HTML content of the web page and collects it to provide iteration, searching and modification features on it. To pro
2 min read
Article Tags :
Practice Tags :