How to remove empty tags using BeautifulSoup in Python?

Last Updated : 26 Nov, 2020

Prerequisite: Requests, BeautifulSoup, strip

The task is to write a program that removes the empty tag from HTML code. In Beautiful Soup there is no in-built method to remove tags that has no content.

Module Needed:

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.

pip install bs4

requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.

pip install requests

Approach:

Get HTML Code
Iterate through each tag
- Fetching text from the tag and remove whitespaces using the strip.
- After removing whitespace, check If the length of the text is zero remove the tag from HTML code.

Example 1: Remove empty tag.

Python3

# Import Module 
from bs4 import BeautifulSoup 
  
# HTML Object 
html_object = """ 
  
<p> 
<p></p> 
<strong>some<br>text<br>here</strong></p> 
  
"""
  
# Get HTML Code 
soup = BeautifulSoup( html_object , "lxml") 
  
# Iterate each line 
for x in soup.find_all(): 
  
    # fetching text from tag and remove whitespaces 
    if len(x.get_text(strip=True)) == 0: 
          
        # Remove empty tag 
        x.extract() 
  
# Print HTML Code with removed empty tags 
print(soup)

Output:

<html><body><strong>sometexthere</strong>
</body></html>

Example 2: Remove empty tag from a given URL.

Python3

# Import Module 
from bs4 import BeautifulSoup 
import requests 
  
# Page URL 
URL = "https://www.geeksforgeeks.org/"
  
# Page content from Website URL 
page = requests.get( URL ) 
  
# Get HTML Code 
soup = BeautifulSoup( page.content , "lxml" ) 
  
# Iterate each line 
for x in soup.find_all(): 
  
    # fetching text from tag and remove whitespaces 
    if len( x.get_text ( strip = True )) == 0: 
  
        # Remove empty tag 
        x.extract() 
  
# Print HTML Code with removed empty tags 
print(soup)