Skip to content
Related Articles

Related Articles

Improve Article
How to remove empty tags using BeautifulSoup in Python?
  • Last Updated : 26 Nov, 2020

Prerequisite: Requests, BeautifulSoup, strip

The task is to write a program that removes the empty tag from HTML code. In Beautiful Soup there is no in-built method to remove tags that has no content.

Module Needed:

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
  • requests:  Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests

Approach:

  • Get HTML Code
  • Iterate through each tag
    • Fetching text from the tag and remove whitespaces using the strip.
    • After removing whitespace, check If the length of the text is zero remove the tag from HTML code.

Example 1: Remove empty tag.

Python3




# Import Module
from bs4 import BeautifulSoup
  
# HTML Object
html_object = """
  
<p>
<p></p>
<strong>some<br>text<br>here</strong></p>
  
"""
  
# Get HTML Code
soup = BeautifulSoup( html_object , "lxml")
  
# Iterate each line
for x in soup.find_all():
  
    # fetching text from tag and remove whitespaces
    if len(x.get_text(strip=True)) == 0:
          
        # Remove empty tag
        x.extract()
  
# Print HTML Code with removed empty tags
print(soup)

Output:



<html><body><strong>sometexthere</strong>
</body></html>

Example 2: Remove empty tag from a given URL.

Python3




# Import Module
from bs4 import BeautifulSoup
import requests
  
# Page URL
  
# Page content from Website URL
page = requests.get( URL )
  
# Get HTML Code
soup = BeautifulSoup( page.content , "lxml" )
  
# Iterate each line
for x in soup.find_all():
  
    # fetching text from tag and remove whitespaces
    if len( x.get_text ( strip = True )) == 0:
  
        # Remove empty tag
        x.extract()
  
# Print HTML Code with removed empty tags
print(soup)

Output:

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :