Skip to content
Related Articles

Related Articles

Improve Article

Find the length of the text of the first given tag using BeautifulSoup

  • Last Updated : 17 Jun, 2021

In this article, we are going to Find the length of the text of the first given tag using BeautifulSoup.

Let us see a sample example. Using ‘html.parser’ it is parsed and the tag value ‘h2’ length is calculated in the below code soup = BeautifulSoup(html_doc, ‘html.parser’) specifies that entire given HTML document is parsed using html.parser. The soup.find(‘h2’).text method takes any of the valid HTML tags that are present inside the given document and searches for it. If the tags are present, it will get the next set of operations to get done. In case if the specified tag is not present, it will throw “Attribute Error”

Here in the example, we care calculating length, hence used len() function. The len() function returns the number of items in an object and in the case of a string, it returns the number of characters enclosed in that string.

Example 1:

In this example, as we have tried to get a text value present inside “h2”, it is just calculating the number of characters enclosed in that string.



Python3




# import module
from bs4 import BeautifulSoup
 
# assign HTML document
html_doc = """
<html>
 
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>An example of HTML page to find the length of
the first tag</title>
</head>
 
<body>
<h2>An example of HTML page to find the length of the
first tag</h2>
 
 
 
<p>
Beautiful Soup is a library which is essential to scrape
information from web pages.
It helps to iterate, search and modifying the parse tree.</p>
 
 
</body>
</html>
"""
 
# create beautiful soap object
soup = BeautifulSoup(html_doc, 'html.parser')
 
# get length
print("Length of the text of the first <h2> tag:")
print(len(soup.find('h2').text))

Output:

Length of the text of the first <h2> tag:
59

The soup.find().text statement retrieves the text enclosed between a particular tag. Then the len() function returns the length of the text.

Example 2 :

Get the length of all HTML tags present inside the given HTML.

Python3




# import module
from bs4 import BeautifulSoup
 
# assign html document
html_doc = """
<html>
 
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>An example of HTML page to find the length of
the first tag</title>
</head>
 
<body>
<h2>An example of HTML page to find the length of the
first tag</h2>
 
 
<p>
Beautiful Soup is a library which is essential to scrape
information from web pages.
It helps to iterate, search and modifying the parse tree.</p>
 
 
</body>
</html>
"""
 
# create beautiful sopa object
soup = BeautifulSoup(html_doc, 'html.parser')
 
# Get all the tags present in the html and
# getting their length
for tag in soup.findAll(True):
    print(tag.name, " : ", len(soup.find(tag.name).text))

Output:



The findAll(True) method until there are tags, it will find them. The for tag in soup.findAll(True): statement iterates all the tags that are found out and, finally the statement print(tag.name, ” : “, len(soup.find(tag.name).text)) displays the tag one by one as well as its length.

If we explicitly want to get the first tag means, in the above code, we need to put a break statement after the print statement.

Python3




# get length of first tag only
for tag in soup.findAll(True):
    print(tag.name, " : ", len(soup.find(tag.name).text))
    break

Output: 

html  :  270

Example 3:

In this example, we will find the text length of a particular given tag from an HTML document. 

Python3




# import module
from bs4 import BeautifulSoup
 
# assign HTML document
html_doc = """
<html>
 
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>An example of HTML page to find the length of
the first tag</title>
</head>
 
<body>
<h2>An example of HTML page to find the length of the
first tag</h2>
 
 
<p>
Beautiful Soup is a library which is essential to scrape
information from web pages.
It helps to iterate, search and modifying the parse tree.</p>
 
 
</body>
</html>
"""
 
# create beautiful soap object
soup = BeautifulSoup(html_doc, 'html.parser')
 
# assign tag
tag = "html"
 
# get length
print("Length of the text of", tag, "tag is:",
      len(soupResults.find(tag).text))

Output:

Length of the text of html tag is: 5062

Example 4:

Now let us see how to get a tag and their text lengths from a web page like monster. As we need to get data from this request URL, we need to include the requests module to achieve the same.

Python3




# import module
from bs4 import BeautifulSoup
import requests
 
# assign URL
monsterPage = requests.get(monsterPageURL)
 
# create Beautiful Soup object
soupResults = BeautifulSoup(monsterPage.content, 'html.parser')
 
# assign tag
tag="title"
 
# get length of the tags
print("Length of the text of",tag,"tag is:",
        len(soupResults.find(tag).text))

Output:

Length of the text of title tag is: 57

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :