Beautifulsoup – Kinds of objects

Last Updated : 25 Nov, 2022

In this article, we will discuss different types of objects in Beautifulsoup. When the string or HTML document is given in the constructor of BeautifulSoup, this constructor converts this document to different python objects.

The four major and important objects are :

BeautifulSoup
Tag
NavigableString
Comments

1. BeautifulSoup Object: The BeautifulSoup object represents the parsed document as a whole. So, it is the complete document which we are trying to scrape. For most purposes, you can treat it as a Tag object.

Python3

# importing the module
from bs4 import BeautifulSoup
 
# parsing the document
soup = BeautifulSoup('''<h1>Geeks for Geeks</h1>''',
                     "html.parser")
 
print(type(soup))

Output:

<class 'bs4.BeautifulSoup'>

2. Tag Object: Tag object corresponds to an XML or HTML tag in the original document. Further, this object is usually used to extract a tag from the whole HTML document. Further, Beautiful Soup is not an HTTP client which means to scrap online websites you first have to download them using the requests module and then serve them to Beautiful Soup for scraping. Additionally, this object returns the first found tag if your document has multiple tags with the same name.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    <html>
        <b>Geeks for Geeks</b>
    </html>
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(type(tag))

Output:

<class 'bs4.element.Tag'>

The tag contains many methods and attributes. And two important features of a tag are its name and attributes.

Name
Attributes

# Name :

The name of the tag can be accessed through ‘.name’ as suffix.

Syntax: tag.name

Return: the type of tag it is.

We can also change the name of the tag.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    <html>
        <b>Geeks for Geeks</b>
    </html>
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(tag.name)
 
# changing the tag
tag.name = "Strong"
print(tag)

Output:

b
<Strong>Geeks for Geeks</Strong>

# Attributes :

Example 1: Anything that is NOT tag, is basically an attribute and must contain a value. A tag object can have many attributes and can be accessed either through accessing the keys or directly accessing through value. We can also modify the attributes and their value.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    <html>
        <b class="gfg">Geeks for Geeks</b>
    </html>
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
print(tag["class"])
 
# modifying class
tag["class"] = "geeks"
print(tag)
 
# delete the class attributes
del tag["class"]
print(tag)

Output:

['gfg']
<b class="geeks">Geeks for Geeks</b>
<b>Geeks for Geeks</b>

Example 2: A document may contain multi-valued attributes and can be accessed using key-value pair.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
 
# Initialize the object with an HTML page
# soup for multi_valued attributes
soup = BeautifulSoup('''
    <html>
        <b class="gfg geeks">Geeks for Geeks</b>
    </html>
    ''', "html.parser")
 
# Get the tag
tag = soup.b
 
print(tag["class"])

Output:

['gfg', 'geeks']

3. NavigableString Object: A string corresponds to a bit of text within a tag. Beautiful Soup uses the NavigableString class to contain these bits of text.

Syntax: <tag> String here </tag>

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    <html>
        <b>Geeks for Geeks</b>
    </html>
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
# Get the string inside the tag
string = tag.string
   
# Print the output
print(type(string))

Output:

<class 'bs4.element.NavigableString'>

4. Comment Object: The Comment object is just a special type of NavigableString and is used to make the codebase more readable.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Create the document
markup = "<b><!-- COMMENT --></b>"
   
# Initialize the object with the document
soup = BeautifulSoup(markup, "html.parser")
   
# Get the whole comment inside b tag
comment = soup.b.string
   
# Print the type of the comment
print(type(comment))

Output:

<class 'bs4.element.Comment'>

Suggest improvement

Beautifulsoup Installation - Python

How to Scrape Data From Local HTML Files using Python?

Share your thoughts in the comments

Installing and Loading BeautifulSoup

Navigating the HTML structure With Beautiful Soup

Searching and Extract for specific tags With Beautiful Soup

Creating new HTML elements With Beautiful Soup

Modifying HTML with BeautifulSoup

Working with CSS selectors With Beautiful Soup

Handling cookies and sessions with BeautifulSoup

Installing and Loading BeautifulSoup

Navigating the HTML structure With Beautiful Soup

Searching and Extract for specific tags With Beautiful Soup

Creating new HTML elements With Beautiful Soup

Modifying HTML with BeautifulSoup

Working with CSS selectors With Beautiful Soup

Handling cookies and sessions with BeautifulSoup

Beautifulsoup – Kinds of objects

Python3

Python3

Python3

Python3

Python3

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?