Open In App

BeautifulSoup – Modifying the tree

Last Updated : 07 Sep, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisites: BeautifulSoup

Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify html webpages. This article depicts how beautifulsoup can be employed to modify the parse tree. BeautifulSoup is used to search the parse tree and allow you to modify the tree. You can rename tag, change the values of its attributes, add and delete attribute.

Modifying the name of the tag and its attributes

You can change the name of the tag and modify its attribute by adding or deleting them.

  • To change tag name:

Syntax: tag.name = “new_tag”

  • To modify its attribute or to add new attribute:

Syntax: tag[“attribute”] = “value”

  • To delete any attribute:

Syntax: del tag[“attribute”]

A tree can also be modified by inserting new elements at required places.

  • insert() function will insert new element at any position

Syntax: tag.insert()

  • insert_after() function will insert element after something in the parse tree.

Syntax: tag.insert_after()

  • insert_before() function will insert element before something in the parse tree.

Syntax: tag.insert_before()

Approach :

  • Import module
  • Scrap data from webpage
  • Parse the string scraped to html
  • Select tag within which modification has to be performed
  • Make required changes

Example 1: 

Python3




# importing module
from bs4 import BeautifulSoup
 
markup = """<p class="para">gfg</p>
 
 
 
 
"""
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# extracting a tag
tag = soup.p
 
print("Before modifying the tag name: ")
print(tag)
print()
 
# modifying tag name
tag.name = "div"
 
print("After modifying the tag name: ")
print(tag)
print()
# modifying its class attribute
tag['class'] = "div_class"
 
# adding new attribute
tag['id'] = "div_id"
 
print("After modifying and adding attributes: ")
print(tag)
print()
 
# to delete any attributes
del tag["class"]
 
print("After deleting class attribute: ")
print(tag)
print()
 
# modifying the tags content
tag.string = "Geeks"
 
print("After modifying tag string: ")
print(tag)
print()
 
# using insert function.
tag = soup.div
print("Before inserting: ")
print(tag)
print()
 
# inserting content
tag.insert(1, " for Geeks")
print("After inserting: ")
print(tag)
print()


Output:

Example 2:

Python3




# importing module
from bs4 import BeautifulSoup
 
soup = BeautifulSoup("<b>| A Computer Science portal</b>", 'html.parser')
 
tag = soup.new_tag("p")
tag.string = "Geeks"
 
 
# insert before
soup.b.string.insert_before(tag)
print(soup.b)
print()
 
# insert after
soup.b.p.insert_after(soup.new_string(" for Geeks"))
print(soup.b)


Output:

modify tree python bs4

Adding new tag and wrapping element 

The tree can be modified by adding a new tag at any required location. We can also wrap the element to modify it.

  • new_tag() function will add a new tag

Syntax: new_tag(“attribute”)

  • wrap() function will enclose an element in the tag you specify and returns a new wrapper

Syntax: wrap()

  • unwrap() function unwrap the wrapped elements.

Syntax: unwrap()

Example:

Python3




# importing module
from bs4 import BeautifulSoup
 
markup = '
 
 
 
 
<p>Geeks for Geeks</p>
 
 
 
 
'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
print(soup)
 
# wrapping around the string
soup.p.string.wrap(soup.new_tag("i"))
print(soup)
 
# wrapping around the tag
soup.p.wrap(soup.new_tag("div"))
print(soup)
 
# unwrapping the i tag
 
soup.p.i.unwrap()
 
print(soup)
 
old_tag = soup.div
 
# new tag
new_tag = soup.new_tag('div')
new_tag.string = "| A Computer Science portal for geeks"
 
# adding new tag
old_tag.append(new_tag)
 
print(soup)


Output:

Replacing element

replace_with() function will replace old tag or string with new tag or string in the parse tree.

Syntax: replace_with()

Example:

Python3




# importing BeautifulSoup Module
from bs4 import BeautifulSoup
 
markup = '<a href="http://gfg.com/">Geeks for Geeks <i>gfg.com</i></a>'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# tag to be replaced
old_tag = soup.a
 
# new tag
new_tag = soup.new_tag("p")
 
# input string
new_tag.string = "gfg.in"
 
'''replacing tag  page_element.replace_with("string")
removes a tag or string from the tree, and replaces
it with the tag or string of your choice.'''
 
old_tag.i.replace_with(new_tag)
 
print(old_tag)


Output:

<a href=”http://gfg.com/”>Geeks for Geeks <p>gfg.in</p></a>

Adding new content to an existing tag

For adding new contents to an existing tag can be done by append() function or NavigableString() constructor.

Syntax: tag.append(“content”)

Example:

Python3




# importing module
from bs4 import BeautifulSoup
from bs4 import NavigableString
 
markup = """<a href="https://www.geeksforgeeks.org/">Geeks for Geeks</a>"""
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# extracting a tag
tag = soup.a
 
# appending content
tag.append("| A Computer Science portal")
print(tag)
 
# appending content using navigableString constructor
new_str = NavigableString(" for geeks")
tag.append(new_str)
print(tag)


Output:

<a href=”https://www.geeksforgeeks.org/”>Geeks for Geeks| A Computer Science portal</a>

<a href=”https://www.geeksforgeeks.org/”>Geeks for Geeks| A Computer Science portal for geeks</a>

Removing content and element

A tree can be modified by removing content from it or by removing element also.

  • clear() removes the contents of the tag.

Syntax: clear()

  • extract() removes a tag or strings from the tree.

Syntax: extract()

  • decompose() removes the tag and delete it all content.

Syntax: decompose()

Example:

Python3




# importing module
from bs4 import BeautifulSoup
 
markup = '<a href="https://www.geeksforgeeks.org/">Geeks for Geeks <i>| A Computer Science portal</i></a>'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
tag = soup.a
print(tag)
print()
 
# clearing its all content
tag.clear()
print(tag)
print()
 
# extracting i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.extract()
 
print(a_tag)
print()
 
# decomposing i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.decompose()
 
print(a_tag)


Output:

Removing content python bs4



Previous Article
Next Article

Similar Reads

BeautifulSoup object - Python Beautifulsoup
BeautifulSoup object is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. The BeautifulSoup object represents the parsed document as a whole. For most purposes, you can treat it as a Tag object. Syntax: BeautifulS
2 min read
How to Search the Parse Tree using BeautifulSoup?
Searching the parse tree means we need to find the tag and the content of the HTML tree. This can be done in many ways. But the most used method for searching the parse tree is the find() and find_all() method. With the help of this, we can parse the HTML tree using Beautifulsoup. For Searching the parse tree follow the below steps. Step 1: For scr
2 min read
Python BeautifulSoup Navigating tree sideways
In this article, we will see how to navigate the beautifulsoup parse tree sideways. Navigating sideways means that the tags are on the same level. See the below example to get a better idea. &lt;a&gt; &lt;b&gt;&lt;/b&gt; &lt;c&gt;&lt;/c&gt; &lt;/a&gt; In the above example, the tags &lt;b&gt; and &lt;c&gt; are at the same level. Installation of Requ
3 min read
Python | Modifying tuple contents with list
In Python, tuples are immutable and hence no changes are required in them once they are formed. This restriction makes their processing harder and hence certain operations on tuples are quite useful to have knowledge of. This article deals with modifying the second tuple element with the list given. Let's discuss certain ways in which this can be d
6 min read
Modifying PDF file using Python
The following article depicts how a PDF can be modified using python's pylovepdf module. The Portable Document Format(PDF) is a file format developed by Adobe in 1993 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. pylovepdf module can be downloaded using
3 min read
BeautifulSoup - Error Handling
Sometimes, during scraping data from websites we all have faced several types of errors in which some are out of understanding and some are basic syntactical errors. Here we will discuss on types of exceptions that are faced during coding the script. Error During Fetching of Website When we are fetching any website content we need to aware of some
4 min read
BeautifulSoup - Scraping List from HTML
Prerequisite: RequestsBeautifulSoup Python can be employed to scrap information from a web page. It can also be used to retrieve data provided within a specific tag, this article how list elements can be scraped from HTML. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not co
2 min read
Insert tags or strings immediately before and after specified tags using BeautifulSoup
BeautifulSoup is a Python library that is used for extracting data out of markup languages like HTML, XML...etc. For example let us say we have some web pages that needed to display relevant data related to some research like processing information such as date or address but that do not have any way to download it, in such cases BeautifulSoup come
2 min read
Beautifulsoup Installation - Python
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. The latest Version of Beautifulsoup is v4.9.3 as of now. PrerequisitesPythonPip How to install Beauti
1 min read
Python - Obtain title, views and likes of YouTube video using BeautifulSoup
In this article, we will learn how can we obtain data (like title, views, likes, dislikes etc) from any YouTube video using a Python script. For this task, we are going to use very famous library for web scraping BeautifulSoup and Requests. Modules required and Installation : Requests : Requests allows you to send HTTP/1.1 requests extremely easily
2 min read
Article Tags :
Practice Tags :