Open In App

BeautifulSoup – Modifying the tree

Prerequisites: BeautifulSoup

Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify html webpages. This article depicts how beautifulsoup can be employed to modify the parse tree. BeautifulSoup is used to search the parse tree and allow you to modify the tree. You can rename tag, change the values of its attributes, add and delete attribute.



Modifying the name of the tag and its attributes

You can change the name of the tag and modify its attribute by adding or deleting them.

Syntax: tag.name = “new_tag”



Syntax: tag[“attribute”] = “value”

Syntax: del tag[“attribute”]

A tree can also be modified by inserting new elements at required places.

Syntax: tag.insert()

Syntax: tag.insert_after()

Syntax: tag.insert_before()

Approach :

Example 1: 




# importing module
from bs4 import BeautifulSoup
 
markup = """<p class="para">gfg</p>
 
 
 
 
"""
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# extracting a tag
tag = soup.p
 
print("Before modifying the tag name: ")
print(tag)
print()
 
# modifying tag name
tag.name = "div"
 
print("After modifying the tag name: ")
print(tag)
print()
# modifying its class attribute
tag['class'] = "div_class"
 
# adding new attribute
tag['id'] = "div_id"
 
print("After modifying and adding attributes: ")
print(tag)
print()
 
# to delete any attributes
del tag["class"]
 
print("After deleting class attribute: ")
print(tag)
print()
 
# modifying the tags content
tag.string = "Geeks"
 
print("After modifying tag string: ")
print(tag)
print()
 
# using insert function.
tag = soup.div
print("Before inserting: ")
print(tag)
print()
 
# inserting content
tag.insert(1, " for Geeks")
print("After inserting: ")
print(tag)
print()

Output:

Example 2:




# importing module
from bs4 import BeautifulSoup
 
soup = BeautifulSoup("<b>| A Computer Science portal</b>", 'html.parser')
 
tag = soup.new_tag("p")
tag.string = "Geeks"
 
 
# insert before
soup.b.string.insert_before(tag)
print(soup.b)
print()
 
# insert after
soup.b.p.insert_after(soup.new_string(" for Geeks"))
print(soup.b)

Output:

Adding new tag and wrapping element 

The tree can be modified by adding a new tag at any required location. We can also wrap the element to modify it.

Syntax: new_tag(“attribute”)

Syntax: wrap()

Syntax: unwrap()

Example:




# importing module
from bs4 import BeautifulSoup
 
markup = '
 
 
 
 
<p>Geeks for Geeks</p>
 
 
 
 
'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
print(soup)
 
# wrapping around the string
soup.p.string.wrap(soup.new_tag("i"))
print(soup)
 
# wrapping around the tag
soup.p.wrap(soup.new_tag("div"))
print(soup)
 
# unwrapping the i tag
 
soup.p.i.unwrap()
 
print(soup)
 
old_tag = soup.div
 
# new tag
new_tag = soup.new_tag('div')
new_tag.string = "| A Computer Science portal for geeks"
 
# adding new tag
old_tag.append(new_tag)
 
print(soup)

Output:

Replacing element

replace_with() function will replace old tag or string with new tag or string in the parse tree.

Syntax: replace_with()

Example:




# importing BeautifulSoup Module
from bs4 import BeautifulSoup
 
markup = '<a href="http://gfg.com/">Geeks for Geeks <i>gfg.com</i></a>'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# tag to be replaced
old_tag = soup.a
 
# new tag
new_tag = soup.new_tag("p")
 
# input string
new_tag.string = "gfg.in"
 
'''replacing tag  page_element.replace_with("string")
removes a tag or string from the tree, and replaces
it with the tag or string of your choice.'''
 
old_tag.i.replace_with(new_tag)
 
print(old_tag)

Output:

<a href=”http://gfg.com/”>Geeks for Geeks <p>gfg.in</p></a>

Adding new content to an existing tag

For adding new contents to an existing tag can be done by append() function or NavigableString() constructor.

Syntax: tag.append(“content”)

Example:




# importing module
from bs4 import BeautifulSoup
from bs4 import NavigableString
 
markup = """<a href="https://www.geeksforgeeks.org/">Geeks for Geeks</a>"""
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# extracting a tag
tag = soup.a
 
# appending content
tag.append("| A Computer Science portal")
print(tag)
 
# appending content using navigableString constructor
new_str = NavigableString(" for geeks")
tag.append(new_str)
print(tag)

Output:

<a href=”https://www.geeksforgeeks.org/”>Geeks for Geeks| A Computer Science portal</a>

<a href=”https://www.geeksforgeeks.org/”>Geeks for Geeks| A Computer Science portal for geeks</a>

Removing content and element

A tree can be modified by removing content from it or by removing element also.

Syntax: clear()

Syntax: extract()

Syntax: decompose()

Example:




# importing module
from bs4 import BeautifulSoup
 
markup = '<a href="https://www.geeksforgeeks.org/">Geeks for Geeks <i>| A Computer Science portal</i></a>'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
tag = soup.a
print(tag)
print()
 
# clearing its all content
tag.clear()
print(tag)
print()
 
# extracting i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.extract()
 
print(a_tag)
print()
 
# decomposing i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.decompose()
 
print(a_tag)

Output:


Article Tags :