Open In App

BeautifulSoup – Modifying the tree

Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisites: BeautifulSoup

Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify html webpages. This article depicts how beautifulsoup can be employed to modify the parse tree. BeautifulSoup is used to search the parse tree and allow you to modify the tree. You can rename tag, change the values of its attributes, add and delete attribute.

Modifying the name of the tag and its attributes

You can change the name of the tag and modify its attribute by adding or deleting them.

  • To change tag name:

Syntax: tag.name = “new_tag”

  • To modify its attribute or to add new attribute:

Syntax: tag[“attribute”] = “value”

  • To delete any attribute:

Syntax: del tag[“attribute”]

A tree can also be modified by inserting new elements at required places.

  • insert() function will insert new element at any position

Syntax: tag.insert()

  • insert_after() function will insert element after something in the parse tree.

Syntax: tag.insert_after()

  • insert_before() function will insert element before something in the parse tree.

Syntax: tag.insert_before()

Approach :

  • Import module
  • Scrap data from webpage
  • Parse the string scraped to html
  • Select tag within which modification has to be performed
  • Make required changes

Example 1: 

Python3




# importing module
from bs4 import BeautifulSoup
 
markup = """<p class="para">gfg</p>
 
 
 
 
"""
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# extracting a tag
tag = soup.p
 
print("Before modifying the tag name: ")
print(tag)
print()
 
# modifying tag name
tag.name = "div"
 
print("After modifying the tag name: ")
print(tag)
print()
# modifying its class attribute
tag['class'] = "div_class"
 
# adding new attribute
tag['id'] = "div_id"
 
print("After modifying and adding attributes: ")
print(tag)
print()
 
# to delete any attributes
del tag["class"]
 
print("After deleting class attribute: ")
print(tag)
print()
 
# modifying the tags content
tag.string = "Geeks"
 
print("After modifying tag string: ")
print(tag)
print()
 
# using insert function.
tag = soup.div
print("Before inserting: ")
print(tag)
print()
 
# inserting content
tag.insert(1, " for Geeks")
print("After inserting: ")
print(tag)
print()


Output:

Example 2:

Python3




# importing module
from bs4 import BeautifulSoup
 
soup = BeautifulSoup("<b>| A Computer Science portal</b>", 'html.parser')
 
tag = soup.new_tag("p")
tag.string = "Geeks"
 
 
# insert before
soup.b.string.insert_before(tag)
print(soup.b)
print()
 
# insert after
soup.b.p.insert_after(soup.new_string(" for Geeks"))
print(soup.b)


Output:

modify tree python bs4

Adding new tag and wrapping element 

The tree can be modified by adding a new tag at any required location. We can also wrap the element to modify it.

  • new_tag() function will add a new tag

Syntax: new_tag(“attribute”)

  • wrap() function will enclose an element in the tag you specify and returns a new wrapper

Syntax: wrap()

  • unwrap() function unwrap the wrapped elements.

Syntax: unwrap()

Example:

Python3




# importing module
from bs4 import BeautifulSoup
 
markup = '
 
 
 
 
<p>Geeks for Geeks</p>
 
 
 
 
'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
print(soup)
 
# wrapping around the string
soup.p.string.wrap(soup.new_tag("i"))
print(soup)
 
# wrapping around the tag
soup.p.wrap(soup.new_tag("div"))
print(soup)
 
# unwrapping the i tag
 
soup.p.i.unwrap()
 
print(soup)
 
old_tag = soup.div
 
# new tag
new_tag = soup.new_tag('div')
new_tag.string = "| A Computer Science portal for geeks"
 
# adding new tag
old_tag.append(new_tag)
 
print(soup)


Output:

Replacing element

replace_with() function will replace old tag or string with new tag or string in the parse tree.

Syntax: replace_with()

Example:

Python3




# importing BeautifulSoup Module
from bs4 import BeautifulSoup
 
markup = '<a href="http://gfg.com/">Geeks for Geeks <i>gfg.com</i></a>'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# tag to be replaced
old_tag = soup.a
 
# new tag
new_tag = soup.new_tag("p")
 
# input string
new_tag.string = "gfg.in"
 
'''replacing tag  page_element.replace_with("string")
removes a tag or string from the tree, and replaces
it with the tag or string of your choice.'''
 
old_tag.i.replace_with(new_tag)
 
print(old_tag)


Output:

<a href=”http://gfg.com/”>Geeks for Geeks <p>gfg.in</p></a>

Adding new content to an existing tag

For adding new contents to an existing tag can be done by append() function or NavigableString() constructor.

Syntax: tag.append(“content”)

Example:

Python3




# importing module
from bs4 import BeautifulSoup
from bs4 import NavigableString
 
markup = """<a href="https://www.geeksforgeeks.org/">Geeks for Geeks</a>"""
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# extracting a tag
tag = soup.a
 
# appending content
tag.append("| A Computer Science portal")
print(tag)
 
# appending content using navigableString constructor
new_str = NavigableString(" for geeks")
tag.append(new_str)
print(tag)


Output:

<a href=”https://www.geeksforgeeks.org/”>Geeks for Geeks| A Computer Science portal</a>

<a href=”https://www.geeksforgeeks.org/”>Geeks for Geeks| A Computer Science portal for geeks</a>

Removing content and element

A tree can be modified by removing content from it or by removing element also.

  • clear() removes the contents of the tag.

Syntax: clear()

  • extract() removes a tag or strings from the tree.

Syntax: extract()

  • decompose() removes the tag and delete it all content.

Syntax: decompose()

Example:

Python3




# importing module
from bs4 import BeautifulSoup
 
markup = '<a href="https://www.geeksforgeeks.org/">Geeks for Geeks <i>| A Computer Science portal</i></a>'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
tag = soup.a
print(tag)
print()
 
# clearing its all content
tag.clear()
print(tag)
print()
 
# extracting i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.extract()
 
print(a_tag)
print()
 
# decomposing i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.decompose()
 
print(a_tag)


Output:

Removing content python bs4



Last Updated : 07 Sep, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads