BeautifulSoup – Modifying the tree
Last Updated :
07 Sep, 2021
Prerequisites: BeautifulSoup
Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify html webpages. This article depicts how beautifulsoup can be employed to modify the parse tree. BeautifulSoup is used to search the parse tree and allow you to modify the tree. You can rename tag, change the values of its attributes, add and delete attribute.
Modifying the name of the tag and its attributes
You can change the name of the tag and modify its attribute by adding or deleting them.
Syntax: tag.name = “new_tag”
- To modify its attribute or to add new attribute:
Syntax: tag[“attribute”] = “value”
Syntax: del tag[“attribute”]
A tree can also be modified by inserting new elements at required places.
- insert() function will insert new element at any position
Syntax: tag.insert()
- insert_after() function will insert element after something in the parse tree.
Syntax: tag.insert_after()
- insert_before() function will insert element before something in the parse tree.
Syntax: tag.insert_before()
Approach :
- Import module
- Scrap data from webpage
- Parse the string scraped to html
- Select tag within which modification has to be performed
- Make required changes
Example 1:
Python3
from bs4 import BeautifulSoup
markup =
soup = BeautifulSoup(markup, 'html.parser' )
tag = soup.p
print ( "Before modifying the tag name: " )
print (tag)
print ()
tag.name = "div"
print ( "After modifying the tag name: " )
print (tag)
print ()
tag[ 'class' ] = "div_class"
tag[ 'id' ] = "div_id"
print ( "After modifying and adding attributes: " )
print (tag)
print ()
del tag[ "class" ]
print ( "After deleting class attribute: " )
print (tag)
print ()
tag.string = "Geeks"
print ( "After modifying tag string: " )
print (tag)
print ()
tag = soup.div
print ( "Before inserting: " )
print (tag)
print ()
tag.insert( 1 , " for Geeks" )
print ( "After inserting: " )
print (tag)
print ()
|
Output:
Example 2:
Python3
from bs4 import BeautifulSoup
soup = BeautifulSoup( "<b>| A Computer Science portal</b>" , 'html.parser' )
tag = soup.new_tag( "p" )
tag.string = "Geeks"
soup.b.string.insert_before(tag)
print (soup.b)
print ()
soup.b.p.insert_after(soup.new_string( " for Geeks" ))
print (soup.b)
|
Output:
Adding new tag and wrapping element
The tree can be modified by adding a new tag at any required location. We can also wrap the element to modify it.
- new_tag() function will add a new tag
Syntax: new_tag(“attribute”)
- wrap() function will enclose an element in the tag you specify and returns a new wrapper
Syntax: wrap()
- unwrap() function unwrap the wrapped elements.
Syntax: unwrap()
Example:
Python3
from bs4 import BeautifulSoup
markup = '
<p>Geeks for Geeks< / p>
'
soup = BeautifulSoup(markup, 'html.parser' )
print (soup)
soup.p.string.wrap(soup.new_tag( "i" ))
print (soup)
soup.p.wrap(soup.new_tag( "div" ))
print (soup)
soup.p.i.unwrap()
print (soup)
old_tag = soup.div
new_tag = soup.new_tag( 'div' )
new_tag.string = "| A Computer Science portal for geeks"
old_tag.append(new_tag)
print (soup)
|
Output:
Replacing element
replace_with() function will replace old tag or string with new tag or string in the parse tree.
Syntax: replace_with()
Example:
Python3
from bs4 import BeautifulSoup
soup = BeautifulSoup(markup, 'html.parser' )
old_tag = soup.a
new_tag = soup.new_tag( "p" )
new_tag.string = "gfg.in"
old_tag.i.replace_with(new_tag)
print (old_tag)
|
Output:
<a href=”http://gfg.com/”>Geeks for Geeks <p>gfg.in</p></a>
Adding new content to an existing tag
For adding new contents to an existing tag can be done by append() function or NavigableString() constructor.
Syntax: tag.append(“content”)
Example:
Python3
from bs4 import BeautifulSoup
from bs4 import NavigableString
markup =
soup = BeautifulSoup(markup, 'html.parser' )
tag = soup.a
tag.append( "| A Computer Science portal" )
print (tag)
new_str = NavigableString( " for geeks" )
tag.append(new_str)
print (tag)
|
Output:
<a href=”https://www.geeksforgeeks.org/”>Geeks for Geeks| A Computer Science portal</a>
<a href=”https://www.geeksforgeeks.org/”>Geeks for Geeks| A Computer Science portal for geeks</a>
Removing content and element
A tree can be modified by removing content from it or by removing element also.
- clear() removes the contents of the tag.
Syntax: clear()
- extract() removes a tag or strings from the tree.
Syntax: extract()
- decompose() removes the tag and delete it all content.
Syntax: decompose()
Example:
Python3
from bs4 import BeautifulSoup
soup = BeautifulSoup(markup, 'html.parser' )
tag = soup.a
print (tag)
print ()
tag.clear()
print (tag)
print ()
soup2 = BeautifulSoup(markup, 'html.parser' )
a_tag = soup2.a
print (a_tag)
print ()
i_tag = soup2.i.extract()
print (a_tag)
print ()
soup2 = BeautifulSoup(markup, 'html.parser' )
a_tag = soup2.a
print (a_tag)
print ()
i_tag = soup2.i.decompose()
print (a_tag)
|
Output:
Share your thoughts in the comments
Please Login to comment...