Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify HTML webpages. This article depicts how beautifulsoup can be employed to extract a div and its content by its ID. For this, find() function of the module is used to find the div by its ID.
Approach:
- Import module
- Scrap data from a webpage
- Parse the string scraped to HTML
- Find the div with its ID
- Print its content
Syntax : find(tag_name, **kwargs)
Parameters:
- The tag_name argument tell Beautiful Soup to only find tags with given names. Text strings will be ignored, as will tags whose names that don’t match.
- The **kwargs arguments are used to filter against each tag’s ‘id’ attribute.
Below is the implementation:
Example 1:
Python3
#importing module from bs4 import BeautifulSoup
markup = '''<html><body><div id="container">Div Content</div></body></html>'''
soup = BeautifulSoup(markup, 'html.parser' )
#finding the div with the id div_bs4 = soup.find( 'div' , id = "container" )
print (div_bs4.string)
|
Output:
Div Content
Example 2:
Python3
#importing module from bs4 import BeautifulSoup
markup = markup = """
<!DOCTYPE> <html> <head><title>Example</title></head>
<body>
<p> Nested div
</p>
<div id="first"> Div with ID first
<div id="second"> Div with id second
</div>
</div>
</body>
</html> """ # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser' )
#finding the div with the id div_bs4 = soup.find( 'div' , id = "second" )
print (div_bs4.string)
|
Output:
Div with id second