How to extract a div tag and its contents by id with BeautifulSoup?
Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify HTML webpages. This article depicts how beautifulsoup can be employed to extract a div and its content by its ID. For this, find() function of the module is used to find the div by its ID.
Approach:
- Import module
- Scrap data from a webpage
- Parse the string scraped to HTML
- Find the div with its ID
- Print its content
Syntax : find(tag_name, **kwargs)
Parameters:
- The tag_name argument tell Beautiful Soup to only find tags with given names. Text strings will be ignored, as will tags whose names that don’t match.
- The **kwargs arguments are used to filter against each tag’s ‘id’ attribute.
Below is the implementation:
Example 1:
Python3
#importing module from bs4 import BeautifulSoup markup = '''<html><body><div id="container">Div Content</div></body></html>''' soup = BeautifulSoup(markup, 'html.parser' ) #finding the div with the id div_bs4 = soup.find( 'div' , id = "container" ) print (div_bs4.string) |
Output:
Div Content
Example 2:
Python3
#importing module from bs4 import BeautifulSoup markup = markup = """ <!DOCTYPE> <html> <head><title>Example</title></head> <body> <p> Nested div </p> <div id="first"> Div with ID first <div id="second"> Div with id second </div> </div> </body> </html> """ # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser' ) #finding the div with the id div_bs4 = soup.find( 'div' , id = "second" ) print (div_bs4.string) |
Output:
Div with id second
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.