Prerequisite: Beautifulsoup Installation
Attributes are provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. A tag may have any number of attributes. For example, the tag <b class=”active”> has an attribute “class” whose value is “active”. We can access a tag’s attributes by treating it like a dictionary.
Syntax:
tag.attrs
Implementation:
Example 1: Program to extract the attributes using attrs approach.
# Import Beautiful Soup from bs4 import BeautifulSoup
# Initialize the object with a HTML page soup = BeautifulSoup( '''
<html>
<h2 class="hello"> Heading 1 </h2>
<h1> Heading 2 </h1>
</html>
''' , "lxml" )
# Get the whole h2 tag tag = soup.h2
# Get the attribute attribute = tag.attrs
# Print the output print (attribute)
|
Output:
{'class': ['hello']}
Example 2: Program to extract the attributes using dictionary approach.
# Import Beautiful Soup from bs4 import BeautifulSoup
# Initialize the object with a HTML page soup = BeautifulSoup( '''
<html>
<h2 class="hello"> Heading 1 </h2>
<h1> Heading 2 </h1>
</html>
''' , "lxml" )
# Get the whole h2 tag tag = soup.h2
# Get the attribute attribute = tag[ 'class' ]
# Print the output print (attribute)
|
Output:
['hello']
Example 3: Program to extract the multiple attribute values using dictionary approach.
# Import Beautiful Soup from bs4 import BeautifulSoup
# Initialize the object with a HTML page soup = BeautifulSoup( '''
<html>
<h2 class="first second third"> Heading 1 </h2>
<h1> Heading 2 </h1>
</html>
''' , "lxml" )
# Get the whole h2 tag tag = soup.h2
# Get the attribute attribute = tag[ 'class' ]
# Print the output print (attribute)
|
Output:
['first', 'second', 'third']