Skip to content
Related Articles

Related Articles

Extracting an attribute value with beautifulsoup in Python
  • Last Updated : 29 Dec, 2020

Prerequisite: Beautifulsoup Installation

Attributes are provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. A tag may have any number of attributes. For example, the tag <b class=”active”> has an attribute “class” whose value is “active”. We can access a tag’s attributes by treating it like a dictionary.

Syntax: 

tag.attrs

Implementation:
Example 1: Program to extract the attributes using attrs approach.
 

Python3



filter_none

edit
close

play_arrow

link
brightness_4
code

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="hello"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag.attrs
  
# Print the output
print(attribute)

chevron_right


Output: 

{'class': ['hello']}

Example 2: Program to extract the attributes using dictionary approach.
 

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="hello"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag['class']
  
# Print the output
print(attribute)

chevron_right


Output: 

['hello']

Example 3: Program to extract the multiple attribute values using dictionary approach.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="first second third"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag['class']
  
# Print the output
print(attribute)

chevron_right


Output: 

['first', 'second', 'third']

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up
Recommended Articles
Page :