Related Articles

Related Articles

Python BeautifulSoup – find all class
  • Last Updated : 26 Nov, 2020

Prerequisite:- Requests , BeautifulSoup

The task is to write a program to find all the classes for a given Website URL. In Beautiful Soup there is no in-built method to find all classes.

Module needed:

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4

  • requests:  Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.
pip install requests

Methods #1: Finding the class in a given HTML document.

Approach:



  • Create an HTML doc.
  • Import module.
  • Parse the content into BeautifulSoup.
  • Iterate the data by class name.

Code:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# html code
html_doc = """<html><head><title>Welcome  to geeksforgeeks</title></head>
<body>
<p class="title"><b>Geeks</b></p>
  
  
<p class="body">geeksforgeeks a computer science portal for geeks
</body>
"""
  
# import module
from bs4 import BeautifulSoup
  
# parse html content
soup = BeautifulSoup( html_doc , 'html.parser')
  
# Finding by class name
soup.find( class_ = "body" )

chevron_right


Output:

<p class="body">geeksforgeeks a computer science portal for geeks
</p>

 

Methods #2: Below is the program to find all class in a URL.

Approach:

  • Import module
  • Make requests instance and pass into URL
  • Pass the requests into a Beautifulsoup() function
  • Then we will iterate all tags and fetch class name

Code:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# Import Module
from bs4 import BeautifulSoup
import requests
  
# Website URL
  
# class list set
class_list = set()
  
# Page content from Website URL
page = requests.get( URL )
  
# parse html content
soup = BeautifulSoup( page.content , 'html.parser')
  
# get all tags
tags = {tag.name for tag in soup.find_all()}
  
# iterate all tags
for tag in tags:
  
    # find all element of tag
    for i in soup.find_all( tag ):
  
        # if tag has attribute of class
        if i.has_attr( "class" ):
  
            if len( i['class'] ) != 0:
                class_list.add(" ".join( i['class']))
  
print( class_list )

chevron_right


Output:


Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up
Recommended Articles
Page :