Prerequisite:- Requests , BeautifulSoup
The task is to write a program to find all the classes for a given Website URL. In Beautiful Soup there is no in-built method to find all classes.
Module needed:
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.
pip install requests
Methods #1: Finding the class in a given HTML document.
Approach:
- Create an HTML doc.
- Import module.
- Parse the content into BeautifulSoup.
- Iterate the data by class name.
Code:
Python3
# html code html_doc = """<html><head><title>Welcome to geeksforgeeks</title></head>
<body> <p class="title"><b>Geeks</b></p> <p class="body">geeksforgeeks a computer science portal for geeks </body> """ # import module from bs4 import BeautifulSoup
# parse html content soup = BeautifulSoup( html_doc , 'html.parser' )
# Finding by class name soup.find( class_ = "body" )
|
Output:
<p class="body">geeksforgeeks a computer science portal for geeks </p>
Methods #2: Below is the program to find all class in a URL.
Approach:
- Import module
- Make requests instance and pass into URL
- Pass the requests into a Beautifulsoup() function
- Then we will iterate all tags and fetch class name
Code:
Python3
# Import Module from bs4 import BeautifulSoup
import requests
# Website URL # class list set class_list = set ()
# Page content from Website URL page = requests.get( URL )
# parse html content soup = BeautifulSoup( page.content , 'html.parser' )
# get all tags tags = {tag.name for tag in soup.find_all()}
# iterate all tags for tag in tags:
# find all element of tag
for i in soup.find_all( tag ):
# if tag has attribute of class
if i.has_attr( "class" ):
if len ( i[ 'class' ] ) ! = 0 :
class_list.add( " " .join( i[ 'class' ]))
print ( class_list )
|
Output: