Open In App

Extract the HTML code of the given tag and its parent using BeautifulSoup

In this article, we will discuss how to extract the HTML code of the given tag and its parent using BeautifulSoup.

Modules Needed

First, we need to install all these modules on our computer.



pip install bs4
pip install lxml
pip install requests

Scraping A Sample Website




# importing the modules
from bs4 import BeautifulSoup
import requests
  
# URL to the scraped
  
# getting the contents of the website and parsing them
webpage = requests.get(URL)
soup = BeautifulSoup(webpage.content, "lxml")

Here to extract the HTML of the title of the site, we can extract this easily using the id of the title.






# getting the h1 with id as firstHeading and printing it
title = soup.find("h1", attrs={"id": 'firstHeading'})
print(title)




# getting the text/content inside the h1 tag we
# parsed on the previous line
cont = title.get_text()
print(cont)

We need to extract it that displays the HTML in lists of lists form.




# getting the HTML of the parent parent of 
# the h1 tag we parsed earlier
parent = soup.find("span"
                   attrs={"id": 'Machine_learning_approaches'}).parent()
print(parent)

Below is the complete program:




# importing the modules
from bs4 import BeautifulSoup 
import requests 
  
# URL to the scraped
  
# getting the contents of the website and parsing them
webpage = requests.get(URL) 
soup = BeautifulSoup(webpage.content, "lxml")
  
# getting the h1 with id as firstHeading and printing it
title = soup.find("h1", attrs={"id": 'firstHeading'})
print(title)
  
# getting the text/content inside the h1 tag we 
# parsed on the previous line
cont = title.get_text()
print(cont)
  
# getting the HTML of the parent parent of 
# the h1 tag we parsed earlier
parent = soup.find("span"
                   attrs={"id": 'Machine_learning_approaches'}).parent()
print(parent)

Output:

You can also refer to this video for an explanation:


Article Tags :