Open In App

Retrieve children of the html tag using BeautifulSoup

Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisites: Beautifulsoup

Beautifulsoup is a Python module that is used for web scraping. This article discusses how children tags of given HTML tags can be scraped and displayed.

Sample Website:https://www.geeksforgeeks.org/caching-page-tables/

For First Child:

Approach

  • Import module
  • Pass the URL
  • Request page
  • Display the first child using findChild() function

Syntax:

findChild()

Example:

Python3




from bs4 import BeautifulSoup
import requests
  
# sample web page
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
child_soup = soup.find('p')
  
print("child :  ", child_soup.findChild())


Output:

child :   <a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>

For all children:

For Retrieving children of the HTML tag we have to option either we use .children or .contents. The difference between children and contents is children do not take any memory it gives us an iterable list and the contents give the child tag, but it uses the memory. For large HTML files use children is a better option and for storing value need contents will be better.

Approach

  • Import module
  • Pass the website URL
  • Request page
  • Use either keyword to display the child tags

Using .children:

For Retrieve all the children .children will used.

Example:

Python3




from bs4 import BeautifulSoup
import requests
  
# sample web page
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('p')
  
for i in child_soup.children:
    print("child :  ", i)


Output:

child :   <a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>

child :    is a memory management scheme which allows physical address space of a process to be non-contiguous. The basic idea of paging is to break physical memory into fixed-size blocks called 

child :   <strong>frames</strong>

child :    and logical memory into blocks of same size called 

child :   <strong>pages</strong>

child :   . While executing the process the required pages of that process are loaded into available frames from their source that is disc or any backup storage device.

Using .contents

It will also return all the child tag and store them in memory.

Example

Python3




from bs4 import BeautifulSoup
import requests
  
# sample web page
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
child_soup = soup.find('p')
  
print("child :  ", child_soup.contents)


Output

child :   [<a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>, ‘ is a memory management scheme which allows physical address space of a process to be non-contiguous. The basic idea of paging is to break physical memory into fixed-size blocks called ‘, <strong>frames</strong>, ‘ and logical memory into blocks of same size called ‘, <strong>pages</strong>, ‘. While executing the process the required pages of that process are loaded into available frames from their source that is disc or any backup storage device.’]



Last Updated : 15 Mar, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads