Retrieve children of the html tag using BeautifulSoup

Last Updated : 15 Mar, 2021

Beautifulsoup is a Python module that is used for web scraping. This article discusses how children tags of given HTML tags can be scraped and displayed.

Sample Website:https://www.geeksforgeeks.org/caching-page-tables/

For First Child:

Approach

Import module
Pass the URL
Request page
Display the first child using findChild() function

Syntax:

findChild()

Example:

Python3

from bs4 import BeautifulSoup 
import requests 
  
# sample web page 
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page 
page = requests.get(sample_web_page) 
  
# with the help of beautifulSoup and html parser create soup 
soup = BeautifulSoup(page.content, "html.parser") 
  
child_soup = soup.find('p') 
  
print("child :  ", child_soup.findChild()) 

Output:

child : <a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>

For all children:

For Retrieving children of the HTML tag we have to option either we use .children or .contents. The difference between children and contents is children do not take any memory it gives us an iterable list and the contents give the child tag, but it uses the memory. For large HTML files use children is a better option and for storing value need contents will be better.

Approach

Import module
Pass the website URL
Request page
Use either keyword to display the child tags

Using .children:

For Retrieve all the children .children will used.

Example:

Python3

from bs4 import BeautifulSoup 
import requests 
  
# sample web page 
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page 
page = requests.get(sample_web_page) 
  
# with the help of beautifulSoup and html parser create soup 
soup = BeautifulSoup(page.content, "html.parser") 
child_soup = soup.find('p') 
  
for i in child_soup.children: 
    print("child :  ", i) 

Output:

child : <a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>

child : is a memory management scheme which allows physical address space of a process to be non-contiguous. The basic idea of paging is to break physical memory into fixed-size blocks called

child : frames

child : and logical memory into blocks of same size called

child : pages

child : . While executing the process the required pages of that process are loaded into available frames from their source that is disc or any backup storage device.

Using .contents

It will also return all the child tag and store them in memory.

Example

Python3

from bs4 import BeautifulSoup 
import requests 
  
# sample web page 
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page 
page = requests.get(sample_web_page) 
  
# with the help of beautifulSoup and html parser create soup 
soup = BeautifulSoup(page.content, "html.parser") 
  
child_soup = soup.find('p') 
  
print("child :  ", child_soup.contents) 

Output

child : [<a href=”https://www.geeksforgeeks.org/paging-in-operating-system/” rel=”noopener” target=”_blank”>Paging</a>, ‘ is a memory management scheme which allows physical address space of a process to be non-contiguous. The basic idea of paging is to break physical memory into fixed-size blocks called ‘, frames, ‘ and logical memory into blocks of same size called ‘, pages, ‘. While executing the process the required pages of that process are loaded into available frames from their source that is disc or any backup storage device.’]

Suggest improvement

Find the text of the given tag using BeautifulSoup

Grouping and Aggregating with Pandas

Share your thoughts in the comments

Retrieve children of the html tag using BeautifulSoup

For First Child:

Python3

For all children:

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?