Open In App

BeautifulSoup – Search by text inside a tag

Prerequisites: Beautifulsoup

Beautifulsoup is a powerful python module used for web scraping. This article discusses how a specific text can be searched inside a given tag.



INTRODUCTION:

BeautifulSoup is a Python library for parsing HTML and XML documents. It provides a simple and intuitive API for navigating, searching, and modifying the parse tree of an HTML or XML document. It is designed to make it easy to extract data from web pages, and can be used for web scraping, data mining, and other types of data extraction tasks. It is built on the top of the powerful lxml parser, which is very fast and efficient.

BeautifulSoup is often used in combination with other Python libraries such as requests or Selenium to automate the process of downloading and parsing web pages. It can handle malformed or incomplete HTML, which is common in the real world, and provides several methods for searching for tags based on their contents, such as find(), find_all(), and select(). It also has built-in support for searching for tags using regular expressions.



Overall, BeautifulSoup is a valuable tool for anyone working with HTML or XML data, providing a simple and intuitive API for parsing and manipulating data, and it’s widely used in web scraping and data mining tasks.

Approach

We will see search text inside a tag by two method.

Method 1: iterative 

This method uses for loop for to search for the text.

Example




from bs4 import BeautifulSoup
import requests
 
# sample web page
 
# call get method to request that page
page = requests.get(sample_web_page)
 
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
 
child_soup = soup.find_all('strong')
 
text = 'page table base register (PTBR)'
 
# we will search the tag with in which text is same as given text
for i in child_soup:
    if(i.string == text):
        print(i)

Output

<strong>page table base register (PTBR)</strong>

Method 2: Using lambda

It is a one liner alternative of the above example.

Example




from bs4 import BeautifulSoup
import requests
 
# sample web page
 
# call get method to request that page
page = requests.get(sample_web_page)
 
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
 
text = 'CS Theory Course'
 
# Search by text with the help of lambda function
gfg = soup.find_all(lambda tag: tag.name == "strong" and text in tag.text)
 
print(gfg)

Output

[<strong>CS Theory Course</strong>]

 

IMPORTANTS POINTS:

Here are some important points to consider when using BeautifulSoup to search for text inside a tag:

BeautifulSoup provides several methods for searching for tags based on their contents, such as find(), find_all(), and select().
The find_all() method returns a list of all tags that match a given filter, while the find() method returns the first tag that matches the filter.
You can use the text keyword argument to search for tags that contain specific text.
You can also use regular expressions to search for text inside a tag.

ADVANTAGES AND DISADVANTAGES:

Advantages of BeautifulSoup:

Disadvantages of BeautifulSoup:


Article Tags :