Get all HTML tags with BeautifulSoup
Last Updated :
25 Feb, 2021
Web scraping is a process of using bots like software called web scrapers in extracting information from HTML or XML content. Beautiful Soup is one such library used for scraping data through python. Beautiful Soup parses through the HTML content of the web page and collects it to provide iteration, searching and modification features on it. To provide these functionalities it works with a parser that converts the content to a parse tree. Using a parser you are comfortable with It’s fairly easy to crawl through the web pages using BeautifulSoup.
To get all the HTML tags of a web page using the BeautifulSoup library first import BeautifulSoup and requests library to make a GET request to the web page.
Step-by-step Approach:
Python3
from bs4 import BeautifulSoup
import requests
|
- After importing the library now assign a URL variable with the URL of the web page and make a GET request to fetch the raw HTML content:
Python3
html_content = requests.get(url).text
|
- Now parse the HTML content:
Python3
soup = BeautifulSoup(html_content, "html.parser" )
|
- Now to get all the HTML tags of the web page run a loop for the .name attribute of the tag using the find_all() function:
Python3
[tag.name for tag in soup.find_all()]
|
Below is the complete program:
Python3
from bs4 import BeautifulSoup
import requests
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "html.parser" )
[tag.name for tag in soup.find_all()]
|
Output:
['html',
'head',
'meta',
'meta',
'meta',
'link',
'meta',
'meta',
'meta',
'meta',
'meta',
'script',
'script',
'link',
'title',
'link',
'link',
'script',
'script']
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...