How to Scrape Paragraphs using Python?

Prerequisite: Implementing Web Scraping in Python with BeautifulSoup

In this article, we are going to see how we extract all the paragraphs from the given HTML document or URL using python.

Module Needed:

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.

pip install bs4

requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.

pip install requests

Approach:

Import module
Create an HTML document and specify the ‘<p>’ tag into the code
Pass the HTML document into the Beautifulsoup() function
Use the ‘P’ tag to extract paragraphs from the Beautifulsoup object
Get text from the HTML document with get_text().

Code:

Python3

# import module 

from bs4 import BeautifulSoup 

# Html doc 

html_doc = """ 
<html> 
<head> 
<title>Geeks</title> 
</head> 
<body> 
<h2>paragraphs</h2> 

<p>Welcome geeks.</p> 

<p>Hello geeks.</p> 

</body> 
</html> 
"""

soup = BeautifulSoup(html_doc, 'html.parser') 

# traverse paragraphs from soup 

for data in soup.find_all("p"): 

    print(data.get_text())

Output:

Welcome geeks.
Hello geeks.

Now Lets Extract Paragraphs from the given URL.

Code:

Python3

# import module 

import requests 

import pandas as pd 

from bs4 import BeautifulSoup 

# link for extract html data 

def getdata(url): 

    r = requests.get(url) 

    return r.text 

htmldata = getdata("https://www.geeksforgeeks.org/") 

soup = BeautifulSoup(htmldata, 'html.parser') 

data = '' 

for data in soup.find_all("p"): 

    print(data.get_text())

Output:

Article Tags :

Python

Python web-scraping-exercises

python-utility

Web-scraping