Perquisites:
Scraping is an essential technique which helps us to retrieve useful data from a URL or a html file that can be used in another manner. The given article shows how to extract paragraph from a URL and save it as a text file.
Modules Needed
bs4: Beautiful Soup(bs4) is a Python library used for getting data from HTML and XML files. It can be installed as follows:
pip install bs4
urllib: urllib is a package that collects several modules for working with URLs. It can also be installed the same way, it is most of the in-built in the environment itself.
pip install urllib
Approach:
- Create a text file.
- Now for the program, import required module and pass URL and **.txt file path. This will make a copy of html code of that URL in your local machine.
- Make requests instance and pass into URL
- Open file in read mode and pass required parameter(s).
- Pass the requests into a Beautifulsoup() function.
- Create another file(or you can also write/append in existing file).
- Then we can iterate, and find all the ‘p’ tags, and print each of the paragraph in our text file.
The implementation is given below:
Example:
Python3
import urllib.request
from bs4 import BeautifulSoup
"/home/gpt/PycharmProjects/pythonProject1/test/text_file.txt" )
file = open ( "text_file.txt" , "r" )
contents = file .read()
soup = BeautifulSoup(contents, 'html.parser' )
f = open ( "test1.txt" , "w" )
for data in soup.find_all( "p" ):
sum = data.get_text()
f.writelines( sum )
f.close()
|
Output:

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
13 Jan, 2023
Like Article
Save Article