Skip to content
Related Articles

Related Articles

How to extract paragraph from a website and save it as a text file?

View Discussion
Improve Article
Save Article
  • Last Updated : 29 Dec, 2020
View Discussion
Improve Article
Save Article

Perquisites:  

Scraping is an essential technique which helps us to retrieve useful data from a URL or a html file that can be used in another manner. The given article shows how to extract paragraph from a URL and save it as a text file.

Modules Needed

bs4: Beautiful Soup(bs4) is a Python library used for getting data from HTML and XML files. It can be installed as follows:

pip install bs4

urllib: urllib is a package that collects several modules for working with URLs. It can also be installed the same way, it is most of the in-built in the environment itself.

pip install urllib

Approach:

  • Create a text file.
  • Now for the program, import required module and pass URL and **.txt file path. This will make a copy of html code of that URL in your local machine.
  • Make requests instance and pass into URL
  • Open file in read mode and pass required parameter(s)
  • Pass the requests into a Beautifulsoup() function.
  • Create another file(or you can also write/append in existing file).
  • Then we can iterate, and find all the ‘p’ tags, and print each of the paragraph in our text file.

The implementation is given below:

Example:

Python3




import urllib.request
from bs4 import BeautifulSoup
  
# here we have to pass url and path
# (where you want to save ur text file)
                           "/home/gpt/PycharmProjects/pythonProject1/test/text_file.txt")
  
file = open("text_file.txt", "r")
contents = file.read()
soup = BeautifulSoup(contents, 'html.parser')
  
f = open("test1.txt", "w")
  
# traverse paragraphs from soup
for data in soup.find_all("p"):
    sum = data.get_text()
    f.writelines(sum)
  
f.close()

Output:

My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!