Scraping is an essential technique which helps us to retrieve useful data from a URL or a html file that can be used in another manner. The given article shows how to extract paragraph from a URL and save it as a text file.
bs4: Beautiful Soup(bs4) is a Python library used for getting data from HTML and XML files. It can be installed as follows:
pip install bs4
urllib: urllib is a package that collects several modules for working with URLs. It can also be installed the same way, it is most of the in-built in the environment itself.
pip install urllib
- Create a text file.
- Now for the program, import required module and pass URL and **.txt file path. This will make a copy of html code of that URL in your local machine.
- Make requests instance and pass into URL
- Open file in read mode and pass required parameter(s).
- Pass the requests into a Beautifulsoup() function.
- Create another file(or you can also write/append in existing file).
- Then we can iterate, and find all the ‘p’ tags, and print each of the paragraph in our text file.
The implementation is given below:
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.