Skip to content
Related Articles

Related Articles

Extract all the URLs from the webpage Using Python

View Discussion
Improve Article
Save Article
  • Last Updated : 07 Jan, 2021

Scraping is a very essential skill for everyone to get data from any website. In this article, we are going to write Python scripts to extract all the URLs from the website or you can save it as a CSV file.

Module Needed:

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
  • requests:  Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests

Approach:

  • Import module
  • Make requests instance and pass into URL
  • Pass the requests into a Beautifulsoup() function
  • Use ‘a’ tag to find them all tag (‘a href ‘)

Example 1:

Python3




import requests
from bs4 import BeautifulSoup
 
 
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
 
urls = []
for link in soup.find_all('a'):
    print(link.get('href'))

Output:

Example 2:

Extracting URLs and save as CSV files.

Python3




import requests
from bs4 import BeautifulSoup
 
grab = requests.get(urls)
soup = BeautifulSoup(grab.text, 'html.parser')
 
# opening a file in write mode
f = open("test1.txt", "w")
# traverse paragraphs from soup
for link in soup.find_all("a"):
   data = link.get('href')
   f.write(data)
   f.write("\n")
 
f.close()

Output:

 


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!