How to Download All Images from a Web Page in Python?

Prerequisite:

Web scraping is a technique to fetch data from websites. While surfing on the web, many websites don’t allow the user to save data for personal use. One way is to manually copy-paste the data, which both tedious and time-consuming. Web Scraping is the automation of the data extraction process from websites. In this article we will discuss how we can download all images from a web page using python.

Modules Needed

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python.
requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python.
os: The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules. This module provides a portable way of using operating system dependent functionality.

Approach

Import module
Get HTML Code
Get list of img tags from HTML Code using findAll method in Beautiful Soup.

images = soup.findAll('img')

Create separate folder for downloading images using mkdir method in os.

os.mkdir(folder_name)

Iterate through all images and get the source URL of that image.
After getting the source URL, last step is download the image
Fetch Content of Image

r = requests.get(Source URL).content

Download image using File Handling

# Enter File Name with Extension like jpg, png etc..
with open("File Name","wb+") as f:
      f.write(r)

Program:

Python3

from bs4 import *

import requests

import os
 
# CREATE FOLDER

def folder_create(images):

    try:

        folder_name = input("Enter Folder Name:- ")

        # folder creation

        os.mkdir(folder_name)
 
    # if folder exists with that name, ask another name

    except:

        print("Folder Exist with that name!")

        folder_create()
 
    # image downloading start

    download_images(images, folder_name)
 
# DOWNLOAD ALL IMAGES FROM THAT URL

def download_images(images, folder_name):

    # initial count is zero

    count = 0
 
    # print total images found in URL

    print(f"Total {len(images)} Image Found!")
 
    # checking if images is not zero

    if len(images) != 0:

        for i, image in enumerate(images):

            # From image tag ,Fetch image Source URL
 
                        # 1.data-srcset

                        # 2.data-src

                        # 3.data-fallback-src

                        # 4.src
 
            # Here we will use exception handling
 
            # first we will search for "data-srcset" in img tag

            try:

                # In image tag ,searching for "data-srcset"

                image_link = image["data-srcset"]

            # then we will search for "data-src" in img 

            # tag and so on..

            except:

                try:

                    # In image tag ,searching for "data-src"

                    image_link = image["data-src"]

                except:

                    try:

                        # In image tag ,searching for "data-fallback-src"

                        image_link = image["data-fallback-src"]

                    except:

                        try:

                            # In image tag ,searching for "src"

                            image_link = image["src"]
 
                        # if no Source URL found

                        except:

                            pass
 
            # After getting Image Source URL

            # We will try to get the content of image

            try:

                r = requests.get(image_link).content

                try:
 
                    # possibility of decode

                    r = str(r, 'utf-8')
 
                except UnicodeDecodeError:
 
                    # After checking above condition, Image Download start

                    with open(f"{folder_name}/images{i+1}.jpg", "wb+") as f:

                        f.write(r)
 
                    # counting number of image downloaded

                    count += 1

            except:

                pass
 
        # There might be possible, that all

        # images not download

        # if all images download

        if count == len(images):

            print("All Images Downloaded!")

        # if all images not download

        else:

            print(f"Total {count} Images Downloaded Out of {len(images)}")
 
# MAIN FUNCTION START

def main(url):

    # content of URL

    r = requests.get(url)
 
    # Parse HTML Code

    soup = BeautifulSoup(r.text, 'html.parser')
 
    # find all images in URL

    images = soup.findAll('img')
 
    # Call folder create function

    folder_create(images)
 
# take url

url = input("Enter URL:- ")
 
# CALL MAIN FUNCTION
main(url)

Output:

Article Tags :

Python

Python BeautifulSoup

Python web-scraping-exercises

Python-requests

Web-scraping