Download XKCD Comics using Python

Last Updated : 02 Aug, 2022

In this article, we are going to know how to download XKCD Comics using Python.

XKCD is a webcomic of the varied genre that consists of sarcasm, mathematics, language, Python, and many more. This website consists of many curious comics and sometimes user wants to save that comic image on their local devices. Doing so manually is a very exhausting process because to download the comic images of “XKCD Comics” a user has to visit every page of the comic website “https://xkcd.com/” to make it easy we are going to create a Python program that can download the page of the comic by entering their page number.

Required Modules

To download XKCD comics pages using Python, we need to install the beautifulsoup4 and requests module. To do so run the following commands in the command prompt.

pip install beautifulsoup4
pip install requests

Requests Module

The requests module is used to deal with HTTP requests to a specified URL. Whether it be Web Scrapping or REST APIs, this module must be learned to work with these technologies.

Beautifulsoup4 Module

The beautifulsoup4 module is used to scrape information from web pages. It helps to organize the unorganized web data by improving HTML and presenting it in an easily-traversable XML structure.

Stepwise implementation:

Step 1: Import all the required libraries and modules.

Python3

# Importing required modules 
import requests as req 
import os,bs4

Step 2: Store the URL of the XKCD website from where we have to download our comic page. Using the os.makedirs() method create a folder for storing the images in our local folder and also check for the folder if it already exists then store it in the same folder.

Python3

# Storing website URL 
url = 'https://xkcd.com/'
# Make Directory to store image 
os.makedirs('xkcd', exist_ok=True) 

Step 3: Take the input from the user of “comic image number” and append it to the last of the URL declared in step 1 after that store the information in the ‘res’ variable of that URL using requests.get() after that store HTML page in variable ‘soup’ using bs4.beautifulSoup() method and then store the URL of the image using the soup.select() method that we have to download and at last make that URL complete by appending it with ‘http:’ to be used in the next step. We can also see that image URL by using inspecting the element in the web browser as shown in the below image.

Python3

n=input("Input the comic number ") 
# Append the user comic number in the url 
url+=n 
print('Downloading image from %s...' % url) 
# Request the url from the web 
res = req.get(url)     
  
# Now Store the HTML page that is found in the url 
soup = bs4.BeautifulSoup(res.text) 
  
# Find the Element that contain the image tag 
comicElem = soup.select('#comic img') 
  
# Now get that source of the image and make it as a url 
comicUrl = 'http:' + comicElem[0].get('src')

Step 4: Requesting the information from the URL that we had made in the previous step and save it in the directory folder and store the binary mode file in the declared file and folder by using file handling in Python and methods of the os module.

Python3

# Request the url from the web 
res = req.get(comicUrl) 
  
# Save the file in the directory 
# wd means it is open for writing in binary mode 
imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')     
for chunk in res.iter_content(1):   
    # Writing the binary image file 
    imageFile.write(chunk) 
# Closing the binary image file 
imageFile.close()   
print('Successfully downloaded')

Final Code:

Python3

# Importing required modules 
import requests as req 
import os, bs4 
  
# Storing website URL 
url = 'https://xkcd.com/'
# Make Directory to store image 
os.makedirs('xkcd', exist_ok=True) 
# exist_ok prevent program from throwing an exception if the folder existed 
  
n = input("Input the comic number ") 
# Append the user comic number in the url 
url += n 
print('Downloading image from %s...' % url) 
# Request the url from the web 
res = req.get(url) 
  
# Now Store the HTML page that is found in the url 
soup = bs4.BeautifulSoup(res.text) 
  
# Find the Element that contain the image tag 
comicElem = soup.select('#comic img') 
  
# Now get that source of the image and make it as a url 
comicUrl = 'http:' + comicElem[0].get('src') 
  
# Request the url from the web 
res = req.get(comicUrl) 
  
# Save the file in the directory 
# wd means it is open for writing in binary mode 
imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb') 
for chunk in res.iter_content(1): 
    # Writing the binary image file 
    imageFile.write(chunk) 
# Closing the binary image file 
imageFile.close() 
print('Successfully downloaded')