Download XKCD Comics using Python
Last Updated :
02 Aug, 2022
In this article, we are going to know how to download XKCD Comics using Python.
XKCD is a webcomic of the varied genre that consists of sarcasm, mathematics, language, Python, and many more. This website consists of many curious comics and sometimes user wants to save that comic image on their local devices. Doing so manually is a very exhausting process because to download the comic images of “XKCD Comics” a user has to visit every page of the comic website “https://xkcd.com/” to make it easy we are going to create a Python program that can download the page of the comic by entering their page number.
Required Modules
To download XKCD comics pages using Python, we need to install the beautifulsoup4 and requests module. To do so run the following commands in the command prompt.
pip install beautifulsoup4
pip install requests
Requests Module
The requests module is used to deal with HTTP requests to a specified URL. Whether it be Web Scrapping or REST APIs, this module must be learned to work with these technologies.
Beautifulsoup4 Module
The beautifulsoup4 module is used to scrape information from web pages. It helps to organize the unorganized web data by improving HTML and presenting it in an easily-traversable XML structure.
Stepwise implementation:
Step 1: Import all the required libraries and modules.
Python3
import requests as req
import os,bs4
|
Step 2: Store the URL of the XKCD website from where we have to download our comic page. Using the os.makedirs() method create a folder for storing the images in our local folder and also check for the folder if it already exists then store it in the same folder.
Python3
os.makedirs( 'xkcd' , exist_ok = True )
|
Step 3: Take the input from the user of “comic image number” and append it to the last of the URL declared in step 1 after that store the information in the ‘res’ variable of that URL using requests.get() after that store HTML page in variable ‘soup’ using bs4.beautifulSoup() method and then store the URL of the image using the soup.select() method that we have to download and at last make that URL complete by appending it with ‘http:’ to be used in the next step. We can also see that image URL by using inspecting the element in the web browser as shown in the below image.
Python3
n = input ( "Input the comic number " )
url + = n
print ( 'Downloading image from %s...' % url)
res = req.get(url)
soup = bs4.BeautifulSoup(res.text)
comicElem = soup.select( '#comic img' )
comicUrl = 'http:' + comicElem[ 0 ].get( 'src' )
|
Step 4: Requesting the information from the URL that we had made in the previous step and save it in the directory folder and store the binary mode file in the declared file and folder by using file handling in Python and methods of the os module.
Python3
res = req.get(comicUrl)
imageFile = open (os.path.join( 'xkcd' , os.path.basename(comicUrl)), 'wb' )
for chunk in res.iter_content( 1 ):
imageFile.write(chunk)
imageFile.close()
print ( 'Successfully downloaded' )
|
Final Code:
Python3
import requests as req
import os, bs4
os.makedirs( 'xkcd' , exist_ok = True )
n = input ( "Input the comic number " )
url + = n
print ( 'Downloading image from %s...' % url)
res = req.get(url)
soup = bs4.BeautifulSoup(res.text)
comicElem = soup.select( '#comic img' )
comicUrl = 'http:' + comicElem[ 0 ].get( 'src' )
res = req.get(comicUrl)
imageFile = open (os.path.join( 'xkcd' , os.path.basename(comicUrl)), 'wb' )
for chunk in res.iter_content( 1 ):
imageFile.write(chunk)
imageFile.close()
print ( 'Successfully downloaded' )
|
Output:
Share your thoughts in the comments
Please Login to comment...