Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Scrape IMDB movie rating and details using Python

  • Difficulty Level : Medium
  • Last Updated : 01 Jun, 2021

We can scrape the IMDb movie ratings and their details with the help of the BeautifulSoup library of Python. 

Modules Needed:

Below is the list of modules required to scrape from IMDB.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

  • requests: Requests library is an integral part of Python for making HTTP requests to a specified URL. Whether it be REST APIs or Web Scrapping, requests is must to be learned for proceeding further with these technologies. When one makes a request to a URI, it returns a response.
  • html5lib: A pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
  • bs4: BeautifulSoup object is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster.

Approach:

Steps to implement web scraping in python to extract IMDb movie ratings and its ratings:



  • Import the required modules.

Python3




from bs4 import BeautifulSoup
import requests
import re

 
 

  • Access the HTML content from the webpage by assigning the URL and creating a soap object.

 

Python3




# Downloading imdb top 250 movie's data
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
  • Extract the movie ratings and their details. Here, we are extracting data from BeautifulSoup object using Html tags like href, title, etc.

Python3




movies = soup.select('td.titleColumn')
links = [a.attrs.get('href') for a in soup.select('td.titleColumn a')]
crew = [a.attrs.get('title') for a in soup.select('td.titleColumn a')]
ratings = [b.attrs.get('data-value') for b in soup.select('td.posterColumn span[name=ir]')]
votes = [b.attrs.get('data-value') for b in soup.select('td.ratingColumn strong')]

 
 

  • After extracting the movie details, create an empty list and store the details in a dictionary, and then adding them to a list.

 

Python3




# create a empty list for storing
# movie information
list = []
 
# Iterating over movies to extract
# each movie's details
for index in range(0, len(movies)):
   
    # Separating  movie into: 'place',
    # 'title', 'year'
    movie_string = movies[index].get_text()
    movie = (' '.join(movie_string.split()).replace('.', ''))
    movie_title = movie[len(str(index))+1:-7]
    year = re.search('\((.*?)\)', movie_string).group(1)
    place = movie[:len(str(index))-(len(movie))]
     
    data = {"movie_title": movie_title,
            "year": year,
            "place": place,
            "star_cast": crew[index],
            "rating": ratings[index],
            "vote": votes[index],
            "link": links[index]}
    list.append(data)
  • Now or list is filled with top IMBD movies along with their details. Finally display the list of movie details

Python3




for movie in list:
    print(movie['place'], '-', movie['movie_title'], '('+movie['year'] +
          ') -', 'Starring:', moive['star_cast'], movie['rating'])

Complete Code:

Python3




from bs4 import BeautifulSoup
import requests
import re
 
 
# Downloading imdb top 250 movie's data
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
 
movies = soup.select('td.titleColumn')
links = [a.attrs.get('href') for a in soup.select('td.titleColumn a')]
crew = [a.attrs.get('title') for a in soup.select('td.titleColumn a')]
 
ratings = [b.attrs.get('data-value')
           for b in soup.select('td.posterColumn span[name=ir]')]
 
votes = [b.attrs.get('data-value')
         for b in soup.select('td.ratingColumn strong')]
 
list = []
 
# create a empty list for storing
# movie information
list = []
 
# Iterating over movies to extract
# each movie's details
for index in range(0, len(movies)):
     
    # Separating  movie into: 'place',
    # 'title', 'year'
    movie_string = movies[index].get_text()
    movie = (' '.join(movie_string.split()).replace('.', ''))
    movie_title = movie[len(str(index))+1:-7]
    year = re.search('\((.*?)\)', movie_string).group(1)
    place = movie[:len(str(index))-(len(movie))]
    data = {"movie_title": movie_title,
            "year": year,
            "place": place,
            "star_cast": crew[index],
            "rating": ratings[index],
            "vote": votes[index],
            "link": links[index]}
    list.append(data)
 
# printing movie details with its rating.
for movie in list:
    print(movie['place'], '-', movie['movie_title'], '('+movie['year'] +
          ') -', 'Starring:', movie['star_cast'], movie['rating'])

Output:




My Personal Notes arrow_drop_up
Recommended Articles
Page :