Open In App

IPL Score Prediction using Deep Learning

Since the dawn of the IPL in 2008, it has attracted viewers all around the globe. A high level of uncertainty and last-minute nail-biters have urged fans to watch the matches. Within a short period, the IPL has become the highest revenue-generating league in cricket. In a cricket match, we often see the scoreline showing the probability of the team winning based on the current match situation. This prediction is usually done with the help of data analytics. Before, when there were no advancements in machine learning, predictions were usually based on intuition or some basic algorithms. The above picture clearly tells you how bad it is to take run rate as a single factor to predict the final score in a limited-overs cricket match.

IPL Score Prediction

Being a cricket fan, visualizing the statistics of cricket is mesmerizing. We went through various blogs and found out patterns that could be used for predicting the score of IPL matches beforehand. 



Why Deep Learning?

We humans can’t easily identify patterns from huge data and thus here, machine learning and deep learning comes into play. It learns how the players and teams have performed against the opposite team previously and trains the model accordingly. Using only machine learning algorithm gives a moderate accuracy therefore we used deep learning which gives much better performance than our previous model and considers the attributes which can give accurate results.

Tools used:

Technology used:

Step-by-step implementation:

First, let’s import all the necessary libraries:






import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
import keras
import tensorflow as tf

Step 1: Loading the dataset!

When dealing with cricket data, it contains data from the year 2008 to 2017. The dataset can be downloaded from here. The dataset contain features like venue, date, batting and bowling team, names of batsman and bowler, wickets and more.

We imported both the datasets using .read_csv() method into a dataframe using pandas and displayed the first 5 rows of each dataset.




ipl = pd.read_csv('ipl_dataset.csv')
ipl.head()

Output:

    mid        date                  venue               bat_team  \
0 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders
1 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders
2 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders
3 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders
4 1 2008-04-18 M Chinnaswamy Stadium Kolkata Knight Riders
bowl_team batsman bowler runs wickets overs \
0 Royal Challengers Bangalore SC Ganguly P Kumar 1 0 0.1
1 Royal Challengers Bangalore BB McCullum P Kumar 1 0 0.2
2 Royal Challengers Bangalore BB McCullum P Kumar 2 0 0.2
3 Royal Challengers Bangalore BB McCullum P Kumar 2 0 0.3
4 Royal Challengers Bangalore BB McCullum P Kumar 2 0 0.4
runs_last_5 wickets_last_5 striker non-striker total
0 1 0 0 0 222
1 1 0 0 0 222
2 2 0 0 0 222
3 2 0 0 0 222
4 2 0 0 0 222

Step 3: Data Pre-processing

Dropping unimportant features




#Dropping certain features
df = ipl.drop(['date', 'runs', 'wickets', 'overs', 'runs_last_5', 'wickets_last_5','mid', 'striker', 'non-striker'], axis =1)

Further Pre-Processing




X = df.drop(['total'], axis =1)
y = df['total']

Label Encoding




#Label Encoding
 
from sklearn.preprocessing import LabelEncoder
 
# Create a LabelEncoder object for each categorical feature
venue_encoder = LabelEncoder()
batting_team_encoder = LabelEncoder()
bowling_team_encoder = LabelEncoder()
striker_encoder = LabelEncoder()
bowler_encoder = LabelEncoder()
 
# Fit and transform the categorical features with label encoding
X['venue'] = venue_encoder.fit_transform(X['venue'])
X['bat_team'] = batting_team_encoder.fit_transform(X['bat_team'])
X['bowl_team'] = bowling_team_encoder.fit_transform(X['bowl_team'])
X['batsman'] = striker_encoder.fit_transform(X['batsman'])
X['bowler'] = bowler_encoder.fit_transform(X['bowler'])

Train Test Split




# Train test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Feature Scaling




from sklearn.preprocessing import MinMaxScaler
 
scaler = MinMaxScaler()
 
# Fit the scaler on the training data and transform both training and testing data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Step 4: Define the Neural Network




# Define the neural network model
model = keras.Sequential([
    keras.layers.Input( shape=(X_train_scaled.shape[1],)),  # Input layer
    keras.layers.Dense(512, activation='relu'),  # Hidden layer with 512 units and ReLU activation
    keras.layers.Dense(216, activation='relu'),  # Hidden layer with 216 units and ReLU activation
    keras.layers.Dense(1, activation='linear'# Output layer with linear activation for regression
])
 
# Compile the model with Huber loss
huber_loss = tf.keras.losses.Huber(delta=1.0# You can adjust the 'delta' parameter as needed
model.compile(optimizer='adam', loss=huber_loss)  # Use Huber loss for regression

Step 5: Model Training




# Train the model
model.fit(X_train_scaled, y_train, epochs=50, batch_size=64, validation_data=(X_test_scaled, y_test))

Output:

Epoch 1/50
832/832 [==============================] - 4s 4ms/step - loss: 32.9487 - val_loss: 22.0690
Epoch 2/50
832/832 [==============================] - 3s 3ms/step - loss: 22.3249 - val_loss: 22.5012
Epoch 3/50
832/832 [==============================] - 3s 4ms/step - loss: 22.2967 - val_loss: 22.0187
Epoch 4/50
832/832 [==============================] - 3s 4ms/step - loss: 22.2845 - val_loss: 21.9685
Epoch 5/50
832/832 [==============================] - 3s 3ms/step - loss: 22.2155 - val_loss: 21.9134




model_losses = pd.DataFrame(model.history.history)
model_losses.plot()

Output:

Epoch vs Loss & Validation Loss

Step 6: Model Evaluation




# Make predictions
predictions = model.predict(X_test_scaled)
 
from sklearn.metrics import mean_absolute_error,mean_squared_error
mean_absolute_error(y_test,predictions)

Output:

9.62950576317203

Step 7: Let’s create an Interactive Widget




import ipywidgets as widgets
from IPython.display import display, clear_output
 
import warnings
warnings.filterwarnings("ignore")
 
venue = widgets.Dropdown(options=df['venue'].unique().tolist(),description='Select Venue:')
batting_team = widgets.Dropdown(options =df['bat_team'].unique().tolist(),  description='Select Batting Team:')
bowling_team = widgets.Dropdown(options=df['bowl_team'].unique().tolist(),  description='Select Batting Team:')
striker = widgets.Dropdown(options=df['batsman'].unique().tolist(), description='Select Striker:')
bowler = widgets.Dropdown(options=df['bowler'].unique().tolist(), description='Select Bowler:')
 
predict_button = widgets.Button(description="Predict Score")
 
def predict_score(b):
    with output:
        clear_output()  # Clear the previous output
         
 
        # Decode the encoded values back to their original values
        decoded_venue = venue_encoder.transform([venue.value])
        decoded_batting_team = batting_team_encoder.transform([batting_team.value])
        decoded_bowling_team = bowling_team_encoder.transform([bowling_team.value])
        decoded_striker = striker_encoder.transform([striker.value])
        decoded_bowler = bowler_encoder.transform([bowler.value])
 
 
        input = np.array([decoded_venue,  decoded_batting_team, decoded_bowling_team,decoded_striker, decoded_bowler])
        input = input.reshape(1,5)
        input = scaler.transform(input)
        #print(input)
        predicted_score = model.predict(input)
        predicted_score = int(predicted_score[0,0])
 
        print(predicted_score)

The widget-based interface allows you to interactively predict the score for specific match scenarios. Now, we have set up the button to trigger the predict_score function when clicked and display the widgets for venue, batting team , bowling team, striker and bowler.




predict_button.on_click(predict_score)
output = widgets.Output()
display(venue, batting_team, bowling_team, striker, bowler, predict_button, output)

Output:

We have predicted the score of the match between CSK and King XI Punjab in Punjab Cricket Stadium. The predicted score of the match is 183.

By harnessing the power of ML and DL, we have successfully predicted the cricket scores based on historical data. The model’s ability to predict cricket scores can be a valuable asset for IPL enthusiasts, teams, and analysts. It can provide insights into the dynamics of a match and help anticipate how different factors impact the final score.


Article Tags :