Prerequisites: Apriori Algorithm
Apriori Algorithm is a Machine Learning algorithm which is used to gain insight into the structured relationships between different items involved. The most prominent practical application of the algorithm is to recommend products based on the products already present in the user’s cart. Walmart especially has made great use of the algorithm in suggesting products to it’s users.
Dataset : Groceries data
Implementation of algorithm in Python:
Step 1: Importing the required libraries
Python3
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
|
Step 2: Loading and exploring the data
Python3
cd C:\Users\Dev\Desktop\Kaggle\Apriori Algorithm
data = pd.read_excel( 'Online_Retail.xlsx' )
data.head()
|



Step 3: Cleaning the Data
Python3
data[ 'Description' ] = data[ 'Description' ]. str .strip()
data.dropna(axis = 0 , subset = [ 'InvoiceNo' ], inplace = True )
data[ 'InvoiceNo' ] = data[ 'InvoiceNo' ].astype( 'str' )
data = data[~data[ 'InvoiceNo' ]. str .contains( 'C' )]
|
Step 4: Splitting the data according to the region of transaction
Python3
basket_France = (data[data[ 'Country' ] = = "France" ]
.groupby([ 'InvoiceNo' , 'Description' ])[ 'Quantity' ]
. sum ().unstack().reset_index().fillna( 0 )
.set_index( 'InvoiceNo' ))
basket_UK = (data[data[ 'Country' ] = = "United Kingdom" ]
.groupby([ 'InvoiceNo' , 'Description' ])[ 'Quantity' ]
. sum ().unstack().reset_index().fillna( 0 )
.set_index( 'InvoiceNo' ))
basket_Por = (data[data[ 'Country' ] = = "Portugal" ]
.groupby([ 'InvoiceNo' , 'Description' ])[ 'Quantity' ]
. sum ().unstack().reset_index().fillna( 0 )
.set_index( 'InvoiceNo' ))
basket_Sweden = (data[data[ 'Country' ] = = "Sweden" ]
.groupby([ 'InvoiceNo' , 'Description' ])[ 'Quantity' ]
. sum ().unstack().reset_index().fillna( 0 )
.set_index( 'InvoiceNo' ))
|
Step 5: Hot encoding the Data
Python3
def hot_encode(x):
if (x< = 0 ):
return 0
if (x> = 1 ):
return 1
basket_encoded = basket_France.applymap(hot_encode)
basket_France = basket_encoded
basket_encoded = basket_UK.applymap(hot_encode)
basket_UK = basket_encoded
basket_encoded = basket_Por.applymap(hot_encode)
basket_Por = basket_encoded
basket_encoded = basket_Sweden.applymap(hot_encode)
basket_Sweden = basket_encoded
|
Step 6: Building the models and analyzing the results
a) France:
Python3
frq_items = apriori(basket_France, min_support = 0.05 , use_colnames = True )
rules = association_rules(frq_items, metric = "lift" , min_threshold = 1 )
rules = rules.sort_values([ 'confidence' , 'lift' ], ascending = [ False , False ])
print (rules.head())
|

From the above output, it can be seen that paper cups and paper and plates are bought together in France. This is because the French have a culture of having a get-together with their friends and family atleast once a week. Also, since the French government has banned the use of plastic in the country, the people have to purchase the paper-based alternatives.
b) United Kingdom:
Python3
frq_items = apriori(basket_UK, min_support = 0.01 , use_colnames = True )
rules = association_rules(frq_items, metric = "lift" , min_threshold = 1 )
rules = rules.sort_values([ 'confidence' , 'lift' ], ascending = [ False , False ])
print (rules.head())
|

If the rules for British transactions are analyzed a little deeper, it is seen that the British people buy different colored tea-plates together. A reason behind this may be because typically the British enjoy tea very much and often collect different colored tea-plates for different occasions.
c) Portugal:
Python3
frq_items = apriori(basket_Por, min_support = 0.05 , use_colnames = True )
rules = association_rules(frq_items, metric = "lift" , min_threshold = 1 )
rules = rules.sort_values([ 'confidence' , 'lift' ], ascending = [ False , False ])
print (rules.head())
|

On analyzing the association rules for Portuguese transactions, it is observed that Tiffin sets (Knick Knack Tins) and color pencils. These two products typically belong to a primary school going kid. These two products are required by children in school to carry their lunch and for creative work respectively and hence are logically make sense to be paired together.
d) Sweden:
Python3
frq_items = apriori(basket_Sweden, min_support = 0.05 , use_colnames = True )
rules = association_rules(frq_items, metric = "lift" , min_threshold = 1 )
rules = rules.sort_values([ 'confidence' , 'lift' ], ascending = [ False , False ])
print (rules.head())
|

On analyzing the above rules, it is found that boys’ and girls’ cutlery are paired together. This makes practical sense because when a parent goes shopping for cutlery for his/her children, he/she would want the product to be a little customized according to the kid’s wishes.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
11 Jan, 2023
Like Article
Save Article