Implementing Apriori algorithm in Python

Prerequisites: Apriori Algorithm
Apriori Algorithm is a Machine Learning algorithm which is used to gain insight into the structured relationships between different items involved. The most prominent practical application of the algorithm is to recommend products based on the products already present in the user’s cart. Walmart especially has made great use of the algorithm in suggesting products to it’s users.

Dataset : Groceries data

Implementation of algorithm in Python:
Step 1: Importing the required libraries

Python3

import numpy as np 

import pandas as pd 

from mlxtend.frequent_patterns import apriori, association_rules 

Step 2: Loading and exploring the data

Python3

# Changing the working location to the location of the file 
cd C:\Users\Dev\Desktop\Kaggle\Apriori Algorithm 

  
# Loading the Data 

data = pd.read_excel('Online_Retail.xlsx') 
data.head() 

Python3

# Exploring the columns of the data 
data.columns 

Python3

# Exploring the different regions of transactions 
data.Country.unique() 

Step 3: Cleaning the Data

Python3

# Stripping extra spaces in the description 

data['Description'] = data['Description'].str.strip() 

# Dropping the rows without any invoice number 

data.dropna(axis = 0, subset =['InvoiceNo'], inplace = True) 

data['InvoiceNo'] = data['InvoiceNo'].astype('str') 

# Dropping all transactions which were done on credit 

data = data[~data['InvoiceNo'].str.contains('C')]

Step 4: Splitting the data according to the region of transaction

Python3

# Transactions done in France 

basket_France = (data[data['Country'] =="France"] 

          .groupby(['InvoiceNo', 'Description'])['Quantity'] 

          .sum().unstack().reset_index().fillna(0) 

          .set_index('InvoiceNo')) 

# Transactions done in the United Kingdom 

basket_UK = (data[data['Country'] =="United Kingdom"] 

          .groupby(['InvoiceNo', 'Description'])['Quantity'] 

          .sum().unstack().reset_index().fillna(0) 

          .set_index('InvoiceNo')) 

# Transactions done in Portugal 

basket_Por = (data[data['Country'] =="Portugal"] 

          .groupby(['InvoiceNo', 'Description'])['Quantity'] 

          .sum().unstack().reset_index().fillna(0) 

          .set_index('InvoiceNo')) 

basket_Sweden = (data[data['Country'] =="Sweden"] 

          .groupby(['InvoiceNo', 'Description'])['Quantity'] 

          .sum().unstack().reset_index().fillna(0) 

          .set_index('InvoiceNo'))

Step 5: Hot encoding the Data

Python3

# Defining the hot encoding function to make the data suitable  
# for the concerned libraries 

def hot_encode(x): 

    if(x<= 0): 

        return 0

    if(x>= 1): 

        return 1

# Encoding the datasets 

basket_encoded = basket_France.applymap(hot_encode) 

basket_France = basket_encoded 

basket_encoded = basket_UK.applymap(hot_encode) 

basket_UK = basket_encoded 

basket_encoded = basket_Por.applymap(hot_encode) 

basket_Por = basket_encoded 

basket_encoded = basket_Sweden.applymap(hot_encode) 

basket_Sweden = basket_encoded

Step 6: Building the models and analyzing the results
a) France:

Python3

# Building the model 

frq_items = apriori(basket_France, min_support = 0.05, use_colnames = True) 

# Collecting the inferred rules in a dataframe 

rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 

rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 

print(rules.head())

From the above output, it can be seen that paper cups and paper and plates are bought together in France. This is because the French have a culture of having a get-together with their friends and family atleast once a week. Also, since the French government has banned the use of plastic in the country, the people have to purchase the paper-based alternatives.
b) United Kingdom:

Python3

frq_items = apriori(basket_UK, min_support = 0.01, use_colnames = True) 

rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 

rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 

print(rules.head())

If the rules for British transactions are analyzed a little deeper, it is seen that the British people buy different colored tea-plates together. A reason behind this may be because typically the British enjoy tea very much and often collect different colored tea-plates for different occasions.
c) Portugal:

Python3

frq_items = apriori(basket_Por, min_support = 0.05, use_colnames = True) 

rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 

rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 

print(rules.head())

On analyzing the association rules for Portuguese transactions, it is observed that Tiffin sets (Knick Knack Tins) and color pencils. These two products typically belong to a primary school going kid. These two products are required by children in school to carry their lunch and for creative work respectively and hence are logically make sense to be paired together.
d) Sweden:

Python3

frq_items = apriori(basket_Sweden, min_support = 0.05, use_colnames = True) 

rules = association_rules(frq_items, metric ="lift", min_threshold = 1) 

rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False]) 

print(rules.head())

On analyzing the above rules, it is found that boys’ and girls’ cutlery are paired together. This makes practical sense because when a parent goes shopping for cutlery for his/her children, he/she would want the product to be a little customized according to the kid’s wishes.

Article Tags :

Machine Learning

Python