Open In App

Association Rule Mining in R Programming

Improve
Improve
Like Article
Like
Save
Share
Report

Association Rule Mining in R Language is an Unsupervised Non-linear algorithm to uncover how the items are associated with each other. In it, frequent Mining shows which items appear together in a transaction or relation. It’s majorly used by retailers, grocery stores, an online marketplace that has a large transactional database. The same way when any online social media, marketplace, and e-commerce websites know what you buy next using recommendations engines. The recommendations you get on item or variable, while you check out the order is because of Association rule mining boarded on past customer data. There are three common ways to measure association:

  • Support
  • Confidence
  • Lift

Theory

In association rule mining, Support, Confidence, and Lift measure association.

Support says how popular an item is, as measured in the proportion of transactions in which an item set appears.

Confidence says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}.
Thus it is measured by the proportion of transaction with item X in which item Y also appears. Confidence might misrepresent the importance of association.

Lift says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

Apriori Algorithm is also used in association rule mining for discovering frequent itemsets in the transactions database. It was proposed by Agrawal & Srikant in 1993.

Example:
A customer does 4 transactions with you. In the first transaction, she buys 1 apple, 1 beer, 1 rice, and 1 chicken. In the second transaction, she buys 1 apple, 1 beer, 1 rice. In the third transaction, she buys 1 apple, 1 beer only. In fourth transactions, she buys 1 apple and 1 orange.

Support(Apple) = 4/4 

So, Support of {Apple} is 4 out of 4 or 100%

Confidence(Apple -> Beer) =  Support(Apple, Beer)/Support(Apple)
                          = (3/4)/(4/4)
                          = 3/4

So, Confidence of {Apple -> Beer} is 3 out of 4 or 75%

Lift(Beer -> Rice) = Support(Beer, Rice)/(Support(Beer) * Support(Rice))
                   = (2/4)/(3/4) * (2/4)
                   = 1.33

So, Lift value is greater than 1 implies Rice is likely to be bought if Beer is bought.

The Dataset

Market Basket dataset consists of 15010 observations with Date, Time, Transaction and Item feature or columns. The date variable or column ranges from 30/10/2016 to 09/04/2017. Time is a categorical variable that tells the time. Transaction is a quantitative variable that helps in differentiation of transactions. Item is a categorical variable that links with a product.




# Loading data
dataset = read.transactions('Market_Basket_Optimisation.csv'
                           sep = ', ', rm.duplicates = TRUE)
  
# Structure 
str(dataset)


Performing Association Rule Mining on Dataset

Using the Association Rule Mining algorithm on the dataset which includes 15010 observations.




# Installing Packages
install.packages("arules")
install.packages("arulesViz")
  
# Loading package
library(arules)
library(arulesViz)
  
# Fitting model
# Training Apriori on the dataset
set.seed = 220 # Setting seed
associa_rules = apriori(data = dataset, 
                        parameter = list(support = 0.004
                                         confidence = 0.2))
  
# Plot
itemFrequencyPlot(dataset, topN = 10)
  
# Visualising the results
inspect(sort(associa_rules, by = 'lift')[1:10])
plot(associa_rules, method = "graph"
     measure = "confidence", shading = "lift")


Output:

  • Model associa_rules:

    The model minimum length is 1, the maximum length is 10, and the target rules with absolute support count 30.

  • Item Frequency Plot:

    So, mineral water is the best selling product followed by eggs, spaghetti, french fries, etc.

  • Visualizing the model:

    So, the plot of graphs of 100 is displayed.

So, Association rule mining is widely used in Recommendation systems in E-Commerce, online marketplace and Social Media websites, etc, and widely used in the industry.



Last Updated : 22 Jun, 2020
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads