ASSOCIATION IN R
Description of Association
Association rule mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently a itemset occurs in a transaction. A typical example is Market Based Analysis.
Market Based Analysis is one of the key techniques used by large relations to show associations between items. It allows retailers to identify relationships between the items that people buy together frequently.
PROBLEM - Groceries Data Set
Source Code
install.packages("arules")
library(arules)
data(Groceries)
class(Groceries)
inspect(head(Groceries, 2))
grocery_rules <- apriori(Groceries, parameter = list(support = 0.01, confidence = 0.5))
inspect(head(sort(grocery_rules, by = "confidence"), 3))
wholemilk_rules <- apriori(data=Groceries, parameter=list (supp=0.001,conf = 0.08), appearance = list (rhs="whole milk"))
inspect(head(sort(wholemilk_rules, by = "confidence"), 3))
grocery_rules_increased_support <- apriori(Groceries, parameter = list(support = 0.02, confidence = 0.5))
inspect(head(sort(grocery_rules_increased_support, by = "confidence"), 3))
Output
To perform Association Rule Mining, we use the arules package in R.
The Groceries data comes with the arules pkg.
Since association mining deals with transactions, the data has to be converted to one of class transactions, made available in R through the arules pkg. This is a necessary step because the apriori() function accepts transactions data of class transactions only.
Unlike dataframe, using head(Groceries) does not display the transaction items in the data. To view the transactions, use the inspect() function instead.
Adjust the maxlen, supp and conf arguments in the apriori function to control the number of rules generated.
minval is the minimum value of the spport an itemset should satisfy to be a part of a rule.
smax is the maximum support value for an itemset.
arem is an Additional Rule Evaluation Parameter. In the above code we have constrained the number of rules using Support and Confidence. There are several other ways to constrain the rules using the arem parameter in the function and we will discuss more about it later in the article.
aval is a logical indicating whether to return the additional rule evaluation measure selected with arem.
originalSupport The traditional support value only considers both LHS and RHS items for calculating support. If you want to use only the LHS items for the calculation then you need to set this to FALSE.
maxtime is the maximum amount of time allowed to check for subsets.
minlen is the minimum number of items required in the rule.
maxlen is the maximum number of items that can be present in the rule.
If you want to get stronger rules, you have to increase the confidence. If you want lengthier rules increase the maxlen parameter. If you want to eliminate shorter rules, decrease the minlen parameter.
The rules with confidence of 1 imply that, whenever the LHS item was purchased, the RHS item was also purchased 100% of the time.
A rule with a lift of 3.91 imply that, the items in LHS and RHS are 3.91 times more likely to be purchased together compared to the purchases when they are assumed to be unrelated.
Comments
Post a Comment