Association Rule Learning and Apriori Algorithm Explanation with an Example

What is Association Rule Learning?

Beyza Cevik
2 min readDec 1, 2020

It is a rule-based machine learning technique.

What does it do?

It examines occurrences of events together and discovers relations between variables in databases.

Why is it useful?

It can be one of the best alternatives to solve a pattern-finding problem in categorical data. And also, new business strategies can be developed according to these association rules.

What are the key notations and measures of association rules?

Now we will see briefly essential notions in association rule learning from the following example;

Information: A customer who buys cereals tend to buy milk. (cereals =>milk)

Item: Each product in the basket of N items {Item1, Item2, …, ItemN}.
ex: Item1=milk etc.

Itemset: Group of items bought together in a transaction t.

Let freq(X): the number of o transactions containing item X and N be the total number of transactions.

Support: freq(Item1)

Confidence: freq(Item1, Item2) / freq(Item1)

Support and confidence values are utilized to measure an association rule. And also Lift, Conviction, All-confidence, etc. can be leveraged further.

How do we generate Association Rules?

There are algorithms such as Apriori and Eclat to generate association rules. We will learn about Apriori Algorithm.

Apriori Algorithm

Frequent Item Set: Itemset with a support value greater than a threshold

This algorithm assumes that; all the subsets of a frequent itemset must also be frequent.

Two main steps are followed,

  1. Join: Generate candidate itemsets.
  2. Prune: If the support values of an itemset are lower than the threshold, remove.

For example;

We have 4 transactions in our database and we have 5 different items.

Min. threshold support: 2

  1. 1-freq itemset from L1 does not contain D as freq(D)< min threshold.
  2. We join and generate candidate 2-freq itemsets C2 by combining items. And extract the occurrence time of 2-freq itemsets in the database.
  3. We prune the itemsets with low support value and get L2. (If supp. val< min threshold)
  4. We join L2 and create 3-freq unique itemsets C3. And it’s support value is 2 as {B,C,E} occurs 2 times in database.
  5. As there is only 1 element in our 3-freq itemset with an appropriate support value. We found our frequent itemset {B,C,E}.
  6. This rule can be added to the strong ruleset after computing and controlling confidence values.

If we adapt this to a real-life example lets say;

B: camera, C: memory card, and E: battery then you can put these on the same shelf or one of those items can be discounted to increase sales.

--

--