In this article, we are going to discuss introduction of the SON algorithm and map- reduce. Also, we will cover the First Map and First reduce and Second Map and Second Reduce. So let’s discuss it.
The SON algorithm :
The SON algorithm impart itself well to a parallel – computing environment. Each of the chunk can be treated in parallel, and the frequent Itemsets from each chunk unite to form the candidates.
You can dispense the candidates to many processors, have each processor count the support for each candidate in subset of the baskets, and at the end sum those supports to get the support for each candidate itemset in the complete dataset.
This procedure does not have to be carry out in map-reduce, but there is a natural way of indicating each of the two passes as a map -reduce operation. We shall abridge this map-reduce sequence below.
- First Map Function :
The allotted subset of the baskets is taken and frequent Itemsets in the subset using simple randomized algorithm is identified. Considering that algorithm, lower the support threshold from s to ps if each map task to get gets fraction p of the complete feed in file . The result is a set of key-value pairs (F, 1), where F is a frequent itemset from the specimen . The merit is always 1 and is immaterial.
- First Reduce Function :
Each reduce chore is allocated a set of keys, which are Itemsets. The worth is disregarded, and the reduce job simply produces those Itemsets that come into view one or more times . Thus, the result of the first reduce function is the candidate Itemsets.
- Second Map Function :
The map tasks for the second map function take all the output from the first reduce function (the candidate Itemsets) and a section of the input data file. Each map task counts the number of occurrences of each of the candidate Itemsets among the baskets in the section of the dataset that it was allocated. In this second map function (C, v) is the key pair value set will be the output, and where you can see the following parameters as follows.
C – It is one of candidate sets.
v – It is the support for the itemset included in the baskets that were input to the map task.
- Second Reduce Function :
The reduce tasks take the Itemsets they are provided as keys and aggregate the analogous values. The result is the complete support for each of the Itemsets that the reduce task was provided to handle. Those Itemsets whose sum of values is at least s are frequent in the entire dataset . So the reduce task outputs these Itemsets with their sum up . Itemsets that do not have total support at least s are not broadcasted to the output of the reduce task.