In this article, we are going to discuss introduction of the SON algorithm and map- reduce. Also, we will cover the First Map and First reduce and Second Map and Second Reduce. So let’s discuss it.
The SON algorithm :
The SON algorithm impart itself well to a parallel – computing environment. Each of the chunk can be treated in parallel, and the frequent Itemsets from each chunk unite to form the candidates.
You can dispense the candidates to many processors, have each processor count the support for each candidate in subset of the baskets, and at the end sum those supports to get the support for each candidate itemset in the complete dataset.
This procedure does not have to be carry out in map-reduce, but there is a natural way of indicating each of the two passes as a map -reduce operation. We shall abridge this map-reduce sequence below.
- First Map Function :
The allotted subset of the baskets is taken and frequent Itemsets in the subset using simple randomized algorithm is identified. Considering that algorithm, lower the support threshold from s to ps if each map task to get gets fraction p of the complete feed in file . The result is a set of key-value pairs (F, 1), where F is a frequent itemset from the specimen . The merit is always 1 and is immaterial.
- First Reduce Function :
Each reduce chore is allocated a set of keys, which are Itemsets. The worth is disregarded, and the reduce job simply produces those Itemsets that come into view one or more times . Thus, the result of the first reduce function is the candidate Itemsets.
- Second Map Function :
The map tasks for the second map function take all the output from the first reduce function (the candidate Itemsets) and a section of the input data file. Each map task counts the number of occurrences of each of the candidate Itemsets among the baskets in the section of the dataset that it was allocated. In this second map function (C, v) is the key pair value set will be the output, and where you can see the following parameters as follows.
C – It is one of candidate sets.
v – It is the support for the itemset included in the baskets that were input to the map task.
- Second Reduce Function :
The reduce tasks take the Itemsets they are provided as keys and aggregate the analogous values. The result is the complete support for each of the Itemsets that the reduce task was provided to handle. Those Itemsets whose sum of values is at least s are frequent in the entire dataset . So the reduce task outputs these Itemsets with their sum up . Itemsets that do not have total support at least s are not broadcasted to the output of the reduce task.
- Sum 2D array in Python using map() function
- Election algorithm and distributed processing
- Minimax Algorithm in Game Theory | Set 3 (Tic-Tac-Toe AI - Finding optimal move)
- Algorithm to generate positive rational numbers
- Apriori Algorithm
- Liang-Barsky Algorithm
- Linde-Buzo-Gray (LBG) Algorithm
- Detecting top nodes on a Social Network - The VoteRank Algorithm
- Basic Understanding of CURE Algorithm
- Basic understanding of Jarvis-Patrick Clustering Algorithm
- Bisecting K-Means Algorithm Introduction
- The Multistage Algorithm in Data Analytics
- Toivonen's algorithm in data analytics
- Univariate, Bivariate and Multivariate data and its analysis
- Endian order and binary files
- Look-and-Say Sequence
- How to add articles to "To Do" and "Done" lists on GeeksforGeeks?
- Finding the number of triangles amongst horizontal and vertical line segments
- Find k-th bit in a binary string created by repeated invert and append operations
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.