The reduce() method is a higher-order function that takes all the elements in a collection (Array, List, etc) and combines them using a binary operation to produce a single value. It is necessary to make sure that operations are commutative and associative. Anonymous functions are passed as parameter to the reduce function.
val l = List(2, 5, 3, 6, 4, 7) // returns the largest number from the collection l.reduce((x, y) => x max y)
The order in which numbers are selected for operation by the reduce method is random. This is the reason why non-commutative and non-associative operations are not preferred.
In the above program, the reduce method selects random pairs and finds out the maximum value in a particular pair. These values are again compared with each other until a single maximum valued element is obtained. We generally make use of the reduce() method along with the map() method while working with Resilient Distributed Datasets in Spark. The map() method helps us to transform a collection to another collection while the reduce() method allows us to perform some action.
Finding average using map() and reduce():
(21, 4) Average= 5.25
In the above program, all elements of the collection are transformed into tuples with two elements. First element of the tuple is the number itself and the second element is the counter. Initially all counters are set to 1. The output itself is a tuple with two elements: first value is the sum and the second value is the number of elements.
Note: Type of output given by reduce() method is same as the type of elements of the collection.