Prune-and-Search | A Complexity Analysis Overview
The word “prune” means to reduce something by removing things that are not necessary. So, Prune-and-Search is an excellent algorithmic paradigm for solving various optimization problems. This approach was first suggested by Nimrod Megiddo in 1983. This approach always consists of several iterations. At each iteration, it discards a fraction, say f, of input data and then invokes the same algorithm recursively on the remaining data to solve the problem.
The main idea of this approach is to reduce the search space by pruning a fraction of the input elements and recurse on the remaining valid input elements. After some iterations, the size of input data will become so small that it can be solved by the brute-force method in constant time c’.
Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready. To complete your preparation from learning a language to DS Algo and many more, please refer Complete Interview Preparation Course.
Time Complexity Analysis for such algorithms:
Let the time required in each iteration be O(n^k) where-
n = size of input data k is some constant.
Let f = fraction of data be removed in each iteration. Recursively, the above approach can be written as-
T(n) = T((1-f)n) + O(n^k)
T(n) <= T((1-f)n) + c*n^k for very large value of n.
<= T((1-f)^2 * n) + c*n^k +c*((1-f)^k)*n^k
<= c’ + c*n^k + c*((1-f)^k)n^k + c*((1-f)^2k)n^k + ….. + c*((1-f)^pk)n^k
= c’ + c*n^k(1 + (1-f)^k + (1-f)^2k + ….. + (1-f)^pk).
Since (1-f) < 1, as n tends to very large number
Therefore, T(n) = O(n^k).
It shows that the time complexity of the whole process is of the same order as the time complexity of prune-and-search in each iteration. This approach can be used to analyze the algorithms of many known problems like Binary Search, finding Kth largest/smallest element from an unsorted array(The Selection Problem), 1-center problem(Smallest Enclosing Circle), solving linear programming in two variables, and so on.
1. Binary Search:
As we know that this technique is applied on a sorted list of data for searching the index of a particular value(say ‘val‘), in the given list. For this, we go to the middle element and compare it with val. If the middle element is equal to val then, we return this middle element. Otherwise, we prune one-half of the data and the same technique is used for the remaining elements. For detailed implementation see this.
Time Complexity Analysis:
In each step, since it is comparing val with only the middle element, so complexity for this step will be O(1) let say it c (any constant). And half of the list is removed, so T(n) = T(n/2) + O(1) if n>=2 Otherwise T(n) = O(1) = c.
In simple terms,
T(n) = T(n/2) + c
= T(n/4) + c + c
= T(n/8) + c + c + c
= T(1) + c + … + c + c
= k times c where k will be a constant.
Since half of input is discarded in each iteration, so the value of k will be at most log(n) in worst cases. Therefore, worst case complexity of Binary Search will be
T(n) = O(log(n)).
2. The Selection Problem:
Given an unordered list of n elements, the task is to find the Kth smallest element from the list. The first very basic approach is to sort the given list in ascending order and directly pick the value at the Kth index. So, sorting will take O(n*log(n)) time in general and O(1) for retrieving the Kth index value. Therefore, overall T(n) = O(n*log(n)) for this approach.
The second approach is to use the QuickSelect method i.e Prune-and-Search technique. The basic idea of this prune-and-search selection algorithm is to determine a fraction that will not contain the Kth element and discard that fraction of elements from the next iterations. We know that in order to have an O(n) algorithm, we must have a method that is capable of pruning away a fraction of elements in O(n) time in each iteration. Let P be an element of the list that can partition the given list into two parts, say S1 and S2, such that S1 contains all elements less than or equal to P and S2 contains all elements greater than P. Now we can say that
- If |S1| == K, then S1[k] will be the Kth smallest element.
- If |S1| > K, then the Kth element must be present in S1. So discard the second part S2 and recurse on S1 with the same algorithm.
- Otherwise, the Kth element must be in S2. So discard S1 and recurse on S2 for (K-|S1|)th element in S2.
The key point here is how to select P such that we can always discard a fraction of S, no matter whether we are pruning S1 or S2. The answer is P should be the median of list S. Again, finding median is the special case of this problem where K=n/2.
But median can be calculated by some other way more efficiently using the following algorithm:
- Divide the list into n/5 sublists each containing at most 5 elements.
- Now we can sort each sublist and find their median in constant time.
- Again find the median of all medians recursively until size becomes at most 5. The obtained median will be a perfect pivot to be used in the Quick Selection Algorithm.[Note that Insertion sort will be a better choice for sorting the smaller sublists of size 5.]
Why only 5?
Splitting the list into a size of 5 assumes a worst-case split of 70-30. At least half of the medians are greater than the median of medians, so half of the n/5 blocks have at least 3 elements which give 3n/10 split that implies the other partition is 7n/10 in the worst case i.e. T(n) = T(n/5) + T(7n/10) + O(n). Since (n/5 + 7n/10) < 1, so T(n) = O(n) in worst case. Let P be the median of medians which can be represented as-
|1''1''1''1''1| 1 1 1 1 |2 2 2 2 2| 2 2 2 2 |3__3__3__3_|P|'3''3''3''3| 4 4 4 4 |4 4 4 4 4| 5 5 5 5 |5__5__5__5__5|
As shown here, at least one-fourth of the elements in S are less than or equal to P and at least one-fourth of elements of S is greater than or equal to P. Thus, if we choose P in this way, we can always prune away at least one-fourth of elements during each iteration. Hence, splitting S into sublists of size 5 will be an efficient way to find the median of medians. Now we can state the algorithm as-
Prune-and-Search Algorithm to find Kth smallest element
Input: A set S of n elements and K.
Output: The Kth smallest element in S.
Step-1: If |S| <= 5, apply any Brute Force method.
Step-2: Divide S into n/5 sublists each containing at most 5 elements.
Step-3: Sort each sublists(Insertion sort will be better one to apply).
Step-4: Recursively find P as median of medians of each sublists.
Step-5: Partition S into S1 and S2 such that S1 contains all elements less than or equal to P and S2 contains all elements greater than P.
Step-6: Now there can be three cases as
a) if |S1| == K, then S1[k] will be the Kth smallest element.
b) if |S1| > K, then Kth element must be present in S1. So discard the second part s2 and recurse on s1 with same algorithm.
c) Otherwise Kth element must be in S2. So discard S1 and recurse on S2 for (K-|S1|)th element in S2.
For detailed code and implementation, visit K’th Smallest/Largest Element in Unsorted Array | Set 3 (Worst-Case Linear Time)
Complexity Analysis: Since each sub-list contains 5 elements so sorting them will take a constant amount of time. Thus, steps 2, 3, and 5 can be done in O(n) time, and step 4 needs T(n/5) time as we are using the same algorithm recursively to find the median of n/5 medians. Since we are pruning at least n/4 elements in each iteration, so 3n/4 elements are remaining after each iteration in the worst case. Hence, T(n) = T(3n/4) + T(n/5) + O(n).
Let T(n) = a0 + a1*n + a2*(n^2) + …. where a1 != 0.
T(3n/4) = a0 + (3/4)a1*n + (9/16)a2*(n^2) + ….
T(n/5) = a0 + (1/5)a1*n + (1/25)a2*(n^2) + ….
T((3n/4) + (n/5)) = T(19n/20) = a0 + (19/20)a1*n + (361/400)a2*(n^2) + ….
Thus, T(3n/4) + T(n/5) <= a0 + T(19n/20).
T(n) = T(3n/4) + T(n/5) + O(n)
<= T(19n/20) + cn.
Applying the initially obtained formula for general case to this inequality,
we will get T(n) = O(n).
Thus, we have a worst-case linear time algorithm for solving the selection problem based on the Prune-and-Search technique. Similarly, we can apply these types of strategies to solve Linear Programming with two variables and the Smallest Enclosing Circle Problems.