# Clustering/Partitioning an array such that sum of square differences is minimum

Given an array of n numbers and a number k. We need to divide the array into k partitions (clusters) of same or different length. For a given k, there can be one or more ways to make clusters (partitions). We define a function Cost(i) for the cluster, as the square of the difference between its first and last element. If the current cluster is , where is the length of current cluster, then .
Amongst all the possible kinds of partitions, we have to find the partition that will minimize the function,

Example:

Input : arr[] = {1, 5, 8, 10}
k = 2
Output : 20
Explanation :
Consider clustering 4 elements 1, 5, 8, 10
into 2 clusters. There are three options:
1. S1 = 1, S2 = 5, 8, 10, with total cost
+  = 25.
2. S1 = 1, 5, S2 = 8, 10, with total cost
+  = 20.
3. S1 = 1, 5, 8, S2 = 10, with total cost
+  = 49.
So, the optimal clustering is the second one,
so the output of the above problem is 20.

Input : arr[] = {5, 8, 1, 10}
k = 3
Output : 20
Explanation :
The three partitions are {5, 8}, {1} and {10}


## Recommended: Please try your approach on {IDE} first, before moving on to the solution.

To solve the problem, we assume that we have k slabs. We have to insert them in some k different positions in the array, which will give us the required partition scheme, and the one having minimum value for f(x) will be the answer.

Naive solution:

If we solve the above problem by the naive method, we would simply take all the possibilities and compute the minimum.

// C++ program to find minimum cost k partitions
// of array.
#include<iostream>
using namespace std;

const int inf = 1000000000;
int ans = inf;

// function to generate all possible answers.
// and comute minimum of all costs.
// i   --> is index of previous partition
// par --> is current number of partitions
// a[] and n --> Input array and its size
// current_ans --> Cost of partitions made so far.
void solve(int i, int par, int a[], int n,
int k, int current_ans)
{
// If number of partitions is more than k
if (par > k)
return;

// If we have mad k partitions and have
// reached last element
if (par==k && i==n-1)
{
ans = min(ans, current_ans);
return;
}

// 1) Partition array at different points
// 2) For every point, increase count of
//    partitions, "par" by 1.
// 3) Before recursive call, add cost of
//    the partition to current_ans
for (int j=i+1; j<n; j++)
solve(j, par+1, a, n, k, current_ans +
(a[j]-a[i+1])*(a[j]-a[i+1]));
}

// Driver code
int main()
{
int k = 2;
int a[] = {1, 5, 8, 10};
int n = sizeof(a)/sizeof(a[0]);
solve(-1, 0, a, n, k, 0);
cout << ans << endl;
return 0;
}


Output:

20


Time Complexity: Its clear that the above algorithm has Time Complexity of .

Dynamic Programming:

We create a table dp[n+1][k+1] table and initialize all values as infinite.

dp[i][j] stores optimal partition cost
for arr[0..i-1] and j partitions.


Let us compute the value of dp[i][j]. we take an index m, such that m < i, and put a partition next to that position such that there is no slab in between the indices i and m. It can be seen simply that answer to the current scenario is dp[m][j-1] + (a[i-1]-a[m])*(a[i-1]-a[m]), where the first term signifies the minimum f(x) till the element with j-1 partitions and the second one signifies the cost of current cluster. So we will take the minimum of all the possible indices m and dp[i][j] will be assigned the minimum amongst them.

// C++ program to find minimum cost k partitions
// of array.
#include<iostream>
using namespace std;
const int inf = 1000000000;

// Returns minimum cost of partitioning a[] in
// k clusters.
int minCost(int a[], int n, int k)
{
// Create a dp[][] table and initialize
// all values as infinite. dp[i][j] is
// going to store optimal partition cost
// for arr[0..i-1] and j partitions
int dp[n+1][k+1];
for (int i=0; i<=n; i++)
for (int j=0;j<=k;j++)
dp[i][j] = inf;

// Fill dp[][] in bottom up manner
dp[0][0] = 0;

// Current ending position (After i-th
// iteration result for a[0..i-1] is computed.
for (int i=1;i<=n;i++)

// j is number of partitions
for (int j=1;j<=k;j++)

// Picking previous partition for
// current i.
for (int m=i-1;m>=0;m--)
dp[i][j] = min(dp[i][j], dp[m][j-1] +
(a[i-1]-a[m])*(a[i-1]-a[m]));

return dp[n][k];
}

// Driver code
int main()
{
int k = 2;
int a[] = {1, 5, 8, 10};
int n = sizeof(a)/sizeof(a[0]);
cout << minCost(a, n, k) << endl;
return 0;
}


Output:

20


Time Complexity: Having the three simple loops, the complexity of the above algorithm is .

This article is contributed by Amritya Vagmi and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

# GATE CS Corner    Company Wise Coding Practice

Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.
3.7 Average Difficulty : 3.7/5.0
Based on 18 vote(s)