Sorting with Tapes : Balanced Merge

Balanced merge is a type of sorting algorithm where data is stored on multiple tapes (or other forms of storage). The tapes are divided into two groups: odd-numbered tapes and even-numbered tapes. The data on each tape is initially sorted, and the algorithm’s goal is to merge the data from all of the tapes into a single sorted output.

Approach: Here’s an approach for the above algorithm:

The algorithm works by repeatedly making passes over the data, merging pairs of tapes at a time. On each pass, the tapes are divided into pairs, and each pair’s data is merged into a temporary output tape. After all, pairs have been processed, the data on the temporary output tapes are copied back to the original tapes, and the process is repeated with a new set of tape pairs until all of the data is on a single tape.

Illustration: Here is an illustration with the approach used:

Suppose we have the following list of data that we want to sort: data[] = [5, 2, 4, 6, 1, 3]

Here are the steps involved in the sorting process:

Step 1: Initialize the tapes and divide the data among them. We create three empty tapes and divide the data among them like this: tapes = [[], [], []]
for i, x in enumerate(data):
tapes[i % 3].append(x)
# tapes is now [[5, 4, 1], [2, 6, 3], []]

Step 2: Sort the data on each tape using an internal sorting technique. We can use the built-in sort function to sort the data on each:
for tape in tapes:
tape.sort()
# tapes is now [[1, 4, 5], [2, 3, 6], []]

Step 3: Repeatedly make passes over the tapes, merging pairs of tapes at a time. On each pass, we divide the tapes into pairs and merge the data from each pair into a temporary output tape using the merge function:
while len(tapes) > 1:
new_tapes = []
for i in range(0, len(tapes), 2):
tape1 = tapes[i]
tape2 = tapes[i+1] if i+1 < len(tapes) else None
new_tapes.append(merge(tape1, tape2))
tapes = new_tapes
# After the first pass, tapes is [[1, 2, 3, 4, 5, 6]]

Step 4: Copy the data from the temporary output tapes back to the original tapes. Since we only have one tape at this point, we can skip this step.

Step 5: Repeat the process until all of the data is on a single tape. Since we only have one tape at this point, the sorting process is complete.

The final result is a sorted version of the original data:
sorted_data = tapes[0]
# sorted_data is [1, 2, 3, 4, 5, 6]

The steps involved in implementing the balanced merge sorting algorithm are:

Initialize the tapes and divide the data among them.
Sort the data on each tape using an internal sorting technique.
Repeatedly make passes over the tapes, merging pairs of tapes at a time.
Copy the data from the temporary output tapes back to the original tapes.
Repeat the process until all of the data is on a single tape.

Here is the code that demonstrates this approach:

C++

// C++ code for the above approach
#include <bits/stdc++.h>

using namespace std;
 
vector<int> merge(vector<int> tape1, vector<int> tape2) {

    if (tape2.size() == 0) {

        return tape1;

    }
 
    // Merge the data from tape1 and tape2

    // into a temporary output tape

    vector<int> output_tape;
 
    int i = 0, j = 0;
 
    // Merge all the tapes left

    while (i < tape1.size() && j < tape2.size()) {

        if (tape1[i] < tape2[j]) {

            output_tape.push_back(tape1[i]);

            i++;

        } else {

            output_tape.push_back(tape2[j]);

            j++;

        }

    }

    output_tape.insert(output_tape.end(), tape1.begin() + i, tape1.end());

    output_tape.insert(output_tape.end(), tape2.begin() + j, tape2.end());

    return output_tape;
}
 
// Function for Balanced merge sort

vector<int> balanced_merge_sort(vector<int> data, int num_tapes) {

    // Initialize the tapes

    vector<vector<int>> tapes(num_tapes);
 
    // Divide the data among the tapes

    for (int i = 0; i < data.size(); i++) {

        tapes[i % num_tapes].push_back(data[i]);

    }
 
    // Sort each tape

    for (int i = 0; i < num_tapes; i++) {

        sort(tapes[i].begin(), tapes[i].end());

    }
 
    // Repeatedly make passes over the tapes,

    // merging pairs of tapes at a time

    while (tapes.size() > 1) {

        vector<vector<int>> new_tapes;

        for (int i = 0; i < tapes.size(); i += 2) {

            vector<int> tape1 = tapes[i];

            vector<int> tape2 = (i + 1 < tapes.size()) ? tapes[i + 1] : vector<int>();

            new_tapes.push_back(merge(tape1, tape2));

        }

        tapes = new_tapes;

    }
 
    // Return the final merged tape

    return tapes[0];
}
 
// Driver code

int main() 
{

    // Input

    vector<int> data = {5, 2, 4, 6, 1, 3};
 
    // Function call

    vector<int> sorted_data = balanced_merge_sort(data, 3);
 
    for (int i = 0; i < sorted_data.size(); i++) {

        cout << sorted_data[i] << " ";

    }

    cout << endl;

    return 0;
}
 
// This code is contributed by lokeshpotta20.

Java

// Java code for the above approach

import java.io.*;

import java.util.*;
 
class GFG {
 
  public static List<Integer> merge(List<Integer> tape1,

                                    List<Integer> tape2)

  {

    if (tape2.size() == 0) {

      return tape1;

    }
 
    // Merge the data from tape1 and tape2

    // into a temporary output tape

    List<Integer> outputTape = new ArrayList<>();
 
    int i = 0, j = 0;
 
    // Merge all the tapes left

    while (i < tape1.size() && j < tape2.size()) {

      if (tape1.get(i) < tape2.get(j)) {

        outputTape.add(tape1.get(i));

        i++;

      }

      else {

        outputTape.add(tape2.get(j));

        j++;

      }

    }

    outputTape.addAll(tape1.subList(i, tape1.size()));

    outputTape.addAll(tape2.subList(j, tape2.size()));

    return outputTape;

  }
 
  // Function for Balanced merge sort

  public static List<Integer>

    balancedMergeSort(List<Integer> data, int numTapes)

  {

    // Initialize the tapes

    List<List<Integer> > tapes = new ArrayList<>();

    for (int i = 0; i < numTapes; i++)

      tapes.add(new ArrayList<>());
 
    // Divide the data among the tapes

    for (int i = 0; i < data.size(); i++) {

      tapes.get(i % numTapes).add(data.get(i));

    }
 
    // Sort each tape

    for (int i = 0; i < numTapes; i++) {

      tapes.get(i).sort(Integer::compareTo);

    }
 
    // Repeatedly make passes over the tapes,

    // merging pairs of tapes at a time

    while (tapes.size() > 1) {

      List<List<Integer> > newTapes

        = new ArrayList<>();

      for (int i = 0; i < tapes.size(); i += 2) {

        List<Integer> tape1 = tapes.get(i);

        List<Integer> tape2

          = (i + 1 < tapes.size())

          ? tapes.get(i + 1)

          : new ArrayList<>();

        newTapes.add(merge(tape1, tape2));

      }

      tapes = newTapes;

    }
 
    // Return the final merged tape

    return tapes.get(0);

  }
 
  public static void main(String[] args)

  {

    // Input

    List<Integer> data = List.of(5, 2, 4, 6, 1, 3);
 
    // Function call

    List<Integer> sortedData

      = balancedMergeSort(data, 3);
 
    for (int i = 0; i < sortedData.size(); i++) {

      System.out.print(sortedData.get(i) + " ");

    }

    System.out.println();

  }
}
 
// This code is contributed by lokesh.

Python3

# Python code for the above approach
 
# Function for Balanced merge sort
 
def balanced_merge_sort(data, num_tapes):
 
    # Initialize the tapes

    tapes = [[] for _ in range(num_tapes)]
 
    # Divide the data among the tapes

    for i, x in enumerate(data):
 
        tapes[i % num_tapes].append(x)
 
    # Sort each tape

    for tape in tapes:

        tape.sort()
 
    # Repeatedly make passes over the tapes,

    # merging pairs of tapes at a time

    while len(tapes) > 1:
 
        new_tapes = []

        for i in range(0, len(tapes), 2):

            tape1 = tapes[i]

            tape2 = tapes[i + 1] if i + 1 < len(tapes) else None

            new_tapes.append(merge(tape1, tape2))

        tapes = new_tapes
 
    # Return the final merged tape

    return tapes[0]
 
def merge(tape1, tape2):

    if tape2 is None:

        return tape1
 
    # Merge the data from tape1 and tape2 

    # into a temporary output tape

    output_tape = []
 
    i = j = 0
 
    # Merge all the tapes left

    while i < len(tape1) and j < len(tape2):
 
        if tape1[i] < tape2[j]:

            output_tape.append(tape1[i])

            i += 1

        else:

            output_tape.append(tape2[j])

            j += 1

    output_tape.extend(tape1[i:])

    output_tape.extend(tape2[j:])

    return output_tape
 
# Driver code
 
# Input

data = [5, 2, 4, 6, 1, 3]
 
# Function call

sorted_data = balanced_merge_sort(data, 3)
 
print(sorted_data)

// C# code for the above approach

using System;

using System.Collections.Generic;
 
public class GFG {
 
  public static List<int> merge(List<int> tape1,

                                List<int> tape2)

  {

    if (tape2.Count == 0) {

      return tape1;

    }
 
    // Merge the data from tape1 and tape2

    // into a temporary output tape

    List<int> outputTape = new List<int>();
 
    int i = 0, j = 0;
 
    // Merge all the tapes left

    while (i < tape1.Count && j < tape2.Count) {

      if (tape1[i] < tape2[j]) {

        outputTape.Add(tape1[i]);

        i++;

      }

      else {

        outputTape.Add(tape2[j]);

        j++;

      }

    }

    outputTape.AddRange(

      tape1.GetRange(i, tape1.Count - i));

    outputTape.AddRange(

      tape2.GetRange(j, tape2.Count - j));

    return outputTape;

  }
 
  // Function for Balanced merge sort

  public static List<int>

    balancedMergeSort(List<int> data, int numTapes)

  {

    // Initialize the tapes

    List<List<int> > tapes = new List<List<int> >();

    for (int i = 0; i < numTapes; i++)

      tapes.Add(new List<int>());

    // Divide the data among the tapes

    for (int i = 0; i < data.Count; i++) {

      tapes[i % numTapes].Add(data[i]);

    }
 
    // Sort each tape

    for (int i = 0; i < numTapes; i++) {

      tapes[i].Sort();

    }
 
    // Repeatedly make passes over the tapes,

    // merging pairs of tapes at a time

    while (tapes.Count > 1) {

      List<List<int> > newTapes

        = new List<List<int> >();

      for (int i = 0; i < tapes.Count; i += 2) {

        List<int> tape1 = tapes[i];

        List<int> tape2 = (i + 1 < tapes.Count)

          ? tapes[i + 1]

          : new List<int>();

        newTapes.Add(merge(tape1, tape2));

      }

      tapes = newTapes;

    }
 
    // Return the final merged tape

    return tapes[0];

  }
 
  static public void Main()

  {
 
    // Input

    List<int> data = new List<int>{ 5, 2, 4, 6, 1, 3 };
 
    // Function call

    List<int> sortedData = balancedMergeSort(data, 3);
 
    for (int i = 0; i < sortedData.Count; i++) {

      Console.Write(sortedData[i] + " ");

    }

    Console.WriteLine();

  }
}
 
// This code is contributed by lokeshmvs21.

Javascript

// JavaScript code for the above approach
 
// Function to merge two sorted tapes

function merge(tape1, tape2) {

    // If one of the tapes is empty, 

    // return the other tape

    if (tape2.length == 0) {

        return tape1;

    }
 
    // Merge the data from tape1 and tape2

    // into a temporary output tape

    let output_tape = [];
 
    let i = 0, j = 0;
 
    // Merge all the tapes left

    while (i < tape1.length && j < tape2.length) {

        if (tape1[i] < tape2[j]) {

            output_tape.push(tape1[i]);

            i++;

        } else {

            output_tape.push(tape2[j]);

            j++;

        }

    }

    output_tape.push(...tape1.slice(i));

    output_tape.push(...tape2.slice(j));

    return output_tape;
}
 
// Function for Balanced merge sort

function balanced_merge_sort(data, num_tapes) {

    // Initialize the tapes

    let tapes = new Array(num_tapes);

    for (let i=0; i<num_tapes; i++) {

        tapes[i] = [];

    }
 
    // Divide the data among the tapes

    for (let i = 0; i < data.length; i++) {

        tapes[i % num_tapes].push(data[i]);

    }
 
    // Sort each tape

    for (let i = 0; i < num_tapes; i++) {

        tapes[i].sort(function(a, b) {

            return a - b;

        });

    }
 
    // Repeatedly make passes over the tapes,

    // merging pairs of tapes at a time

    while (tapes.length > 1) {

        let new_tapes = [];

        for (let i = 0; i < tapes.length; i += 2) {

            let tape1 = tapes[i];

            let tape2 = (i + 1 < tapes.length) ? tapes[i + 1] : [];

            new_tapes.push(merge(tape1, tape2));

        }

        tapes = new_tapes;

    }
 
    // Return the final merged tape

    return tapes[0];
}
 
// Driver code

let data = [5, 2, 4, 6, 1, 3]; // Input
let sorted_data = balanced_merge_sort(data, 3);
 
console.log(sorted_data); // Output

Output

[1, 2, 3, 4, 5, 6]

Time Complexity: O(n log n)
Auxiliary Space: O(n)

Advantages of Balanced Merge:

Can take advantage of parallelism to speed up the sorting process
The time complexity of O(n log n), which is generally considered to be efficient for large amounts of data
Can handle large amounts of data

Disadvantages of Balanced Merge:

Requires additional space to store the data on the tapes and the temporary output tapes during the merge process
May not be the most efficient choice for smaller datasets

Benefits of using this over other sorting techniques:

Good choice for sorting large amounts of data when multiple tapes (or other forms of sequential storage) are available
Can be used to take advantage of parallelism to speed up the sorting process
Maybe a good choice when you need to sort data in an external storage device (such as a tape drive or hard drive)

Article Tags :

DSA

Sorting

Technical Scripter

Technical Scripter 2022