Open In App

Optimization for Unordered Map O(1) operations

Last Updated : 25 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

C++ provides convenient data structures like std::set and std::map, tree-based structures with operations taking O(log n) time. With C++11, hash-based alternatives were introduced: std::unordered_set and std::unordered_map. However, concerns arise regarding potential vulnerabilities and efficiency in real-world scenarios. In this article, we’re going to explore strategies to optimize unordered maps for O(1) operations and address security issues.

Implementation of Unordered Map:

To comprehend the vulnerabilities, it is very important to know the implementation details of std::unordered_map. The GCC implementation reveals the use of _Mod_range_hashing and _Prime_rehash_policy. The hash function involves hashing the input and applying modulo with a prime number to determine the position in the hash table. The _Prime_rehash_policy contains a list of primes stored in __prime_list. These prime numbers are used to modulo the input and get the correct position in the hash table. Also, before taking the modulo of the input with these prime numbers, Unordered map uses a hash funtion to hash the input values. The hash used by Unordered map is std::hash which is simply the identity function for integers. Depending on the compiler version, specific primes trigger resizing during operations. For instance, 126271 works for GCC 6 or earlier, and 107897 is suitable for GCC 7 or later.

Understanding Vulnerabilities of Unordered Map:

Unordered map is presumed to have O(1) time complexity per operation, assuming minimal collisions. The assumption is valid for random input data but becomes precarious with non-random or adversarial inputs during hacking attempts. If the hash function is known, attackers can craft inputs causing O(n^2) collisions, compromising the expected constant-time complexity of unordered map operations.

Vulnerable Implementation for Unordered Map:

A simple code snippet illustrates the vulnerability by measuring execution time for different primes:

C++
#include <ctime>
#include <iostream>
#include <unordered_map>

const int N = 2e5;

void insert_numbers(long long x)
{
    clock_t begin = clock();
    std::unordered_map<long long, int> numbers;

    for (int i = 1; i <= N; i++)
        numbers[i * x] = i;

    long long sum = 0;

    for (const auto& entry : numbers)
        sum += (entry.first / x) * entry.second;

    std::cout << "x = " << x << ": " << std::fixed
            << static_cast<double>(clock() - begin) / CLOCKS_PER_SEC
            << " seconds, sum = " << sum << "\n";
}

int main()
{
    insert_numbers(107897);
    insert_numbers(126271);

    return 0;
}
// Note: differences in language semantics, standard libraries, data structure
//implementations, and time measurement mechanisms can lead to 
//variations in execution time and output.
Java
import java.util.HashMap;
import java.util.Map;

public class Main {
    static final int N = 200000;

    static void insertNumbers(long x) {
        long startTime = System.nanoTime();
        Map<Long, Integer> numbers = new HashMap<>();

        for (int i = 1; i <= N; i++) {
            numbers.put((long) i * x, i);
        }

        long sum = 0;

        for (Map.Entry<Long, Integer> entry : numbers.entrySet()) {
            sum += (entry.getKey() / x) * entry.getValue();
        }

        double elapsedTime = (System.nanoTime() - startTime) / 1e9;
        System.out.printf("x = %d: %.6f seconds, sum = %d%n", x, elapsedTime, sum);
    }

    public static void main(String[] args) {
        insertNumbers(107897);
        insertNumbers(126271);
    }
}
// Note: differences in language semantics, standard libraries, data structure
//implementations, and time measurement mechanisms can lead to 
//variations in execution time and output.
Python
import time

N = int(2e5)

def insert_numbers(x):
    # Measure the start time
    begin = time.time()
    # Initialize an empty dictionary to store numbers and their corresponding indices
    numbers = {}

    # Populate the dictionary with numbers and their indices
    for i in range(1, N + 1):
        numbers[i * x] = i

    # Initialize sum
    sum_val = 0

    # Calculate the sum by iterating through the dictionary
    for key, value in numbers.items():
        sum_val += (key // x) * value

    # Calculate the time taken and print the result using the format method
    print("x = {}: {:.6f} seconds, sum = {}".format(x, time.time() - begin, sum_val))

def main():
    # Call insert_numbers function with different values of x
    insert_numbers(107897)
    insert_numbers(126271)

if __name__ == "__main__":
    main()
# Note: differences in language semantics, standard libraries, data structure
# implementations, and time measurement mechanisms can lead to 
# variations in execution time and output.
C#
using System;
using System.Collections.Generic;
using System.Diagnostics;

public class Program
{
    const int N = 200000;

    // Method to insert numbers into a dictionary and calculate sum
    static void InsertNumbers(long x)
    {
        // Start measuring time
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        // Dictionary to store key-value pairs
        Dictionary<long, int> numbers = new Dictionary<long, int>();

        // Insert numbers into the dictionary
        for (int i = 1; i <= N; i++)
        {
            numbers[i * x] = i;
        }

        // Variable to store the sum
        long sum = 0;

        // Calculate sum using dictionary entries
        foreach (var entry in numbers)
        {
            sum += (entry.Key / x) * entry.Value;
        }

        // Stop measuring time
        stopwatch.Stop();

        // Output the result
        Console.WriteLine($"x = {x}: {stopwatch.Elapsed.TotalSeconds} seconds, sum = {sum}");
    }

    // Main method, entry point of the program
    public static void Main(string[] args)
    {
        // Call InsertNumbers method with different values
        InsertNumbers(107897);
        InsertNumbers(126271);
    }
}

// Note: differences in language semantics, standard libraries, data structure
//implementations, and time measurement mechanisms can lead to 
//variations in execution time and output.
JavaScript
// Import the 'performance-now' module to measure time in milliseconds
const { performance } = require('perf_hooks');

const N = 2e5;

// Function to insert numbers into a dictionary and calculate their sum
function insertNumbers(x) {
    // Measure the start time
    const begin = performance.now();
    // Initialize an empty object to store numbers and their corresponding indices
    const numbers = {};

    // Populate the object with numbers and their indices
    for (let i = 1; i <= N; i++) {
        numbers[i * x] = i;
    }

    // Initialize sum
    let sum = 0;

    // Calculate the sum by iterating through the object
    for (const key in numbers) {
        sum += Math.floor(key / x) * numbers[key];
    }

    // Calculate the time taken and print the result using the 'toFixed' method
    console.log(`x = ${x}: ${(performance.now() - begin).toFixed(6)} milliseconds, sum = ${sum}`);
}

// Main function
function main() {
    // Call insertNumbers function with different values of x
    insertNumbers(107897);
    insertNumbers(126271);
}

// Call the main function to execute the code
main();

// Note: differences in language semantics, standard libraries, data structure
//implementations, and time measurement mechanisms can lead to 
//variations in execution time and output.

Output:

x = 107897: 0.062000 seconds, sum = 2666686666700000
x = 126271: 56.123000 seconds, sum = 2666686666700000

Secure Unordered Map from Collision Attacks:

To secure unordered map from collision attacks, a custom hash function can be used. We know that any deterministic hash function can be hacked to produce a large number of collisions, so the first thing we should do is add some non-determinism. The standard hash function can be modified to introduce non-determinism, making it harder to hack:

C++
struct custom_hash {
    size_t operator()(uint64_t x) const
    {
        static const uint64_t FIXED_RANDOM
            = chrono::steady_clock::now()
                  .time_since_epoch()
                  .count();
        return x + FIXED_RANDOM;
    }
};
Python3
import time


class CustomHash:
    def __init__(self):
        # Using time since epoch as a fixed random seed
        self.FIXED_RANDOM = int(time.time() * 1e9)

    def __call__(self, x):
        return x + self.FIXED_RANDOM


# Test the custom hash function
custom_hash = CustomHash()
print(custom_hash(123))  # Example usage: hashing the value 123

However, adding a fixed number to every input doesn’t ensure complete security. A more robust approach is using a hash function like splitmix64 for better unpredictability:

C++
struct custom_hash {
    static uint64_t splitmix64(uint64_t x)
    {
        // Implementation of splitmix64
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }

    size_t operator()(uint64_t x) const
    {
        static const uint64_t FIXED_RANDOM
            = chrono::steady_clock::now()
                  .time_since_epoch()
                  .count();
        return splitmix64(x + FIXED_RANDOM);
    }
};
Java
public class CustomHash {
    // Method to generate custom hash
    public long generateHash(long x) {
        return 0; // Always return 0
    }

    // Example usage
    public static void main(String[] args) {
        CustomHash customHash = new CustomHash();
        long hashedValue = customHash.generateHash(123);
        System.out.println(hashedValue); // Output should be 0
    }
}
Python3
import time

class CustomHash:
    @staticmethod
    def splitmix64(x):
        # Implementation of splitmix64
        x += 0x9e3779b97f4a7c15
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb
        return x ^ (x >> 31)

    def __init__(self):
        self.FIXED_RANDOM = int(time.time() * 1e6)

    def __call__(self, x):
        return self.splitmix64(x + self.FIXED_RANDOM)
JavaScript
class CustomHash {
    static splitmix64(x) {
        // Implementation of splitmix64
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >>> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >>> 27)) * 0x94d049bb133111eb;
        return x ^ (x >>> 31);
    }

    constructor() {
        // Generate a fixed random number based on current time
        this.FIXED_RANDOM = Math.floor(Date.now() * 1e6);
    }

    // Method to generate custom hash
    generateHash(x) {
        return CustomHash.splitmix64(x + this.FIXED_RANDOM); // Corrected call to static method
    }
}

// Example usage
const customHash = new CustomHash();
const hashedValue = customHash.generateHash(123);
console.log(hashedValue);

Better Implementation of Unordered Map:

The custom hash function can be integrated into the unordered map or other hash tables like gp_hash_table. Below is a better implementation of the unordered map which almost guarantees O(1) time for all operations like Insertion, deletion, etc.

C++
#include <bits/stdc++.h>

struct custom_hash {
    static uint64_t splitmix64(uint64_t x)
    {
        // Implementation of splitmix64
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }

    size_t operator()(uint64_t x) const
    {
        static const uint64_t FIXED_RANDOM
            = std::chrono::steady_clock::now()
                  .time_since_epoch()
                  .count();
        return splitmix64(x + FIXED_RANDOM);
    }
};

const int N = 2e5;

void insert_numbers(long long x)
{
    clock_t begin = clock();
    std::unordered_map<long long, int, custom_hash> numbers;

    for (int i = 1; i <= N; i++)
        numbers[i * x] = i;

    long long sum = 0;

    for (const auto& entry : numbers)
        sum += (entry.first / x) * entry.second;

    std::cout << "x = " << x << ": " << std::fixed
              << static_cast<double>(clock() - begin) / CLOCKS_PER_SEC
              << " seconds, sum = " << sum << "\n";
}

int main()
{
    insert_numbers(107897);
    insert_numbers(126271);

    return 0;
}
Java
import java.util.*;
import java.time.*;

public class Main {
    // Define the size of the numbers
    static final int N = 200000;

    // Custom hash function
    static class CustomHash {
        long splitmix64(long x) {
            // Implementation of splitmix64
            x += 0x9e3779b97f4a7c15L;
            x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9L;
            x = (x ^ (x >> 27)) * 0x94d049bb133111ebL;
            return x ^ (x >> 31);
        }

        public int hashCode(long x) {
            long FIXED_RANDOM = Instant.now().toEpochMilli();
            return (int) splitmix64(x + FIXED_RANDOM);
        }
    }

    // Function to insert numbers and calculate the sum
    static void insertNumbers(long x) {
        long begin = System.currentTimeMillis();
        HashMap<Long, Integer> numbers = new HashMap<>(N, 1.0f);

        for (int i = 1; i <= N; i++)
            numbers.put(i * x, i);

        long sum = 0;

        for (Map.Entry<Long, Integer> entry : numbers.entrySet())
            sum += (entry.getKey() / x) * entry.getValue();

        System.out.printf("x = %d: %.2f seconds, sum = %d\n", x, (System.currentTimeMillis() - begin) / 1000.0, sum);
    }

    // Main function
    public static void main(String[] args) {
        insertNumbers(107897);
        insertNumbers(126271);
    }
}
Python3
import time
import hashlib

class CustomHash:
    @staticmethod
    def splitmix64(x):
        # Implementation of splitmix64
        x += 0x9e3779b97f4a7c15
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb
        return x ^ (x >> 31)

    def __call__(self, x):
        # FIXED_RANDOM is set to the current time since epoch
        FIXED_RANDOM = int(time.time() * 1000)
        return self.splitmix64(x + FIXED_RANDOM)

def insert_numbers(x):
    begin = time.process_time()  # or time.perf_counter()
    numbers = {}
    custom_hash = CustomHash()

    for i in range(1, N + 1):
        numbers[i * x] = i

    sum_val = 0

    for key, value in numbers.items():
        sum_val += (key // x) * value

    print(f"x = {x}: {time.process_time() - begin:.6f} seconds, sum = {sum_val}")

if __name__ == "__main__":
    N = int(2e5)
    insert_numbers(107897)
    insert_numbers(126271)
JavaScript
class CustomHash {
    static splitmix64(x) {
        // Implementation of splitmix64
        x += 0x9e3779b97f4a7c15n;
        x = (x ^ (x >> 30n)) * 0xbf58476d1ce4e5b9n;
        x = (x ^ (x >> 27n)) * 0x94d049bb133111ebn;
        return Number(x ^ (x >> 31n));
    }

    static customHash(x) {
        // FIXED_RANDOM is set to the current time since epoch
        const FIXED_RANDOM = Date.now();
        return this.splitmix64(BigInt(x) + BigInt(FIXED_RANDOM));
    }
}

function insertNumbers(x) {
    const begin = Date.now();
    const numbers = new Map();

    for (let i = 1; i <= N; i++) {
        numbers.set(BigInt(i) * BigInt(x), i);
    }

    let sum = 0n;

    for (const [key, value] of numbers) {
        sum += (key / BigInt(x)) * BigInt(value);
    }

    console.log(`x = ${x}: ${(Date.now() - begin) / 1000} seconds, sum = ${sum}`);
}

const N = 200000;
insertNumbers(107897);
insertNumbers(126271);

Output:

x = 107897: 0.068000 seconds, sum = 2666686666700000
x = 126271: 0.064000 seconds, sum = 2666686666700000

Conclusion:

Making unordered map operations efficient (O(1)) requires dealing with security issues and creating strong hash functions. By ensuring security and adding randomness, developers can use unordered map with confidence, achieving good performance and avoiding vulnerabilities to hacks.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads