Python 3.6 dictionary implementation using hash tables


Dictionary in Python is a collection of data values, used to store data values like a map, which unlike other Data Types that hold only single value as an element, Dictionary holds key:value pair. Key value is provided in the dictionary to make it more optimized. Each key-value pair in a Dictionary is separated by a colon :, whereas each key is separated by a ‘comma’. To know more about dictionary click here.

Based on a proposal by Raymond Hettinger the new dict() function has 20% to 25% less memory usage compared to python v.3.5 or less. It relies upon the order-preserving semantics proposed by Raymond Hettinger. This implementation makes the dictionaries more compact and provides a faster iteration over them.

The memory layout of dictionaries in earlier versions was unnecessarily inefficient. It comprised of a sparse table of 24-byte entries that stored the hash value, the key pointer, and the value pointer. The memory layout of dictionaries in version 3.5 and less were implemented to store in a single sparse table.

Example:

for the below dictionary:

d = {'banana':'yellow', 'grapes':'green', 'apple':'red'}

used to store as:

    entries = [['--', '--', '--'],
               [-5850766811922200084, 'grapes', 'green'],
               ['--', '--', '--'],
               ['--', '--', '--'],
               ['--', '--', '--'],
               [2247849978273412954, 'banana', 'yellow'],
               ['--', '--', '--'],
               [-2069363430498323624, 'apple', 'red']]

Instead, in the new dict() implementation the data is now being organized in a dense table referenced by a sparse table of indices as follows:



 indices  = [None, 1, None, None, None, 0, None, 2]
 enteries = [[2247849978273412954, 'banana', 'yellow']
             [-5850766811922200084, 'grapes', 'green'],
             [-2069363430498323624, 'apple', 'red']]

It is important to notice that in the new dict() implementation only the data layout has been changed and no changes are made in the hash table algorithms. Neither the collision statics nor the table search order has been changed.

This new implementation of dict() is believed to significantly compress dictionaries for memory saving depending upon the size of dictionary. Small dictionaries gets the most benefit out of it.

For a sparse table of size t with n entries, the sizes are:

curr_size = 24 * t
new_size = 24 * n + sizeof(index) * t

In the above example banana/grapes/apple, the size of the former implementation is 192 bytes ( eight 24-byte entries) and the later implementation has a size of 90 bytes ( three 24-byte entries and eight 1-byte indices ). That shows around 58% compression in size of the dictionary.

In addition to saving memory, the new memory layout makes iteration faster. Now functions like Keys(), items() and values can loop over the dense table without having to skip empty slots, unlike the older implementation. Other benefits of this new implementation are better cache utilization, faster resizing and fewer touches to the memory.

My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.