Hash File Organization in DBMS

Last Updated : 11 Mar, 2024

Hashing techniques are used to retrieve specific data. Searching through all index values to reach the desired data becomes very inefficient, in this scenario we may use hashing as an efficient technique for locating desired data directly on disk without using an index structure.

Hash File Configuration is also known as Direct file configuration.

What is Hash File Organization?

Hash file organization may be a strategy of putting away and getting to information in a record employing a hash work to calculate the address of the information within the record.

This permits for fast recovery of information based on a key.

In Hashing we mainly refer the following terms:

Data Bucket: A data bucket is a storage location where records are stored. These buckets are also considered storage units.

Hash Function: A hash function is a mapping function that maps all search keys to actual record addresses. Generally, a hash function uses a primary key to generate a hash index (address of a data block). Hash functions range from simple to complex mathematical functions.

Hash Index: The prefix of the entire hash value is used as the hash index. Each hash index has a depth value that indicates the number of bits used to calculate the hash function.

Hashing Technique

Data is stored in data blocks at addresses generated using a hash function. The location where these records are stored is called a data block or data bucket. In this organization, records are stored at known addresses rather than by location. To write a record, the address is first calculated by applying a mathematical function to obtain the record’s key. The data record is saved to the generated address. In this case, the records are stored in BUCKETS, which are storage units that can store one or more records. For example, the hash function h(K) = K mod 7 hashes 35 and 43 to addresses 0, 1, as shown below,

43 mod 7 = 135 mod 7 = 0

Hashing Types

There are two types of hashing such as:

Static Hashing
Dynamic Hashing

Static Hashing

If you specify a search key value, the hash function always calculates the same address. If you want to generate an address that uses a mod 5 hash function, only 5 different values will be generated. The output address of this function is always the same. The number of available buckets always remains constant. Bucket addresses generated with static hashing always remain the same.

For example,

If you use the hash function mod(5) to get the address for customer ID =75, you will always get the same bucket address 0

The bucket address does not change in this scenario.

75 mod 5= 0

66 mod 5 = 1

82 mod 5 = 2

93 mod 5 =3

104 mod 5 = 4

and so on.

Static Hashing mapping with example

Dynamic Hashing

In dynamic hashing, Data buckets grow or shrink (dynamically added or removed) as the data set grows or shrinks. Dynamic Hashing is also known as Extended Hashing. Dynamic hashing requires the hash function to generate a large number of values.

For example, there are three data sets: Data1, Data2, and Data3.

The hash function produces three addresses 1010, 1011, and 1001.

This storage method only considers part of this address, specifically the first bit that stores the data.

So we try to load three of them into addresses 0 and 1.

h(Data 1) -> 1010

h(Data 2) -> 1011

h(Data 3) -> 1001

Double Hashing Mapping Case 1

But the problem is that there are no bucket addresses left for Data3. Buckets must be dynamically expanded to support D3. Therefore, we change the address by 2 bits instead of 1 bit and update the existing data to have a 2-bit address.

Next, try to record data 3.

Double Hashing Mapping Case -2

Conclusion

Hashing is an efficient technique for locating desired data directly on disk without using index structures. Hashing techniques are useful when you want to retrieve specific data as when searching through all index values to reach the desired data becomes very inefficient.

Frequently Asked Questions on Hash File Organization – FAQs

What are the advantages of Hash File Organization?

One advantage of hash file organization lies with respect to quick information recovery, as the hash function specifically maps the key to its position within the file. This could result in useful as in case of retrieving particular information from records.

Are there any downsides to Hash Record Organization?

One downside is the potential for collisions, where diverse keys hash to the same address. This will lead to extra handling to resolve collisions and may affect the effectiveness of information recovery.

Why is Hash Record Organization advantageous?

Hash File Organization empowers quick recovery of records, particularly when managing with huge datasets. It offers proficient and easy information retrieval and recovery, making it a favored choice for applications requiring fast information lookups and searches.

Suggest improvement

File Organization in DBMS | Set 2

Share your thoughts in the comments