Difference between Hash Join and Sort Merge Join

1. Hash Join :
It is also known as “go-to-guy” in case of join operators. This means that in case no other join is preferred (maybe due to no sorting or indexing etc), then, Hash join is used. Hash join is best algorithm when large, unsorted, and non-indexed data (residing in tables) is to be joined. Hash join algorithm consists of probe phase and build phase.

In the case of 2 relations named R and S, algorithm for Hash join would be as follows :

Hash records of R, one by one, using A values
(Use same M buckets and same hash function h)
Hash matching pair of records into same bucket
End



2. Sort Merge Join :
Sort Merge Join as name suggests, has 2 phases in join algorithm, namely, sort phase and merge phase. Merge algorithm is fastest join algorithm. This is reason that Sort Merge Join is fastest in case of sorted relations. Assuming that 2 sorted relations R and S need to be merged, algorithm is as follows :

If R is sorted on A, S is sorted on B do
Merge R and S to get join result
End



Difference between Hash Join and Sort Merge Join :

S.No. Hash Join Sort Merge Join
1. It is specifically used in case of joining of larger tables. It is usually used to join two independent sources of data represented in a table.
2. It has best performance in case of large and sorted and non-indexed inputs. It is better than hash join in case of performance in large tables.
3. Two phases in this are build and probe. It consists of 2 phases consisting sort operation and merge operation.
4. Steps involved are building a Hash table on a small table. It is used to probe hash value of Hash table is applicable for each element in second row. First row from first table and second row from table is taken, if it is not end then, selected rows are checked for merger. If they can be merged, merged row is returned else next rows are taken from tables and steps are repeated until rows are exhausted.
5. It is not as fast as sort merge join in case of sorted tables. It is fastest join operation in case of sorted tables. This is because it uses merge phase and sort phase, where, if sort is already previously done, then merge is fastest operation.
6. Its types are classic hash join, Grace hash join, hybrid hash join, hash anti join, hash semi-join, recursive hash join and hash bailout. It does not have further classifications.
7. This join is automatically selected in case there is no specific reason to adopt other types of join algorithms. It is also known as go-to guy of all join operators. It is not automatically selected.

Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up

I am an Information Technology final year student I have done professional Digital Marketing(including Blogging) course from Lurn Inc, Rockville, USA I love learning and teaching Do Let me know how you like my articles Improvements are most welcome

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.