Open In App

Difference between Hash Join and Sort Merge Join

Last Updated : 29 Jul, 2020
Like Article

1. Hash Join :
It is also known as “go-to-guy” in case of join operators. This means that in case no other join is preferred (maybe due to no sorting or indexing etc), then, Hash join is used. Hash join is best algorithm when large, unsorted, and non-indexed data (residing in tables) is to be joined. Hash join algorithm consists of probe phase and build phase.

In the case of 2 relations named R and S, algorithm for Hash join would be as follows :

Hash records of R, one by one, using A values
(Use same M buckets and same hash function h)
Hash matching pair of records into same bucket

2. Sort Merge Join :
Sort Merge Join as name suggests, has 2 phases in join algorithm, namely, sort phase and merge phase. Merge algorithm is fastest join algorithm. This is reason that Sort Merge Join is fastest in case of sorted relations. Assuming that 2 sorted relations R and S need to be merged, algorithm is as follows :

If R is sorted on A, S is sorted on B do
Merge R and S to get join result

Difference between Hash Join and Sort Merge Join :

S.No. Hash Join Sort Merge Join
1. It is specifically used in case of joining of larger tables. It is usually used to join two independent sources of data represented in a table.
2. It has best performance in case of large and sorted and non-indexed inputs. It is better than hash join in case of performance in large tables.
3. Two phases in this are build and probe. It consists of 2 phases consisting sort operation and merge operation.
4. Steps involved are building a Hash table on a small table. It is used to probe hash value of Hash table is applicable for each element in second row. First row from first table and second row from table is taken, if it is not end then, selected rows are checked for merger. If they can be merged, merged row is returned else next rows are taken from tables and steps are repeated until rows are exhausted.
5. It is not as fast as sort merge join in case of sorted tables. It is fastest join operation in case of sorted tables. This is because it uses merge phase and sort phase, where, if sort is already previously done, then merge is fastest operation.
6. Its types are classic hash join, Grace hash join, hybrid hash join, hash anti join, hash semi-join, recursive hash join and hash bailout. It does not have further classifications.
7. This join is automatically selected in case there is no specific reason to adopt other types of join algorithms. It is also known as go-to guy of all join operators. It is not automatically selected.

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads