Introduction of B+ Tree
B + tree is a variation of B-tree data structure. In a B + tree, data pointers are stored only at the leaf nodes of the tree. In a B+ tree structure of a leaf node differs from the structure of internal nodes. The leaf nodes have an entry for every value of the search field, along with a data pointer to the record (or to the block that contains this record). The leaf nodes of the B+ tree are linked together to provide ordered access on the search field to the records. Internal nodes of a B+ tree are used to guide the search.Some search field values from the leaf nodes are repeated in the internal nodes of the B+ tree.
Features of B+ Trees
- Balanced: B+ Trees are self-balancing, which means that as data is added or removed from the tree, it automatically adjusts itself to maintain a balanced structure. This ensures that the search time remains relatively constant, regardless of the size of the tree.
- Multi-level: B+ Trees are multi-level data structures, with a root node at the top and one or more levels of internal nodes below it. The leaf nodes at the bottom level contain the actual data.
- Ordered: B+ Trees maintain the order of the keys in the tree, which makes it easy to perform range queries and other operations that require sorted data.
- Fan-out: B+ Trees have a high fan-out, which means that each node can have many child nodes. This reduces the height of the tree and increases the efficiency of searching and indexing operations.
- Cache-friendly: B+ Trees are designed to be cache-friendly, which means that they can take advantage of the caching mechanisms in modern computer architectures to improve performance.
- Disk-oriented: B+ Trees are often used for disk-based storage systems because they are efficient at storing and retrieving data from disk.
Why Use B+ Tree?
- B+ Trees are the best choice for storage systems with sluggish data access because they minimise I/O operations while facilitating efficient disc access.
- B+ Trees are a good choice for database systems and applications needing quick data retrieval because of its balanced structure, which guarantees predictable performance for a variety of activities and facilitates effective range-based queries.
Difference between B+ Tree and B Tree
Some differences between B+ Tree vs. B Tree
|Parameters||B+ Tree||B Tree|
|Structure||Separate leaf nodes for data storage and internal nodes for indexing||Nodes store both keys and data values|
|Leaf Nodes||Leaf nodes form a linked list for efficient range-based queries||Leaf nodes do not form a linked list|
|Order||Higher order (more keys)||Lower order (fewer keys)|
|Key Duplication||Typically allows key duplication in leaf nodes||Usually does not allow key duplication|
|Disk Access||Better disk access due to sequential reads in linked list structure||More disk I/O due to non-sequential reads in internal nodes|
|Applications||Database systems, file systems, where range queries are common||In-memory data structures, databases, general-purpose use|
|Performance||Better performance for range queries and bulk data retrieval||Balanced performance for search, insert, and delete operations|
|Memory Usage||Requires more memory for internal nodes||Requires less memory as keys and values are stored in the same node|
Implementation of B+ Tree
In order, to implement dynamic multilevel indexing, B-tree and B+ tree are generally employed. The drawback of the B-tree used for indexing, however, is that it stores the data pointer (a pointer to the disk file block containing the key value), corresponding to a particular key value, along with that key value in the node of a B-tree. This technique, greatly reduces the number of entries that can be packed into a node of a B-tree, thereby contributing to the increase in the number of levels in the B-tree, hence increasing the search time of a record. B+ tree eliminates the above drawback by storing data pointers only at the leaf nodes of the tree. Thus, the structure of leaf nodes of a B+ tree is quite different from the structure of internal nodes of the B tree. It may be noted here that, since data pointers are present only at the leaf nodes, the leaf nodes must necessarily store all the key values along with their corresponding data pointers to the disk file block, in order to access them.
Moreover, the leaf nodes are linked to providing ordered access to the records. The leaf nodes, therefore form the first level of the index, with the internal nodes forming the other levels of a multilevel index. Some of the key values of the leaf nodes also appear in the internal nodes, to simply act as a medium to control the searching of a record. From the above discussion, it is apparent that a B+ tree, unlike a B-tree has two orders, ‘a’ and ‘b’, one for the internal nodes and the other for the external (or leaf) nodes.
The structure of the internal nodes of a B+ tree of order ‘a’ is as follows:
- Each internal node is of the form: <P1, K1, P2, K2, ….., Pc-1, Kc-1, Pc> where c <= a and each Pi is a tree pointer (i.e points to another node of the tree) and, each Ki is a key-value (see diagram-I for reference).
- Every internal node has : K1 < K2 < …. < Kc-1
- For each search field values ‘X’ in the sub-tree pointed at by Pi, the following condition holds : Ki-1 < X <= Ki, for 1 < i < c and, Ki-1 < X, for i = c (See diagram I for reference)
- Each internal node has at most ‘a’ tree pointers.
- The root node has, at least two tree pointers, while the other internal nodes have at least \ceil(a/2) tree pointers each.
- If an internal node has ‘c’ pointers, c <= a, then it has ‘c – 1’ key values.
Diagram-I The structure of the leaf nodes of a B+ tree of order ‘b’ is as follows:
- Each leaf node is of the form: <<K1, D1>, <K2, D2>, ….., <Kc-1, Dc-1>, Pnext> where c <= b and each Di is a data pointer (i.e points to actual record in the disk whose key value is Ki or to a disk file block containing that record) and, each Ki is a key value and, Pnext points to next leaf node in the B+ tree (see diagram II for reference).
- Every leaf node has : K1 < K2 < …. < Kc-1, c <= b
- Each leaf node has at least \ceil(b/2) values.
- All leaf nodes are at the same level.
Diagram-II Using the Pnext pointer it is viable to traverse all the leaf nodes, just like a linked list, thereby achieving ordered access to the records stored in the disk.
Advantages of B+Trees
- A B+ tree with ‘l’ levels can store more entries in its internal nodes compared to a B-tree having the same ‘l’ levels. This accentuates the significant improvement made to the search time for any given key. Having lesser levels and the presence of Pnext pointers imply that the B+ trees is very quick and efficient in accessing records from disks.
- Data stored in a B+ tree can be accessed both sequentially and directly.
- It takes an equal number of disk accesses to fetch records.
- B+trees have redundant search keys, and storing search keys repeatedly is not possible.
Disadvantages of B+Trees
- The major drawback of B-tree is the difficulty of traversing the keys sequentially. The B+ tree retains the rapid random access property of the B-tree while also allowing rapid sequential access.
Application of B+ Trees
- Multilevel Indexing
- Faster operations on the tree (insertion, deletion, search)
- Database indexing
Please Login to comment...