Cache oblivious kd-Tree

Last Updated : 08 Mar, 2024

Cache-oblivious kd-tree data structures are a great utility that performs multi-dimensional orthogonal range searching. One of the salient terms of kd trees is binary space partitioning which periodically subdivides space into two convex sets by using hyperplanes as splitting.

This article focuses on discussing the following topics in detail-

Cache Oblivious kd-tree.
Van Emde Boas Layout.
Operation on Cache Oblivious kd tree.
Advantages of Cache Oblivious kd-tree.
Limitations of Cache Oblivious kd-tree.

Cache-Oblivious kd-tree

Cache oblivious kd tree answers queries and updates in a certain memory transfer.
The structure can be enlarged to support d-dimensional range queries with the same update bound.
To advance the structure some cache-oblivious structures like the van Emde Boas layout and exponential search trees are used.
The kd-tree which was put forward by Jon Louis Bentley is a binary tree of height O(log₂ N) with the N points stowed in the leaves of the tree. The internal nodes(which are subtrees too) show a periodic decomposition of the plane by means of axis-orthogonal lines that split the set of points into two subsets of equal size.
On even levels of the tree, the dividing lines are horizontal, and on odd levels they are vertical. In this way, a rectangular region A_v is spontaneously related with each node v, and the nodes on any distinct level of the tree partition the plane into disjoint regions. There are some aggregators to decide which value is going to the x-axis and y-axis of the plane, if for a specific rationale x-axis is chosen, all points lesser than x will go to the left subtree and all points are greater than x will go to the right sub-tree.

Partitioning of the plane

Van Emde Boas Layout

The Van Emde Boas layout is the calibre way of settling a balanced tree in memory such that a root-leaf path can be traversed efficaciously in the cache-obliviously model.
VEB has the properties to give access to the functioning of an associated array. Like Insert, Delete, Lookup, FindNext, FindPrevious, which are then used to lay out the key-value pairs in memory.
Van Emde Boas observed that if keys are regulated to the set {1, 2, 3, …, n} then all operations can be done in time O(log log n) and space O(n).

Operation on Cache Oblivious kd-tree

This section focuses on discussing the three operations on Cache Oblivious kd-tree-

Insertion
Querying
Deletion

Let’s discuss each of these operations in detail-

1. Insertion:

With the help of the Van Emde Boas layout, an exponential layout is elucidated. In this layout, a balanced binary tree T with N leaves is recursively decomposed into a set of components, which are each laid out using the Van Emde Boas layout’s INSERT property.

Let us assume for analysis of the structure, that N is of form 22 ^ C, C here is a non-negative integer.

Memory format: The grade level way of laying out a balanced binary tree is Van Emde Boas’s layout. The root-leaf path in this layout can traverse systematically in the cache-obliviously model.

A binary tree T of height O(log₂N) with N leaves can be laid out in O(N) adjacent memory locations, such that any root-leaf path can be traversed cache-obliviously in O(logN) memory transfers.

In this layout, a balanced binary tree T with N leaves is recursively decomposed into a set of components, which are each laid out using the Van Emde Boas layout. We define component C₀ to consist of the first 1/2 log₂ N levels of T. C₀ contains Θ(√N) nodes and is called an N-component because its root is the root of a tree (T) with N leaves.
The objective is to acquire the exponential layout of T, using Van Emde Boas Layout.
First of all, stock C₀using the van Emde Boas layout, accompanied immediately by the recursive layout of the √N subtrees, T₁, T₂, …, T_√N, of size √N, underneath C₀ in T, instructing from left to right. The recursion halts when a subtree has 2 leaves; such a 2-component is laid out in 3 consecutive memory locations.

Insertion

The meaning of the exponential layout naturally defines a decomposition of T into log₂log₂ N +2 layers, with layer i consisting of a number of N^{1/2^i−1}components. An X-component is of size Θ(√X) and its √X/2 leaves are connected to √X-components. Thus, the root of an X-component is the root of a kd-tree containing X points.

Insertion

2. Querying:

Contemplate on an exponential layout of a balanced binary tree F with W leaves, and let e be the root in a subtree F_e of F holding W leaves. Any traversal of F_e can be performed in O(1) memory transfers.
The node e is stocked inside an X-component with F ≤ X ≤ F². The X-component is of size O(F) and is therefore stored in O(1) blocks. Moreover, the part of F_e that is not included in the X-component is stored consecutively in memory in O(1) blocks. Consequently, the optimal paging strategy can make sure that any traversal of T is performed in O(1) memory transfers, simply by loading the O(1) relevant blocks.

Querying

How querying is done?

We periodically respond to a range query C on a cache-oblivious kd-tree Q starting at the root: at a node e we move along the query to a child e_c of e if C intersects the region Re_cassociated with e_c.
The grade way to bound the number of nodes in Q visited when responding to a query C, or analogous, the number of nodes e where R_e intersects C, is to first bound the number of nodes e where R_e intersects a vertical line j.
The region R_rassociated with the root r is obviously intersected by j, but as the regions associated with its two children represent a subdivision of R_r with a vertical line, only the region Rr_c associated with one of these children r_cis intersected. Because the region Rr_c is subdivided by a horizontal line, the regions associated with both children of r_care intersected.
As each of these children contains N/4 points, the recurrence for the number of regions intersected by j is-

C(N)=2+2C(N/4) = O(√N).

Consequently, we can say that the number of regions intersected by a horizontal line is O(√N). It means that the number of regions intersected by the boundary of C is O(√N). The number of additional nodes visited when answering C is bounded by O(Q), as their corresponding regions are completely contained in C. Thus, in total O(√N + Q) nodes are visited.
It supports updates in O( log₂ N/B · log_M/B N) = O(log²_B N) transfers. “B” here is a block of memory.

3. Deletion:

There are two invariants when we try to delete a point from a kd-tree laid out using the relaxed exponential layout. We have to find the relevant leaf W and remove it and its parent.
First Invariant: The removal of W can result in this invariant being contravene for each of the O(log log N) components along the path from the root of T to W.
Let C be the topmost component where Invariant 1 is contravened.
Let V be the root of C, and let T_V be the subtree rooted at V.
If T_V is an X-component then it contains X/2 − 1 point.
To restore the invariant, we first collect the X/2 − 1 point in T_Vas well as the X/2 ≤ X` ≤ 2X points in the subtree T_V`, rooted at the sibling V` of V, and destroy T_V, T_V`, and their parent Y.
We then construct a kd-tree T` on the collected K points. If X − 1 ≤ K ≤ 2X, we layout T` in the space previously occupied by T_v and T_v` using the exponential layout, and connect the grandparent of v to the root of T`.
In effect, we merge T_v and T_v`into T`. Since T` contains between X − 1 and 2X points in the first case, and T_U and T_U` each contain between X and 5X/4 points in the second case, we can in both of the above cases layout the constructed trees such that their roots are roots of an X-component. Thus, Invariant 1 is restored.

Deletion

The search for leaf W requires O(log_B N) memory transfers.
The total amortized cost of restoring the first invariant is-

∑ _i=0 ^{log log N}O( 1/B log_M/B N^1/2i) = O( 1/B log_B N) = O(log_B N) memory transfers.

Second Invariant: The removal of w can result in this invariant being violated in nodes on the path from the root of T to l.
Let v be the topmost node where Invariant 2 is contravened, and let K be the number of points in the subtree T_v rooted at v.
If v is in an X-component, the O(√X) subtrees T₁, … T_O(√X₎ of T_v below the X-component containing v use more than a 4K adjacent memory cell.
Invariant 2 can be restored by compressing all the subtrees.
First, compress a subtree T_i containing |T_i| nodes by traversing T_i and revising the nodes in |T_i| adjacent memory cells. The compressed layout of T_i is accompanied immediately in memory by the compressed layout of T_i+1 – effectively pushing all the unutilized space in each subtree past the end of the last subtree – now the subtrees use less than 2K adjacent memory cells and Invariant 2 is restored.

Advantages of Cache Oblivious kd-tree:

During the kd tree construction, in the RAM model, a kd-tree on N points can be established periodically in O(N log₂ N) time; the root dividing line is found using an O(N) time median algorithm, the points are allotted into two sets as claimed by to this line in O(N) time, and the two subtrees are constructed periodically.
Construction is fast.

Limitations of Cache Oblivious kd-tree:

Cache-oblivious multidimensional range searching data structures, a number of challenging problems like improving the kd-tree update bound, improving the space-bound of the range-tree, removing the block size assumption from the range-tree.
It is very fast. However, it can be a little expensive to maintain.

Suggest improvement

Count possible values of K such that A%K = B%K

Count of indices with value 1 after performing given operations sequentially

Share your thoughts in the comments

Cache oblivious kd-Tree

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?