Open In App

Cache oblivious kd-Tree

Last Updated : 08 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Cache-oblivious kd-tree data structures are a great utility that performs multi-dimensional orthogonal range searching. One of the salient terms of kd trees is binary space partitioning which periodically subdivides space into two convex sets by using hyperplanes as splitting.

This article focuses on discussing the following topics in detail- 

  1. Cache Oblivious kd-tree.
  2. Van Emde Boas Layout.
  3. Operation on Cache Oblivious kd tree.
  4. Advantages of Cache Oblivious kd-tree.
  5. Limitations of Cache Oblivious kd-tree.

Cache-Oblivious kd-tree

  • Cache oblivious kd tree answers queries and updates in a certain memory transfer.
  • The structure can be enlarged to support d-dimensional range queries with the same update bound.
  • To advance the structure some cache-oblivious structures like the van Emde Boas layout and exponential search trees are used.
  • The kd-tree which was put forward by Jon Louis Bentley is a binary tree of height O(log2 N) with the N points stowed in the leaves of the tree. The internal nodes(which are subtrees too) show a periodic decomposition of the plane by means of axis-orthogonal lines that split the set of points into two subsets of equal size.
  • On even levels of the tree, the dividing lines are horizontal, and on odd levels they are vertical. In this way, a rectangular region Av is spontaneously related with each node v, and the nodes on any distinct level of the tree partition the plane into disjoint regions. There are some aggregators to decide which value is going to the x-axis and y-axis of the plane, if for a specific rationale x-axis is chosen, all points lesser than x will go to the left subtree and all points are greater than x will go to the right sub-tree.
Cache Oblivious Kd-Tree

Partitioning of the plane

Van Emde Boas Layout 

  • The Van Emde Boas layout is the calibre way of settling a balanced tree in memory such that a root-leaf path can be traversed efficaciously in the cache-obliviously model.
  • VEB has the properties to give access to the functioning of an associated array. Like Insert, Delete, Lookup, FindNext, FindPrevious, which are then used to lay out the key-value pairs in memory.
  • Van Emde Boas observed that if keys are regulated to the set {1, 2, 3, …, n} then all operations can be done in time O(log log n) and space O(n).

Operation on Cache Oblivious kd-tree

This section focuses on discussing the three operations on Cache Oblivious kd-tree-

  1. Insertion
  2. Querying
  3. Deletion

Let’s discuss each of these operations in detail-

1. Insertion:

With the help of the Van Emde Boas layout, an exponential layout is elucidated. In this layout, a balanced binary tree T with N leaves is recursively decomposed into a set of components, which are each laid out using the Van Emde Boas layout’s INSERT property.

Let us assume for analysis of the structure, that N is of form 22 ^ C, C here is a non-negative integer.

  • Memory format: The grade level way of laying out a balanced binary tree is Van Emde Boas’s layout. The root-leaf path in this layout can traverse systematically in the cache-obliviously model.

A binary tree T of height O(log2 N) with N leaves can be laid out in O(N) adjacent memory locations, such that any root-leaf path can be traversed cache-obliviously in O(logN) memory transfers.

  • In this layout, a balanced binary tree T with N leaves is recursively decomposed into a set of components, which are each laid out using the Van Emde Boas layout. We define component C0 to consist of the first 1/2 log2 N levels of  T. C0 contains Θ(√N) nodes and is called an N-component because its root is the root of a tree (T) with N leaves.
  • The objective is to acquire the exponential layout of T, using Van Emde Boas Layout.
  • First of all, stock C0 using the van Emde Boas layout, accompanied immediately by the recursive layout of the √N subtrees, T1, T2, …, T√N, of size √N, underneath C0 in T, instructing from left to right. The recursion halts when a subtree has 2 leaves; such a 2-component is laid out in 3 consecutive memory locations.

Insertion

  • The meaning of the exponential layout naturally defines a decomposition of T into log2 log2 N +2 layers, with layer i consisting of a number of N1/2^i−1 components. An X-component is of size Θ(√X) and its √X/2 leaves are connected to √X-components. Thus, the root of an X-component is the root of a kd-tree containing X points.

Insertion

2. Querying:

  • Contemplate on an exponential layout of a balanced binary tree F with W leaves, and let e be the root in a subtree Fe of F holding W leaves. Any traversal of Fe can be performed in O(1) memory transfers.
  • The node e is stocked inside an X-component with F ≤ X ≤ F2. The X-component is of size O(F) and is therefore stored in O(1) blocks. Moreover, the part of Fe that is not included in the X-component is stored consecutively in memory in O(1) blocks. Consequently, the optimal paging strategy can make sure that any traversal of T is performed in O(1) memory transfers, simply by loading the O(1) relevant blocks.

Querying

How querying is done?

  • We periodically respond to a range query C on a cache-oblivious kd-tree Q starting at the root: at a node e we move along the query to a child ec of e if C intersects the region Rec associated with ec.
  • The grade way to bound the number of nodes in Q visited when responding to a query C, or analogous, the number of nodes e where Re intersects C, is to first bound the number of nodes e where Re intersects a vertical line j.
  • The region Rr associated with the root r is obviously intersected by j, but as the regions associated with its two children represent a subdivision of Rr with a vertical line, only the region Rrc associated with one of these children rc is intersected. Because the region Rrc is subdivided by a horizontal line, the regions associated with both children of rc are intersected.
  • As each of these children contains N/4 points, the recurrence for the number of regions intersected by j is-

C(N)=2+2C(N/4) = O(√N). 

  • Consequently, we can say that the number of regions intersected by a horizontal line is O(√N). It means that the number of regions intersected by the boundary of C is O(√N). The number of additional nodes visited when answering C is bounded by O(Q), as their corresponding regions are completely contained in C. Thus, in total O(√N + Q) nodes are visited.
  • It supports updates in O( log2 N/B · logM/B N) = O(log2B N) transfers. “B” here is a block of memory.

3. Deletion:

  • There are two invariants when we try to delete a point from a kd-tree laid out using the relaxed exponential layout. We have to find the relevant leaf W and remove it and its parent.
  • First Invariant: The removal of W can result in this invariant being contravene for each of the O(log log N) components along the path from the root of T to W.
  • Let C be the topmost component where Invariant 1 is contravened.
  • Let V be the root of C, and let TV be the subtree rooted at V.
  • If TV is an X-component then it contains X/2 − 1 point.
  • To restore the invariant, we first collect the X/2 − 1 point in TV as well as the X/2 ≤ X` ≤ 2X points in the subtree TV`, rooted at the sibling V` of V, and destroy TV, TV`, and their parent Y.
  • We then construct a kd-tree T` on the collected K points. If X − 1 ≤ K ≤ 2X, we layout T` in the space previously occupied by Tv and Tv` using the exponential layout, and connect the grandparent of v to the root of T`.
  • In effect, we merge Tv and Tv` into T`. Since T` contains between X − 1 and 2X points in the first case, and TU and TU` each contain between X and 5X/4 points in the second case, we can in both of the above cases layout the constructed trees such that their roots are roots of an X-component. Thus, Invariant 1 is restored.

Deletion

  • The search for leaf W requires O(logB N) memory transfers.
  • The total amortized cost of restoring the first invariant is-

∑ i=0 log log N O( 1/B logM/B N1/2i) = O( 1/B logB N) = O(logB N) memory transfers.

  • Second Invariant:  The removal of w can result in this invariant being violated in nodes on the path from the root of T to l.
  • Let v be the topmost node where Invariant 2 is contravened, and let K be the number of points in the subtree Tv rooted at v.
  • If v is in an X-component, the O(√X) subtrees T1, … TO(√X) of Tv below the X-component containing v use more than a 4K adjacent memory cell.
  • Invariant 2 can be restored by compressing all the subtrees.
  • First, compress a subtree Ti containing |Ti| nodes by traversing Ti and revising the nodes in |Ti| adjacent memory cells. The compressed layout of Ti is accompanied immediately in memory by the compressed layout of Ti+1 – effectively pushing all the unutilized space in each subtree past the end of the last subtree – now the subtrees use less than 2K adjacent memory cells and Invariant 2 is restored.

Advantages of Cache Oblivious kd-tree:

  • During the kd tree construction, in the RAM model, a kd-tree on N points can be established periodically in O(N log2 N) time; the root dividing line is found using an O(N) time median algorithm, the points are allotted into two sets as claimed by to this line in O(N) time, and the two subtrees are constructed periodically.
  • Construction is fast.

Limitations of Cache Oblivious kd-tree:

  • Cache-oblivious multidimensional range searching data structures, a number of challenging problems like improving the kd-tree update bound, improving the space-bound of the range-tree, removing the block size assumption from the range-tree.
  • It is very fast. However, it can be a little expensive to maintain.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads