# Inverting the Burrows – Wheeler Transform

**Prerequisite: ** Burrows – Wheeler Data Transform Algorithm

**Why inverse of BWT? The main idea behind it:**

1. The remarkable thing about BWT algorithm is that this particular transform is invertible with minimal data overhead.

2. To compute inverse of BWT is to undo the BWT and recover the original string. The naive method of implementing this algorithm can be studied from **here**. The naive approach is speed and memory intensive and requires us to store |text| cyclic rotations of the string |text|.

3. Let’s discuss a faster algorithm where we have with us only two things:

i. **bwt_arr[]** which is the **last column of sorted rotations** list given as **“annb$aa”**.

ii. **‘x’** which is the row index at which our original string **“banana$”** appears in the sorted rotations list. We can see that **‘x’ is 4** in the example below.

Row Index Original Rotations Sorted Rotations ~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~ 0 banana$ $banana 1 anana$b a$banan 2 nana$ba ana$ban 3 ana$ban anana$b *4 na$bana banana$ 5 a$banan na$bana 6 $banana nana$ba

4. ** An important observation: ** If the jth original rotation (which is original rotation shifted j characters to the left) is the ith row in the sorted order, then **l_shift[i]** records in the sorted order where (j+1)st original rotation appears. For example, the 0th original rotation **“banana$”** is row 4 of sorted order, and since l_shift[4] is 3, the next original rotation **“anana$b”** is row 3 of the sorted order.

Row Index Original Rotations Sorted Rotations l_shift ~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~ ~~~~~~~ 0 banana$ $banana 4 1 anana$b a$banan 0 2 nana$ba ana$ban 5 3 ana$ban anana$b 6 *4 na$bana banana$ 3 5 a$banan na$bana 1 6 $banana nana$ba 2

5. Our job is to deduce **l_shift[]** from the information available to us which is **bwt_arr[]** and **‘x’** and with its help compute the inverse of BWT.

**How to compute l_shift[] ?**

1. We know BWT which is **“annb$aa”**. This implies that we know all the characters of our original string, even though they’re permuted in wrong order.

2. By sorting **bwt_arr[]**, we can reconstruct first column of sorted rotations list and we call it **sorted_bwt[]**.

Row Index Sorted Rotations bwt_arr l_shift ~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ 0 $ ? ? ? ? ? a 4 1 a ? ? ? ? ? n 2 a ? ? ? ? ? n 3 a ? ? ? ? ? b *4 b ? ? ? ? ? $ 3 5 n ? ? ? ? ? a 6 n ? ? ? ? ? a

3. Since **‘$’** occurs only once in the string **‘sorted_bwt[]’** and rotations are formed using cyclic wrap around, we can deduce that **l_shift[0] = 4.** Similarly, **‘b’** occurs once, so we can deduce that **l_shift[4] = 3.**

4. But, because **‘n’** appears twice, it seems ambiguous **whether l_shift[5] = 1 and l_shift[6] = 2 or whether l_shift[5] = 2 and l_shift[6] = 1.**

5. Rule to solve this ambiguity is that **if rows i and j both start with the same letter and i<j, then l_shift[i] < l_shift[j]**. This implies l_shift[5] = 1 and l_shift[6] =2. Continuing in a similar fashion, **l_shift[]** gets computed to the following.

Row Index Sorted Rotations bwt_arr l_shift ~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ 0 $ ? ? ? ? ? a 4 1 a ? ? ? ? ? n 0 2 a ? ? ? ? ? n 5 3 a ? ? ? ? ? b 6 *4 b ? ? ? ? ? $ 3 5 n ? ? ? ? ? a 1 6 n ? ? ? ? ? a 2

** Why is the ambiguity resolving rule valid? **

1. The rotations are sorted in such a way that row 5 is lexicographically less than row 6.

2. Thus, the five unknown characters in row 5 must be less than the five unknown characters in row 6 (as both start with **‘n’**).

3. We also know that between the two rows than end with **‘n’**, row 1 is lower than row 2.

4. But, the five unknown characters in rows 5 and 6 are precisely the first five characters in rows 1 and 2 or this would contradict the fact that rotations were sorted.

5. Thus, **l_shift[5] = 1** and **l_shift[6] = 2.**

** Way of implementation:**

1.** Sort BWT: ** Using **qsort()**, we arrange characters of **bwt_arr[]** in sorted order and store it in **sorted_arr[]**.

2.** Compute l_shift[]: **

i. We take an array of pointers **struct node *arr[]**, each of which points to a linked list.

ii. Making each distinct character of **bwt_arr[]** a head node of a linked list, we append nodes to the linked list whose data part contains index at which that character occurs in **bwt_arr[]**.

i *arr[128] Linked Lists ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~ 37 $ -----> 4 -> NULL 97 a -----> 0 -> 5 -> 6 -> NULL 110 n -----> 1 -> 2 -> NULL 98 b -----> 3 -> NULL

iii. Making distinct characters of **sorted_bwt[]** heads of linked lists, we traverse linked lists and get corresponding **l_shift[]** values.

int[] l_shift = { 4, 0, 5, 6, 3, 1, 2 };

3. Iterating string length times, we decode BWT with ** x = l_shift[x] ** and output ** bwt_arr[x].**

x = l_shift[4] x = 3 bwt_arr[3] = 'b' x = l_shift[3] x = 6 bwt_arr[6] = 'a'

Examples:

Input : annb$aa // Burrows - Wheeler Transform 4 // Row index at which original message // appears in sorted rotations list Output : banana$ Input : ard$rcaaaabb 3 Output : abracadabra$

Following is the C code for way of implementation explained above:

`// C program to find inverse of Burrows ` `// Wheeler transform ` `#include <stdio.h> ` `#include <stdlib.h> ` `#include <string.h> ` ` ` `// Structure to store info of a node of ` `// linked list ` `struct` `node { ` ` ` `int` `data; ` ` ` `struct` `node* next; ` `}; ` ` ` `// Compares the characters of bwt_arr[] ` `// and sorts them alphabetically ` `int` `cmpfunc(` `const` `void` `* a, ` `const` `void` `* b) ` `{ ` ` ` `const` `char` `* ia = (` `const` `char` `*)a; ` ` ` `const` `char` `* ib = (` `const` `char` `*)b; ` ` ` `return` `strcmp` `(ia, ib); ` `} ` ` ` `// Creates the new node ` `struct` `node* getNode(` `int` `i) ` `{ ` ` ` `struct` `node* nn = ` ` ` `(` `struct` `node*)` `malloc` `(` `sizeof` `(` `struct` `node)); ` ` ` `nn->data = i; ` ` ` `nn->next = NULL; ` ` ` `return` `nn; ` `} ` ` ` `// Does insertion at end in the linked list ` `void` `addAtLast(` `struct` `node** head, ` `struct` `node* nn) ` `{ ` ` ` `if` `(*head == NULL) { ` ` ` `*head = nn; ` ` ` `return` `; ` ` ` `} ` ` ` `struct` `node* temp = *head; ` ` ` `while` `(temp->next != NULL) ` ` ` `temp = temp->next; ` ` ` `temp->next = nn; ` `} ` ` ` `// Computes l_shift[] ` `void` `* computeLShift(` `struct` `node** head, ` `int` `index, ` ` ` `int` `* l_shift) ` `{ ` ` ` `l_shift[index] = (*head)->data; ` ` ` `(*head) = (*head)->next; ` `} ` ` ` `void` `invert(` `char` `bwt_arr[]) ` `{ ` ` ` `int` `i,len_bwt = ` `strlen` `(bwt_arr); ` ` ` `char` `* sorted_bwt = (` `char` `*)` `malloc` `(len_bwt * ` `sizeof` `(` `char` `)); ` ` ` `strcpy` `(sorted_bwt, bwt_arr); ` ` ` `int` `* l_shift = (` `int` `*)` `malloc` `(len_bwt * ` `sizeof` `(` `int` `)); ` ` ` ` ` `// Index at which original string appears ` ` ` `// in the sorted rotations list ` ` ` `int` `x = 4; ` ` ` ` ` `// Sorts the characters of bwt_arr[] alphabetically ` ` ` `qsort` `(sorted_bwt, len_bwt, ` `sizeof` `(` `char` `), cmpfunc); ` ` ` ` ` `// Array of pointers that act as head nodes ` ` ` `// to linked lists created to compute l_shift[] ` ` ` `struct` `node* arr[128] = { NULL }; ` ` ` ` ` `// Takes each distinct character of bwt_arr[] as head ` ` ` `// of a linked list and appends to it the new node ` ` ` `// whose data part contains index at which ` ` ` `// character occurs in bwt_arr[] ` ` ` `for` `(i = 0; i < len_bwt; i++) { ` ` ` `struct` `node* nn = getNode(i); ` ` ` `addAtLast(&arr[bwt_arr[i]], nn); ` ` ` `} ` ` ` ` ` `// Takes each distinct character of sorted_arr[] as head ` ` ` `// of a linked list and finds l_shift[] ` ` ` `for` `(i = 0; i < len_bwt; i++) ` ` ` `computeLShift(&arr[sorted_bwt[i]], i, l_shift); ` ` ` ` ` `printf` `(` `"Burrows - Wheeler Transform: %s\n"` `, bwt_arr); ` ` ` `printf` `(` `"Inverse of Burrows - Wheeler Transform: "` `); ` ` ` `// Decodes the bwt ` ` ` `for` `(i = 0; i < len_bwt; i++) { ` ` ` `x = l_shift[x]; ` ` ` `printf` `(` `"%c"` `, bwt_arr[x]); ` ` ` `} ` `} ` ` ` `// Driver program to test functions above ` `int` `main() ` `{ ` ` ` `char` `bwt_arr[] = ` `"annb$aa"` `; ` ` ` `invert(bwt_arr); ` ` ` `return` `0; ` `} ` |

*chevron_right*

*filter_none*

Output:

Burrows - Wheeler Transform: annb$aa Inverse of Burrows - Wheeler Transform: banana$

** Time Complexity:** O(nLogn) as qsort() takes O(nLogn) time.

** Exercise:** Implement inverse of Inverse of Burrows – Wheeler Transform in O(n) time.

** Source:**

http://www.cs.princeton.edu/courses/archive/fall07/cos226/assignments/burrows.html

This article is contributed by **Anureet Kaur**. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

## Recommended Posts:

- Burrows - Wheeler Data Transform Algorithm
- Inverting the Move to Front Transform
- Transform the string
- Check if it is possible to transform one string to another
- Transform a string such that it has abcd..z as a subsequence
- Transform One String to Another using Minimum Number of Given Operation
- Move To Front Data Transform Algorithm
- Map every character of one string to another such that all occurrences are mapped to the same character
- Implement a Dictionary using Trie
- Case-specific sorting of Strings in O(n) time and O(1) space
- Distinct strings such that they contains given strings as sub-sequences
- Strings from an array which are not prefix of any other string
- Generate all possible strings such that char at index i is either str1[i] or str2[i]
- Maximum length palindromic substring such that it starts and ends with given char