Contact Form

Name

Email *

Message *

Cari Blog Ini

Burrows Wheeler Transformation

The Burrows-Wheeler Transform: A Data Preparation Technique

What is the Burrows-Wheeler Transform?

The Burrows-Wheeler Transform (BWT) is a data transformation algorithm that restructures data in a way that enhances its compressibility. It is a reversible permutation of the characters of a string, with one procedure for converting a string into its BWT and another for inverting the process.

Motivation for the Burrows-Wheeler Transform

The BWT was developed for use in data compression techniques like bzip2. It was found that the BWT tends to group similar characters together, which makes it easier for subsequent compression algorithms to exploit repetition and reduce file size.

Applications of the Burrows-Wheeler Transform

The Burrows-Wheeler Transform has found applications in various areas, including: * Data Compression: As mentioned above, the BWT is the core algorithm in the popular bzip2 compression tool. * Suffix Arrays and Pattern Matching: The BWT can be used to construct suffix arrays, which are data structures that facilitate efficient pattern matching and other string analysis tasks. * Genome Sequencing: The BWT is used in short read mapping algorithms, which align short DNA sequences to a reference genome for the purpose of genome resequencing.

Implementing the Burrows-Wheeler Transform

The Burrows-Wheeler Transform can be implemented using a linear-time algorithm that takes O(n) space, where n is the length of the input string. The algorithm involves the following steps: 1. Sort the rotations of the input string lexicographically. 2. Extract the last column of the sorted rotations to form the BWT.

Inversion of the Burrows-Wheeler Transform

The BWT is invertible, meaning that the original string can be recovered from its BWT. The inverse BWT algorithm uses a last-to-first approach, where the last character of the original string is recovered first, followed by the second-to-last character, and so on.

Conclusion

The Burrows-Wheeler Transform is a powerful data transformation technique that has gained widespread adoption in data compression and other applications. Its ability to group similar characters together makes it an effective preprocessing step for subsequent compression algorithms. The BWT is invertible, making it possible to recover the original string from its transformed form.


Comments