Advanced compression and encoding techniques in HP Vertica
The main aim for us to storing the data in a computer system is to reduce the amount of memory space the data would occupy.HP vertica uses encoding and compression techniques to optimize the storage of the data and also to improve the query performance.
Encoding is the process of converting the data into some standard format and this encoded data can be processed directly by vertica.
The most commonly used encoding techniques in vertica are:
- Run length encoding (RLE)
- Delta val encoding.
Compression is the method of compacting the data and this cannot be directly understood by vertica. First the data has to be decompressed and only this decompressed data can be understood by vertica. Most commonly used compression technique is:
- LZO (Lempel-Ziv-Oberhumer-based) compression
Run length encoding (RLE):
RLE is used to compress data that is sorted with few distinct values in the column.
In deltaval encoding the data can the sorted or unsorted and the values are not very distinct.The difference between the data is found out keeping one of the data as the base.
LZO compression is used when the data is unsorted the data is very distinct.LZO compression removes the spaces between the characters and encodes them, much like compressing a document before we send a email.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.