Related Articles

# Arithmetic Encoding and Decoding Using MATLAB

• Difficulty Level : Expert
• Last Updated : 17 Aug, 2020

Arithmetic coding is a type of entropy encoding utilized in lossless data compression. Ordinarily, a string of characters, for example, the words “hey” is represented for utilizing a fixed number of bits per character.

In the most straightforward case, the probability of every symbol occurring is equivalent. For instance, think about a set of three symbols, A, B, and C, each similarly prone to happen. Straightforward block encoding would require 2 bits for every symbol, which is inefficient as one of the bit varieties is rarely utilized.
In other words, A = 00, B = 01, and C = 10, however, 11 is unused. An increasingly productive arrangement is to speak to a succession of these three symbols as a rational number in base 3 where every digit speaks to a symbol. For instance, the arrangement “ABBCAB” could become 0.011201. In arithmetic coding as an incentive in the stretch [0, 1). The subsequent stage is to encode this ternary number utilizing a fixed-guide paired number of adequate exactness toward recuperating it, for example, 0.00101100102 — this is just 10 bits. This is achievable for long arrangements in light of the fact that there are productive, set up calculations for changing over the base of subjectively exact numbers.

When all is said and done, arithmetic coders can deliver close ideal output for some random arrangement of symbols and probabilities (the ideal value is – `log2P` bits for every symbol of likelihood P). Compression algorithms that utilize arithmetic coding go by deciding a structure of the data – fundamentally an expectation of what examples will be found in the symbols of the message. The more precise this prediction is, the closer to ideal the output will be.

When all is said and done, each progression of the encoding procedure, aside from the absolute last, is the equivalent; the encoder has fundamentally only three bits of information to consider: The following symbol that should be encoded. The current span (at the very beginning of the encoding procedure, the stretch is set to [0,1], yet that will change) The probabilities the model allows to every one of the different symbols that are conceivable at this stage (as referenced prior, higher-request or versatile models imply that these probabilities are not really the equivalent in each progression.) The encoder isolates the current span into sub-spans, each speaking to a small amount of the current span relative to the likelihood of that symbol in the current setting. Whichever stretch relates to the real symbol that is close to being encoded turns into the span utilized in the subsequent stage. At the point when the sum total of what symbols have been encoded, the subsequent span unambiguously recognizes the arrangement of symbols that delivered it. Any individual who has a similar last span and model that is being utilized can remake the symbol succession that more likely than not entered the encoder to bring about that last stretch. It isn’t important to transmit the last stretch, in any case; it is just important to transmit one division that exists in that span. Specifically, it is just important to transmit enough digits (in whatever base) of the part so all divisions that start with those digits fall into the last stretch; this will ensure that the subsequent code is a prefix code.

Encoding

 `% input string``str = ``'GeeksforGeeks'``;``fprintf(``'The entered string is : %s\n'``, str);`` ` `% length of the string``len = length(str);``fprintf(``'The length of the string is : %d\n'``, len);`` ` `% get unique characters from the string``u = unique(str);``fprintf(``'The unique characters are : %s\n'``, u);`` ` `% length of the unique characters string``len_unique = length(u);``fprintf(``'The length of unique character string is : %d\n'``, len_unique);`` ` `% General lookup table`` ` `% get zeros of length of unique characters``z = zeros(1, len_unique);``p = zeros(1, len_unique);`` ` `for` `i = 1 : len_unique`` ` `   ``% in 'z' variable we will find the ``   ``% occurrence of each characters from 'str'  ``   ``z(i) = length(findstr(str, u(i)));`` ` `   ``% in 'p' variable we will get ``   ``% probability of those occurrences``   ``p(i) = z(i) / len;``end``display(z);``display(p);`` ` `% in 'cpr' variable we will get the cumulative ``% summation of 'p' from '1' till last value of 'p'``cpr = cumsum(p);`` ` `% in 'newcpr' variable we are taking ``% 'cpr' from '0' till last value of 'p' ``newcpr = [0 cpr];`` ` `display(cpr);``display(newcpr);`` ` `% make table till 'len_unique' size``for` `i = 1 : len_unique`` ` `   ``% in first column we are then ``   ``% placing 'newcpr' values``   ``interval(i, 1) = newcpr(i);`` ` `   ``% in second column we are ``   ``% placing 'cpr' values``   ``interval(i, 2) = cpr(i);``end`` ` `% Displaying the lookup table``display(``'The lookip table is : '``)``display(interval);`` ` `% Encoder Table`` ` `low = 0;``high = 1;``for` `i = 1 : len``   ``for` `j = 1 : len_unique`` ` `       ``% if the value from 'str'``       ``% matches with 'u' then``       ``if` `str(i) == u(j);``           ``pos = j;``           ``j = j + 1;  `` ` `           ``% displaying the matched length ``           ``% of unique characters``           ``display(pos);`` ` `           ``% getting the tag value ``           ``% of the matched character``           ``range = high - low;``           ``high = low + (range .* interval(pos, 2));``           ``low = low + (range .* interval(pos, 1));``           ``i = i + 1;``           ``break``       ``end``   ``end``end`` ` `% displaying tag value``tag = low;``display(tag);`

Output :

```The entered string is : GeeksforGeeks
The length of the string is : 13
The unique characters are : Gefkors
The length of unique character string is : 7

z =

2     4     1     2     1     1     2

p =

0.1538    0.3077    0.0769    0.1538    0.0769    0.0769    0.1538

cpr =

0.1538    0.4615    0.5385    0.6923    0.7692    0.8462    1.0000

newcpr =

0    0.1538    0.4615    0.5385    0.6923    0.7692    0.8462    1.0000

The lookip table is :

interval =

0    0.1538
0.1538    0.4615
0.4615    0.5385
0.5385    0.6923
0.6923    0.7692
0.7692    0.8462
0.8462    1.0000

pos =

1

pos =

2

pos =

2

pos =

4

pos =

7

pos =

3

pos =

5

pos =

6

pos =

1

pos =

2

pos =

2

pos =

4

pos =

7

tag =

0.0409
```

Decoding

 `str = ``''``;`` ` `for` `i = 1 : len``   ``for` `j = 1 : len_unique``      ` `       ``% If tag value falls between a certain ``       ``% range in lookup table then``       ``if` `tag > interval(j, 1) && tag < interval(j, 2)``           ``pos = j;``           ``tag = (tag - interval(pos, 1)) / p(j);`` ` `           ``% Geeting the matched tag ``           ``% value character``           ``decoded_str = u(pos);`` ` `           ``% String concatenating ``           ``% 'decoded_str' into 'str'``           ``str = strcat(str, decoded_str);``           ``break``       ``end``   ``end``end`` ` `% Displaying final decoded string``disp(str)`

Output :

`GeeksforGeeks`

My Personal Notes arrow_drop_up