Arithmetic Encoding and Decoding Using MATLAB
Arithmetic coding is a type of entropy encoding utilized in lossless data compression. Ordinarily, a string of characters, for example, the words “hey” is represented for utilizing a fixed number of bits per character.
In the most straightforward case, the probability of every symbol occurring is equivalent. For instance, think about a set of three symbols, A, B, and C, each similarly prone to happen. Straightforward block encoding would require 2 bits for every symbol, which is inefficient as one of the bit varieties is rarely utilized. In other words, A = 00, B = 01, and C = 10, however, 11 is unused. An increasingly productive arrangement is to speak to a succession of these three symbols as a rational number in base 3 where every digit speaks to a symbol. For instance, the arrangement “ABBCAB” could become 0.011201. In arithmetic coding as an incentive in the stretch [0, 1). The subsequent stage is to encode this ternary number utilizing a fixed-guide paired number of adequate exactness toward recuperating it, for example, 0.00101100102 — this is just 10 bits. This is achievable for long arrangements in light of the fact that there are productive, set up calculations for changing over the base of subjectively exact numbers.
When all is said and done, arithmetic coders can deliver close ideal output for some random arrangement of symbols and probabilities (the ideal value is – log2P bits for every symbol of likelihood P). Compression algorithms that utilize arithmetic coding go by deciding a structure of the data – fundamentally an expectation of what examples will be found in the symbols of the message. The more precise this prediction is, the closer to ideal the output will be.
When all is said and done, each progression of the encoding procedure, aside from the absolute last, is the equivalent; the encoder has fundamentally only three bits of information to consider: The following symbol that should be encoded. The current span (at the very beginning of the encoding procedure, the stretch is set to [0,1], yet that will change) The probabilities the model allows to every one of the different symbols that are conceivable at this stage (as referenced prior, higher-request or versatile models imply that these probabilities are not really the equivalent in each progression.) The encoder isolates the current span into sub-spans, each speaking to a small amount of the current span relative to the likelihood of that symbol in the current setting. Whichever stretch relates to the real symbol that is close to being encoded turns into the span utilized in the subsequent stage. At the point when the sum total of what symbols have been encoded, the subsequent span unambiguously recognizes the arrangement of symbols that delivered it. Any individual who has a similar last span and model that is being utilized can remake the symbol succession that more likely than not entered the encoder to bring about that last stretch. It isn’t important to transmit the last stretch, in any case; it is just important to transmit one division that exists in that span. Specifically, it is just important to transmit enough digits (in whatever base) of the part so all divisions that start with those digits fall into the last stretch; this will ensure that the subsequent code is a prefix code.
Encoding
MATLAB
str = 'GeeksforGeeks' ;
fprintf( 'The entered string is : %s\n' , str);
len = length(str);
fprintf( 'The length of the string is : %d\n' , len);
u = unique(str);
fprintf( 'The unique characters are : %s\n' , u);
len_unique = length(u);
fprintf( 'The length of unique character string is : %d\n' , len_unique);
z = zeros(1, len_unique);
p = zeros(1, len_unique);
for i = 1 : len_unique
z(i) = length(findstr(str, u(i)));
p(i) = z(i) / len;
end
display(z);
display(p);
cpr = cumsum(p);
newcpr = [0 cpr];
display(cpr);
display(newcpr);
for i = 1 : len_unique
interval(i, 1) = newcpr(i);
interval(i, 2) = cpr(i);
end
display( 'The lookip table is : ' )
display(interval);
low = 0;
high = 1;
for i = 1 : len
for j = 1 : len_unique
if str(i) == u(j);
pos = j;
j = j + 1;
display(pos);
range = high - low;
high = low + (range .* interval(pos, 2));
low = low + (range .* interval(pos, 1));
i = i + 1;
break
end
end
end
tag = low;
display(tag);
|
Output :
The entered string is : GeeksforGeeks
The length of the string is : 13
The unique characters are : Gefkors
The length of unique character string is : 7
z =
2 4 1 2 1 1 2
p =
0.1538 0.3077 0.0769 0.1538 0.0769 0.0769 0.1538
cpr =
0.1538 0.4615 0.5385 0.6923 0.7692 0.8462 1.0000
newcpr =
0 0.1538 0.4615 0.5385 0.6923 0.7692 0.8462 1.0000
The lookip table is :
interval =
0 0.1538
0.1538 0.4615
0.4615 0.5385
0.5385 0.6923
0.6923 0.7692
0.7692 0.8462
0.8462 1.0000
pos =
1
pos =
2
pos =
2
pos =
4
pos =
7
pos =
3
pos =
5
pos =
6
pos =
1
pos =
2
pos =
2
pos =
4
pos =
7
tag =
0.0409
Decoding
MATLAB
str = '' ;
for i = 1 : len
for j = 1 : len_unique
if tag > interval(j, 1) && tag < interval(j, 2)
pos = j;
tag = (tag - interval(pos, 1)) / p(j);
decoded_str = u(pos);
str = strcat(str, decoded_str);
break
end
end
end
disp(str)
|
Output :
GeeksforGeeks
Last Updated :
01 Jun, 2022
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...