Explain extended huffman coding. What is the advantage of extended Huffman coding?

826views

written 7.6 years ago by

teamques10 ★ 70k

Extended Huffman Coding:

In applications where the alphabet size is large, pmax is generally quite small, and the amount of deviation from the entropy, especially in terms of a percentage of the rate, is quite small.

However, in cases where the alphabet is small and the probability of occurrence of the different letters is skewed, the value of pmax can be quite large and the Huffman code can become rather inefficient when compared to the entropy.

To overcome this inefficiency we use adaptive Huffman coding, the same can be illustrated with the help of following example:

Consider a source that puts out iid letters from the alphabet A = {a1, a2, a3} with the probability model P(a1) = 0.8, P(a2) = 0.02, and P(a3) = 0.18. The entropy for this source is 0.816 bits/symbol. A Huffman code for this source is shown in Table below

TABLE 1: The Huffman code.

Letter	Codeword
a1	0
a2	11
a3	10

The average length for this code is 1.2 bits/symbol. The difference between the average code length and the entropy, or the redundancy, for this code is 0.384 bits/symbol, which is 47% of the entropy. This means that to code this sequence we would need 47% more bits than the minimum required.

Now for the source described in the above example, instead of generating a codeword for every symbol, we will generate a codeword for every two symbols. If we look at the source sequence two at a time, the number of possible symbol pairs, or size of the extended alphabet, is 32 = 9. The extended alphabet, probability model, and Huffman code for this example are shown in Table below

TABLE 2: The extended alphabet and corresponding Huffman code.

Letter	Probability	Code
a1a1	0.64	0
a1a2	0.016	10101
a1a3	0.144	11
a2a1	0.016	10100
a2a2	0.0004	10100101
a2a3	0.0036	1010011
a3a1	0.1440	100
a3a2	0.0036	10100100
a3a3	0.0324	10100

The average codeword length for this extended code is 1.7228 bits/symbol. However, each symbol in the extended alphabet corresponds to two symbols from the original alphabet.

Therefore, in terms of the original alphabet, the average codeword length is 1.7228/2 = 0.8614 bits/symbol.

This redundancy is about 0.045 bits/symbol, which is only about 5.5% of the entropy.

Advantage of extended Huffman coding

We see that by coding blocks of symbols together we can reduce the redundancy of Huffman codes.

ADD COMMENT EDIT