Huffman Codes are codes that are part of a compression algorithm that generates a tree called Huffman Tree. This algorithm is implemented on text, image, and video compressions. The idea is to generate codes that will hold information while it is being light or small in bit size.
In this reading, we will try to use the Huffman Code to compress
a text. The idea is to calculate the frequency of each letter and then generate
a Huffman Tree. Then for each left child line or left adjacent, we give 0 value.
For each right child line or right adjacent, we give 1 value. The Huffman Code
for each letter is the combination of the value of the path to the letter in Huffman
Tree.
Let’s us consider text “DESIGN AND ANALYSIS OF ALGORITHMS”
to be compressed. First, we need to create a frequency letter for each letter of
the text.
S |
sp |
A |
I |
N |
D |
G |
L |
O |
E |
Y |
F |
R |
T |
H |
M |
4 |
4 |
4 |
3 |
3 |
2 |
2 |
2 |
2 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
*Note: sp = space
Then, we generate a Huffman Tree. The way we create Huffman
Tree is to select the minimum frequency letter then combine its frequency as the parent node value. This step is repeated until it reaches the topmost node as its
root.
Then from that Huffman Tree, we can get the Huffman Code for each letter. Please refers to this table below.
Character/Letter |
Huffman
Code |
Bit |
S |
000 |
3 bits |
sp |
001 |
3 bits |
A |
010 |
3 bits |
I |
0110 |
4 bits |
N |
0111 |
4 bits |
D |
1100 |
4 bits |
G |
1000 |
4 bits |
L |
1001 |
4 bits |
O |
1010 |
4 bits |
E |
1011 |
4 bits |
Y |
11010 |
5 bits |
F |
11011 |
5 bits |
R |
11100 |
5 bits |
T |
11101 |
5 bits |
H |
11110 |
5 bits |
M |
11111 |
5 bits |
Total Bit |
67 bits |
Therefore, for text “DESIGN AND ANALYSIS OF ALGORITHMS”, the
Huffman Code is described below.
o
D:1100(4 bits)
o
E:1011(4 bits)
o
S:000(3 bits)
o
I:0110(4 bits)
o
G:1000(4 bits)
o
N:0111(4 bits)
o
sp:001(3 bits)
o
A:010(3 bits)
o
N:0111(4 bits)
o
D:1100(4 bits)
o
sp:001(3 bits)
o
A:010(3 bits)
o
N:0111(4 bits)
o
A:010(3 bits)
o
L:1001(4 bits)
o
Y:11010(5 bits)
o
S:000(3 bits)
o
I:0110(4 bits)
o
S:000(3 bits)
o
sp:001(3 bits)
o
O:1010(4 bits)
o
F:11011(5 bits)
o
sp:001(3 bits)
o
A:010(3 bits)
o
L:1001(4 bits)
o
G:1000(4 bits)
o
O:1010(4 bits)
o
R:11100(5 bits)
o
I:0110(4 bits)
o
T:11101(5 bits)
o
H:11110(5 bits)
o
M:11111(5 bits)
o
S:000(3 bits)
Total
bits = 126 bits
126 bits is much smaller than its original size which is 264 bits. This is why Huffman is a quite popular algorithm for compressions.
No comments:
Post a Comment