For example, suppose we are given the same encoding tree above, and we are asked to decode a file containing the following bits: Leaf nodes represent characters, so once you reach a leaf, you output that character. If the bit is a 0, you move left in the tree. ![]() The decoding algorithm is to read each bit from the file, one at a time, and use this bit to traverse the Huffman tree. You can use a Huffman tree to decode text that was previously encoded with its binary patterns. But this will not cause problems in decoding the file, because Huffman encodings by definition have a useful prefix property where no character's encoding can ever occur as the start of another's encoding. It might worry you that the characters are stored without any delimiters between them, since their encodings can be different lengths and characters can cross byte boundaries, as with 'a' at the end of the second byte. You do not need to worry about this it is part of the underlying file system. Files are stored as sequences of whole bytes, so in cases like this the remaining digits of the last bit are filled with 0s. Since the character encodings have different lengths, often the length of a Huffman-encoded file does not come out to an exact multiple of 8 bits. Require 22 bits, or almost 3 bytes, compared to the original file of 10 bytes. The following table details the char-to-binary mapping in more detail. Using the preceding encoding map, the text "ab ab cab" would be encoded as: Using the encoding map, we can encode the file's text into a shorterīinary representation. This structure can be used to create an efficient encoding in the next step. Notice that the nodes with low frequencies end up far down in the tree, and nodes with high frequencies end up near The following diagram shows this process. This will be the root of our finished Huffman tree. This process is repeated until the queue contains only one binary tree node with all the othersĪs its children. The new node the first removed becomes the left child, and the second the right. Now the algorithm repeatedly removes the two nodes from the front of the queue (the two with the smallestįrequencies) and joins them into a new node whose frequency is their sum. (The priority queue is somewhat arbitrary in how it breaks ties, such as 'c' being before EOF and 'a' being before 'b'). The nodes are then put into a priority queue, which keeps them in prioritized order with smaller counts having higher priority, so that characters with lower counts will come out of the queue sooner. Step 2 of Huffman's algorithm places our counts into binary tree nodes, with each node storing a character and a count of its occurrences. Your program's output format should exactly match the abridged log of execution above. Version of that character to the destination file. encode data: Re-examine the source file's contents, and for each character, output the encoded binary.build encoding map: Traverse the binary tree to discover the binary encodings of each character. ![]() A priority queue is used to help build the tree along the way.
0 Comments
Leave a Reply. |