Information about Adaptive Huffman Coding

Adaptive Huffman coding (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding, building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data.

The benefit of one-pass procedure is that the source can be encoded in real time, though it becomes more sensitive to transmission errors, since just a single loss ruins the whole code.

Algorithms

There are a number of implementations of this method, the most notable are FGK (Faller-Gallager-Knuth) and Vitter algorithm.

Vitter algorithm

Code is represented as a tree structure in which every node has a corresponding weight and a unique number.

Numbers go down, and from right to left.

Weights must suffice sibling property, that is what nodes can be listed in order of nonincreasing weight with each node adjacent to its sibling. Thus if A is parent node of B and node C is child of B, then .

The weight is merely the count of symbols transmitted which codes are associated with children of that node.

A set of nodes with same weights make a block.

To get the code for every node, in case of binary tree we could just traverse all the path from the root to the node, writing down (for example) "1" if we go to the right and "0" if we go to the left.

We need some general and straightforward method to transmit symbols which are not yet transmitted (NYT), we could use, for example, transmission of binary numbers for every symbol in alphabet.

Encoder and decoder start with only the root node, which has the maximum number. In the beginning it is our initial NYT node.

When we transmit an NYT symbol we have to transmit code for the NYT node, then for its generic code.

For every symbol which is already in the tree we only have to transmit code for its leaf node.

For every symbol transmitted on both sides we must execute update procedure:

1. If current symbol is NYT, add two child nodes to NYT node, one will be a new NYT node the other is leaf node for our symbol, increase weight for new leaf node and old NYT, go to step 4, else go to symbol's leaf node.

2. If this node does not have the highest number in a block swap it with which has the highest number

3. Increase weight for current node

4. If this is not the root node go to parent node, go to step 2, else end.

Note: swapping nodes means swapping weights and corresponding symbols, but not the numbers.

Example

Developing adapive Huffman tree


Start with empty tree.

For "a" transmit its binary code.

NYT spawns two child nodes 254 and 255.

Weight for root is now 1.

Now code for "a", associated with node 255 is 1.

For "b" transmit 0 for NYT node, then its binary code.

NYT spawns two child nodes 252 for NYT and 253 for leaf node.

Increase weights for 253 and 254 and root.

For second "b" transmit 01.

Go to its leaf node 253, we have a block of weights of 1 and the biggest number in the block is 255, so swap the nodes, increase weight, go to root, increase weight for root.

Future code for "b" is 1, and for "a" is now 01, which reflects their frequency.

External links

Adaptive coding refers to variants of entropy encoding methods of lossless data compression. They are particularly suited to streaming data, as they adapt to localized changes in the characteristics of the data, and don't require a first pass over the data to calculate a
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.
Newton Faller (January 25, 1947–1996) the son of Kurt Faller and Ada Faller from Rio Grande do Sul, was a Brazilian computer scientist and electrical engineer. He is credited with the discovery of adaptive Huffman codes while an employee of IBM do Brasil in Rio.
..... Click the link for more information.
Robert G. Gallager (born May 29, 1931 in Philadelphia, PA) is an American electrical engineer known for his work on information theory and communications networks. He was elected an IEEE Fellow in 1968 and a member of the NAE in 1979.
..... Click the link for more information.
Donald Ervin Knuth

Photographed by Jacob Appelbaum, 25 October 2005
Born January 10 1938 (1938--)
..... Click the link for more information.
Jeffrey Scott Vitter (born 1955 in New Orleans, Louisiana) serves as the Frederick L. Hovde Dean of Science at Purdue University in West Lafayette, Indiana. Previously he was the Gilbert, Louis, and Edward Lehrman Professor of Computer Science and department chair at Duke
..... Click the link for more information.
The National Institute of Standards and Technology (NIST), known between 1901–1988 as the National Bureau of Standards (NBS), is a non-regulatory agency of the United States Department of Commerce. The institute's mission is to promote U.S.
..... Click the link for more information.
The Dictionary of Algorithms and Data Structures is a dictionary style reference for many algorithms, "algorithmic techniques", "archetypal problems" and data structures found in the field of Computer Science [1]. The dictionary is maintained by Paul E.
..... Click the link for more information.
data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes.
..... Click the link for more information.
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the
..... Click the link for more information.
Information theory is a branch of applied mathematics and engineering involving the quantification of information to find fundamental limits on compressing and reliably communicating data.
..... Click the link for more information.
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.

Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity
..... Click the link for more information.
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data.
..... Click the link for more information.
In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.

One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.
Arithmetic coding is a method for lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code.
..... Click the link for more information.
Range encoding is a form of arithmetic coding, a data compression method, that is believed to be free from arithmetic coding related patents. It is on this basis that interest in range encoding has arisen, particularly in the open source community.
..... Click the link for more information.

..... Click the link for more information.
An Exponential-Golomb code (or just Exp-Golomb code) of order is a type of universal code, parameterized by a whole number . To encode a nonnegative integer in an order- exp-Golomb code, one can use the following method:

..... Click the link for more information.
universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic (i.e.
..... Click the link for more information.
Elias gamma code is a universal code encoding positive integers. It is used most commonly when coding integers whose upper-bound cannot be determined beforehand.

To code a number:
  1. Write it in binary.

..... Click the link for more information.
Fibonacci coding is a universal code which encodes positive integers into binary code words. All tokens end with "11" and have no "11" before the end.

The formula used to generate Fibonacci codes is:


where F(i) is the i-th Fibonacci number.
..... Click the link for more information.
A dictionary coder, also sometimes known as a substitution coder, is any of a number of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary')
..... Click the link for more information.
LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively [1] .
..... Click the link for more information.
Lempel-Ziv-Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978.
..... Click the link for more information.
Lempel-Ziv-Oberhumer (LZO) is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.

A free software tool which implements it is lzop.
..... Click the link for more information.
DEFLATE is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool, and was later specified in RFC 1951.
..... Click the link for more information.
The Lempel-Ziv-Markov chain-Algorithm (LZMA) is an algorithm for data compression in development since 1998[1] and used in the 7z format of the 7-Zip archiver.
..... Click the link for more information.
LZX is the name of an LZ77 family compression algorithm. It is also the name of a file archiver with the same name. Both were invented by Jonathan Forbes and Tomi Poutanen.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter