Information about Variable Length Coding
This article is about the transmission of data across noisy channels. For the storage of text in computers, see variable-width encoding.
In coding theory a variable-length code is a code which maps source symbols to a variable number of bits.
Variable-length codes can allow sources to be compressed and decompressed with zero error (lossless data compression) and still be read back symbol by symbol. With the right coding strategy an i.i.d. source may be compressed almost arbitrarily close to its entropy. This is in contrast to fixed length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure.
Some examples of well-known variable-length coding strategies are Huffman coding, Lempel-Ziv coding and arithmetic coding.
Extension of a code
The extension of a code is the mapping of finite length source sequences to finite length bit strings, that is obtained by concatenating for each symbol of the source sequence the corresponding codeword produced by the original code.Classes of variable-length codes
Variable-length codes can be strictly nested in order of decreasing generality as non-singular, uniquely decodable and instantaneous (prefix free). Instantaneous codes are always uniquely decodable, which in turn are always non-singular :Non-singular codes
A code is non-singular if each source symbol is mapped to a different non-empty bit string, i.e. the mapping from source symbols to bit strings is one-to-one.- For example the mapping M1 = {(a,0), (b,0), (c,1)} is not non-singular because both "a" and "b" map to the same bit string "0" ; any extension of this mapping will generate a lossy (non-lossless) coding. Such singular coding may still be useful when some loss of information is acceptable (for example when such code is used in audio or video compression, where a lossy coding becomes equivalent to source quantification).
- However, the mapping M2 = {(a,0), (b,1), (c,00), (d,01)} is non-singular ; its extension will generate a lossless coding, which will be useful for general data transmission (but this feature is not always required). Note that it is not necessary for the non-singular code to be more compact than the source (and in many applications, a larger code is useful, for example as a way to detect and/or recover from encoding or transmission errors, or in security applications to protect a source from undetectable tamperring).
Uniquely decodable codes
A code is uniquely decodable if its extension is non-singular.- The non-singular example mapping M2 in the previous paragraph is not uniquely decodable because (for example) the source sequence "aa" maps to bit string "00" using the extension, exactly like the source sequence "c". However, such a code is useful when the set of all possible source symbols is completely known and finite, or when there are restrictions (for example a formal syntax) that determine if source elements of this extension are acceptable. Such restrictions permit the decoding of the original message by checking which of the possible source symbols mapped to the same symbol are valid under those restrictions.
- The extension of the mapping M3 = {(a,0), (b,01), (c,011)} is uniquely decodable (this can be demonstrated by looking at the follow-set after each target bit string in the map, because each bitstring is terminated as soon as we see a 0 bit which cannot follow any existing code to create a longer valid code in the map, but unambiguously starts a new code).
Instantaneous codes
A code is instantaneous (also said context-free) if no target bit string in the mapping is a prefix of the target bit string of a different source symbol in the same mapping. This means that symbols can be decoded instantaneously after their entire codeword is received.
- The example mapping M3 in the previous paragraph is not instantaneous because we don't know after reading the bit string "0" if it encodes a "a" source symbol, or if it is the prefix of the encodings of the "b" or "c" symbols.
- An example of an instantaneous variable-length code is shown below.
- ::
- : The source is one of four symbols: a, b,c or d. The binary codeword for each symbol is given.
- : The encoding and decoding of a portion of an example source sequence is given below:
- ::
Advantages
The advantage of a variable-length code is that unlikely source symbols can be assigned longer codewords and likely source symbols can be assigned shorter codewords, thus giving a low expected codeword length. For the above example, if the probabilities of (a, b, c, d) were
, the expected number of bits used to represent a source symbol using the code above would be:
- :
.
| Lossless compression methods | ||||
|---|---|---|---|---|
| Audio compression methods |
| |||
| Image compression methods |
| |||
| Video compression |
| |||
| Timeline of information theory, data compression, and error-correcting codes | ||||
variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation in a computer.
..... Click the link for more information.
..... Click the link for more information.
Coding theory is a branch of mathematics and computer science dealing with the error-prone process of transmitting data across noisy channels, via clever means, so that a large number of errors that occur can be corrected.
..... Click the link for more information.
..... Click the link for more information.
In communications, a code is a rule for converting a piece of information (for example, a letter, word, or phrase) into another form or representation, not necessarily of the same type.
..... Click the link for more information.
..... Click the link for more information.
data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes.
..... Click the link for more information.
..... Click the link for more information.
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the
..... Click the link for more information.
..... Click the link for more information.
independent and identically distributed (i.i.d.) if each has the same probability distribution as the others and all are mutually independent.
The abbreviation i.i.d.
..... Click the link for more information.
The abbreviation i.i.d.
..... Click the link for more information.
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.
Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.
..... Click the link for more information.
Arithmetic coding is a method for lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code.
..... Click the link for more information.
..... Click the link for more information.
The term quantification has several meanings, general and specific. Primarily it covers all those acts which quantify observations and experiences by converting them into numbers through counting and measuring. It is thus the basis for mathematics and for science.
..... Click the link for more information.
..... Click the link for more information.
A prefix code is a code, typically a variable-length code, with the "prefix property": no code word is a prefix of any other code word in the set. A code with code words has the prefix property; a code consisting of does not, because "1" is a prefix of both "10" and "11".
..... Click the link for more information.
..... Click the link for more information.
data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes.
..... Click the link for more information.
..... Click the link for more information.
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the
..... Click the link for more information.
..... Click the link for more information.
Information theory is a branch of applied mathematics and engineering involving the quantification of information to find fundamental limits on compressing and reliably communicating data.
..... Click the link for more information.
..... Click the link for more information.
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.
Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity
..... Click the link for more information.
..... Click the link for more information.
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data.
..... Click the link for more information.
..... Click the link for more information.
In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.
One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.
..... Click the link for more information.
Adaptive Huffman coding (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding, building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to
..... Click the link for more information.
..... Click the link for more information.
Arithmetic coding is a method for lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code.
..... Click the link for more information.
..... Click the link for more information.
Range encoding is a form of arithmetic coding, a data compression method, that is believed to be free from arithmetic coding related patents. It is on this basis that interest in range encoding has arisen, particularly in the open source community.
..... Click the link for more information.
..... Click the link for more information.
..... Click the link for more information.
An Exponential-Golomb code (or just Exp-Golomb code) of order is a type of universal code, parameterized by a whole number . To encode a nonnegative integer in an order- exp-Golomb code, one can use the following method:
..... Click the link for more information.
..... Click the link for more information.
universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic (i.e.
..... Click the link for more information.
..... Click the link for more information.
Elias gamma code is a universal code encoding positive integers. It is used most commonly when coding integers whose upper-bound cannot be determined beforehand.
To code a number:
..... Click the link for more information.
To code a number:
- Write it in binary.
..... Click the link for more information.
Fibonacci coding is a universal code which encodes positive integers into binary code words. All tokens end with "11" and have no "11" before the end.
The formula used to generate Fibonacci codes is:
where F(i) is the i-th Fibonacci number.
..... Click the link for more information.
The formula used to generate Fibonacci codes is:
where F(i) is the i-th Fibonacci number.
..... Click the link for more information.
A dictionary coder, also sometimes known as a substitution coder, is any of a number of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary')
..... Click the link for more information.
..... Click the link for more information.
LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively [1] .
..... Click the link for more information.
..... Click the link for more information.
Lempel-Ziv-Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus