Information about Code Excited Linear Prediction
Code Excited Linear Prediction (CELP) is a speech coding algorithm originally proposed by M.R. Schroeder and B.S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as RELP and LPC vocoders (e.g. FS-1015). Along with its variants, such as ACELP, RCELP, LD-CELP and VSELP, it is currently the most widely used speech coding algorithm. CELP is now used as a generic term for a class of algorithms and not for a particular codec.
Before exploring the complex encoding process of CELP we introduce the Speex decoder here. Figure 1 describes a generic CELP decoder. The excitation is produced by summing the contributions from an adaptive (aka pitch) codebook and a fixed (aka innovation) codebook:
where
is the adaptive (pitch) codebook contribution and
is the fixed (innovation) codebook contribution. The fixed codebook is a vector quantization dictionary that is (implicitly or explicitly) hard-coded into the codec. This codebook can be algebraic (ACELP) or be stored explicitly (e.g. Speex). The entries in the adaptive codebook consist of delayed versions of the excitation. This makes it possible to efficiently code periodic signals, such as voiced sounds.
The filter that shapes the excitation has an all-pole model of the form
, where
is called the prediction filter and is obtained using linear prediction (Levinson-Durbin algorithm). An all-pole filter is used because it is a good representation of the human vocal tract and because it is easy to compute.
In order to achieve real-time encoding using limited computing resources, the CELP search is broken down into smaller, more manageable, sequential searches using a simple perceptual weighting function. Typically, the encoding is performed in the following order:
where
.
(See for formats and for codecs)
..... Click the link for more information.
Introduction
The CELP algorithm is based on four main ideas:- Using the source-filter model of speech production through linear prediction (LP);
- Using an adaptive and a fixed codebook as the input (excitation) of the LP model;
- Performing a search in closed-loop in a “perceptually weighted domain”.
- Applying vector quantization (VQ)
CELP Decoder
Before exploring the complex encoding process of CELP we introduce the Speex decoder here. Figure 1 describes a generic CELP decoder. The excitation is produced by summing the contributions from an adaptive (aka pitch) codebook and a fixed (aka innovation) codebook:
where
is the adaptive (pitch) codebook contribution and
is the fixed (innovation) codebook contribution. The fixed codebook is a vector quantization dictionary that is (implicitly or explicitly) hard-coded into the codec. This codebook can be algebraic (ACELP) or be stored explicitly (e.g. Speex). The entries in the adaptive codebook consist of delayed versions of the excitation. This makes it possible to efficiently code periodic signals, such as voiced sounds.
The filter that shapes the excitation has an all-pole model of the form
, where
is called the prediction filter and is obtained using linear prediction (Levinson-Durbin algorithm). An all-pole filter is used because it is a good representation of the human vocal tract and because it is easy to compute.
CELP Encoder
The main principle behind CELP is called Analysis-by-Synthesis (AbS) and means that the encoding (analysis) is performed by perceptually optimising the decoded (synthesis) signal in a closed loop. In theory, the best CELP stream would be produced by trying all possible bit combinations and selecting the one that produces the best-sounding decoded signal. This is obviously not possible in practice for two reasons: the required complexity is beyond any currently available hardware and the "best sounding" selection criterion implies a human listener.In order to achieve real-time encoding using limited computing resources, the CELP search is broken down into smaller, more manageable, sequential searches using a simple perceptual weighting function. Typically, the encoding is performed in the following order:
- LPC coefficients are computed and quantized, usually as LSPs
- The adaptive (pitch) codebook is searched and its contribution removed
- The fixed (innovation) codebook is searched
Noise Weighting
Most (if not all) modern audio codecs attempt to shape the coding noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. That's why instead of minimizing the simple quadratic error, CELP minimizes the error for the perceptually weighted domain. The weighting filter W(z) is typically derived from the LPC filter by the use of bandwidth expansion:where
.
External links
- This is based on a paper presented at Linux.Conf.Au
- Some parts based on the Speex codec manual
- reference implementations of CELP 1016A and LPC 10e.
References
M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937-940, 1985.| Lossless compression methods | ||||
|---|---|---|---|---|
| Audio compression methods |
| |||
| Image compression methods |
| |||
| Video compression |
| |||
| Timeline of information theory, data compression, and error-correcting codes | ||||
Speech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to
..... Click the link for more information.
..... Click the link for more information.
RELP stands for Residual Excited Linear Prediction and is an old (no longer used) speech coding algorithm.
..... Click the link for more information.
..... Click the link for more information.
Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.
..... Click the link for more information.
..... Click the link for more information.
A vocoder or voder (a portmanteau of voice and encoder) is a speech analyzer and synthesizer. It was originally developed as a speech coder for telecommunications applications in the 1930s, the idea being to code speech for transmission.
..... Click the link for more information.
..... Click the link for more information.
FS-1015 is a secure telephony speech encoding standard developed by the United States Department of Defense and later by NATO. It is also known as LPC-10 and STANAG 4198.
The standard was finished 1984. The algorithm used is a linear predictive coding vocoder.
..... Click the link for more information.
The standard was finished 1984. The algorithm used is a linear predictive coding vocoder.
..... Click the link for more information.
The source-filter model of speech production models speech as a combination of a sound source, such as the vocal cords, and a filter, the vocal tract (and radiation characteristic).
..... Click the link for more information.
..... Click the link for more information.
Linear prediction is a mathematical operation where future values of a discrete-time signal are estimated as a linear function of previous samples.
In digital signal processing, linear prediction is often called linear predictive coding (LPC) and can thus be viewed as a
..... Click the link for more information.
In digital signal processing, linear prediction is often called linear predictive coding (LPC) and can thus be viewed as a
..... Click the link for more information.
Vector quantization is a classical quantization technique from signal processing which allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression.
..... Click the link for more information.
..... Click the link for more information.
Pitch is the perceived fundamental frequency of a sound. While the actual fundamental frequency can be precisely determined through physical measurement, it may differ from the perceived pitch because of overtones, or partials, in the sound.
..... Click the link for more information.
..... Click the link for more information.
Speex
File extension:
Developed by: Xiph.Org Foundation
Type of format: Audio codec
Contained by: Ogg
Standard(s): Specification
Speex is a free software speech codec that may be used on VoIP applications and podcasts.
..... Click the link for more information.
File extension:
.spxDeveloped by: Xiph.Org Foundation
Type of format: Audio codec
Contained by: Ogg
Standard(s): Specification
Speex is a free software speech codec that may be used on VoIP applications and podcasts.
..... Click the link for more information.
Levinson or Levinson-Durbin recursion is a procedure in linear algebra to recursively calculate the solution to an equation involving a Toeplitz matrix. The algorithm runs in Θ(n2
..... Click the link for more information.
..... Click the link for more information.
Line Spectral Pairs (LSP) or Line Spectral Frequencies (LSF) are used to represent Linear Prediction Coefficients (LPC) for transmission over a channel. LSPs have several properties (e.g.
..... Click the link for more information.
..... Click the link for more information.
Psychoacoustics is the study of subjective human perception of sounds. Alternatively it can be described as the study of the psychological correlates of the physical parameters of acoustics.
..... Click the link for more information.
..... Click the link for more information.
Bandwidth expansion is a technique for widening the bandwidth or the resonances in an LPC filter. This is done by moving all the poles towards the origin by a constant factor .
..... Click the link for more information.
..... Click the link for more information.
Speex
File extension:
Developed by: Xiph.Org Foundation
Type of format: Audio codec
Contained by: Ogg
Standard(s): Specification
Speex is a free software speech codec that may be used on VoIP applications and podcasts.
..... Click the link for more information.
File extension:
.spxDeveloped by: Xiph.Org Foundation
Type of format: Audio codec
Contained by: Ogg
Standard(s): Specification
Speex is a free software speech codec that may be used on VoIP applications and podcasts.
..... Click the link for more information.
data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes.
..... Click the link for more information.
..... Click the link for more information.
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the
..... Click the link for more information.
..... Click the link for more information.
Information theory is a branch of applied mathematics and engineering involving the quantification of information to find fundamental limits on compressing and reliably communicating data.
..... Click the link for more information.
..... Click the link for more information.
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.
Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity
..... Click the link for more information.
..... Click the link for more information.
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data.
..... Click the link for more information.
..... Click the link for more information.
In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.
One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.
..... Click the link for more information.
Adaptive Huffman coding (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding, building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to
..... Click the link for more information.
..... Click the link for more information.
Arithmetic coding is a method for lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code.
..... Click the link for more information.
..... Click the link for more information.
Range encoding is a form of arithmetic coding, a data compression method, that is believed to be free from arithmetic coding related patents. It is on this basis that interest in range encoding has arisen, particularly in the open source community.
..... Click the link for more information.
..... Click the link for more information.
..... Click the link for more information.
An Exponential-Golomb code (or just Exp-Golomb code) of order is a type of universal code, parameterized by a whole number . To encode a nonnegative integer in an order- exp-Golomb code, one can use the following method:
..... Click the link for more information.
..... Click the link for more information.
universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic (i.e.
..... Click the link for more information.
..... Click the link for more information.
Elias gamma code is a universal code encoding positive integers. It is used most commonly when coding integers whose upper-bound cannot be determined beforehand.
To code a number:
..... Click the link for more information.
To code a number:
- Write it in binary.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus

