Information about Motion Compensation
In video compression, motion compensation is a technique for describing a picture in terms of translated copies of portions of a reference picture, often 8x8 or 16x16-pixel blocks. This increases compression ratios by making better use of redundant information between successive frames.
A simple block replacement scheme would successfully compress only those portions of the screen that do not move. However when the block in a new frame and a previous frame are not similar, it is often the case that the information contained in the new frame's block is, in fact, available in the previous frame - but it has moved. Consider a car moving across the screen. Even though it has moved from one frame to the next, much of the data required for the latest frame is still in the previous frame, just not in the same place. Motion compensation attempts to achieve greater compression by reusing portions from the previous frame to construct the new frame, even if they have moved.
A first approach would be to simply subtract a reference frame from a given frame. The difference is then called residual and usually contains less energy (or information) than the original frame. The residual can be encoded at a lower bit-rate with the same quality. The decoder can reconstruct the original frame by adding the reference frame again.
A more sophisticated approach is to approximate the motion of the whole scene and the objects of a video sequence. The motion is described by some parameters that have to be encoded in the bit-stream. The pixels of the predicted frame are approximated by appropriately translated pixels of the reference frame. This gives much better residuals than a simple subtraction. However, the bit-rate occupied by the parameters of the motion model must not become too large.
Usually, the frames are processed in groups. One frame (usually the first) is encoded without motion compensation just as a normal image. This frame is called I-frame (intra-coded frame, MPEG terminology) or I-picture. The other frames are called P-frames or P-pictures and are predicted from the I-frame or P-frame that comes (temporally) immediately before it. The prediction schemes are, for instance, described as IPPPP, meaning that a group consists of one I-frame followed by four P-frames.
Frames can also be predicted from future frames. The future frames then need to be encoded before the predicted frames and thus, the encoding order does not necessarily match the real frame order. Such frames are usually predicted from two directions, i.e. from the I- or P-frames that immediately precede or follow the predicted frame. These bidirectionally predicted frames are called B-frames. A coding scheme could, for instance, be IBBPBBPBBPBB.
Moving objects within a frame are not sufficiently represented by global motion compensation. Thus, other methods are needed.
To exploit the redundancy between neighboring block vectors, (e.g. for a single moving object covered by multiple blocks) it is common to encode only the difference between the current and previous motion vector in the bit-stream. The result of this differencing process is mathematically equivalent to a global motion compensation capable of panning. Further down the encoding pipeline, an entropy coder will take advantage of the resulting statistical distribution of the motion vectors around the zero vector to reduce the output size.
It is possible to shift a block by a non-integer number of pixels, which is called sub-pixel precision. The in-between pixels are generated by interpolating neighboring pixels. Commonly, half-pixel or quarter pixel precision (Qpel, used by H.264 and MPEG-2 ASP) is used. The computational expense of sub-pixel precision is much higher due to the extra processing required for interpolation and on the encoder side, a much greater number of potential source blocks to be evaluated.
The main disadvantage of block motion compensation is that it introduces discontinuities at the block borders (blocking artifacts). These artifacts appear in the form of sharp horizontal and vertical edges which are easily spotted by the human eye and produce ringing effects (large coefficients in high frequency sub-bands) in the Fourier-related transform used for transform coding of the residual frames.
Block motion compensation divides up the current frame into non-overlapping blocks, and the motion compensation vector tells where those blocks come from (a common misconception is that the previous frame is divided up into non-overlapping blocks, and the motion compensation vectors tell where those blocks move to). The source blocks typically overlap in the source frame. Some video compression algorithms assemble the current frame out of pieces of several different previously-transmitted frames.
Studies of methods for reducing the complexity of OBMC have shown that the contribution to the window function is smallest for the diagonally-adjacent block. Reducing the weight for this contribution to zero and increasing the other weights by an equal amount leads to a substantial reduction in complexity without a large penalty in quality. In such a scheme, each pixel then belongs to 3 blocks rather than 4, and rather than using 8 neighboring blocks, only 4 are used for each block to be compensated. Such a scheme is found in the H.263 Annex F Advanced Prediction mode
To find optimal motion vectors, one basically has to calculate the block prediction error for each motion vector within a certain search range and pick the one that has the best compromise between the amount of error and the number of bits needed for motion vector data. The motion estimation technique of simply exhaustively testing all possible motion representations to perform such an optimization is called full search. A faster method, which is sub-optimal with respect to rate-distortion, is to use a coarse search grid for a first approximation and to refine the grid in the surrounding of this approximation in further steps. One common method is the 3-step search, which uses search grids of 3×3 motion vectors and 3 refinement steps to get an overall search range of 15×15 pixel.
For OBME, the pixel-wise prediction errors of a block and its overlapping neighbouring blocks have to be weighted and summed according to the window function before being squared. As in the process of successively finding/refining motion vectors some neighbouring MVs are not known yet, the corresponding prediction errors can be ignored (not added) as a sub-optimal solution.
The major disadvantages of OBMC are increased computational complexity of OBME, and the fact that prediction errors and, thus, also the optimal motion vectors depend on neighbouring blocks/motion vectors. Therefore, there is no algorithm with polynomial computational complexity that guarantees optimal motion vectors. However, there are near-optimal iterative and non-iterative methods with acceptable computational complexity.
(See for formats and for codecs)
..... Click the link for more information.
A simple block replacement scheme would successfully compress only those portions of the screen that do not move. However when the block in a new frame and a previous frame are not similar, it is often the case that the information contained in the new frame's block is, in fact, available in the previous frame - but it has moved. Consider a car moving across the screen. Even though it has moved from one frame to the next, much of the data required for the latest frame is still in the previous frame, just not in the same place. Motion compensation attempts to achieve greater compression by reusing portions from the previous frame to construct the new frame, even if they have moved.
A first approach would be to simply subtract a reference frame from a given frame. The difference is then called residual and usually contains less energy (or information) than the original frame. The residual can be encoded at a lower bit-rate with the same quality. The decoder can reconstruct the original frame by adding the reference frame again.
A more sophisticated approach is to approximate the motion of the whole scene and the objects of a video sequence. The motion is described by some parameters that have to be encoded in the bit-stream. The pixels of the predicted frame are approximated by appropriately translated pixels of the reference frame. This gives much better residuals than a simple subtraction. However, the bit-rate occupied by the parameters of the motion model must not become too large.
Usually, the frames are processed in groups. One frame (usually the first) is encoded without motion compensation just as a normal image. This frame is called I-frame (intra-coded frame, MPEG terminology) or I-picture. The other frames are called P-frames or P-pictures and are predicted from the I-frame or P-frame that comes (temporally) immediately before it. The prediction schemes are, for instance, described as IPPPP, meaning that a group consists of one I-frame followed by four P-frames.
Frames can also be predicted from future frames. The future frames then need to be encoded before the predicted frames and thus, the encoding order does not necessarily match the real frame order. Such frames are usually predicted from two directions, i.e. from the I- or P-frames that immediately precede or follow the predicted frame. These bidirectionally predicted frames are called B-frames. A coding scheme could, for instance, be IBBPBBPBBPBB.
Global motion compensation
In global motion compensation, the motion model basically reflects camera motions such as dolly (forward, backwards), track (left, right), boom (up, down), pan (left, right), tilt (up, down) and roll (along the view axis). It works best for still scenes without moving objects. There are several advantages of global motion compensation:- It models precisely the major part of motion usually found in video sequences with just a few parameters. The share in bit-rate of these parameters is negligible.
- It does not partition the frames. This avoids artifacts at partition borders.
- A straight line (in the time direction) of pixels with equal spatial positions in the frame corresponds to a continuously moving point in the real scene. Other MC schemes introduce discontinuities in the time direction.
Moving objects within a frame are not sufficiently represented by global motion compensation. Thus, other methods are needed.
Block motion compensation
In block motion compensation (BMC), the frames are partitioned in blocks of pixels (e.g. macroblocks of 16×16 pixels in MPEG). Each block is predicted from a block of equal size in the reference frame. The blocks are not transformed in any way apart from being shifted to the position of the predicted block. This shift is represented by a motion vector.To exploit the redundancy between neighboring block vectors, (e.g. for a single moving object covered by multiple blocks) it is common to encode only the difference between the current and previous motion vector in the bit-stream. The result of this differencing process is mathematically equivalent to a global motion compensation capable of panning. Further down the encoding pipeline, an entropy coder will take advantage of the resulting statistical distribution of the motion vectors around the zero vector to reduce the output size.
It is possible to shift a block by a non-integer number of pixels, which is called sub-pixel precision. The in-between pixels are generated by interpolating neighboring pixels. Commonly, half-pixel or quarter pixel precision (Qpel, used by H.264 and MPEG-2 ASP) is used. The computational expense of sub-pixel precision is much higher due to the extra processing required for interpolation and on the encoder side, a much greater number of potential source blocks to be evaluated.
The main disadvantage of block motion compensation is that it introduces discontinuities at the block borders (blocking artifacts). These artifacts appear in the form of sharp horizontal and vertical edges which are easily spotted by the human eye and produce ringing effects (large coefficients in high frequency sub-bands) in the Fourier-related transform used for transform coding of the residual frames.
Block motion compensation divides up the current frame into non-overlapping blocks, and the motion compensation vector tells where those blocks come from (a common misconception is that the previous frame is divided up into non-overlapping blocks, and the motion compensation vectors tell where those blocks move to). The source blocks typically overlap in the source frame. Some video compression algorithms assemble the current frame out of pieces of several different previously-transmitted frames.
Variable block-size motion compensation
Variable block-size motion compensation (VBSMC) is the use of BMC with the ability for the encoder to dynamically select the size of the blocks. When coding video, the use of larger blocks can reduce the number of bits needed to represent the motion vectors, while the use of smaller blocks can result in a smaller amount of prediction residual information to encode. Older designs such as H.261 and MPEG-1 video typically use a fixed block size, while newer ones such as H.263, MPEG-4 Part 2, H.264/MPEG-4 AVC, and VC-1 give the encoder the ability to dynamically choose what block size will be used to represent the motion.Overlapped block motion compensation
Overlapped block motion compensation (OBMC) is a good solution to these problems because it not only increases prediction accuracy but also avoids blocking artifacts. When using OBMC, blocks are typically twice as big in each dimension and overlap quadrant-wise with all 8 neighbouring blocks. Thus, each pixel belongs to 4 blocks. In such a scheme, there are 4 predictions for each pixel which are summed up to a weighted mean. For this purpose, blocks are associated with a window function that has the property that the sum of 4 overlapped windows is equal to 1 everywhere.Studies of methods for reducing the complexity of OBMC have shown that the contribution to the window function is smallest for the diagonally-adjacent block. Reducing the weight for this contribution to zero and increasing the other weights by an equal amount leads to a substantial reduction in complexity without a large penalty in quality. In such a scheme, each pixel then belongs to 3 blocks rather than 4, and rather than using 8 neighboring blocks, only 4 are used for each block to be compensated. Such a scheme is found in the H.263 Annex F Advanced Prediction mode
Quarter Pixel (QPel) and Half Pixel motion compensation
In Motion Compensation, quarter or half samples are actually interpolated sub-samples caused by fractional motion vectors. Based on the vectors and full-samples, the sub-samples can be calculated by using bicubic or bilinear 2-D filtering. See subclause 8.4.2.2 "Fractional sample interpolation process" of the H.264 standard.Motion estimation
Motion estimation (BME, OBME) is the process of finding optimal or near-optimal motion vectors. The amount of prediction error for a block is often measured using the mean squared error (MSE) or sum of absolute differences (SAD) between the predicted and actual pixel values over all pixels of the motion-compensated region.To find optimal motion vectors, one basically has to calculate the block prediction error for each motion vector within a certain search range and pick the one that has the best compromise between the amount of error and the number of bits needed for motion vector data. The motion estimation technique of simply exhaustively testing all possible motion representations to perform such an optimization is called full search. A faster method, which is sub-optimal with respect to rate-distortion, is to use a coarse search grid for a first approximation and to refine the grid in the surrounding of this approximation in further steps. One common method is the 3-step search, which uses search grids of 3×3 motion vectors and 3 refinement steps to get an overall search range of 15×15 pixel.
For OBME, the pixel-wise prediction errors of a block and its overlapping neighbouring blocks have to be weighted and summed according to the window function before being squared. As in the process of successively finding/refining motion vectors some neighbouring MVs are not known yet, the corresponding prediction errors can be ignored (not added) as a sub-optimal solution.
The major disadvantages of OBMC are increased computational complexity of OBME, and the fact that prediction errors and, thus, also the optimal motion vectors depend on neighbouring blocks/motion vectors. Therefore, there is no algorithm with polynomial computational complexity that guarantees optimal motion vectors. However, there are near-optimal iterative and non-iterative methods with acceptable computational complexity.
External links
- Temporal Rate Conversion - article giving an overview of motion compensation techniques.
| Lossless compression methods | ||||
|---|---|---|---|---|
| Audio compression methods |
| |||
| Image compression methods |
| |||
| Video compression |
| |||
| Timeline of information theory, data compression, and error-correcting codes | ||||
Video compression refers to reducing the quantity of data used to represent video images, and this is almost always coupled with the goal of retaining as much of the original's quality as possible.
..... Click the link for more information.
..... Click the link for more information.
For the physics topic, see .
Reference frames are frames of a compressed video that are used to define future frames. As such they are only used in inter-frame compression techniques...... Click the link for more information.
In video compression algorithms a residual frame is formed by subtracting the reference frame from the desired frame. This difference is known as the error or residual frame.
..... Click the link for more information.
..... Click the link for more information.
The three major picture types found in typical video compression designs are I(ntra) pictures, P(redicted) pictures, and B(i-predictive) pictures (or B(i-directional) pictures).
..... Click the link for more information.
..... Click the link for more information.
The three major picture types found in typical video compression designs are I(ntra) pictures, P(redicted) pictures, and B(i-predictive) pictures (or B(i-directional) pictures).
..... Click the link for more information.
..... Click the link for more information.
The three major picture types found in typical video compression designs are I(ntra) pictures, P(redicted) pictures, and B(i-predictive) pictures (or B(i-directional) pictures).
..... Click the link for more information.
..... Click the link for more information.
Moving Picture Experts Group, commonly referred to as simply MPEG, is a working group of ISO/IEC charged with the development of video and audio encoding standards. Its first meeting was in May of 1988 in Ottawa, Canada.
..... Click the link for more information.
..... Click the link for more information.
In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.
One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Quarter pixel (also known as Q-pel or Qpel) refers to a quarter of a standard pixel. It is used in many modern video encoding standards such as MPEG-4 ASP and H.264/AVC to refer to a motion estimation precision of a quarter pixel.
..... Click the link for more information.
..... Click the link for more information.
This is a list of linear transformations of functions related to Fourier analysis. Such transformations map a function to a set of coefficients of basis functions, where the basis functions are sinusoidal and are therefore strongly localized in the frequency spectrum.
..... Click the link for more information.
..... Click the link for more information.
Transform coding is a type of data compression for "natural" data like audio signals or photographic images. The transformation is typically "lossy", resulting in a lower quality copy of the original input.
..... Click the link for more information.
..... Click the link for more information.
In video compression algorithms a residual frame is formed by subtracting the reference frame from the desired frame. This difference is known as the error or residual frame.
..... Click the link for more information.
..... Click the link for more information.
H.261 is a 1990 ITU-T video coding standard originally designed for transmission over ISDN lines on which data rates are multiples of 64 kbit/s. It is one member of the H.26x family of video coding standards in the domain of the ITU-T Video Coding Experts Group (VCEG).
..... Click the link for more information.
..... Click the link for more information.
MPEG-1 defines a group of Audio and Video (AV) coding and compression standards agreed upon by MPEG (Moving Picture Experts Group). MPEG-1 video is used by the Video CD (VCD) format and less commonly by the DVD-Video format.
..... Click the link for more information.
..... Click the link for more information.
H.263 is a video codec standard originally designed by the ITU-T in a project ending in 1995/1996 as a low-bitrate compressed format for videoconferencing. It is one member of the H.26x family of video coding standards in the domain of the ITU-T Video Coding Experts Group (VCEG).
..... Click the link for more information.
..... Click the link for more information.
MPEG-4 Part 2 is a video compression technology developed by MPEG. It belongs to the MPEG-4 ISO/IEC standard (ISO/IEC 14496-2). It is a Discrete Cosine Transform compression standard, similar to previous standards such as MPEG-1 and MPEG-2.
..... Click the link for more information.
..... Click the link for more information.
H.264 is a standard for video compression. It is also known as MPEG-4 Part 10, or AVC (for Advanced Video Coding). It was written by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as
..... Click the link for more information.
..... Click the link for more information.
VC-1 is the informal name of the SMPTE 421M video codec standard initially developed by Microsoft. WMV3, better known as Windows Media Video 9 codec, served as the basis for development of the VC-1 codec specification.
..... Click the link for more information.
..... Click the link for more information.
H.263 is a video codec standard originally designed by the ITU-T in a project ending in 1995/1996 as a low-bitrate compressed format for videoconferencing. It is one member of the H.26x family of video coding standards in the domain of the ITU-T Video Coding Experts Group (VCEG).
..... Click the link for more information.
..... Click the link for more information.
In statistics, the mean squared error or MSE of an estimator is the expected value of the square of the "error." The error is the amount by which the estimator differs from the quantity to be estimated.
..... Click the link for more information.
..... Click the link for more information.
Sum of Absolute Differences (SAD) is a widely used, extremely simple video quality metric used for block-matching in motion estimation for video compression.
..... Click the link for more information.
..... Click the link for more information.
As a branch of the theory of computation in computer science, computational complexity theory investigates the problems related to the amounts of resources required for the execution of algorithms (e.g.
..... Click the link for more information.
..... Click the link for more information.
data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an un-encoded representation would use through use of specific encoding schemes.
..... Click the link for more information.
..... Click the link for more information.
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the
..... Click the link for more information.
..... Click the link for more information.
Information theory is a branch of applied mathematics and engineering involving the quantification of information to find fundamental limits on compressing and reliably communicating data.
..... Click the link for more information.
..... Click the link for more information.
Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable.
Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
Shannon entropy quantifies the information contained in a piece of data: it is the minimum average message length, in bits (if using base-2 logarithms), that must
..... Click the link for more information.
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity
..... Click the link for more information.
..... Click the link for more information.
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data.
..... Click the link for more information.
..... Click the link for more information.
In information theory an entropy encoding is a lossless data compression scheme that is independent of the media’s specific characteristics.
One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
One of the main types of entropy coding assigns codes to symbols so as to match code lengths with the probabilities of the
..... Click the link for more information.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus