0% found this document useful (0 votes)
31 views

Video Coding

Uploaded by

nabila brahimi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Video Coding

Uploaded by

nabila brahimi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Video Codec Design

Iain E. G. Richardson
Copyright q 2002 John Wiley & Sons, Ltd
ISBNs: 0-471-48553-5 (Hardback); 0-470-84783-2 (Electronic)

Image and Video Compression


Fundamentals

3.1 INTRODUCTION
Representing video materialin a digital form requires a large numberof bits. The volume of
data generated by digitising a video signal is too large for most storage and transmission
systems (despite the continual increase in storage capacity and transmission ‘bandwidth’).
This means that compression is essential for most digital video applications.
The ITU-R 601 standard (described in Chapter 2) describes a digital format for video that
is roughly equivalent to analogue television, in terms of spatial resolution and frame rate.
One channel of ITU-R 601 television, broadcast in uncompressed digital form, requires a
transmission bit rate of 216Mbps. At this bit rate,a4.7Gbyte DVD couldstorejust
87 seconds of uncompressed video.
Table 3.1 shows the uncompressed bit rates of several popular video formats. From this
table it can be seenthat even QCIF at 15 frames per second (i.e. relatively low-qualityvideo,
suitable for video telephony) requires 4.6Mbps for transmission or storage. Table 3.2 lists
typical capacities of popular storage media and transmission networks.
There is a cleargap between the high bit rates demanded by uncompressed video and the
available capacityof current networksand storage media. The purpose of video compression
(video coding) is to fill this gap. A video compression system aimsto reduce the amount of
data required to store or transmit video whilst maintaining an ‘acceptable’ level of video
quality. Most of the practical systems and standards for video compression are ‘lossy’, i.e.
the volume of data is reduced (compressed) at the expense of a loss of visual quality. The
quality loss depends on many factors, but in general, higher compression resultsin a greater
loss of quality.

3.1.1 Do We Need Compression?

The following statement (or something similar) has been made many times over the 20-year
history of image and video compression: ‘Video compression will become redundant very
soon, once transmission and storage capacities have increased to a sufficient level to cope
with uncompressed video.’ It is truethat both storage and transmission capacities continue
to
increase. However, an efficient and well-designedvideocompression system gives very
significant performanceadvantagesfor visual communicationsat both low and high
transmission bandwidths. At low bandwidths, compression enables applications that would
not otherwise be possible, such as basic-quality video telephony over a standard telephone
28 IMAGE
AND VIDEO COMPRESSION
FUNDAMENTALS

Table 3.1 Uncompressedbitrates


secondper BitsLuminance
Chrominance
perFrames
ution Format
601 ITU-R 429
858 x 525 x 525 30 216Mbps
352 CIF x 288 Mbps
176 x 14436.5 30
176
QCIF x 144 88 x 72 15 4.6 Mbps

Table 3.2 Typical transmission / storage capacities


Capacity Media / network
Ethernet
LAN
(10
Mbps)
Max.
Mbps
10 / Typical
1-2
Mbps
Typical ADSL
ISDN-2 128 kbps
downstream
kbpsmodem
56V.90 upstream
kbps/ 33
DVD-5
CD-ROM

connection. At high bandwidths, compression can support amuch higher visual quality.For
example, a4.7 Gbyte DVD can store approximately 2 hours of uncompressed QCIF video (at
15 frames per second)or 2 hours of compressed ITU-R 601 video (at 30 frames per second).
Most users would prefer to see ‘television-quality’ video with smooth motion rather than
‘postage-stamp’ video with jerky motion.
Video compression and video CODECs will therefore remain a vital part of the emerg-
ing multimedia industry for the foreseeable future, allowing designers to make the most
efficient use of available transmission or storage capacity. In this chapter we introduce the
basiccomponents of an imageorvideocompressionsystem. We begin by defining the
concept of an image or video encoder (compressor) and decoder (decompressor). We then
describe the main functional blocks of an image encodeddecoder (CODEC) and a video
CODEC.

3.2 IMAGEANDVIDEOCOMPRESSION
Information-carrying signals may be compressed, i.e. converted to a representation or form
that requires fewer bits than the original (uncompressed) signal. A device or program that
compresses a signal is an encoder and a device or program that decompresses a signal is a
decoder. An enCOderDECoder pair is a CODEC.
Figure 3.1 shows a typical exampleof a CODEC as partof a communication system. The
original (uncompressed) information is encoded (compressed): this is source coding. The
source coded signal isthen encoded further to add error protection (channel coding)prior to
transmission over a channel. At thereceiver,a channeldecoder detects andor corrects
transmission errors and asource decoder decompresses the signal. The decompressedsignal
maybe identicaltotheoriginal signal (lossless compression) or it maybe distorted or
degraded in some way (lossy compression).
IMAGE AND VIDEO COMPRESSION 29

Channel CODEC ------7


:------

Figure 3.1 Sourcecoder,channelcoder,channel

General-purpose compression CODECs are available that are designed to encode and
compress data containing statistical redundancy. An information-carrying signal usually
contains redundancy, which means that it may (in theory) be represented ina more compact
way. For example, characters within a text file occur with varying frequencies: in English,
the letters E, T and A occur more oftenthan the letters Q, Z and X. This makes it possible to
compress a text file by representing frequently occurring characters with short codes and
infrequently occurring characters with longer codes (thisprinciple is used in Hufinan coding,
described in Chapter 8). Compression is achieved by reducing the statistical redundancy in
the text file. This type of general-purpose CODEC isknown as an entropy CODEC.
Photographic images and sequences of video frames are not amenable to compression
using general-purpose CODECs. Their contents (pixel values) tend to be highly correlated,
i.e. neighbouring pixels have similar values, whereas an entropy encoderperforms best with
data values that have a certaindegree of independence(decorrelateddata).Figure 3.2
illustrates the poor performance of a general-purpose entropy encoder with image data.
The original image (a) is compressed and decompressed using a ZIP program to produce

(c) decoded;
and(c) P E G encoded
decoded
and
30 IMAGE AND VIDEO COMPRESSION FUNDAMENTALS
I I I I

Original -----)
Source model Entropy encoder -
image l video
4
Store l transmit
-
Decoded Source model t Entropydecoder 4
image / video

Figure 3.3 Image or video CODEC

image (b). This is identical to the original (lossless compression), but the compressed file is
only 92% of the size of the original, i.e. there is very little compression. Image(c)is
obtained by compressing and decompressing the original using the JPEG compression
method. The compressed version is less than a quarter of the size of the original (over
4 x compression) and the decompressed image looks almost identical to the original. (It is in
fact slightly ‘degraded’ due to the lossy compression process.)
In this example, the JPEG method achieved good compression performance by applying a
source model to the image before compression. The source model attempts to exploit the
properties of video or image data andto represent it in a form that can readily be compressed
by an entropy encoder. Figure 3.3 shows the basic design of an image or video CODEC
consisting of a source model and an entropy encoderldecoder.
Images and video signals have a number of properties that may be exploited by source
models. Neighbouring samples (pixels) within an image or a video frame tend to be highly
correlated and so there is significant spatial redundancy. Neighbouring regions within
successive video frames also tend to be highly correlated (temporal redundancy). As well
as these statistical properties (statistical redundancy), a source model may take advantage of
subjectiveredundancy, exploiting the sensitivity of the human visual system to various
characteristics of images and video. For example, the HVS is much more sensitive to low
frequencies than to high ones and so it is possible to compress an image by eliminating
certain high-frequency components. Image (c) in Figure 3.2 was compressed by discarding
certain subjectively redundant components of the information: the decoded image is not
identical to the original but the information loss is not obvious to the human viewer.
Examples of image and video source models include the following:

3.2.1 DPCM (Differential PulseCodeModulation)

Each sample or pixel is predicted from one or more previously transmitted samples. The
simplest prediction is formedfrom the previous pixel (pixel A in Figure 3.4). A more
accurate prediction can be obtained using a weighted average of neighbouring pixels (for
example, A, B and C in Figure 3.4).The actual pixel value X is subtracted from the
prediction and the difference (the prediction error) is transmitted to thereceiver. The
prediction error will typically be small due to spatial correlation, and compression can
be achieved by representing common, small prediction errors with short binary codes and
larger, less common errors with longer codes. Further compression may be achieved by
quantising the prediction error and reducing its precision: this is lossy compression as it
IMAGE AND VIDEO COMPRESSION 31

i j Previous
line of transmitted pixels
-..................
I

Current line of pixels

\
Pixel to be predicted

Figure 3.4 DPCM

becomes impossible to exactly reproduce the original values at the decoder. DPCM may be
applied spatially (using adjacent pixels in the same frame)and/or temporally (using adjacent
pixels in a previous frame to form the prediction) and gives modest compression with low
complexity.

3.2.2TransformCoding

Theimagesamplesare transformed intoanotherdomain (or representation)and are


represented by transform coeficcients. In the ‘spatial domain’ (i.e. the original form of the
image), samples arehighly spatially correlated. The aimof transform coding is to reducethis
correlation, ideally leavingasmall number of visually significant transform coefficients
(important to the appearance of the original image) and a large number of insignificant
coefficients (that may be discarded without significantly affecting the visual quality of the
image). The transformprocess itself does not achieve compression: a lossy quantisation
process in which the insignificant coefficients are removed, leaving behind a small number
of significant coefficients, usually follows it. Transform coding (Figure 3.5) forms the basis
of most of the popular imageand video compression systems and is described in more detail
in this chapter and in Chapter 7.

3.2.3Motion-compensatedPrediction

Using a similarprinciple to DPCM, the encoder forms a model of the current framebased on
the samples of a previously transmitted frame. The encoder attempts to ‘compensate’ for
motion in a video sequence by translating (moving) or warping the samplesof the previously
transmitted ‘reference’ frame.The resulting motion-compensated predicted frame (the
model of the currentframe)is subtractedfrom the currentframe to producearesidual
‘error’ frame (Figure 3.6). Further coding usually follows motion-compensated prediction,
e.g. transform coding of the residual frame.

Figure 3.5 Transform coding


32 IMAGE AND VIDEO COMPRESSION FUNDAMENTALS

Motion
compensation
Current Residual

Figure 3.6 Motion-compensatedprediction

3.2.4 Model-basedCoding

Theencoderattemptstocreate a semantic model of thevideoscene,forexample by


analysing and interpreting the contentof the scene. An example is a ‘talking head’ model:
the encoder analyses a scene containing a person’s head and shoulders (a typical video
conferencing scene) and models the head as a 3-D object. The decoder maintains its own
3-D model of the head. Instead of transmitting information that describes the entire image,
the encoder sends only the animation parameters required ‘move’
to the model, together with
an error signal that compensates for the difference between the modelled scene and the
actualvideoscene(Figure 3.7). Model-based coding has the potentialforfargreater

original I I reconstructed

3-D model 3-D model

Figure 3.7 Model-basedcoding


IMAGE CODEC 33

compression than the othersourcemodels described above: however, the computational


complexity required to analyse and synthesise 3-D models of a video scene in real time is
very high.

3.3 IMAGE CODEC


An image CODEC encodes and decodes single images or individual frames from a video
sequence (Figure 3.8) and may consist of a transform coding stage followed by guantisation
and entropy coding.

3.3.1 Transform Coding

The transform coding stage converts (transforms) the image from the spatial domain into
another domain in order to make it more amenable to compression. The transform may be
applied to discrete blocks of the image (block transform) or to the entire image.

Block transforms

The spatial imagesamplesare processed in discrete blocks, typically 8 x 8 or 16 x 16


samples. Each block is transformed using a 2-D transform to produce a block of transform
coefficients. The performance of a block-based transform for imagecompression depends on
how well it can decorrelate the information in each block.
The Karhunen-Loeve transform (KLT) has the ‘best’ performance ofany block-based
image transform. The coefficients produced by the KLT are decorrelated and the energy in
theblockispackedinto a minimalnumber of coefficients. The KLT is, however, very
computationally inefficient, and it is impractical because the functions required to carry out
the transform (‘basis functions’) must be calculated in advance and transmitted to the
decoder for every image. The discrete cosine transform (DCT) performs nearly as well as
the KLT and is much more computationally efficient. Figure 3.9 shows a 16 x 16 block of

-
Encoder

Source Entropy
image
Transform 4 Quanlise 4 Reorder + encode

Store / transmit

Entropy
Reorder Rescale
decode

Decoder

Figure 3.8 Image CODEC


34 IMAGE AND VIDEO COMPRESSION FUNDAMENTALS

(b)
Figure 3.9 (a) 16 x 16 block of pixels; (b) DCT coefficients
IMAGE CODEC 35

image samples (a) and the corresponding block of coefficients produced by the DCT (b). In
the original block, the energy is distributed across the 256 samples and the latter are clearly
closely interrelated (correlated). In the coefficient block, the energy is concentrated into a few
significant coefficients (at the top left). The coefficients are decorrelated: this means that the
smaller-valued coefficients may be discarded (for example by quantisation) without
significantly affecting the quality of the reconstructed image block at the decoder.
The 16 x 16 array of coefficients shown in Figure 3.9 represent spatial frequencies in the
original block. At the top left of the array are the low-frequency components, representing
the gradual changes of brightness (luminance) in the original block. At the bottom right of
the array are high-frequency components and these represent rapid changes in brightness.
These frequency components are analogous to the components produced by Fourier analysis
of a time-varying signal (and in fact the DCT is closely related to the discrete Fourier
transform) except that here the components are 2-D. The example shown in Figure 3.9 is
typical for a photographic image: most ofthecoefficients produced by the DCT are
insignificantand can be discarded. This makes the DCT a powerfultool for image and
video compression.

Image transforms

The DCT is usually applied to small, discrete blocks of an image, for reasons of practicality.
In contrast, an image tran.$orm may be applied to a complete video image (or to a large ‘tile’
within the image). The most popular transform of this type is the discrete wavelet transform.
A 2-D wavelet transform is applied to the original image in order to decompose it into a
series of filtered ‘sub-band’ images (Figure 3.10). Image (a) is processed in a series of stages
to produce the ‘wavelet decomposition’ image (b). This is made upof a series of components,
each containing a subset of the spatial frequencies in the image. At the top left is a low-pass
filtered version of the original and moving to the bottom right, each component contains
progressively higher-frequency information that adds the ‘detail’ of the image. It is clear that
the higher-frequency components are relatively ‘sparse’, i.e. manyof the values (or
‘coefficients’) in these components are zero or insignificant. The wavelet transform is thus
an efficientwayof decorrelating or concentrating the important information into a few
significant coefficients.
The wavelet transform is particularly effective for still image compression and has been
adopted as part of the PEG-2000 standard and for still image ‘texture’ coding in the MPEG-4
standard. Wavelet-based compression is discussed further in Chapter 7.
Another image transform that has received much attention is the so-called fractal
transform. A fractal transform coder attempts to represent an image as a set of scaled and
translated arbitrary ‘basispatterns’. Fractal-based coding has not, however,shownsuffi-
ciently good performance to be included in any of the international standards for video and
image coding and so we will not discuss it in detail.

3.3.2 Quantisation

The block and image transforms described above do not themselves achieve any compres-
sion. Instead, they represent the image in a different domain in which the image data is
36 IMAGE AND VIDEO COMPRESSION F U N D A M E N T A L S

(b)
Figure 3.10 Waveletdecomposition of image
IMAGE CODEC 37

separated into components of varying ‘importance’ to the appearance of the image. The
purpose of quantisationisto remove thecomponents of thetransformeddatathatare
unimportantto the visualappearance of theimage and to retainthevisuallyimportant
components.Onceremoved, the lessimportantcomponentscannot be replaced and so
quantisation is a lossy process.

Example

1. The DCT coefficients shown earlierinFigure 3.9 arequantised by dividingeach


coefficient by an integer. The resulting array of quantised coefficients is shown in
Figure 3.1 l(a): the large-value coefficients map to non-zero integers and the small-
value coefficients map to zero.
2. Rescaling the quantised array (multiplying each coefficient by the same integer) gives
Figure 3.11(b). The magnitudes of the larger coefficients are similar to the original
coefficients; however, the smaller coefficients (set to zero during quantisation) cannot
be recreated and remain at zero.
3. Applying an inverse DCT to the rescaledarray gives the blockof image samplesshown
in Figure 3.12: this looks superficially similar to the original image blockbut some of
the information has been lost through quantisation.

It is possible to vary the ‘coarseness’ of the quantisation process (using a quantiser ‘scale
factor’ or ‘step size’). ‘Coarse’ quantisation will tend to discard most of the coefficients,
leaving only the most significant,whereas ‘fine’ quantisationwilltend to leave more
coefficients in the quantised block. Coarse quantisationusually gives higher compression at
the expense of a greater loss in image quality. The quantiser scale factor or step size is often
the main parameter used to control image quality and compression in an image or video
CODEC. Figure 3.13 shows a small original image (left) and the effect of compression and
decompression with fine quantisation (middle) and coarse quantisation (right).

3.3.3 Entropy Coding

A typical image blockwill contain afew significant non-zero coefficientsand a large number
of zero coefficients after block transform coding and quantisation. The remaining non-zero
datacan be efficiently compressed using astatisticalcompression method (‘entropy
coding’):

1. Reorder the quantised coefficients. The non-zero quantised coefficientsof a typical image
block tend to be clustered around the ‘top-left comer’, i.e. around the low frequencies
(e.g.Figure3.9).Thesenon-zero values can be groupedtogether in sequence by
reordering the 64 coefficients, for example in a zigzag scanning order (Figure 3.14).
Scanning through in azigzagsequence from thetop-left (lowest frequency) to the
bottom-right(highestfrequency)coefficientsgroupstogetherthesignificant low-
frequency coefficients.
38 IMAGE AND VIDEO COMPRESSION FUNDAMENTALS

5ooj:.
1000

0
:

i 1
fo
5

Figure 3.11 (a) Quantised DCT coefficients; (b) rescaled

2. Run-level coding. Thereordered coefficient array is usually'sparse',consistingofa


group of non-zerocoefficientsfollowed byzeros(withoccasionalnon-zerohigher-
frequency coefficients). This type of array may be compactly represented as a series of
(run, level) pairs, as shown in the example in Table 3.3. Thefirstnumberinthe(run,
IMAGE CODEC 39

Figure 3.12 Reconstructedblock of imagesamples

level) pair represents the numberof preceding zeros and the second number represents a
non-zero value (level). For example, (5, 12) represents five zeros followed by 12.
3. Entropy coding. A statistical coding algorithm is applied to the (run, level) data. The
purpose of the entropy coding algorithm is to represent frequently occurring (run, level)
pairs with a short code and infrequently occurring (run, level)
pairs with a longer code.In
this way, the run-level data may be compressed into a small number of bits.

Huffman coding and arithmetic coding are widely used for entropy coding of image and
video data.

Huffman coding replaces each ‘symbol’ (e.g. a [run, level] pair) with a codeword containing
a variable number of bits. The codewords are allocatedbased on the statistical distribution of

Figure 3.13 (a) Original image; (b) fine quantisation; (c) coarse quantisation
40 IMAGE AND VIDEO COMPRESSION FUNDAMENTALS

Non-zero coefficients
Low frequency I

...,

Reordered linear array

High frequency

Figure 3.14 Zigzagreordering of quantisedcoefficients

the symbols. Short codewords are allocated to common symbols and longer codewords are
allocated to infrequent symbols. Each codeword is chosen to be ‘uniquely decodeable’, so
that a decodercanextracttheseries of variable-lengthcodewordswithoutambiguity.
Huffman coding is well suited to practical implementation and is widely used in practice.

Arithmetic coding maps a series of symbols to a fractional number (see Chapter 8) that is
then converted into a binary number and transmitted. Arithmetic codinghas the potential for
higher compression than Huffman coding. Eachsymbol may be represented with a fractional
number of bits (rather than just an integral number of bits) and this means that the bits
allocated per symbol may be more accurately matched to the statistical distribution of the
coded data.

3.3.4 Decoding

The output of the entropy encoder is a sequence of binary codes representing the original image
in compressed form. In order to recreate the image it is necessary to decode this sequence
and the decoding process (shown in Figure 3.8) is almost the reverseof the encoding process.
An entropy decoder extractsrun-level symbols from thebit sequence. These are converted
to a sequence of coefficients that are reordered into a block of quantised coefficients. The
decoding operations up to this point are the inverse of the equivalent encoding operations.
Each coefficient is multipliedby the integer scale factor (‘rescaled’). This is often described

Table 3.3 Run-levelcodingexample


Reordered
coefficient data 24,3, -9,O, - 2 , 0 , 0 , 0 , 0 , 0 , 1 2 , 0 , 0 , 0 , 2 , .
CODEC VIDEO 41

as ‘inverse quantisation’, but in fact the loss of precision due to quantisation cannot be
reversed and so the rescaled coefficients are not identical to the original transform
coefficients.
The rescaled coefficients are transformed with an inverse transform to reconstruct a
decoded image. Becauseof the data loss during quantisation, this image will not be identical
tothe original image:theamount of difference depends partly on the ‘coarseness’ of
quantisation.

3.4 VIDEO CODEC


A video signal consists of a sequence of individual frames. Each frame may be compressed
individually using an image CODEC as described above: this is described as intra-frame
coding, where each frame is ’intra’ coded without any reference to other frames. However,
better compression performance may be achieved by exploiting the temporal redundancy in
a video sequence (the similarities between successive video frames). This may be achieved
by adding a ‘front end’ to the image CODEC, with two main functions:

1. Prediction: create a prediction of the current frame based on one or more previously
transmitted frames.
2. Compensation: subtract the prediction from the current frame to produce a ‘residual
frame’.

The residual frame is then processed using an ‘image CODEC’. The key to this approach is
the prediction function: if the prediction is accurate, the residual frame will contain little data
and will hence be compressed toa very small size by the image CODEC. In order to decode
the frame, the decoder must ‘reverse’ the compensation process, adding the prediction to the
decoded residual frame (reconstruction) (Figure 3.15). This is inter-frame coding: frames
are coded based onsome relationship with other video frames, i.e. coding exploits the
interdependencies of video frames.

ENCODER DECODER

I prediction
Create I Create
prediction

Prevlous Prevlous
frame@) frame@)

Figure3.15 Video CODEC with prediction


42 IMAGE AND VIDEO COMPRESSION FUNDAMENTALS

3.4.1 FrameDifferencing

The simplest predictor is justthe previous transmitted frame. Figure3.16 shows theresidual
frameproduced by subtracting the previous framefrom the currentframe in a video
sequence. Mid-grey areas of theresidualframecontain zero data:light and dark areas
indicate positive and negative residual data respectively. It is clearthat much of the residual
data is zero: hence, compression efficiency can be improved by compressing the residual
frame rather than the current frame.

Table 3.4 Prediction‘drift’


Decoder
output/
Encoder
Encoder
prediction
output
Decoder
prediction
input
decoder
input
Encoder
Original frame 1 Zero Compressed frame
Decoded
Zero 1
frame 1
Original frame 2 Original frame 1 Compressed Decodedframe 1 Decodedframe 2
residual frame 2
Original frame 3 Original frame 2 Compressed Decodedframe 2 Decodedframe 3
residual frame 2
... ...
VIDEO CODEC 43

Subtract

Current Encoded
F Image encoder
frame frame

Create

L,!
prediction

Previous
frame(s)
Image decoder

Figure 3.17 Encoderwithdecoding loop

The decoder faces a potential problem that can be illustrated as follows. Table 3.4 shows
the sequence of operations required to encode and decode a series of video frames using
frame differencing. Forthefirst frame the encoder and decoder use no prediction. The
problem starts with frame 2: the encoder uses the original frame 1 as a prediction and
encodes the resulting residual. However, the decoder only has the decoded frame 1 available
to form the prediction. Because the coding process is lossy, there is a difference between the
decoded and original frame 1 which leads to a small error in the prediction of frame 2 at
the decoder. This error will build up with each successive frame and the encoder and decoder
predictors will rapidly ‘drift’ apart, leading to a significant drop in decoded quality.
The solution to this problem isfor the encoder to use a decoded frame to formthe
prediction. Hence the encoder in the above example decodes (or reconstructs) frame 1 to
form a prediction for frame 2. The encoder and decoder use the same prediction and drift
should be reduced or removed. Figure 3.17 shows the complete encoder which now includes
a decoding ‘loop’in order to reconstruct its prediction reference. The reconstructed (or
‘reference’) frame is stored in the encoder and in the decoder to form the prediction for the
next coded frame.

3.4.2 Motion-compensatedPrediction

Frame differencing gives better compression performance than intra-frame coding when
successive frames are very similar, but does not perform well when there is a significant
change between the previous and current frames. Such changes are usually due to movement
in the video scene and a significantly better prediction can be achieved by estimating this
movement and compensating for it.
Figure 3.18 shows a video CODEC that uses motion-compensated prediction. Two new
steps are required in the encoder:

1. Motion estimation: a region of the current frame (often a rectangular block of luminance
samples) is compared with neighbouring regions of the previous reconstructed frame.
44 IMAGE AND VIDEO COMPRESSION FUNDAMENTALS

ENCODER DECODE

Motion-compensated

D
........ prediction
.....................
~

Current

f I
Previous Image Previous
decoder frame@)

Figure 3.18 Video CODEC with motion estimation and compensation

The motion estimator attempts to find the ‘best match’, i.e. the neighbouring block in the
reference frame that gives the smallest residual block.
2. Motioncompensation: the ‘matching’ region or block from the referenceframe
(identified by the motion estimator) is subtracted from the current region or block.

Thedecodercarries out the samemotion compensation operation to reconstructthe


current frame. This means that the encoder has totransmit the location of the ‘best’ matching
blocks to the decoder (typically in the form of a set of motion vectors).
Figure 3.19 shows a residualframeproduced by subtracting a motion-compensated
version of the previous frame from the current frame (shown in Figure 3.16). The residual
frameclearlycontainslessdata than the residual in Figure 3.16. Thisimprovement in
SUMMARY 45

compressiondoes not come without aprice: motion estimation can bevery computa-
tionally intensive.The design of a motion estimation algorithm canhave a dramatic effecton
the compression performance and computational complexity of a video CODEC.

3.4.3 Transform,Quantisation and EntropyEncoding

A block or image transformis applied to the residual frame and the coefficients are quantised
and reordered. Run-level pairsareentropycoded as before(althoughthestatistical
distribution and hencethecodingtablesaregenerallydifferentforinter-codeddata). If
motion-compensated prediction is used, motion vector information must be sent in addition
to the run-level data. The motion vectors are typically entropy encoded in a similarway to
run-level pairs, i.e. commonly occurring motion vectors are coded with shorter codes and
uncommon vectors are coded with longer codes.

3.4.4
Decoding

A motion-compensated decoder (Figure 3.18) is usually simpler than the corresponding


encoder.Thedecoderdoes not need a motion estimationfunction(sincethe motion
information is transmitted in the coded bit stream) and it contains only a decoding path
(compared with the encoding and decoding paths in the encoder).

3.5 SUMMARY
Efficient coding of images and video sequences involves creating a model of the source data
that convertsitintoaform that can be compressed. Most image and videoCODECs
developed over the last two decades have been based around a common set of ‘building
blocks’. For motion video compression, the first step is to create a motion-compensated
prediction of the frame to be compressed, based on one or more previously transmitted
frames. The difference betweenthis model and the actual input frame isthen coded using an
imageCODEC.Thedataistransformedintoanotherdomain(e.g.the DCT or wavelet
domain), quantised, reordered and compressed using an entropy encoder. A decoder must
reverse these steps to reconstruct the frame:however, quantisation cannotbe reversed and so
the decoded frame is an imperfect copy of the original.
An encoder and decoder must clearly use a compatible set of algorithms in order to
successfully exchange compressed image or video data.Of prime importance is the syntaxor
structure of the compressed data.In the past 15 years there has been a significant worldwide
effort to develop standards for video and image compression. These standards generally
describe a syntax (and a decoding process) tosupport video or image communications for a
wide range of applications. Chapters 4 and 5 provide an overview of the main standards
bodies and JPEG, MPEG and H . 2 6 ~video and image coding standards.

You might also like