Table of Contents
Page 2 of 4
Sequential DCT-Based Encoding
There are a number of steps which make up the JPEG operation in lossy sequential mode.
An image to be compressed is made up, as is every television image, of a combination of Red, Green and Blue. The luminance of a display is the sum of the luminance (brightness) of the red green and blue colours. Luminance and chrominance (colour) are the key elements of each image. The human eye is not as sensitive to chrominance as it is to luminance. Their values are calculated using the following mathematical formula.
luminance Y = 0.3R + 0.6G + 0.1B chrominance I = 0.6R - 0.28G - 0.32B chrominance Q = 0.21R - 0.52G + 0.31B
After research, it was found that these values provided the information which was required, it is not important to know how these values were computed, it is simply enough to realise that a formula is used to establish the luminance and chrominance for each image.
If we assume an image which has dimensions of 640 x 480 pixels with 24 bits/pixel.
The next stage is to split the source image into blocks of size 8 pixels * 8 pixels. This allows the use of the DCT as this operates on blocks of size 8 * 8. Because there are two chrominance channels, I and Q, and one luminance channel, it isr necessary to split the original image into three seperate images which, when combined, give the original image. Each of the pixels is assigned a value in the range 0-255 which is an accurate reflection of itself in relation to the formaulae given above.
Discrete Cosine Transformation - DCT
A DCT is applied to each of the blocks which makes up the two chrominance images and the luminance image. The output of each DCT is an 8*8 matrix of DCT coefficients. The matrix which is returned by actioning the DCT on the images introduces, in theory, no loss to the overall state of the information.
However, in reality, the DCT returns floating point values which have to be rounded in order for them to be manipulated in a sensible manner. This obviously introduces a little error. This error is, however, insubstantial in comparison to that introduced at later stages. It simply transforms the data contained within the images into a domain from where they can be more efficiently encoded.
The transformation is related to the Discrete Fourier Transform which is central to many different kinds of signal processing including the analysis and compression of sounds and images. The idea behind the DCT is to work out the difference between each of the entries in the matricies. The top leftmost location in each matrix, entry (0,0), is the average DCT value of the entire 8*8 block. The other entries in the matrix represent the spectral
If the entire image is a constant colour then the output of the DCT would be an 8*8 matrix which contained a value in the (0,0) location but no other values in the entire matrix. There is no direct
geographical mapping between the matrix returned by the DCT and the image which it is being applied to. As mentioned, the output of the DCT is a mapping from the difference in luminance or chrominance to a set of values which represent this. Because sample values typically vary very slowly from point to point across an image the DCT processing step lays the foundation for achieving data compression by concentrating on the lower spatial frequencies. A typical 8*8 block from a sample image has very little amplitude in the spatial frequency and does not need to be encoded.
There is currently a lot of ongoing research into finding fast DCT algorithms, the difficulty apparently is
optimal for particular hardware implementations.
Once the DCT has completed operation, the next stage is to use a quantization table to uniformly quantize the matrices. The Quantization table which is required to do this job must be specified by the user or the application at the encoding stage. It is also required that the Quantization table is sent to the decoder so that the image can be correctly decoded.
The Quantization table takes the form of a 64 entry 8*8 matrix which is applied to each of the matrices which the DCT has operated on. Each of the frequency entries in the DCT returned matrices are divide by the corresponding value in the Quantization table. In the entire JPEG process this is the stage where most information is lost. The larger the Quantization value, the more information that is likely to be lost when performing the division. If the Quantization value is 15, for example, and the value in the DCT returned matrix is 59, then division of 59 by 15 gives 3 remainder 14. This is a large loss, which cannot be recouped at any stage. There are obvious ways around this particular problem, for eaxple, rounding 59 up to sixty, but for an entire image this would take a lengthy period and in the end may not provide a clearer image.
We discussed earlier the fact that the DCT process introduces some round-off error as the outputs produced are not typically integers. Antother information loss comes as a result of this, due to the fact that the lowest Quantization value is 1. Upon division of a non integer value by one, some loss is
introduced as well. The main concern comes through repeated compression and decompression, losing a little information at each stage can be very detremental to the images produced after a period of time.
Tuning these Quantization tables to provide the best possible results is also an active research area.
For each image which is sent to the decoder, you should realise that there are a number of matrices which shall be acted upon through DCT and Quantization tables. There are the two for chrominance and one for luminance. Each of these matrices has to have its Quantization table sent alongside. An initial concern is that this is a substantial overhead, however, if in the case of the luminance matrix which consists of 64 smaller 8*8 matrices, one more matrix of size 8*8 does not contribute a sizeable overhead to the process.
The aim of the diffrential quantization process is to reduce the value of the element contained at location (0,0) in each block. The idea is to replace this value with another value which depicts the difference from the corresponding block in the previous matrix.
These values in the (0,0) location are averages of the respective blocks and because of this, the values should change slowly from block to block. By taking the diffrential values in this way, the large numbers should be reduced to smaller numbers. This operation is deemed to be worthwhile because
"DC coefficients frequently contain a significant fraction of the total image energy." The other values in the matrix are transmitted as they stand, the differentials from the previous blocks are not computed in the same manner as the (0,0) entry. To do so would be computationally expensive.
The value in location (0,0) is known as a DC coefficient while the other 63 values are known as AC coefficients.
The quantized coefficients are then ordered into the zig-zag sequence which is shown below in readiness to be encoded using entropy encoding.
As mentioned, the way that the quantized coefficients are ordered is in a zig-zag sequence. The thinking behind this move is to allow the grouping together of the many zeros which are generally to be found in the bottom right hand region of the quantized matrices. This can be observed in the following diagram.
If for example, there were thirty zeros, by looking at the matrix in a zig zag fashion it would be possible to compress this by saying that there are simpl 30 zeros. Obviously this is advantageous.
Figure 3. Zig Zag Sequence for Encoding
Statistical Output Encoding
In some papers the stage of Entropy Encoding and this stage are both described as one. Since the
quantized coefficients are simply manipulated, that is compressed, using some other technique, it is easy to see why. The JPEG standard specifies two coding methods, Huffman Coding and arithmetic coding.
There are issues with both of these methods. Using the Huffman Coding method, it is necessary that Huffman tables be sent along with the other information in order for the compressed image to be decompressed correctly using the same table as it was encoded with. With many of the issues surrounding JPEG, it is up to the implementation of the algorithm to state the Huffman tables and whether or not they are to be standard implementation or are to be transmitted with each run.
The arithmetic coding possibility does not require any tables to be transmitted with the information.
This method adapts to the image statistics as it encodes the image. According to research this method is deemed to be provide 5-10% better compression than the Huffman Coding method, although it is also more complex. There is once a gain a trade-off where it is up to the application to decide upon the more suitable method for implementation.
Table of Contents
Table of Contents
Run Length Encoding | Statistical Encoding Huffman Coding | Lempel Ziv
Source Encoding Differential Encoding| JPEG | JPEG Page 2 | JPEG Page 3 | JPEG Page 4|
Last modified: Fri Mar 20 13:33:12 GMT 1998 Press Here for Help