Compressed Image File Formats

John Miano

Mentioned 10

Since not all graphic formats are of equal complexity, author John Miano does not simply choose a number of file formats and devote a chapter to each one. Instead, he offers additional coverage for the more complex image file formats like PNG (a new standard) and JPEG, while providing all information necessary to use the simpler file formats. While including the well-documented BMP, XBM, and GIF formats for completeness, along with some of their less-covered features, this book gives the most space to the more intricate PNG and JPEG, from basic concepts to creating and reading actual files. Among its highlights, this book covers: -- JPEG Huffman coding, including decoding sequential mode JPEG images and creating sequential JPEG files-- Optimizing the DCT-- Portable Network Graphics format (PNG), including decompressing PNG image data and creating PNG files-- Windows BMP, XBM, and GIF

More on

Mentioned in questions and answers.

In libjpeg I am unable to locate the 8x8 DCT matrix ? If I am not wrong this matrix is always a constant for a 8x8 block . it must contain 1/sqrt(8) on the first row but where is this matrix ?

In an actual JPEG implementation, the DCT matrix is usually factored down to its Gaussian Normal Form. That gives a series of matrix multiplications. However, in the normal form, these only involve operations on the diagonal and values adjacent to the diagonal. Most of the values in the normalized matrices are zero so you can omit them.

That transforms the DCT into a series of 8 parallel operations.

This book describes a couple of ways the matrix operations can be transformed:

This book describes a tensor approach that is theoretically more efficient but tends not to be so in implementation

Earlier I read about mozjpeg. A project from Mozilla to create a jpeg encoder that is more efficient, i.e. creates smaller files.

As I understand (jpeg) codecs, a jpeg encoder would need to create files that use an encoding scheme that can also be decoded by other jpeg codecs. So how is it possible to improve the codec without breaking compatibility with other codecs?

Mozilla does mention that the first step for their encoder is to add functionality that can detect the most efficient encoding scheme for a certain image, which would not break compatibility. However, they intend to add more functionality, first of which is "trellis quantization", which seems to be a highly technical algorithm to do something (I don't understand).

I'm also not entirely sure this quetion belongs on stack overflow, it might also fit superuser, since the question is not specifically about programming. So if anyone feels it should be on superuser, feel free to move this question

JPEG is somewhat unique in that it involves a series of compression steps. There are two that provide the most opportunities for reducing the size of the image.

The first is sampling. In JPEG one usually converts from RGB to YCbCR. In RGB, each component is equal in value. In YCbCr, the Y component is much more important than the Cb and Cr components. If you sample the later at 4 to 1, a 4x4 block of pixels gets reduced from 16+16+16 to 16+1+1. Just by sampling you have reduced the size of the data to be compressed by nearly 1/3.

The other is quantization. You take the sampled pixel values, divide them into 8x8 blocks and perform the Discrete Cosine transform on them. In 8bpp this takes 8x8 8-bit data and converts it to 8x8 16 bit data (inverse compression at that point).

The DCT process tends to produce larger values in the upper right corner an smaller values (close to zero) towards the lower left corner. The upper right coefficients are more valuable than the lower left coefficients.

The 16-bit values are then "quantized" (division in plain english).

The compression process defines an 8x8 quantization matrix. Divide the corresponding entry in the DCT coefficients by the value in the quantization matrix. Because this is integer division, the small values will go to zero. Long runs of zero values are combined using run-length compression. The more consecutive zeros you get, the better the compression.

Generally, the quantization values are much higher at the lower left than in the upper right. You try to force these DCT coefficients to be zero unless they are very large.

This is where much of the loss (not all of it though) comes from in JPEG.

The trade off is to get as many zeros as you can without noticeably degrading the image.

The choice of quantization matrices is the major factor in compression. Most JPEG libraries present a "quality" setting to the user. This translates into the selection of a quantization matrices in the encoder. If someone could devise better quantization matrices, you could get better compression.

This book explains the JPEG process in plain English:

I have a question regarding the JPEG Huffman Table and using the Huffman Table to construct the symbol/binary string from a Tree. Suppose, that in an Huffman Table for 3-Bit code Length the number of codes is greater than 6, then how do we add all those codes in the Tree? If I am correct only 6 codes can be added at the 3-bit level/depth of the tree. So, how do we add the remaining codes if they won't fit in that level? Do we just ignore them?


code length | Total Codes | Codes  
3-Bit       |    10       | 25 43 34 53 92 A2 B2 63 73 C2

In the above example if we go by order of constructing symbols/binary string for the code then up 'til A2 we can add codes in the tree at level 3-Bit, but what about B2,63,73,C2 etc? It's not possible to add them at 3-Bit level of the tree? So what do we do with them?

In JPEG, a Huffman code can be up to 16-bits. The DHT market contains an array of 16 elements giving the number of codes for each length.

The JPEG standard explains how to use the code counts to do the Huffman translation. It is one of the few things explained in detail.

This book explains how it is done from a programmers perspective.


The number of codes that exists at any code length depends upon the counts for other lengths.

I am wondering if you are really looking at the count of codes for length 4 rather than 3.

I have a web gallery where I display images which vary in file sizes and resolutions uploaded by users. Currently all the images are baseline. So I would like to know whether it would really have any significant impact if I converted them to progressive images. What are the advantages and tradeoffs on using progressive images.

The JPEG standard defines a variety of compression modes. Only three of these are in widespread use:

  • Baseline Sequential
  • Extended Sequential
  • Progressive

The only difference in the first to is in the number of tables allowed. Otherwise, they are encoded and decodes in exactly the same way.

JPEG divides images into Frames that are then divided into Scans. The modes above only permit one frame. The frame is the image. The scans are passes through the image data. A scan may be contain the data for one color component or it may be interleaved and contain data for multiple color components.

  • A grayscale sequential JPEG stream will have one scan.
  • A color sequential JPEG stream may have one or three scans.

JPEG takes 8x8 blocks of pixel data and applies the discrete cosine transform to that data. The 64 pixel data become 64 DCT coefficients. The first DCT coefficient is called the "DC" coefficient and the other 63 are called "AC" coefficients.

This is confusing terminology that drawing on the analogy with DC and AC current. The DC coefficient is analogous to the average pixel value of the block.

In sequential JPEG, the 64 coefficients in a block are encoded together (with the DC and AC coefficients encoded differently). In Progressive JPEG, the DC and the AC coefficients scans encode bitfields (of configurable size) within the coefficient. In theory, you could have a separate scan for each bit of each component.

Progressive JPEG is much more complicated to implement and use. If you are creating an encoder for sequential JPEG, you just need to give the caller the option to use interleaved or non-interleaved scans. For progressive JPEG your encoder needs a mechanism to the caller to determine how many scans and what bits should be encoded in each scan.

Progressive encoding can be slower than sequential because you have to make multiple passes over the data.

The speed issue in progressive decoding depends upon how it is done. If you decode the entire image at once, progressive is possibly marginally slower than sequential. If your decoder shows the image fading in as it processes the stream it will be much slower than sequential. Each time you update the display, you have to do the inverse DCT, upsampling, and color transformation.

On the other hand, it is possible to get much better compression using progressive JPEG with well-tuned scans.

There is no difference in quality between progressive and sequential

This book describes the processes:

I know that renaming a file with a .txt extension to a .jpeg extension causes the file to open as a JPEG, but isn't really "valid" (meaning there is not an image shown). Is there any way to change a .txt file into a .jpg file so that the computer displays a random image of ? x ? dimensions with colors and patterns based on the text within the .txt file without modifying it? I don't have any set pattern in mind to compare to.

I'm willing to field any questions in case I wasn't clear on what I'm asking for.

What you are saying has no chance of happening. JPEG files have one of several file structures that share a common compressed data format.

A text file is going have lines of characters separated by , , or sequences that will just screw up a compressed JPEG stream.

If you want to understand how JPEG works, you should get this book:

I'm trying to implement image compression algorithm based on DCT for color JPEG. I'm newbie in image processing so I need some help. What I need is clarification of an algorithm.

I'm using DCT implementation from here

So, here is the algorithm as I understood it:

  1. Load an image using ImageIO into BufferedImage.
  2. Create 3 matrices (1 for each channel: red, green, blue):

    int rgb = bufferedImage.getRGB(i, j);
    int red = (rgb >> 16) & 0xFF;
    int green = (rgb >> 8) & 0xFF;
    int blue = rgb & 0xFF;
  3. Increase matrices to the size so they can be split in chunks 8x8 (where 8 is the size of DCT matrix, N)

  4. For each matrix, split it into chunks of the size 8x8 (result: splittedImage)
  5. Perform forwardDCT on matrices from splittedImage (result: dctImage).
  6. Perform quantization on matrices from dctImage (result: quantizedImage)

Here I don't know what to do. I can:

  • merge quantizedImage matrices into one matrix margedImage, convert it into Vector and perform compressImage method.
  • or convert small matrices from quantizedImage into Vector and perform compressImage method on them, and then marge them into one matrix

So, here I got 3 matrices for red, green and blue colors. Than I convert those matrices into one RGB matrix and create new BufferedImage and using method setRGB to set pixel values. Then perform saving image to file.

Extra questions:

  1. Is it better to convert RGB into YCbCr and perform DCT on Y, Cb and Cr?
  2. Javadoc of compressImage method says that it's not Huffman Encoding, but Run-Length encoding. So will the compressed image be opened by image viewer? Or I should use Huffman Encoding according to JPEG specification, and is there any open source Huffman Encoding implementation in Java?

If you want to follow the implementation steps, I suggest reading:

In regard your questions:

1) The JPEG standard knows nothing about color spaces and does not care whether you use RGB or YCbCr, or CMYK. There are several JPEG file format (e.g., JFIF, EXIF, ADOBE) that specify the color spaces--usually YCbCr.

The reason for using YCbCr is that if follows the JPEG trend of concentrating information. There tends to be more useful information in the Y component than the Cb or Cr components. Using YCbCr, you can sample 4 Ys for ever Cb and Cr (or even 16) for every Y. That reduces the amount of data to be compressed by 1/2.

Note that the JPEG file formats specify limits on sampling (JPEG allows 2:3 sampling while most implementations do not).

2) The DCT coefficients are Runlength encoded then huffman (or arithmetic) encoded. You have to use both.

I've been trying to implement a JPEG compression algorithm on Matlab. The only part I'm having trouble implementing is the huffman encoding. I do understand the DCT into quantization and zig-zag'ing that 8x8 matrix. I also understand how does huffman encoding work, in general. What I do not understand is, after I have an output bitstream and a dictionary that translates consecutive bits to their original form, what do I do with the output? How can I tell a computer to translate that output bitstream using the dictionary I created for it? In addition, each 8x8 matrix will have its own output and dictionary. How can all these outputs be combined into one? Because at the end of the day, the result is supposed to be an image. I might have misunderstood some of the steps, in which case my apologies for any confusion caused by this. Any help would be extremely appriciated!

EDIT: I'm sorry, my question appearntly hasn't been clear enough. Say I use Matlabs built in huffman functions (huffmanenco and huffmandict), what am I supposed to do with the value the huffmanenco returns? The part of what to do with the output string of bits hasn't been clear to me as far as huffman encoding goes in other IDE's and programming languages aswell.

You have two choices with the huffman coding.

  1. Use a pre-canned huffman table.
  2. Make two passes over the data where the first pass generates the huffman tables and the second pass encode.

You cannot have a different dictionary for each MCU.

You say you have the run length encoded values. You huffman encode those and write to the output stream.


You need to be sure that the matlab huffman endocoder is JPEG-compatible. There are different ways to huffman encode.

You need to write the bits from the encoder to the JPEG stream. This means you need a bit level I/O routine. PLUS you need to convert FF values in the compressed data into FF00 values in the JPEG stream.

I suggest getting a copy of

to show how the encoding is done.

I've been working on a custom video codec for use on the web. The custom codec will be powered by javascript and the html5 Canvas element.

There are several reasons for me wanting to do this that I will list at the bottom of this question, but first I want to explain what I have done so far and why I am looking for a fast DCT transform.

The main idea behind all video compression is that frames next to eachother share a large amount of similarities. So what I'm doing is I send the first frame compressed as a jpg. Then I send another Jpeg image that is 8 times as wide as the first frame holding the "differences" between the first frame and the next 8 frames after that.

This large Jpeg image holding the "differences" is much easier to compress because it only has the differences.

I've done many experiments with this large jpeg and I found out that when converted to a YCbCr color space the "chroma" channels are almost completely flat, with a few stand out exceptions. In other words there are few parts of the video that change much in the chroma channels, but some of the parts that do change are quite significant.

With this knowledge I looked up how JPEG compression works and saw that among other things it uses the DCT to compress each 8x8 block. This really interested me because I thought what if I could modify this so that it not only compresses "each" 8x8 block, but it also checks to see if the "next" 8x8 block is similar to the first one. If it is close enough then just send the first block and use the same data for both blocks.

This would increase both decoding speed, and improve bit rate transfer because there would be less data to work with.

I thought that this should be a simple task to accomplish. So I tried to build my own "modified" jpeg encoder/decoder. I built the RGB to YCbCr converter, I left "gzip" compression to do the huffman encoding and now the only main part I have left is to do the DCT transforms.

However this has me stuck. I can not find a fast 8 point 1d dct transform. I am looking for this specific transform because according to many articles I've read the 2d 8x8 dct transform can be separated into several 1x8 id transforms. This is the approach many implementations of jpeg use because it's faster to process.

So I figured that with Jpeg being such an old well known standard a fast 8 point 1d dct should just jump out at me, but after weeks of searching I have yet to find one.

I have found many algorithms that use the O(N^2) complexity approach. However that's bewilderingly slow. I have also found algorithms that use the Fast Fourier Transform and I've modifed them to compute the DCT. Such as the one in this link below:

In theory this should have the "fast" complexity of O(Nlog2(n)) but when I run it it takes my i7 computer about 12 seconds to encode/decode the "modified" jpeg.

I don't understand why it's so slow? There are other javascript jpeg decoders that can do it much faster, but when I try to look through their source code I can't pull out which part is doing the DCT/IDCT transforms.

The only thing I can think of is maybe the math behind the DCT has already been precomputed and is being stored in a lookup table or something. However I have looked hard on google and I can't find anything (that I understand at least) that talks about this.

So my question is where can I find/how can I build a fast way to compute an 8 point 1d dct transform for this "modified" jpeg encoder/decoder. Any help with this would be greatly appreciated.

Okay as for why I want to do this, the main reason is I want to have "interactive" video for mobile phones on my website. This can not be done because of things like iOS loading up it's "native" quick time player every time it starts playing a video. Also it's hard to make the transition to another point in time of the video seem "smooth" when you have such little control of how videos are rendered especially on mobile devices.

Thank you again very much for any help that anyone can provide!

This book shows how the DCT matrix can be factored to Gaussian Normal Form. That would be the fastest way to do a DCT.

I want to create and fill jpeg file with RGB information. why in official sources i can't find information about how to do this. I don't want to use any library for this.

If you have to do it yourself, you need to read the book and have a copy of the JPEG standard.

So I'm reading JFIF (JPEG) data from a file, as an exercise (I know there are libraries out there that already do this, I'm not looking for those). I've already got the image file size, color depth, and dimensions. However, I'm not too sure how to get the actual image data. I've looked at the data in a hex editor, and comparing that against the actual image leads me nowhere. If anyone has a good resource to start on this (I know it's probably an arduous and enlightening process, but that's why I'm doing it), that would be awesome.

My code so far, just for context:

// check header data, assign header data to important fields

        // Start Of Image (SOI) must be FFD8 and the next marker must be FF
        if(!([0] == (byte) 0xFF &&[1] == (byte) 0xD8
                &&[2] == (byte) 0xFF))
            this.isValid = false;

        // check if file is not valid
                    String.format("ERROR: File %s is not registered as a JFIF!\n", this.filename), 
                    new IllegalArgumentException());

        // If the next values are correct, then the data stream starts at SOI
        // If not, the data stream is raw
        this.isRawDataStream = !([3] == (byte) 0xE0
                &&[6]  == (byte) 0x4A
                &&[7]  == (byte) 0x46
                &&[8]  == (byte) 0x49
                &&[9]  == (byte) 0x46
                &&[10] == (byte) 0x00);

        // Read until SOF0 marker (0xC0)
        int i = 11;
        while([i] != (byte) 0xC0) {
        System.out.println("SOF0 marker at offset " + i);

        // Skip two bytes, next byte is the color depth
        this.colorDepth =[i+3];

        // Next two bytes are the image height
        String h = String.format("%02X",[i+4]) + String.format("%02X",[i+5]);
        this.height = hexStringToInt(h);
        System.out.println("Height: " + this.height);

        // Next two bytes are the image width
        String w = String.format("%02X",[i+6]) + String.format("%02X",[i+7]); 
        this.width = hexStringToInt(w);
        System.out.println("Width: " + this.width);

        System.out.println("Color depth: " + this.colorDepth);
        // load pixels into an image
        this.image = new BufferedImage(this.width,

Then, I need to get each pixel and send it to the image. How would I get each pixel and its respective RGB data?

What you are trying to do is not a simple afternoon project. This book explains the process: There is A LOT of code between JPEG compressed data and pixel values.

First of all, you have to deal with two separate but related compression methods: Sequential and progressive.

As you read the bit data, you have to

  1. Huffman decode
  2. Run length decode
  3. Inverse Quantization
  4. List item
  5. Inverse Discrete Cosine Transform
  6. Up sample
  7. YCbCr to RGB convert

That's in the simple case of sequential.

You are not going to get all of those steps explained on this forum.

I also recommend