Digital Image Processing

Rafael C. Gonzalez, Richard Eugene Woods

Mentioned 2

For courses in Image Processing and Computer Vision. Completely self-contained and heavily illustrated this introduction to basic concepts and methodologies for digital image processing is written at a level that truly is suitable for seniors and first-year graduate students in almost any technical discipline. The leading textbook in its field for more than twenty years, it continues its cutting-edge focus on contemporary developments in all mainstream areas of image processing e.g., image fundamentals, image enhancement in the spatial and frequency domains, restoration, color image processing, wavelets, image compression, morphology, segmentation, image description, and the fundamentals of object recognition. It focuses on material that is fundamental and has a broad scope of application.

More on

Mentioned in questions and answers.

I want to implement the two above mentioned image resampling algorithms (bicubic and Lanczos) in C++. I know that there are dozens of existing implementations out there, but I still want to make my own. I want to make it partly because I want to understand how they work, and partly because I want to give them some capabilities not found in mainstream implementations (like configurable multi-CPU support and progress reporting).

I tried reading Wikipedia, but the stuff is a bit too dry for me. Perhaps there are some nicer explanations of these algorithms? I couldn't find anything either on SO or Google.

Added: Seems like nobody can give me a good link about these topics. Can anyone at least try to explain them here?

While what Ants Aasma says roughly describes the difference, I don't think it is particularly informative as to why you might do such a thing.

As far as links go, you are asking a very basic question in image processing, and any decent introductory textbook on the subject will describe this. If I remember correctly, Gonzales and Woods is decent on it, but I'm away from my books and can't check.

Now on to the particulars, it should help to think about what you are doing fundamentally. You have a square lattice of measurements that you want to interpolate new values for. In the simple case of upsampling, lets imagine you want a new measurement in between every one that you already have (e.g. double the resolution).

Now you won't get the "correct" value, because in general you don't have that information. So you have to estimate it. How to do this? A very simple way would be to linearly interpolate. Everyone knows how to do this with two points, you just draw a line between them, and read the new value off the line (in this case, at the half way point).

Now an image is two dimensional, so you really want to do this in both the left-right and up-down directions. Use the result for your estimate and voila you have "bilinear" interpolation.

The main problem with this is that it isn't very accurate, although it's better (and slower) than the "nearest neighbor" approach which is also very local and fast.

To address the first problem, you want something better than a linear fit of two points, you want to fit something to more data points (pixels), and something that can be nonlinear. A good trade off on accuracy and computational cost is something called a cubic spline. So this will give you a smooth fit line, and again you approximate your new "measurement" by the value it takes in the middle. Do this in both directions and you've got "bicubic" interpolation.

So that's more accurate, but still heavy. One way to address the speed issue is to use a convolution, which has the nice property that in the Fourier domain, it's just a multiplication, so we can implement it quite quickly. But you don't need to worry about the implementation to understand that the convolution result at any point is one function (your image) being integrated in product another, typically much smaller support (the part that is non-zero) function called the kernel), after that kernel has been centered over that particular point. In the discrete world, these are just sums of the products.

It turns out that you can design a convolution kernel that has properties quite like the cubic spline, and use that to get a fast "bicubic"

Lancsoz resampling is a similar thing, with slightly different properties in the kernel, which primarily means they will have different characteristic artifacts. You can look up the details of these kernel functions easily enough (I'm sure wikipedia has them, or any intro text). The implementations used in graphics programs tend to be highly optimized and sometimes have specialized assumptions which make them more efficient but less general.

The basic operation principle of both algorithms is pretty simple. They're both convolution filters. A convolution filter that for each output value moves the convolution functions point of origin to be centered on the output and then multiplies all the values in the input with the value of the convolution function at that location and adds them together.

One property of convolution is that the integral of the output is the product of the integrals of the two input functions. If you consider the input and output images, then the integral means average brightness and if you want the brightness to remain the same the integral of the convolution function needs to add up to one.

One way how to understand them is to think of the convolution function as something that shows how much input pixels influence the output pixel depending on their distance.

Convolution functions are usually defined so that they are zero when the distance is larger than some value so that you don't have to consider every input value for every output value.

For lanczos interpolation the convolution function is based on the sinc(x) = sin(x*pi)/x function, but only the first few lobes are taken. Usually 3:

lanczos(x) = {
    0 if abs(x) > 3,
    1 if x == 0,
    else sin(x*pi)/x

This function is called the filter kernel.

To resample with lanczos imagine you overlay the output and input over eachother, with points signifying where the pixel locations are. For each output pixel location you take a box +- 3 output pixels from that point. For every input pixel that lies in that box, calculate the value of the lanczos function at that location with the distance from the output location in output pixel coordinates as the parameter. You then need to normalize the calculated values by scaling them so that they add up to 1. After that multiply each input pixel value with the corresponding scaling value and add the results together to get the value of the output pixel.

Because lanzos function has the separability property and, if you are resizing, the grid is regular, you can optimize this by doing the convolution horizontally and vertically separately and precalculate the vertical filters for each row and horizontal filters for each column.

Bicubic convolution is basically the same, with a different filter kernel function.

To get more detail, there's a pretty good and thorough explanation in the book Digital Image Processing, section 16.3.

Also, and in skia have a pretty well commented implementation of lanczos interpolation.

Does anyone know of any implementation of Split and merge segmentation other than the following package (which is difficult to customize)?

Find out my previous answer! I have done a simple implementation in c++ with opencv. It is not super optimized but it seems to works. I followed the procedure described in Gonzales's Digital Image Processing book.