When I was in 3rd year at the University of Glasgow, I learned about the Floyd-Steinberg error diffusion algorithm, a method for performing digital half-toning on images. This is an extremely simple algorithm which produces a wonderful effect. It converts a greyscale image, with many shades of grey:

into a black and white image which essentially looks the same; in particular which looks the same when viewed from a distance:

More generally the technique can be applied to a colour image using a large palette of colours, producing a similar image in a reduced palette.

The technique is pretty well-known, but I'll quickly summarise how it works for greyscale images: pixels are visited left-right, top-bottom. A pixel with greyscale value *p* is replaced with 0 if *p* <= 128 (assuming 0 is black, 255 is white and 128 is mid-grey). This has the effect of finding the “best match” for *p* in the reduced palette. But if *p* goes from say 100 to 0 this has darkened the pixel significantly, and if p goes from 130 to 255 this has lightened the pixel significantly. So, to compensate for this, the error associated with the palette decision for the pixel (old value minus new value) is propagated to surrounding pixels in the following way:

In the following diagrams, pixel 22 is set to 0 since it is less than 128. Then 22x7/16=10 is added to the pixel immediately to the right, 22x1/16=1 to the pixel to the bottom right, 22x5/16=7 to the pixel below, and 22x3/16=4 to the pixel to the bottom left.

(The Wikipedia entry for Floyd-Steinberg dithering gives some pseudo-code showing how the algorithm can be implemented.)

## Parallelizing error diffusion

I'm no expert on this technique, and I don't know where the authors got these coefficients from, or why they work so well, but when I started working at Codeplay Software Ltd. in 2007 I was looking for interesting algorithms to try parallelising using their Sieve technology. I remembered this algorithm as being particularly cool, and wondered if it was parallelizable. I started from scratchy, trying to work out by myself if it could be parallelized, but I got stuck, so I did a bit of searching and came across this paper, which presents a parallel version of the method, by Panagiotis T. Metaxas.

The paper makes the following observation: the top-left pixel in the image *must* be processed first, and the pixel to its right second. But once these two pixels have been processed, either the third pixel in the top row could be processed, *or* the first pixel in the second row. By continuing to reason in this way, we can number each pixel with an integer representing the time step at which it could be processed in a fully parallel machine:

Observe that there is a skewed diagonal of pixels which can be processed in parallel, and the length of this diagonal increases, reaching its peak as it crosses the middle of the image, then decreases. For a large image there would clearly be a fair bit of opportunity for parallelization.

When I read this paper I didn't feel too stupid for having got stuck with this problem: his algorithm is pretty clever, and apparently even Donald Knuth believed the Floyd-Steinberg algorithm to be inherently serial - see a quote in the paper. (It occurs to me that perhaps Knuth has been somewhat mis-quoted - Knuth states that the algorithm is inherently serial because the pixel in the bottom right hand corner depends on processing of all other pixels. While the point about dependences is true, Metaxas's approach shows that there is opportunity for parallelism. But maybe Knuth simply meant that there is no way to process *every* pixel in parallel.)

**To be continued.**

you have a nice site. thanks for sharing this site. you can download lots of ebook from here

ReplyDeletehttp://feboook.blogspot.com