I've been developing this software at work to take an image from a video stream (a webcam in this case) and detect if a green laser dot is present. After spending most of my day getting a project to compile and a program to recognize my webcam as a stream, and being distracted by numerous other projects and co-workers, I just wasn't at the top of my algorithm development game to detect the green dot. Combine that with office lighting and you know my dilemma (fluorescent lighting is often green tinted and throws off a white balance.)

I start by stepping through each pixel in the source and check its color. If you take the pixel's colors from each channel and find the average you've essentially got a gray scale value. This doesn't work so well since some shiny metal might throw off the detection (like a ring). If I wanted to find the brightest spot, this is a good approach, but I needed to emphasize the green. At first I added a weight to the green channel by simply doubling it. So now the average is calculated as (R+G+G+B)/4, counting the green channel twice. This was better, but also prone to problems.

At the end of the day I had a basic project setup and working with some hit or miss detection. I had invested over an hour into multiple approaches and although my algorithms for color matching were working somewhat, and I knew there had to be a better approach.

What dawned on me later was something I should have noted much earlier. I know that the RGB color concept is essentially a three dimensional array, but since it is color I'm used to thinking of it two dimensionally like you see in an image editing program. Once I visualized it as a 3D space, a big cube made of blocks so to speak, I knew my answer.

You know the shortest route from A to B is a straight line right. This is easy to calculate in 2D using the Pythagorean Theorem as you probably recognize:

a² + b² = c²

This also applies in 3D space, which is exactly what RGB color is. An RGB value is a point in a 3D space. The Theorem applies like this:

a² + b² + c² = d²

The obvious solution was in front of me but I didn't see it. Euclidean distance would give me exactly what I needed. I needed to treat the color difference as a distance.

r² + g² + b² = d²

For instance, say I wanted to find the closest pixel in the supplied source to a target color RGB 128, 255, 128. I check pixels against my target color by finding their channel distance. So imagine I have a black pixel of 0, 0, 0. My distance is calculated as:

r = pixel_color - target_color = 0 - 128 = -128

g = pixel_color - target_color = 0 - 255 = -255

b = pixel_color - target_color = 0 - 128 = -128

d = sqrt(r² + g² + b²) = 312.7

What if I have a pixel with color 0, 0 255, or a pixel with color 0, 255, 0. Which one is closer to my target? If I calculate the average, they are the same. But by distance, the green pixel is closer to my desired.

## Saturday, August 15, 2009

Subscribe to:
Post Comments (Atom)

## 1 comment:

This is cool. I never thought of RGB as xyz but it makes sense. On a related note. Sketchup uses RGB for its directions. Red is X, Green is Y and Blue is Z. It makes it really easy see where you are and type in coordinates when your zooming around the model. (other programs probably use this too)

Post a Comment