Robot Ping Pong: First attempt

This is my first attempt at doing some computer vision work.

The Camera

Since I haven't really decided what I need yet, I used the supplies and tools I had at hand. So I went out and recorded some video on my Canon Rebel T3i. That's an SLR camera -- not a video camera per se, and certainly not a high-performance video camera. But it can shoot 60 fps at 1280x720 (a.k.a. 720p) and save it to a memory card. It cannot stream a live feed in real-time to a computer. It was a bright day for this outdoor shoot, which was in the shade. That, combined with the overall quality of the camera and the lens, meant that the video turned out fairly well. I chose to shoot from behind me and to one side, above the table, which made it easy to keep the table in the frame.

Here is the footage that I'm working from. It has been webified down to 30 fps and compressed -- the original was in a higher quality .mov format at 60 fps.

Extracting Frames

With that video captured, I came back inside and started the processing. It was easy to copy over to my computer, as Ubuntu recognized the camera as a mass storage device, and I copied the .mov file. The file was 67MB. Simple.

I had some trouble getting the replay to work in some Ubuntu players. VLC seemed to do the best job, so that's what I've been using ever since. It plays nicely.

I wanted to extract individual frames from the video, so that I could attempt to identify interesting features in it. To do that, I used the ffmpeg/avconv tool, available in the Ubuntu repository. This is what I did:

ffmpeg -i movie1.mov -r 60 -f image2 pngofmov/image-%03d.png

This created a separate png file for each frame of the movie. With 7.5s of footage, I get about 7.5*60 = 450 frames. Each frame's png is about 1.3MB, for a total of 585MB -- much bigger than the original .mov.

Processing

I used Matlab to do this processing, as a) I have it available, b) I understand how to use it, c) I didn't know any better. Note that I do not own the Image Processing Toolbox, so I'm using the base Matlab package.

The good news is that Matlab makes it easy to load images, display them, and treat them as 3D matrices of numbers. That meant that it took me just a few minutes to get started and under and hour to complete this whole task.

Background Subtraction

I chose to focus on a handful of images from the movie, to see if I could identify the ball in them. To do this, I had the idea of background subtraction: subtracting one image from another to identify the differences. If you subtract an image with only background in it from an image with action on top of the background, you should be left with just the action. Sounds easy.

Seriously, this is how much code this takes in Matlab:

img291 = imread('image-291.png');
imgbase = imread('image-060.png');
imgdiff = img291 - imgbase;
image(imgdiff);
print('-dpng', '291minus060.png');

Well, here is my first attempt.

Background Image

Image With Action

Foreground Through Subtraction

Wow! That actually worked! The ball clearly appears, as does my arm with a paddle. Now, it might be hard to see in this web-sized image, but there is additional noise in the image. There is movement in the bush in the background and even the edges of the table seem to be shimmering enough to cause the odd pixel to light up. But the overall effect is pretty clearly a success.

Finding the Ball

How do I turn that into an algorithm to find a ball? Well, after a little bit of trial and error (and a little googling for how to create a circle mask), I came up with this little function. It searches for the greatest intensity increase that is in the shape of a circle of a particular size.

function circxy = findBall(imgbase, img)
    % do the subtraction
    imgdiff = img - imgbase;
    % average the red,green,blue pixels to get grayscale
    imgdiffgray = mean(imgdiff,3);
    % define a mask/kernel with a 12-pixel radius circle in the middle
    % the kernel has +1 inside the circle and -1 outside the circle
    crad=12;
    ix=sqrt(2*pi*crad^2);iy=ix;cx=ix/2;cy=cx;
    [x,y]=meshgrid(-(cx-1):(ix-cx),-(cy-1):(iy-cy));
    circmask = ((x.^2+y.^2) < crad^2);
    circkern = circmask * 2 - 1;
    % apply the mask to every location in the grayscale difference
    circfind = conv2(imgdiffgray, circkern,'same');
    % find the location where the kernel fit the best
    [circx,circy] = find(circfind == max(max(circfind)));
    circxy = [circx,circy];

So how does it do? Let me show you:

Yep, that's a red cross on top of the ball. It found it, and I would argue quite accurately. Here's a zoomed in look.

Yep, that's pretty accurate. Was it a fluke? I ran it with three more images. Here are the results, zoomed in on where it placed the red cross.

Those red crosses are all on the ball.

I left this session feeling pretty good about how I was doing. In fact, I still feel pretty good about how I did. But, looking back on it later, I have a few concerns with this approach, even though it worked on all four of the frames I tried it on.

This required an initial background-only image. I'm not sure if that is realistic or not. For now I won't worry about it.
My image subtraction approach looks only for where a particular color has increased in intensity. If something gets darker, it is excluded from the difference (it actually gets a negative value in the raw matrix, and matlab displays this as black). Perhaps an absolute value subtraction would be more appropriate?
This means that my method is very dependent on the contrast between the ball and the background. If I had a pale background -- say that light gray wall -- the contrast would be lower, and it would stand out less. It's possible that my circle-finding kernel wouldn't choose it as the strongest match.
My circle kernel is a predefined radius. Admittedly, I looked at an image or two, and decided it was about 12 pixels in radius. This varies depending on how far the ball is from the camera, and all these test images have the ball a fairly similar distance. (On the plus side, we might be able to use the radius to approximate its distance from the camera!)
Motion blur is clearly visible in some of these test images. The ball ceases to be circular, and instead turns into a round-ended rectangle (the path a circle sweeps as it moves). This is most obvious in the last image, where the ball must have had its greatest velocity. It still found the ball in this image, but I suspect it was less certain. (On the plus side, we might be able to use motion blur to estimate the velocity of the ball!)

Epilogue: I have run this algorithm on another set of photos, and it wasn't so successful. Those images had more movement (an actual whole person) and had patches of bright sunshine in the background. The first findBall I tried identified my forehead as the most likely ball in the image.

Robot Ping Pong

Wednesday, January 23, 2013

First attempt

No comments:

Post a Comment

About Me