Robot Ping Pong: March 2013

Sunday, March 24, 2013

Triangulation the unglamorous way

After struggling for weeks to get OpenCV to perform the triangulation for me, I've weakened my usually-high academic integrity and have done something gritty and practical.

Images

Well, let me back up. First I took some new images of the ping pong table. These images are a higher resolution of 640x480, in case the imprecision of pixel coordinates was part of my problem. They also include 6 new real-world points to build the correspondence from. They also use (approximately) parallel gaze directions for the two cameras, and keep the two cameras close together, meaning that the left and right images are fairly similar to each other. Here are the images I'm using now.

You can see I've added the ping pong net to the half-table, put some cans with orange tops on the table surface, and marked out the spots on the floor below the front two corners of the table. Those are my six new points. This was motivated by a fear that my previous eight points included two that were collinear. Based on my (slow and painful) reading of the text books I bought, I got the impression that collinear points don't add to accuracy. And six points is insufficient for some algorithms to solve for everything.

I also improved the accuracy of my manual pixel-marking tool. It is still not able to provide sub-pixel accuracy, but it does really choose the right pixel. Before I was just taking my first mouse click as close enough, but the cursor was often a pixel or two off from where I had intended to select. Now I can follow up the initial click with single-pixel movements using the arrow keys until I have the right pixel marked. Sub-pixel accuracy is theoretically possible by finding the intersection of lines, like the table edges, but I haven't gone that far yet.

Results

How does it work when run through the same program? About as well as the old images. Here are the rectified images with points.

But does the triangulation work? Nope. I still get something that doesn't match the real-world coordinates of my table.

However, there is something new. Remember how I said in past blog entries that the point cloud from the triangulation wasn't even in the shape of a table? Now it is. It's the correct shape, and apparently the correct scale, but it is rotated and translated from the real-world coordinates. Here is a 3D scatter plot from Matlab of the triangulation output.

Hard to make any sense of it, right? What about this one?

I hope that is easier to see the shape of the table. All I did was rotate the view in the Matlab plot. While I did this rotation manually the first time, I found a way to solve for the best rotation and translation to bring the points into the correct orientation. I found the algorithm (and even some Matlab code!) from this guy. And you know what? It works! I can get a rotation matrix, a translation vector, and applying them to the triangulated points, I get something very close to the true real-world 3D coordinates of the table.

Back to OpenCV

I searched high and low to find that rotation matrix in the many outputs from OpenCV. No luck. I figured it might be that OpenCV's triangulation gives me answers with the camera at the origin (instead of my real-world origin at the near-left corner of the table). But the rotations from solvePnP don't seem to work. I experimented with handedness and the choice of axes. That didn't seem to work. Basically nothing works. I would be grateful if anyone reading this could leave a comment telling me where I can get the correct rotation to apply! Or, for that matter, why it needs a rotation in the first place!

After many days of frustration, this morning I gave up. You know what? If I can solve for the correct rotation/translation in Matlab, why can't I do it in C++? So that's what I did. I implemented the same algorithm in C++, so that I can apply it directly to OpenCV's output from triangulation. And it works too. It's unglamorous, having to solve to find it, when it should be readily available, but it gets the job done.

Now that I have good triangulated points, I can see how accurate the method is. I calculated the root-mean-squared distance between the true point (as measured from the scene and table dimensions) and the triangulated point. I get something around 12mm. So in this setup, I would expect to be able to turn accurate ball-centers in each image into a 3D ball location to within 12mm. That sounds pretty good to me.

What's Next?

I feel a great sense of relief that I can triangulate the table, because I've been stuck on this for so long. I can't say that I'm delighted with how I did it, but at least I can move on, and maybe come back to solve this problem the right way another time.

Next, I need to return to video, from this detour into still images. I need to drop back to 320x240 images, and get a ping-pong ball bouncing. But I'm going to keep the new correspondence points (the net, the corners on the floor, and even the cans). I will experiment with having the cameras further apart and not having parallel gaze. Mr. W insists that this will result in better triangulation. I get his point -- it's a crappy triangle if two corners are in the same place -- but I need to make sure that all the OpenCV manipulation works just as well.

Sunday, March 17, 2013

Small progress in triangulation

That last post gave me new emotional strength to approach the problem again. The effort actually paid off, with a partial solution to the problems introduced in my last post.

I can now rectify the images without them looking all weird. Here is the fixed version of the rectified images, side-by-side.

What was wrong? Well, like I suspected, it was a small error. Two small errors, actually, in how I was using the stereoRectify function. First, I was using the flag CV_CALIB_ZERO_DISPARITY. That's the default, so I figured it made sense. Nope. I cleared that flag and things got better. Second, I was specifying an alpha of 1.0. The intent of the alpha parameter is to decide how much black filler you see versus how many good pixels you crop. My answer of 1.0 was intended to keep all the good pixels and allow as much filler as necessary to get that done. I think that was causing the zoomed-out look of the rectification. I changed my answer there to -1 -- which is the default alpha -- and things got better. So I feel pretty good about grinding away until it worked.

I went a little further, and I also found out how to rectify points within the images. That has allowed me to map the table landmark points into the rectified images. You'd think that would be easy... and, in the end, it was. But I did it the hard way first. You see, the OpenCV functions to rectify the image (initUndistortRectifyMap and remap) actually work backwards: for each pixel in the rectified image, they calculate which pixel in the unrectified image to use. Whereas I now want to take specific pixels in the unrectified image, and find out what pixels those would be in the rectified image. That's opposite direction, and when your grasp on the math behind these functions is tenuous, it takes a while to reverse it. However, after solving it on my own, I discovered that the undistortPoints function has some optional arguments that also allow you to rectify the points at the same time. Anyway, those points are circles in these two rectified images:

Despite this progress, I still cannot triangulate. I assumed that fixing the rectification would also fix the triangulation, but this hasn't happened. In fact, my triangulation answers are unaltered by the fixes made in the rectification.

Even further, I also recreated the triangulation results using a different approach, to get the same (incorrect) answers. This time I used the disparity-to-depth "Q" matrix that stereoRectify produces, and feed it through perspectiveTransform. The answers are within a few mm of the triangulatePoints answers.

So, what's left to try? I have a suspicion that a mixture of left and right handed coordinates are to blame. So I'm going to try to push on that for a while, to see if it leads anywhere. My grasp of left and right handedness is flimsy and I have to keep referring to the wikipedia page.

After that, I'm buying at least one book on the math and logic that underlies all this 2D/3D vision stuff. I probably should have done that a month ago. I'm going to start with Hartley and Zisserman's "Multiple View Geometry in Computer Vision" which is apparently the bible of 3D vision, and I'll go from there.

Thursday, March 14, 2013

Why can't I triangulate?

EDIT: Some progress has been made. See my next post.

I've given up trying to reach concrete results before presenting them here. That is obviously leading to a lack of blog posts. So, instead, here is the point at which I am stuck.

I've been trying to use OpenCV to triangulate stuff from my scene using the left and right images from my two PS3 Eye cameras. I've been using the image of the ping pong table to calibrate the exact locations and angles of the cameras with respect to the table, as I would like all my coordinates to be relative to the table for easy comprehension. But it just isn't working. So let me walk you through the steps.

I have a video I've taken of my half-table. The cameras are above the table, about 50cm apart, looking down the center line of the half-table. I have about 45 seconds of just the table that I intend to use for priming background subtraction. Then I have about 10 seconds of me gently bouncing 6 balls across the table.

Landmarks

I've taken a single still image from each camera to use in determining the position of the cameras. Since neither the cameras nor the table are moving, there is no need for synchronization between the eyes. Using these two images, I have manually marked the pixel for a number of "landmarks" on the table: the six line intersections on its surface, plus where the front legs hit the ground. I did this manually because I'm not quite ready to tackle the full "Where's the table?" problem. Done manually, there should only be a pixel or two of error in marking the exact locations. I then measured the table (which does, indeed, match regulation specs) and its legs to get the real-world coordinates of these landmarks. Here are the two marked-up images. There are green circles around the landmarks.

Camera Calibration

I have calibrated the two cameras independently to get their effective field-of-view, optical center, and distortion coefficients. This uses OpenCV's pre-written program to find a known pattern of polka dots that you move about its field of view. I've had no trouble with that. The two cameras give similar calibration results, which makes sense since they probably were manufactured in the same place a few minutes apart.

Here are the images with the distortion of the individual cameras removed. They look pretty normal, but are slightly different that the originals. That's easiest to see at the edges where some of the pixels have been pushed outside the frame by the process. But the straight lines of the table are now actually straight lines.

Stereo Calibration

Using all this info (2d landmarks + camera matrix + distortion coefficients for each camera, and the 3d landmarks) I use OpenCV's stereoCalibrate function. This gives me a number of things, including the relative translation and rotation of the cameras -- where one camera is relative to the other. The angles are hard to interpret, but the translation seems to make sense -- it tells me the cameras are indeed about 50cm apart. So I felt pretty good about that result.

Epilines

With the stereo calibration done, I can draw an epiline image. The way I understand it, an epiline traces the line across one eye's view that represents a single point in the other eye's view. We should know that it worked if the epiline goes through the true matching point. Let's see them:

Amazingly all those lines are right. They all go through one of the landmarks. So it would seem that my stereo calibration has been successful. I don't think the epilines actually serve a purpose here, except to show that so far my answers are working.

Rectify

The next step in OpenCV's workflow is to rectify the images using stereoRectify. Rectifying rotates and distorts the images such that the vertical component of an object in each image is the same. E.g. a table corner that is 100 pixels from the top of the left image is also 100 pixels from the top of the right image. This step is valuable in understanding a 3D scene because it simplifies the correspondence problem: the task of identifying points in each image that correspond to each other. I don't even have that problem yet, since I have hand-marked my landmarks, but eventually this will prove useful. Plus it's another way to show that my progress so far is correct.

Here is the pair of rectified images. They are now a single image side-by-side, because they have to be lined up accurately in the vertical. The red boxes highlight the rectangular region where each eye has valid pixels (i.e. no black filler). The lines drawn across the images highlight the vertical coordinates matching.

This is where I start to get worried. Am I supposed to get this kind of result? I copied this code from a fairly cohesive and simple example in the documentation, but I end up with shrunken images, and that odd swirly ghost of the image around the edges. That looks pretty wrong to me, and doesn't look like the example images from the documentation. This is the example from the documentation, and it shows none of that swirly ghost. The silver lining is that the images are indeed rectified. Those horizontal lines do connect corresponding points in the two images with fairly good accuracy.

Triangulation

Next I try to triangulate some points. I am trying to triangulate the landmarks because since I know their true 3D positions, I can see if the answers are correct. In the future, I would want to triangulate the ball using this same method.

To triangulate, I use OpenCV's triangulatePoints method. That takes the 2D pixel coordinates of the landmarks, and the projection matrix from each eye. That projection matrix is an output of stereoRectify.

The answers simply don't work. After converting the answers back from homogeneous coordinates into 3D coordinates, they don't resemble the table they should represent. Not only are the values too large, but they don't recreate the shape of a table either. It's just a jumbled mess. So now I know that something went wrong. Here are the true points and the triangulation output (units are mm).

True 3D	Triangulated
(0,0,0)	(3658.03,-1506.81,-6335.75)
(762.5,0,0)	(2462.99,1025.58,4136.15)
(1525,0,0)	(2620.73,398.168,1480.21)
(0,1370,0)	(323.729,407.828,-1360.98)
(762.5,1370,0)	(-897.203,594.634,-2136.74)
(1525,1370,0)	(-7611.69,1850.22,-6986.95)
(298.5,203.2,-746)	(-137.791,-5735.79,-7016.07)
(1226.5,203.2,-746)	(5328.58,4257.4,5172.84)

What now?

This is very frustrating because my error is undoubtedly small. Probably something like a transposed matrix, or switching left for right, etc. Someone who knew what they were doing could fix it in a minute. But there is a lack of support for OpenCV, since it is an open source project, and I've been unable to attract any help on their forums.

Since the epilines worked, I believe my error must be in the last two steps: rectifying or triangulating. That's frustrating because the intermediate results that I get are too cryptic for me to make use of, so I feel like it's either all-or-nothing with OpenCV. And either way, this task is now harder.

I've been banging my head against this roadblock off-and-on for a few weeks now, and nothing good is coming of it. And that is why I haven't been posting. No progress, no joy, no posts.

Wednesday, March 6, 2013

Two PS3 Eyes

I know it's been a long time since my last post. You would be forgiven for thinking that this project had died its predicted death. But you'd be wrong. If anything, I've been working harder on the project since my last post. I haven't written because I've been working so hard, and because I wanted to have something concrete to show you. Well, I don't have anything concrete, but I owe an update anyway.

Cameras

The biggest development was that I bought two cameras. While I had been doing lots of research into very expensive cameras that could provide 1MP resolutions at greater than 100fps, I was convinced go a different way (by an extended family member who has been getting involved -- that's right, a second fool is involved in this project! -- who I'll call Mr. W because I like privacy) So I bought two Playstation Eye cameras. As the name would suggest, they are intended to be used with a Playstation, but they use the ubiquitous USB 2.0 interface, and the open source community has developed drivers for Linux (and other platforms). They are almost like a regular webcam. Their first advantage is that they can output 125fps if you accept a resolution of only 320x240 (or 60fps at 640x480). Their second advantage is that they are cheap -- just $22 from Amazon. So I was convinced that there was nothing to lose in trying them out.

It was a good idea. While I'm not sure that this 320x240 resolution will be sufficient in the end, I am learning a lot without having to pay for expensive cameras yet. And it's possible that 320x240 will be enough. Mr. W argues that with 125 fps, there will be enough observations of the ball for the ambiguity introduced by the big pixels to be averaged out, leading to an accurate path prediction.

Do the cameras work? Yep. I managed to get them working with guvcview, a Linux webcam program. That software can select the resolution and frame rate and can make snapshots and video recordings. If I run two instances of guvcview, I can run both cameras at the same time. There are some difficulties: if I leave the preview windows for the two cameras live on my desktop while recording, the load on my poor laptop prevents it from processing all the frames. But minimizing those preview windows solves the problem. I also learned that guvcview needs to be restarted every time you change resolution, frame rate, or output format. The software doesn't suggest that this is necessary, but I couldn't get it to take effect without restarting the program. Once you know that, it's no problem.

I even got them to work with OpenCV directly with their calibration program. However, for the most part, it has been easier for my current work to just record two video files and work from those.

Camera Synchronization

One of the downsides of these cameras is that there is no synchronization of the frames between the two eyes. They take 125 frames per second, but that means they could be offset from each other as much as 4ms (i.e. half of 1000ms/125). So far I haven't found a sure way to determine the offset. Mr. W believes that once you know the offset, you can just interpolate the latest frame with its predecessor to match up with the latest from from the opposite eye. Maybe. Sounds pretty noisy to me, and we're already starting with a lack of accuracy from our low resolution.

Even that requires knowing the offset between the cameras to calculate the interpolation. It's possible we could do that in software -- like maybe I can get the time the frame arrived at the computer. So far I've only seen "pull" commands to get the most recent frame, which is not conducive to knowing the time that frame arrived. I fear that would mean hacking the driver. Or it's possible we could do that with hardware -- like a sync-calibration thingy that moves at a steady speed against a yard stick. I can imagine a motor spinning a clock hand at a high speed. As long as it moves at a constant speed around the clock face, we could use the hour markings to measure its progress (which might mean making the clock hand point in both directions to negate gravity during the spinning). But it wold have to be faster than a second hand. Ideally, I think it would pass a marking every 8ms or less... so that's 625 rpm instead of 1 rpm.

Actually, there is another way, if I want to get fancy. There are some instructions online for how to hack the electronics to make one camera drive the frame rate of the other camera. It might be easy. But more likely it will end badly. For instance, it requires some very fine soldering skills, and we've seen how my soldering is sub-optimal in a previous post.

Accessories

I bought two cheap tripods to stick these cameras onto. However the cameras aren't designed for tripods, so don't have the normal mounting hole. I've been taping them to the tripod, which is working well enough. (Side note: these tripods are horrible. They look nice, are tall, sturdy, and made of light aluminum. But the adjustment screws leave way too much play after they are tightened, making them useless for preserving the orientation of the camera between sessions. But they're good enough to hold the camera off the ground.

Having introduced these cameras, I'll save my tales of woe for another post. There is indeed more to say here, and there is some minor progress on the building-the-robot front as well.