Robot Ping Pong: OpenCV

After my first attempt at doing some vision processing in Matlab, I concluded there must be a better way. I don't want to rule out Matlab entirely. It seems that Matlab can be taken seriously for vision applications. But that would require buying the Toolbox that does that sort of stuff, and I'm not willing to put down the money on software that I might not need. I worry that anything I need to implement myself (as opposed to being built-in to the Toolbox) will be inefficient. Matlab is really only efficient for matrix operations.

OpenCV

It didn't take long to find OpenCV, which is an open source machine vision library. It implements many algorithms common in vision applications, and provides some of the framework to make C++ closer to the simplicity of Matlab. Since it does much of what the Matlab Toolbox does, and will allow me to write efficient custom implementations in C++, I think it provides more room to grow.

OpenCV was a bit of a pain to build from source. Some dependencies also had to be built from source, as the versions offered on my RHEL 6 machine were too old. It's been a week or two since I did the install, so I'm afraid I can't recount any of the details. In the end I got it installed and it seems to be working.

OpenCV also comes with Python bindings, which makes it more fun to do quick exploration work. I've been going back and forth between the two, depending on what I'm doing.

Background Subtraction

So, what can I do with OpenCV? I decided to stick with background subtraction for now. OpenCV has a more complicated approach to background subtraction, and that's probably a good thing since my quick Matlab approach had flaws.

There is more than one background subtraction algorithm in OpenCV, but I've chosen to use the one that seems the most popular and/or newest. That's BackgroundSubtractorMOG2. My vague understanding is that this method builds up a history of each pixel's color, and builds a statistical distribution around that history. Then, when you ask it to decide if a new value for that pixel is foreground or background, it compares the new value to the distribution. If it is too different from the historical distribution, it is flagged as being foreground. The MOG part is referring to a "mixture of gaussians", which means that each pixel's historical distribution is from one of a number of normal distributions. That's intended to capture different valid background states of the pixel. For example a pixel might sometimes have a leaf in it, and sometimes the leaf might have shifted out of the way, revealing the wall behind. Both of those should count as background, even though they are vastly different colors. Obviously I will need to do more reading if I want to understand it. It's available in this paper: Improved Adaptive Guassian Mixture Model for Background Subtraction.

MOG2 also has a built-in shadow identification. Again, I don't know the details, but it flags shadows as not being foreground, and it decides something is a shadow if it is an appropriately dimmer version of the same color.

This algorithm requires many more frames of input in order to decide what the background looks like, so I've had to feed it each frame to set up the algorithm. The following code does all this, and displays/saves the foreground image every 100 frames.

#include <iostream>
#include <cstdio>
#include <opencv2/opencv.hpp>

using namespace std;

void DisplayProgress(cv::Mat& img, cv::Mat& background, cv::Mat& foregroundMask, int frameindex)
{
 cv::imshow("Original", img);
 cv::imshow("Background", background);
 cv::imshow("Foreground Mask", foregroundMask);
 // process the foreground further to remove noise and shadows, etc
 // shadows are masked with value 127
 cv::Mat noShadowForeMask = foregroundMask & (foregroundMask != 127);
 cv::Mat smoothForeMask;
 cv::GaussianBlur(noShadowForeMask, smoothForeMask, cv::Size(11,11), 4.0, 4.0);
 cv::imshow("Foreground Blurred", smoothForeMask);
 cv::Mat binarySmoothForeMask = (smoothForeMask > 64);
 cv::imshow("Foreground Blurred Binary", binarySmoothForeMask);
 // extract the foreground picture
 cv::Mat forePic;
 img.copyTo(forePic, binarySmoothForeMask);
 cv::imshow("Fore Picture", forePic);
 // save the foreground
 const char* filenameFormat = "/home/me/src/ping/out%03d.png";
 char namebuff[256];
 sprintf(namebuff, filenameFormat, frameindex);
 cv::imwrite(namebuff, forePic);
 // wait for user to hit a key before continuing
 cv::waitKey(-1);
 cv::destroyAllWindows();
}

bool LoadImage(const char* filenameFormat, int frameIndex, cv::Mat& fillMeWithImage)
{
 char filename[256];
 sprintf(filename, filenameFormat, frameIndex);
 fillMeWithImage = cv::imread(filename);
 return (fillMeWithImage.data != NULL);
}

int main(int argc, char** argv)
{
 cv::Mat img;
 cv::Mat foreground;
 cv::Mat background;

 cv::BackgroundSubtractorMOG2 bgSub(200, 10.0, true);

 const char* filenameFormat = "/home/me/src/ping/movie1png/movie1-%03d.png";
 for (int frameindex = 1; /*infinite*/; ++frameindex)
 {
  if (!LoadImage(filenameFormat, frameindex, img))
  {
   cout << "Can't find frame " << frameindex << " so assuming we reached end of movie" << endl;
   // display last progress at last image in the movie
   // re-read the last image that existed
   --frameindex;
   LoadImage(filenameFormat, frameindex, img);
   bgSub.getBackgroundImage(background);
   DisplayProgress(img, background, foreground, frameindex);
   break;
  }
  // learn the new image
  bgSub(img, foreground);
  cout << "Added frame " << frameindex << " to background subtraction processor" << endl;
  // display progress occassionally (every 1.66 seconds at 60 fps)
  if (frameindex % 100 == 0)
  {
   bgSub.getBackgroundImage(background);
   DisplayProgress(img, background, foreground, frameindex);
  }
 }

 cout << "Done" << endl;
 cv::destroyAllWindows();
 return 0;
}

I'm doing a little extra processing on the foreground decision it makes. I ignore shadows as not being foreground (thank you built-in functionality). I then blur the true/false mask, and then make it true/false again. Effectively I'm looking for pixels whose neighbors are foreground, but that weren't foreground themselves. This nicely prevents lone pixels in the middle of a foreground blob from being excluded unfairly. It also expands the region marked as foreground, which might be a bad thing.
For convenience of comparing these results to Matlab, I've run it on the same movie I used in that blog post, and manually grabbed the foreground pictures from the same frames I used in that post (instead of looking at each 100'th frame). And here they are, in the same order they appear in the Matlab blog entry (stupid me: not in chronological order).

Of course, this is just the background subtraction, and ignores the findBall aspect. No red crosses in these images. But the subtraction seems to be fairly good, and the extra work I've done (blurring/thresholding) has removed the noise of the table and bush shimmering. It doesn't have some of the drawbacks of my Matlab method, like looking only at increases in intensity.

It still would suffer from requiring good contrast with the background -- I've run it on my second video and had contrast problems. In fact, here is a raw image from a frame in that other video. Where's the ball? Even a human would have a hard time finding it without the context of where the ball was last frame. Yep, it's that little slightly-lighter-gray smudge in the middle. So contrast is still a problem.

Efficiency

Running background subtraction in OpenCV has not proven to be very efficient. With the ~1MP images I'm working with, it was taking much more than 1/60th of a second to process a frame. I'm estimating it was processing about 25 frames per second. This would mean that it can't be run in real time without improvements.

I have some ideas there. The first is using Region of Interest ("ROI") capabilities of a camera, or even on the software side. If you know the thing you are interested in (i.e. the ball) is in a particular region of the picture, only process that region. If ROI is implemented on the camera, that means the camera will only send pixels in that region (which also saves on bandwidth!). I imagine I can find the table and the ball once, then in each incremental frame I could have a very good guess about where to look for the ball. I might only need to process the whole image again if I lose track of the ball.

The second idea is to run the update on the background model (the mixture of gaussians) only periodically. Or, better yet, run for a large number of frames with background only. That would parameterize it to recognize the background (leaf, not leaf, etc) without getting confused by a stationary-but-foreground object, like the opponent's body. Then running the model on live action frames might be faster without the update to the model. The OpenCV code doesn't seem organized in a way that makes that possible, but that's what's good about open source: I can change it.

Conclusion

I conclude that OpenCV is going to be a great tool, but at the same time BackgroundSubtractorMOG2 may or may not be appropriate for me. It is compute intensive to the point that I can't run this way at full speed. My results in Matlab were almost as good for background subtraction (and could easily be reimplemented in OpenCV).

So in this update I made progress only in that I was able to use my new tool: OpenCV. But at least it is progress.

3 comments:

AnonymousApril 6, 2013 at 10:02 AM
hi,
i have made some project using BackgroundSubtractorMOG2 data class.
But, i haven't understood completely the theory behind this algorithm.
Can you please explain this algorithm a little more.
Reply as soon as possible.

JBApril 23, 2013 at 11:48 PM
Sorry for the delay in responding.

I'm certainly not an expert. Your best bet is to follow the link I provided in the blog entry to the paper referenced by the OpenCV documentation:
http://members.tele2.nl/palindromoroz/Publications/zivkovic2004ICPR.pdf

Big picture, here is my understanding. Each pixel is treated independently. A model is built, using the history of that pixel's values. The model tries to fit the values using a mixture of Gaussians (i.e. many normal distributions with different means and standard deviations added together in a weighted sum). New pixel values are compared to the model to arrive at a probability that the new value came from those normal distributions. If the probability exceeds some threshold, the pixel is marked as background. If it is too unlikely to come from those normal distributions, it is marked as foreground. The way the code implements this method, the model is updated after every frame to incorporate the new information.

Be nice, remember I'm an amateur, but by all means please give me feedback!

Robot Ping Pong

Tuesday, January 29, 2013

OpenCV

3 comments:

About Me