As my Google Summer of Code 2012 project, I have to port OpenTLD to python using OpenCV and SimpleCV. OpenTLD a.k.a. Predator was first made by Zdenek Kalal in MATLAB. OpenTLD. It is one of the most reliable algorithms to track objects. The algorithm include on-line training and learning.
The algorithm which consists of off-line training take days and lots of data to train and learn.
Here’s my brief understanding of how OpenTLD algorithm works.
As the name suggests, it consists of three main parts.
Adaptive Tracking is used in OpenTLD. A Median Flow Tracker is made using Lucas-Kanede Tracker with pyramids and with the help of Forward-Backward error, and focusing on 50% of the most reliable points.
As Zdenek Kalal quoted in his GoogleTech Talk about Predator
“Every tracker eventually fails and requires a detector.”
Classifiers are continuously trained from each and every frame. For every frame, Classifiers are evaluated. Errors are estimated via feedback. According to the feedback, classifiers are updated to detect more efficiently.
Ensemble classifier and 1NN classifier are used in detection.
The tracker learns using P-N learning (Positive-Negative) which learns an object classifier and labels all the patches as “object”(positive) and “background”(negative).It uses a tracker for providing positive and detector for negative training examples.
I have started working on OpenTLD for couple of weeks now. I have made a Median Flow tracker for “Tracking” part of OpenTLD. Here’s how the “Tracking” part works in OpenTLD.
This is how Median Flow Tracker works:
- Initialize points to a grid
- Track points between frame
- Estimate reliability of the points
- Filter out 50% of the outliers
- Estimate the new bounding box
Get Filled points in the Bounding Box
Points are tracked using Lucas-Kanede Tracker with pyramids.
To get reliable points, Forward-Backward error method is used. In FB method, points are tracked twice.
tracked points for current image -> previous image
tracked points for previous image -> current image
So, intersection of both of point sets would give me reliable tracked points.
50% of the points are filtered out using median filter. First the median is calculated for the vector of points, and most reliable points are chosen.
New bounding box is estimated based on all relative distance changes of all points to every point. The median of the relative value is used for calculation. Predict Bounding Box
P.S. Now working on Learning part.