Skip to content

Latest commit

 

History

History
54 lines (33 loc) · 2.58 KB

README.md

File metadata and controls

54 lines (33 loc) · 2.58 KB

Motion-Tracking

This project uses convolutional neural networks to track an object through sequential video frames. It is inspired by Recent Advances in Offline Object Tracking. We aimed to recreate and improve on these recent advancements.

Our work process included collecting data from the ALOV300++ and ILSVRC2014 datasets, augmenting the data through random croppings, and building multiple convolutional neural networks.

Preliminary Findings

We have encouraging early results. Below are 10 randomly selected pairs of starting and ending frames (e.g. one frame after another). The starting frames on the left have the bounding box originally given as an input (green). The ending frames on the right have the ground truth bounding box (green) and the bounding box predicted by our model (red).

Vid_1 Start Vid_1 End

Vid 2 Start Vid 2 End

Vid 3 Start Vid 3 End

Vid 4 Start Vid 4 End

Vid 5 Start Vid 5 End

Vid 6 Start Vid 6 End

Vid 7 Start Vid 7 End

Vid 8 Start Vid 8 End

Vid 9 Start Vid 9 End

Vid 10 Start Vid 10 End

Error Analysis

In plotting the actual versus predicted coordinates below for a random sample of 500 images, we can get a sense of how our network is learning. At the top-left we have x0, top right y0, bottom-left x1, and bottom-right x1. These correspond to the upper left corner (x0, y0) and bottom right corner (x1, y1) of the bounding. The kernal density estimates below show that we are on average predicting fairly well (as seems to also be indicated by the images above), but still have some variabilty in how well those predictions are lining up to the ground truth.

Error X Dimension Error Y Dimension

Error X Dimension Error Y Dimension

Moving forward, we hope to continue improving the object tracker through alternative architectures and larger networks.