Train model to return location in image

I want to take a small portion of a still frame from a video, then use that small image as input to a model to track the location that the image was taken from in subsequent video frames, so I can perform motion tracking on a video How would I train a model to perform this task?

edit: I should point out I’m trying to also capture relative scale and rotation in relation to the input image.

Hi there. Unfortunately, I don’t think the algorithm you outlined above is going to achieve good results. To track objects in videos across multiple frames, it is best to train a general object detection model that works on still images, then implement tracking logic yourself in the post processing. Here is a guide on how to implement a simply object tracking system:

Thanks for the feedback. I’m trying to figure out how SnapChat might have implemented a feature where you can pin stickers to a location in a frame of a video, and they somehow keep track of that location even if it goes out of frame and later comes back. I figured it might be machine learning, but I could be wrong.

That feature was most likely implemented with a more generic technique called “feature matching”. You use an algorithm (sometimes this is ML, sometimes not) to identify salient features in an image and then match them between frames.