Frame interpolation is the process of synthesizing in-between images from a given set of images. The technique is often used for temporal up-sampling to increase the refresh rate of videos or to create slow motion effects. Nowadays, with digital cameras and smartphones, we often take several photos within a few seconds to capture the best picture. Interpolating between these “near-duplicate” photos can lead to engaging videos that reveal scene motion, often delivering an even more pleasing sense of the moment than the original photos.
In "FILM: Frame Interpolation for Large Motion", published at ECCV 2022, a method to create high quality slow-motion videos from near-duplicate photos is presented. FILM is a new neural network architecture that achieves state-of-the-art results in large motion, while also handling smaller motions well.
The FILM model takes two images as input and outputs a middle image. At inference time, the model is recursively invoked to output in-between images. FILM has three components:
- Feature extractor that summarizes each input image with deep multi-scale (pyramid) features;
- Bi-directional motion estimator that computes pixel-wise motion (i.e., flows) at each pyramid level;
- Fusion module that outputs the final interpolated image.
FILM is trained on regular video frame triplets, with the middle frame serving as the ground-truth for supervision.
In this tutorial, we will use TensorFlow Hub as a model source.
- Prerequisites
- Prepare images
- Load the model
- Infer the model
- Single middle frame interpolation
- Recursive frame generation
- Convert the model to OpenVINO IR
- Inference
- Select inference device
- Single middle frame interpolation
- Recursive frame generation
- Interactive inference
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.