Learned frame prediction for video codecs

Standard video codecs use a “motion compensation” module to predict the next frame. This way, rather than encoding video frames in their entirety, they only encode motion vectors that map blocks from the previous frame, alongside the residual image which is rather sparse.

In this project, I replace the motion compensation module with a deep neural network (DNN), eliminating the use of motion vectors. This DNN can predict the entire frame in a single forward-pass, without relying on blocks or motion prediction. This approach outperforms the well-established x264 video codec.

I furthermore show that by incorporating adversarial learning, it is possible to obtain more realistic video prediction results, at the cost of higher PSNR. The code and the papers are available (Sulun & Tekalp, 2021), (Sulun, 2018). Supplementary material is presented here:

 

Supplementary material

Here are the qualitative results for our learned video prediction models. In particular, we compare the performance of two models, one trained with L2 loss (L2) and the other with adversarial loss and L2 loss combined (GAN). All results belong to the 9th frames of each video. The slideshow provides an easy way to compare images, enabling navigation using the arrow keys of the keyboard. Below the slideshow, ground-truth and prediction images are available at their original size.



References

  1. lfp_square.png
    Can Learned Frame Prediction Compete with Block Motion Compensation for Video Coding?
    Serkan Sulun and A. Murat Tekalp
    Signal, Image and Video Processing, 2021
  2. thesis_square.png
    Deep Learned Frame Prediction for Video Compression
    Serkan Sulun
    Koc University, 2018
    Master’s Thesis