Skip to content

bytedance/LatentSync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

arXiv

📖 Introduction

We present LatentSync, an end-to-end lip sync framework based on audio conditioned latent diffusion models without any intermediate motion representation, diverging from previous diffusion-based lip sync methods based on pixel space diffusion or two-stage generation. Our framework can leverage the powerful capabilities of Stable Diffusion to directly model complex audio-visual correlations.

Demo

Original video Lip-synced video
demo1_video.mp4
demo1_output.mp4
demo2_video.mp4
demo2_output.mp4
demo3_video.mp4
demo3_output.mp4
demo4_video.mp4
demo4_output.mp4
demo5_video.mp4
demo5_output.mp4

(Photorealistic videos are filmed by contracted models, and anime videos are from VASA-1 and EMO)

📑 Open-Source Plan

  • Inference code and checkpoints
  • Data processing pipeline
  • Training code

About

Taming Stable Diffusion for Lip Sync!

Topics

Resources

License

Stars

Watchers

Forks