Cinematic Frame Generation with Diffusion Models

In this project, we evaluated various approaches to finetuning and other strategies to create more "cinematic" images from AI generated images. We will leave the term "cinematic" to be a subjective term, but generally this would translate to more dramatic lighting, grandiose landscapes, and lens flare effects. We evaluated the following strategies:

Base stable diffusion model
LoRA Finetuned model
Textual Inversion finetuned model to encode "cinematic style" token
Dreambooth finetuned model to specify cinematic style
ControlNet with prompting

We then evaluated two techniques to take an existing video, and make it more cinematic:

ControlNet style transfer per frame
Control-a-video paper, pretrained

An example of the outputs of the different models can be seen below:

Dataset

Since there is no specific dataset that has cinematic photos/videos and also corresponding captions, we constructed our own dataset of around 50 cinematic images and associated captions. The dataset contained a combination of images found on the internet and images generated by DALLE-3, and consisted of shots of people, objects, and landscapes, with features such as dramatic backdrops and rays of light, creating a cinematic effect. We also manually created captions for the images. The cinematic dataset is in the images directory. Some examples of images can be found below:

Running

Setup

Setup a conda environment (we used Python 3.9) and run

pip install -r requirements.txt

And then run the following command to setup accelerate

accelerate config

Finetuning

= All finetuning scripts can be found in the finetune directory. Simply run any of the .sh scripts in order to finetune, for example:

cd finetune
./textual_inversion.sh

If you wish to create your own finetune dataset, feel free to copy the format found in the images directory, and update the training script to point to your dataset.

Evaluation

All evalution scripts can be found in the inference directory. Run any of these python scripts to generate a grid of images that compares that model to the base Stable Diffusion. For example:

cd inference
python inference_dreambooth.py

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
control-a-video		control-a-video
finetune		finetune
images		images
inference		inference
outputs		outputs
.gitignore		.gitignore
CS_236_Final_Project_Report.pdf		CS_236_Final_Project_Report.pdf
README.md		README.md
output_cheetah.mp4		output_cheetah.mp4
output_plot.png		output_plot.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cinematic Frame Generation with Diffusion Models

Dataset

Running

Setup

Finetuning

Evaluation

About

Releases

Packages

Contributors 2

Languages

rmalde/236-cinematic-diffusion

Folders and files

Latest commit

History

Repository files navigation

Cinematic Frame Generation with Diffusion Models

Dataset

Running

Setup

Finetuning

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages