In this project, we evaluated various approaches to finetuning and other strategies to create more "cinematic" images from AI generated images. We will leave the term "cinematic" to be a subjective term, but generally this would translate to more dramatic lighting, grandiose landscapes, and lens flare effects. We evaluated the following strategies:
- Base stable diffusion model
- LoRA Finetuned model
- Textual Inversion finetuned model to encode "cinematic style" token
- Dreambooth finetuned model to specify cinematic style
- ControlNet with prompting
We then evaluated two techniques to take an existing video, and make it more cinematic:
- ControlNet style transfer per frame
- Control-a-video paper, pretrained
An example of the outputs of the different models can be seen below:
Since there is no specific dataset that has cinematic photos/videos and also corresponding captions, we constructed our own dataset of around 50 cinematic images and associated captions. The dataset contained a combination of images found on the internet and images generated by DALLE-3, and consisted of shots of people, objects, and landscapes, with features such as dramatic backdrops and rays of light, creating a cinematic effect. We also manually created captions for the images. The cinematic dataset is in the images
directory. Some examples of images can be found below:
Setup a conda environment (we used Python 3.9) and run
pip install -r requirements.txt
And then run the following command to setup accelerate
accelerate config
=
All finetuning scripts can be found in the finetune
directory. Simply run any of the .sh
scripts in order to finetune, for example:
cd finetune
./textual_inversion.sh
If you wish to create your own finetune dataset, feel free to copy the format found in the images
directory, and update the training script to point to your dataset.
All evalution scripts can be found in the inference
directory. Run any of these python scripts to generate a grid of images that compares that model to the base Stable Diffusion. For example:
cd inference
python inference_dreambooth.py