OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang
Learning and Vision Lab, National University of Singapore
OminiControl is a minimal yet powerful universal control framework for Diffusion Transformer models like FLUX.
-
Universal Control 🌐: A unified control framework that supports both subject-driven control and spatial control (such as edge-guided and in-painting generation).
-
Minimal Design 🚀: Injects control signals while preserving original model structure. Only introduces 0.1% additional parameters to the base model.
- Environment setup
conda create -n omini python=3.10
conda activate omini
- Requirements installation
pip install -r requirements.txt
- Subject-driven generation:
examples/subject.ipynb
- In-painting:
examples/inpainting.ipynb
- Canny edge to image, depth to image, colorization, deblurring:
examples/spatial.ipynb
To run the Gradio app for subject-driven generation:
python -m src.gradio.gradio_app
- Input images are automatically center-cropped and resized to 512x512 resolution.
- When writing prompts, refer to the subject using phrases like
this item
,the object
, orit
. e.g.- A close up view of this item. It is placed on a wooden table.
- A young lady is wearing this shirt.
- The model primarily works with objects rather than human subjects currently, due to the absence of human data in training.
Demos (Left: condition image; Right: generated image)
Text Prompts
- Prompt1: A close up view of this item. It is placed on a wooden table. The background is a dark room, the TV is on, and the screen is showing a cooking show. With text on the screen that reads 'Omini Control!.'
- Prompt2: A film style shot. On the moon, this item drives across the moon surface. A flag on it reads 'Omini'. The background is that Earth looms large in the foreground.
- Prompt3: In a Bauhaus style room, this item is placed on a shiny glass table, with a vase of flowers next to it. In the afternoon sun, the shadows of the blinds are cast on the wall.
- Prompt4: "On the beach, a lady sits under a beach umbrella with 'Omini' written on it. She's wearing this shirt and has a big smile on her face, with her surfboard hehind her. The sun is setting in the background. The sky is a beautiful shade of orange and purple."
- Image Inpainting (Left: original image; Center: masked image; Right: filled image)
- Prompt: The Mona Lisa is wearing a white VR headset with 'Omini' written on it.
- Prompt: A yellow book with the word 'OMINI' in large font on the cover. The text 'for FLUX' appears at the bottom.
-
Other spatially aligned tasks (Canny edge to image, depth to image, colorization, deblurring)
Subject-driven control:
Model | Base model | Description | Resolution |
---|---|---|---|
experimental / subject |
FLUX.1-schnell | The model used in the paper. | (512, 512) |
omini / subject_512 |
FLUX.1-schnell | The model has been fine-tuned on a larger dataset. | (512, 512) |
omini / subject_1024 |
FLUX.1-schnell | The model has been fine-tuned on a larger dataset and accommodates higher resolution. (To be released) | (1024, 1024) |
Spatial aligned control:
Model | Base model | Description | Resolution |
---|---|---|---|
experimental / <task_name> |
FLUX.1 | Canny edge to image, depth to image, colorization, deblurring, in-painting | (512, 512) |
experimental / <task_name>_1024 |
FLUX.1 | Supports higher resolution.(To be released) | (1024, 1024) |
- ComfyUI-Diffusers-OminiControl - ComfyUI integration by @Macoron
- The model's subject-driven generation primarily works with objects rather than human subjects due to the absence of human data in training.
- The subject-driven generation model may not work well with
FLUX.1-dev
. - The released model currently only supports the resolution of 512x512.
- Release the model for higher resolution (1024x1024).
- Release the training code.
@article{
tan2024omini,
title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
author={Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang},
journal={arXiv preprint arXiv:2411.15098},
year={2024}
}