Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model producing blank images with dataset of 128x128 or larger images #98

Open
dpmcgonigle opened this issue Jun 15, 2020 · 10 comments
Open

Comments

@dpmcgonigle
Copy link

dpmcgonigle commented Jun 15, 2020

Hello,

I have tried training several 256x256 datasets with the Glow model using hyperparameter configurations as similar to the CelebA configuration that your team used, and all I'm getting are blank images when trying to generate samples with any temperatures(standard deviation).

For instance, I have LSUN-Tower 256x256 training right now (~708K images), on epoch 800 (as epoch is defined by this code base; approaching an entire pass through the entire dataset). For this experiment I am using 6 Levels, with 32 Steps per level, learning rate 0.001 with local_batch_train of 2 images per GPU (using 4 GPUs), which is derived from n_batch 128 using the default anchor of 32, and I am using the affine coupling layer. There is no y-conditioning for the experiments I'm running (all of the same category), and I’m using n_bits_x of 8.

I have run several other experiments similar to this with other datasets, and additive coupling layer, for longer times, and all have produced blank image samples. It appears any dataset larger than 64x64 is giving me trouble. Is this something that your team has run into at all with larger images? I'm wondering if it will eventually "break out" of this issue if I let it run long enough, or if I need to tweak the hyperparameters.

Thank you very much for your time.

Sincerely,
Dan McGonigle

@guilherme-pombo
Copy link

Hello, I'm having a very similar problem to @dpmcgonigle , even when using the exact same setup as the original paper. Would be amazing to get some clarification as to the possible issues behind this problem. Btw, @dpmcgonigle have you been able to resolve this in the meanwhile?

@dpmcgonigle
Copy link
Author

@guilherme-pombo , this is something that we have not yet resolved. I should have noted that both the training loss and test loss continue to decrease even when sampling blank images during training, interestingly enough. I have noticed that with some of our experiments, there are all-blank sets of sample images for a number of "epochs" before actually "breaking out" and producing good images. So, I'm wondering if it is just a matter of time before "good images" are produced. One piece of evidence against this hypothesis I think is that the trained model producing all blank images is not invertible from Z -> X -> Z with a successive model.decode() then model.encode(), though it is invertible in the other direction X -> Z -> X with model.encode() on a real image, followed by a model.decode(). I am very curious if anyone else may have insight into this problem.

@dpmcgonigle
Copy link
Author

@guilherme-pombo, just wanted to let you know I've had limited success using:

  • L2 regularization
  • More homogenous datasets, where the images have some distinct structure

@Hzzone
Copy link

Hzzone commented Jan 1, 2021

I have successfully reproduced the results shown in the paper with CelebA, using PyTorch. I trained the model with bs=64 and about 10 days. The fig below is the result images with a size of 256. Note that it would be better if I trained the model on CelebA-HQ as celeba is more challenging. A few months later, I will release my source code and the pre-trained model for anyone who is interested.
image

@LeeeLiu
Copy link

LeeeLiu commented Apr 29, 2021

I have successfully reproduced the results shown in the paper with CelebA, using PyTorch. I trained the model with bs=64 and about 10 days. The fig below is the result images with a size of 256. Note that it would be better if I trained the model on CelebA-HQ as celeba is more challenging. A few months later, I will release my source code and the pre-trained model for anyone who is interested.
image

Hello, would you please share your trained checkpoints for 256×256?
I am so GPU hungry. thanks a lot !!!

@Hzzone
Copy link

Hzzone commented Apr 29, 2021

I have successfully reproduced the results shown in the paper with CelebA, using PyTorch. I trained the model with bs=64 and about 10 days. The fig below is the result images with a size of 256. Note that it would be better if I trained the model on CelebA-HQ as celeba is more challenging. A few months later, I will release my source code and the pre-trained model for anyone who is interested.
image

Hello, would you please share your trained checkpoints for 256×256?
I am so GPU hungry. thanks a lot !!!

I am glad to hear you are interested in the pre-trained model. I have submitted it elsewhere. As a result, it would be public once I release my paper on the arxiv. Anyway, it would be not so long until my paper was accepted or not.

@Hzzone
Copy link

Hzzone commented Apr 29, 2021

@@@@@@@
My paper has been accepted by IJCAI 2021.
Waiting for my model!!!!

@LeeeLiu
Copy link

LeeeLiu commented Apr 30, 2021

@@@@@@@
My paper has been accepted by IJCAI 2021.
Waiting for my model!!!!

Congratulations!
Looking forward to your paper link and trained model.

By the way, is the reversibility of your glow model good?
That's to say, when glow do inference z->X->z' ,
z and z' are completely the same.
(In official openAI code, it seems that the reversibility not good)

@Hzzone
Copy link

Hzzone commented May 26, 2021

If the exact inputs are desired to be constructed, the output z of different scales should be gathered.

@Hzzone
Copy link

Hzzone commented Jun 3, 2021

Code for pre-training, and the pre-trained models on the image size of 256, are now available at https://github.com/Hzzone/AgeFlow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants