You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems there is an error in the NLL computations on MNIST. The article reports NLL in "bits per pixels". The "per pixel" part is computed by dividing by the shape of x
This has two consequences:
1. The NLL is divided by 32^2 and not 28^2. This improves the loss.
2. The "scaling penalty" is also multiplied by 32^2 and not 28^2 (see below). This worsens the loss.
It is not clear to us why this would yield the correct likelihood computations.
On an intuitive level, it seems that the "per pixel" loss should be computed by dividing with the original data size 28x28 instead of the padded data size 32x32x3. Below we argue that if the computations were correct we could obtain loss arbitrarily close to 0 by just increasing the amount of zero padding.
Suppose we normalize the input to [-127.5, +127.5] and change the Gaussian to be N(0, 127.5/2). The scaling is then 1 so the log "scaling penalty" becomes 0. Since more zero padding increases the shape and decreases the loss, we can add more and more zero padding and make the loss arbitrarily close to 0, which seems to be a problem.
Our sincerest apologies if we understood something wrong. In either case, we'd be happy to see an argument for how one can compute the likelihood of a 28x28 MNIST image under a model that is trained on a 32x32 padded variant of MNIST. The reasons we are interested in this, is that it would allow a data augmentation trick that interpolates images to larger resolution by zero padding in fourier space. For appropriate scaling the fourier transform is unitary and thus has unit determinant.
The text was updated successfully, but these errors were encountered:
I guess the zero padding is like the pre-processing in order to obtain input x, and from the paper "M is the dimensionality of x", which would be 32x32?
While this wouldn't impact the training/optimization since they are constants, I agree that intuitively it seems like the absolute likelihood computation is "wrong" as you pointed out since original x is 28x28.
It seems there is an error in the NLL computations on MNIST. The article reports NLL in "bits per pixels". The "per pixel" part is computed by dividing by the shape of x
glow/model.py
Line 185 in eaff217
But in "data_loaders/get_mnist_cifar.py" the MNIST data is padded with zeros to size 32x32.
glow/data_loaders/get_mnist_cifar.py
Line 40 in eaff217
This has two consequences:
1. The NLL is divided by 32^2 and not 28^2. This improves the loss.
2. The "scaling penalty" is also multiplied by 32^2 and not 28^2 (see below). This worsens the loss.
glow/model.py
Line 172 in eaff217
It is not clear to us why this would yield the correct likelihood computations.
On an intuitive level, it seems that the "per pixel" loss should be computed by dividing with the original data size 28x28 instead of the padded data size 32x32x3. Below we argue that if the computations were correct we could obtain loss arbitrarily close to 0 by just increasing the amount of zero padding.
Suppose we normalize the input to [-127.5, +127.5] and change the Gaussian to be N(0, 127.5/2). The scaling is then 1 so the log "scaling penalty" becomes 0. Since more zero padding increases the shape and decreases the loss, we can add more and more zero padding and make the loss arbitrarily close to 0, which seems to be a problem.
Our sincerest apologies if we understood something wrong. In either case, we'd be happy to see an argument for how one can compute the likelihood of a 28x28 MNIST image under a model that is trained on a 32x32 padded variant of MNIST. The reasons we are interested in this, is that it would allow a data augmentation trick that interpolates images to larger resolution by zero padding in fourier space. For appropriate scaling the fourier transform is unitary and thus has unit determinant.
The text was updated successfully, but these errors were encountered: