Error in Likelihood Computations? #81

AlexanderMath · 2019-05-06T18:23:40Z

It seems there is an error in the NLL computations on MNIST. The article reports NLL in "bits per pixels". The "per pixel" part is computed by dividing by the shape of x

glow/model.py

Line 185 in eaff217

bits_x = nobj / (np.log(2.) * int(x.get_shape()[1]) * int(

But in "data_loaders/get_mnist_cifar.py" the MNIST data is padded with zeros to size 32x32.

glow/data_loaders/get_mnist_cifar.py

Line 40 in eaff217

x_train = np.lib.pad(x_train, ((0, 0), (2, 2), (2, 2)), 'minimum')

This has two consequences:
1. The NLL is divided by 32^2 and not 28^2. This improves the loss.
2. The "scaling penalty" is also multiplied by 32^2 and not 28^2 (see below). This worsens the loss.

glow/model.py

Line 172 in eaff217

objective += - np.log(hps.n_bins) * np.prod(Z.int_shape(z)[1:])

It is not clear to us why this would yield the correct likelihood computations.

On an intuitive level, it seems that the "per pixel" loss should be computed by dividing with the original data size 28x28 instead of the padded data size 32x32x3. Below we argue that if the computations were correct we could obtain loss arbitrarily close to 0 by just increasing the amount of zero padding.

Suppose we normalize the input to [-127.5, +127.5] and change the Gaussian to be N(0, 127.5/2). The scaling is then 1 so the log "scaling penalty" becomes 0. Since more zero padding increases the shape and decreases the loss, we can add more and more zero padding and make the loss arbitrarily close to 0, which seems to be a problem.

Our sincerest apologies if we understood something wrong. In either case, we'd be happy to see an argument for how one can compute the likelihood of a 28x28 MNIST image under a model that is trained on a 32x32 padded variant of MNIST. The reasons we are interested in this, is that it would allow a data augmentation trick that interpolates images to larger resolution by zero padding in fourier space. For appropriate scaling the fourier transform is unitary and thus has unit determinant.

christabella · 2020-01-10T10:06:43Z

I guess the zero padding is like the pre-processing in order to obtain input x, and from the paper "M is the dimensionality of x", which would be 32x32?

While this wouldn't impact the training/optimization since they are constants, I agree that intuitively it seems like the absolute likelihood computation is "wrong" as you pointed out since original x is 28x28.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Likelihood Computations? #81

Error in Likelihood Computations? #81

AlexanderMath commented May 6, 2019

christabella commented Jan 10, 2020

Error in Likelihood Computations? #81

Error in Likelihood Computations? #81

Comments

AlexanderMath commented May 6, 2019

christabella commented Jan 10, 2020