-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derivative of a Function That Includes @diff Macro #670
Comments
Dear Barışcan, could you try the following:
Here is an mwe that I tried with the same logic, but it does not reproduce the error:
|
Hello again, I appreciate your comments, which were helpful for me to understand several issues. When I do not use the batchnorm function, I was able to take the derivative of my
Now, I conclude that the error is related to the batch normalization layer. Without batchnorm, the optimization of the model goes fine. However, the final performance of the model is worse compared to the one which uses batch normalization (in Pytorch). Therefore, I cannot obtain exatly the same results given in the offical code for GON (https://github.com/BariscanBozkurt/GON/blob/master/Variational-GON.py). How can I arrange my code (or the modified example code I provided in this comment) to take the derivative of the loss which uses a model with a batch normalization layer? |
In the above minimal working example, I use
I get the following error (which is a very similar error with the previous one)
|
I strongly believe that the issue is related to the affine implementation of batchnorm function in Knet. Here, I will report all the observations I made as well as my custom solution although I might be wrong at some points (please correct me). First of all, in our mwe, if I do not feed
Since I do not use Knet.jl/src/ops20/batchnorm.jl Lines 19 to 54 in 0485870
However, what I want is the following,
Therefore, I wrote a custom batch normalization function based on the batchnorm of Knet as the following.
In this function, I feed my learnable parameter
I could not figure out the reason of the error I get with affine implementation of batchnorm function in Knet. I hope my observations help to figure it out together. I will keep working on it. I will appreciate any comment to understand the main reason of the issue. |
Hello.
I am currently using Julia Versio 1.6.3 on a Platform "OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, GPU : CuDevice(0): Tesla T4". I am trying to implement a variational autoencoder called Gradient Origin Networks (GONs). GONs are introduced as a generative model which does not require encoders or hypernetworks. Assume Variational GON model called F. First, a zero vector z_0 is passed through the model F, and then the latent vector initialized as the minus gradient of the loss with respect to this zero vector. Therefore, the latent space is determined by only one gradient step. Let us call this latent vector as z. Then, the network parameters are optimized by using the loss with the reconstruction F(z).
I am currently performing my experiments on MNIST dataset where I linearly interpolated the images to the size of 32x32. The decoding and reparametrization functions are as follows. theta is a vector of model weights.
For the loss, it is used binary cross-entropy and KL-divergence. The code is given as follows.
Since there are two steps for GON (1-) Use the gradient w.r.t. origin to determine the latent space z, 2-) Use latent space for reconstruction) I need to track all the gradient w.r.t. model weights from the steps (1) and (2). Therefore, I wrote the following decoding function and loss function for training purpose.
However, I am not able to take the gradient of the " loss_train(theta, x)" function. I am getting the following error when I use the @diff macro of AutoGrad package. How can I handle to train this model which requires a second order derivative (I need the derivative of the function decode_train)?
To reproduce this result, you can run the following notebook :
https://github.com/BariscanBozkurt/Gradient-Origin-Networks/blob/main/GON_Implementation_Issue.ipynb
My code:
@diff loss_train(theta, x)
The error is:
The text was updated successfully, but these errors were encountered: