-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questions #2
Comments
Hello, I haven't yet implemented ReF-ER in pytorch/tensorflow.
Do you mean computing the gradient for one sample of the mini-batch at the time? With the disclamer that I never implemented it myself, my first guess on how I would implement ReF-ER in pytorch/tensorflow would be:
I know that implementations of PPO use torch.clamp or tf.clip_by_value but that only works for the off-policy policy gradient. |
Sorry to reply you so late,I'm working on the code these days,tank you so much for your advice.After following your advice,the code works fine 。 |
Hello, I did not understand point 1. What would be axis=1? What would be axis=0? Why would you compute the mean of the importance weights? Regarding point 2, I assume this is about updating the penalization coefficient. In the paper, I wrote that I store the most-recently computed importance ratio for each experience in the RM. Each time an experience is sampled for a mini-batch grad update, the associated importance sample is updated. I do something similar also to employ Retrace without having to train on episodes rather than on steps. |
hello, |
What version of tf are you using? Sorry, I did not think it through. |
Hi, sorry I missed this.
|
|
hello, 2、Is my ratio ρ calculation correct?And compare it directly with c_max. self.ratio = (tf.reduce_prod(self.a_new_noise_policy.prob(self.action),axis=1))/(tf.reduce_prod(self.a_old_noise_policy.prob(self.action), axis=1)) self.kl = tf.reduce_sum(tf.distributions.kl_divergence(self.a_old_noise_policy, self.a_new_noise_policy), |
Hi, sorry for the delay.
|
hello,is there implement code with python for ’Remember and Forget for Experience Replay Supplementary Material‘, I had trouble with the gradient calculation.Is it right for me to compute the gradient one by one? Looking forward to your reply,thanks a lot.
The text was updated successfully, but these errors were encountered: