Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about code #1

Open
gunshi opened this issue Apr 26, 2019 · 2 comments
Open

Question about code #1

gunshi opened this issue Apr 26, 2019 · 2 comments

Comments

@gunshi
Copy link

gunshi commented Apr 26, 2019

Hi, thank you for your code which helped me understand some of the concepts from the paper better.
I had a few remaining questions about the implementation, it would be great if you could clarify those.

g_optim.zero_grad() autograd.backward( -z_i, ##why minus, and why zi grad_tensors=svgd) g_optim.step()

I was a little confused about why we are taking the gradient with respect to -z_i and not z_i in the above lines and also why we are computing the kernel over two different batches of particles (z_i and z_j) rather than between the particles of just one batch.. is that to help with something like training stability for example?
Thanks!

@neale
Copy link

neale commented Jun 28, 2019

@gunshi

  • You can compute the kernel among any number of particles. But as particles are like random samples, it might make sense to try to approximate the posterior with more draws from the generator.
  • I think the -z_i term is wrong (see equation (10) from https://arxiv.org/pdf/1707.06626.pdf)
  • Still, the term autograd.backward( z_i, grad_tensors=svgd) is confusing. svgd is a tensor, so we need to compute the jacboian vector product to take gradients. Probably, we want to update z_j with respect to the svgd loss -- pytorch docs.

Empirically, the -z_i term didn't work for me, and it maximized the loss as expected. Flipping it works fine. I also found that computing k(z_j, z_j), then updating w.r.t. z_j converges faster to the MAP estimate, which (for me) is not desireable.

@mokeddembillel
Copy link

mokeddembillel commented Apr 10, 2021

@gunshi @neale

  • I think he uses -z_i because it's gradient ascent and not gradient descent, see equation 10,
  • the problem for me is that he used the X coordinate in order to predict the Y coordinate of the particle, I find that weird, how can we mix two different axes? (see z_flow and data_energy in d_learn)
  • what should we do if we want to sample particles with both coordinates (X and Y) instead of just Y
  • he is passing the observed data to the generator together with the noise, why do we need to do that, especially since it's not relevant to the article.
  • Also something that confused me more is that since in the article the discriminator has only one input (one particle), why he is using two inputs (two particles)
  • Also the way he is updating the parameters and many other things are not similar to what they did in the article.

I tried to implement a version where everything it is exactly similar to the article, but for now, it is still not working, even if I tried everything similar to what they did in the article, here is the code if you want to take a look, (https://github.com/mokeddembillel/Amortized-SVGD-GAN)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants