Question about code #1

gunshi · 2019-04-26T19:32:43Z

Hi, thank you for your code which helped me understand some of the concepts from the paper better.
I had a few remaining questions about the implementation, it would be great if you could clarify those.

g_optim.zero_grad() autograd.backward( -z_i, ##why minus, and why zi grad_tensors=svgd) g_optim.step()

I was a little confused about why we are taking the gradient with respect to -z_i and not z_i in the above lines and also why we are computing the kernel over two different batches of particles (z_i and z_j) rather than between the particles of just one batch.. is that to help with something like training stability for example?
Thanks!

The text was updated successfully, but these errors were encountered:

neale · 2019-06-28T18:37:06Z

@gunshi

You can compute the kernel among any number of particles. But as particles are like random samples, it might make sense to try to approximate the posterior with more draws from the generator.
I think the -z_i term is wrong (see equation (10) from https://arxiv.org/pdf/1707.06626.pdf)
Still, the term autograd.backward( z_i, grad_tensors=svgd) is confusing. svgd is a tensor, so we need to compute the jacboian vector product to take gradients. Probably, we want to update z_j with respect to the svgd loss -- pytorch docs.

Empirically, the -z_i term didn't work for me, and it maximized the loss as expected. Flipping it works fine. I also found that computing k(z_j, z_j), then updating w.r.t. z_j converges faster to the MAP estimate, which (for me) is not desireable.

mokeddembillel · 2021-04-10T07:46:04Z

@gunshi @neale

I think he uses -z_i because it's gradient ascent and not gradient descent, see equation 10,
the problem for me is that he used the X coordinate in order to predict the Y coordinate of the particle, I find that weird, how can we mix two different axes? (see z_flow and data_energy in d_learn)
what should we do if we want to sample particles with both coordinates (X and Y) instead of just Y
he is passing the observed data to the generator together with the noise, why do we need to do that, especially since it's not relevant to the article.
Also something that confused me more is that since in the article the discriminator has only one input (one particle), why he is using two inputs (two particles)
Also the way he is updating the parameters and many other things are not similar to what they did in the article.

I tried to implement a version where everything it is exactly similar to the article, but for now, it is still not working, even if I tried everything similar to what they did in the article, here is the code if you want to take a look, (https://github.com/mokeddembillel/Amortized-SVGD-GAN)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about code #1

Question about code #1

gunshi commented Apr 26, 2019 •

edited

Loading

neale commented Jun 28, 2019 •

edited

Loading

mokeddembillel commented Apr 10, 2021 •

edited

Loading

Question about code #1

Question about code #1

Comments

gunshi commented Apr 26, 2019 • edited Loading

neale commented Jun 28, 2019 • edited Loading

mokeddembillel commented Apr 10, 2021 • edited Loading

gunshi commented Apr 26, 2019 •

edited

Loading

neale commented Jun 28, 2019 •

edited

Loading

mokeddembillel commented Apr 10, 2021 •

edited

Loading