Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not output samples from student to teacher? #12

Open
neverjoe opened this issue Mar 27, 2018 · 6 comments
Open

Why not output samples from student to teacher? #12

neverjoe opened this issue Mar 27, 2018 · 6 comments

Comments

@neverjoe
Copy link

cross_entropy = discretized_mix_logistic_loss(target_distribution, student_samples)

@neverjoe
Copy link
Author

i think target_distribution should be compute by samples from student as output samples from student to teacher.

@vincentherrmann
Copy link
Owner

I don't really get what you mean. Do you think we should sample many inputs for the teacher network from the mu and s output of the student network? Then we had to calculate the whole teacher network multiple times which would be very computationally expensive. Also, if we sample the student output we lose the conditioning on the previous time-samples, so I don't think it makes sense. The output of mu and s of the student network exists only to compare the distributions of the student and the teacher network.

@neverjoe
Copy link
Author

I got your idea, i have same worry, but paper said we need to estimate the distributions of teacher and student by sampling. By the way, the target_distribution and student_samples has different shape, is a bug ? Have u got any reasonable results?

@vincentherrmann
Copy link
Owner

In the paper it says that x = g(z), where z is the input noise. It think the whole point of equations (9)-(13) in the paper is to save us from having to calculate the teacher network multiple times.
The target_distribution is a parameterization of the teacher distribution, and student_samples are multiple samples from the student distribution, so they should have different shapes. For me it seems to work reasonably well, although the output of the parallel wavenet is noisier than the original one (but I haven't implemented the additional loss terms yet, so that might help).

@neverjoe
Copy link
Author

neverjoe commented Mar 27, 2018

Great! I think the power loss and contrastiveis loss is very important for good quality voice.

@neverjoe
Copy link
Author

Can u show me your loss plot?My loss can't get coveraged for days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants