Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the output of Wavenet generate() always the same? #11

Open
littleTwelve opened this issue Mar 23, 2018 · 18 comments
Open

Why the output of Wavenet generate() always the same? #11

littleTwelve opened this issue Mar 23, 2018 · 18 comments

Comments

@littleTwelve
Copy link

When I use the code to train a model, it seems good. However, when I use the trained model to generate data, I get a sequence of number which are all the same value. For example, if I input 1*5000 vector [2,99,34,...,45, 27,33], then I use generate() to generate data, I get [2,99,34,...,45, 27,33,33,33,33,...,33,33,33]. As you see, I generate a sequence of number which are all the same value and what is more strange is that these numbers are all equal to the last number of the input. I can't find what's wrong with the code, I would appreciate it if someone can give me some advice.

@vincentherrmann
Copy link
Owner

Have you tried the generate_fast() method? I think there is probably a bug in the generate() function. I will try to fix it, but you shouldn't really be using it anyway since it's painfully slow.

@littleTwelve
Copy link
Author

Thank you for your reply! I will try it later. Actually, I rewrite your code rely on my understanding of it. So I use generate() just because I can understand it clearly. As for generate_fast(), I can't understand it well. You said there is probably a bug in the generate() function. Although I can't use generate() to get what I want, I have no idea on what's wrong with it. Could you explain it clearer?

@vincentherrmann
Copy link
Owner

The problem in the generate function() was simply that I didn't do one-hot-encoding of the input. I have fixed it now, but let me know if there's something I can do to help your understanding!

@littleTwelve
Copy link
Author

Thanks again! I just found I have a big misunderstanding of the wavenet and I'm trying to correct it. So I'm afraid that I may discuss the generate_fast() method with you after 1 or 2 days. I'm so sorry for that.

@littleTwelve
Copy link
Author

I wonder why you need dilate() in wavenet_modules.py but not just use the parameter 'dilation' in nn.Conv1d?

@littleTwelve
Copy link
Author

In your code, there is '(N, C, L), where N is the input dilation', but based on nn.Conv1d 'N' is the batch size, so I don't know why N is the input dilation?

@vincentherrmann
Copy link
Owner

Here I answered the question regarding the dilate() function. The convolution is executed in parallel for every index in the first dimension, which in the wavenet architecture is both the dilation and the batch number. So, to be exact, N = dilation * minibatch_count.

@littleTwelve
Copy link
Author

Thanks! I also have a question about the item length. In your code, item_length = receptive_field+output_length-1 and I found your output_length is always some small number like 32,48,16. What I used to do at training stage is that I set item_length to be a large number for example 21600 (because I seem to remember DeepMind mentioned in their paper that they need 2 minutes data to generate 1 second data) , which may correspond a very large output_length or a deeper wavenet in your code. And then I just use the output which length is 17507 (if receptive_field is 4093, then 21600 - 4093 = 17507) to do the cross entropy. I want to know whether my idea is reasonable or not?

@vincentherrmann
Copy link
Owner

vincentherrmann commented Mar 24, 2018

Intuitively it makes sense for the output_length to have the same order of magnitude as the receptive field of the model. Currently I use an output length of 4096 most of the time (you can see the stuff I'm working on in the parallel branch). If the output length is longer the computation time increases linearly and it would be better to use bigger mini batches instead.
I'm not sure which passage of the paper you're referring to, though...

@littleTwelve
Copy link
Author

littleTwelve commented Mar 24, 2018

Wow! You are awesome! I happen to learn how to make a conditional wavenet in the few next months. I think I will bother you a lot in the next few months. Could you tell me what is your conditioning input? Types of music or something else?

@vincentherrmann
Copy link
Owner

I'm trying to make the model learn the structure of a piece/song and condition the wavenet on a local time embedding. Hopefully this allows to generate longer and more musically interesting sequences. It's a bit complicated, if it works I will write a blog post about it.

@littleTwelve
Copy link
Author

That's great! I am looking forward to it.

@littleTwelve
Copy link
Author

It seems that my problem mentioned 2 days before has nothing to do with the generating function. I can use your code to generate a sine wave very well. If I use my own dataset, I got nothing but a straight line, but the training loss is 1e-08.

@littleTwelve
Copy link
Author

littleTwelve commented Mar 28, 2018

Is there any trick for how to train a wavenet? No matter how to change the wavenet's parameters, I've got nothing but a straight line. Do you have some suggestions?@vincentherrmann

@HTT1995
Copy link

HTT1995 commented Mar 25, 2019

I had the same problem.Not only the generate function ,but also the trainning result. I always got the raw audio output,such as: [20 20 20 20 20 20 20 20...], I don't know why.I check my code very carefully ,but it doesn't work.I'll very appreciate if someone can help me.

@littleTwelve
Copy link
Author

I think maybe you could increase the value of mu, residue and skip. For example, mu= 64, skip=64 and residue=512. I solved my problem just by this way.

@HTT1995
Copy link

HTT1995 commented Mar 26, 2019

Thank you for your advice, I have tried different combinations of these parameters,but it seemed doesn't work. the input [batchsizeclasseslength] through the all conv layer ,and then I find each column of the one_hot output [batchsizeclassesoutput_length] is very similar, so after de_one_hot, the raw audio output[batchsize1output_length] is the same. Do you have some other suggestions?

@ZXY1231
Copy link

ZXY1231 commented Dec 1, 2020

Thank you for your advice, I have tried different combinations of these parameters,but it seemed doesn't work. the input [batchsize_classes_length] through the all conv layer ,and then I find each column of the one_hot output [batchsize_classes_output_length] is very similar, so after de_one_hot, the raw audio output[batchsize_1_output_length] is the same. Do you have some other suggestions?

Hello, I met the same problem like yours, did you solve your problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants