Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_len doesnt crop samples properly #290

Open
FormMe opened this issue Oct 25, 2024 · 3 comments
Open

max_len doesnt crop samples properly #290

FormMe opened this issue Oct 25, 2024 · 3 comments

Comments

@FormMe
Copy link

FormMe commented Oct 25, 2024

Hi. It seems that max_len doesnt work properly.

mel_len should be mel_input_length_all.max(), not mel_input_length_all.min()
It leads that we select the maximum length as minimum length in batch. With this formula we will select max_len only when the minimum length in batch will be greater than max_len

mel_input_length_all = accelerator.gather(mel_input_length)  # for balanced load
mel_len = min([int(mel_input_length_all.min().item() / 2 - 1), max_len // 2])
mel_len_st = int(mel_input_length.min().item() / 2 - 1)

For example if max_len==400, maximum length of mels in batch was 600 and minimum is 92 with whis formula we assign mel_len=min(92, 400)= 92
Thus, all samples in clipped batch will be with maximum length of 92 because we do

gt.append(mels[bib, :, (random_start * 2) : ((random_start + mel_len) * 2)])

It means that we always train on samples with minimum lenght in tha batch. Here some shapes for example

print(mels.shape, gt.shape, st.shape, wav.shape)
torch.Size([32, 80, 662]) torch.Size([32, 80, 92]) torch.Size([32, 80, 96]) torch.Size([32, 27600])
torch.Size([32, 80, 434]) torch.Size([32, 80, 92]) torch.Size([32, 80, 92]) torch.Size([32, 27600])
torch.Size([32, 80, 844]) torch.Size([32, 80, 92]) torch.Size([32, 80, 92]) torch.Size([32, 27600])

27600/300=92 (300 is hop len)

Also random_start leads to cropping the begging of samples that less than max_len and using padding instead
More over we skip many of samples

if gt.shape[-1] < 80:
   continue

To fix it we should crop only samples which length is greater than max_len

Did I noticed the bug or I dont understand something?

@FormMe FormMe changed the title max_len max_len doesnt crop samples properly Oct 25, 2024
@FormMe
Copy link
Author

FormMe commented Oct 31, 2024

Hello @yl4579. Could you exlain it please?

@martinambrus
Copy link

Hello @yl4579. Could you exlain it please?

I'm afraid @yl4579 left the community around the time this repository was last updated. He, and most of the initial contributors, no longer respond to any questions. You might be able to find some answers if you double-post this into Discussions as well, however the community all seem to have moved on to their own versions of StyleTTS2, including some commercial forks without contributing back to the community - which is a shame really. But that's the state of things right now.

@Respaired
Copy link

Respaired commented Nov 6, 2024

I think there's nothing wrong with the code itself and it's working as intended. the purpose of that line is probably not to take the biggest sample in the batch but rather to ensure no sample in your batch goes beyond that threshold. the Author's previous works also work in a similar way.

I've tried doing the other way by padding / trimming all the samples to ensure they're always at max_len if they were not, this will drastically increase the memory consumption as one would expect if you use a max len close to 10 seconds of audio. unless i'm confused about what you're trying to say, it's not a good idea to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants