Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerous enhancements and fixes (Multichannel audio support, cleaner predictions, scalable losses) #27

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

pokepress
Copy link

Admittedly, this is a pretty big update. I'll break down the changes:

  • Adds support for multichannel audio
  • Makes most losses scalable
  • Adds DC offset loss (DC offset measures how evenly a sound wave is centered around the 0 line-I sometimes had cases with seanet where the waves would become un-centered, and having this loss helps track it)
  • When using predict on an audio file, borders between segments are crossfaded to make transitions less abrupt
  • Enumerates some lists before looping (seems to improve performance)
  • Adds support for bfloat16 (on modern GPUs, tends to be faster and use less memory)
  • Adds configurations for 44.1->44.1 audio cleanup
  • Updates requirements file to newer versions of components
  • Allows saving best models without using cross validation
  • Fixes some subscript names
  • Fixes some typos
  • Removes unused parameter from predict logic

pokepress and others added 30 commits December 8, 2023 22:44
-Specifies hydra version to remove log warnings
-Uses correct(?) key for best model in prediction
-Enumerates iterators (appears to boost performance 10-20% in some cases)
-Uses replace than rename for file saving (seems to be needed for Windows)
-Fixed "msd" discriminator keyword
-Updated torch.stft call to preserve compatibility with later versions of torchaudio
-Save best state without cross-validation (uses total loss)
-Preseve best state across runs
-Adds support for training models with 2 or more audio channels
Add CD Quality Seanet experiment file
Allow predicting from raw model file, remove unused dset parameter, u…
Update readme for other changes
…o isn't changing the center point of the waveform too much, a problem I've run into several times.
Add DC Offset loss. This is more for verifying that the upscaled audi…
Add scaling for various losses. Documentation fixes. Requirements fil…
…edicted clip chunks to smooth segment transitions.
Add ability to specify distinct discriminator learning rate, blend pr…
Add links for AM/FM models
Add scaling factors for msd/mpd, bfloat16 support
(partially) Fix handling of crossfade between segments when sample ra…
Temporarily remove some changes to prepare for pull request.
… config, update torch version to 2.2.1 (2.3.1 seems to have issues)
Add option to enalarge dataset via volume variance, apply beta 1 from…
pokepress and others added 15 commits July 6, 2024 19:32
-Randomly nudge sample start point during load
-expand gitignore
-Use sigmoid wasserstein gan for MSD and MPD discriminators
More conversion to wgan, add stft with configurable range, other corr…
Use Hugging face link for FM Super Resolution Model.
Correctly calculate overlap
Add ability to weight frequencies that that are too loud/soft differently in STFT loss
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants