Add support for triple frequency resolution mono spectrograms using RGB channels #50

auryd · 2022-12-28T07:18:50Z

This is another way to use the color channels which adds a lot of audio quality (alternative to GB stereo). It seems to be compatible with further fine-tuning of the existing model checkpoint on RGB mono spectrograms.

…GB channels

dene- · 2022-12-28T22:05:57Z

Issues I've found out, just cloning your fork and running the commands without extra params besides -t to use triple freq mono:

Running audio-to-image only with -t param, results in the whole audio file converted to an image, instead of chunks of 512px (apparently it's well converted, with different spectrograms on each channel).
Grabbing that long image and getting a 512x512px chunk, when converting that back to audio using image-to-audio, default params, gives me this error:
- (relevant part of the error).

auryd · 2022-12-28T23:45:39Z

Issues I've found out, just cloning your fork and running the commands without extra params besides -t to use triple freq mono:

Running audio-to-image only with -t param, results in the whole audio file converted to an image, instead of chunks of 512px (apparently it's well converted, with different spectrograms on each channel).

Grabbing that long image and getting a 512x512px chunk, when converting that back to audio using image-to-audio, default params, gives me this error:

(relevant part of the error).

Thanks for trying it out!

For the first part, that matches the existing behavior.

I don't repro the error, even with long audio files.

Does the error repro when you run that audio file through the default settings with audio-to-image as well? If not, and it's unique to triple_res_mono, can you share the file?

dene- · 2022-12-29T09:11:10Z

Issues I've found out, just cloning your fork and running the commands without extra params besides -t to use triple freq mono:

Running audio-to-image only with -t param, results in the whole audio file converted to an image, instead of chunks of 512px (apparently it's well converted, with different spectrograms on each channel).

Grabbing that long image and getting a 512x512px chunk, when converting that back to audio using image-to-audio, default params, gives me this error:

(relevant part of the error).

Thanks for trying it out!

For the first part, that matches the existing behavior.

I don't repro the error, even with long audio files.

Does the error repro when you run that audio file through the default settings with audio-to-image as well? If not, and it's unique to triple_res_mono, can you share the file?

I managed to get it working, but only if I convert back to audio the whole image file. If I cut it on chunks (to train for example), I can't convert them back and it throws that error.

Is there a possibility to add low freqs in one of the channels? I think adding them will add much more to the perceived quality improvement. In both normal and triple mono there's no lows at all 🤔

I tried converting using --num-frequencies param with values of 768 and 1024 and despite there's more quality, it still sounds like you cut the low freqs, but I suppose it's the limitation of encoding as image :/

auryd added 2 commits December 27, 2022 21:18

Add support for triple frequency resolution mono spectrograms using R…

9081160

…GB channels

Fix tests

e0bef76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for triple frequency resolution mono spectrograms using RGB channels #50

Add support for triple frequency resolution mono spectrograms using RGB channels #50

auryd commented Dec 28, 2022

dene- commented Dec 28, 2022

auryd commented Dec 28, 2022

dene- commented Dec 29, 2022 •

edited

Add support for triple frequency resolution mono spectrograms using RGB channels #50

Are you sure you want to change the base?

Add support for triple frequency resolution mono spectrograms using RGB channels #50

Conversation

auryd commented Dec 28, 2022

dene- commented Dec 28, 2022

auryd commented Dec 28, 2022

dene- commented Dec 29, 2022 • edited

dene- commented Dec 29, 2022 •

edited