GitHub - mrpep/encodecmae-to-wav: Experiments sonifying frame-level encodecmae features and encodecmae summary vectors using generative audio models.

Code with implementation of models aiming to invert EnCodecMAE features back to the waveform domain.

Inference

We provide pretrained weights for many of our models, and this colab demonstrates how to play around with them.

Model Name	Upstream	Summary	Training Data	Model Type
ecmae2ec-base-1LTransformer	EnCodecMAE Base	None	AS + LL + FMA	Regressor
DiffTransformerAE2L8L1CLS-10s	EnCodecMAE Base	10s	FMA + Jamendo	Diffusion
DiffTransformerAE2L8L1CLS-4s	EnCodecMAE Base	4s	FMA	Diffusion

Training

For training follow these steps:

Gather training datasets and put them in a folder. The datasets should have a sampling rate of 24 kHz.
Install docker and docker-compose.
Clone this repository and also this one
Edit the docker-compose file. Modify the paths in volumes so that they point to: encodecmae repository, this repository, and the folder with the datasets. These folders will appear in the docker container inside the /workspace folder. Update the device_ids according to the gpus that you want to use inside the container for training.
Update the paths in the configs/datasets as needed
Inside this repository folder run:

docker compose up -d
docker attach encodecmae-to-wav-train

An interactive shell will open. Run

cd /workspace/encodecmae
pip install -e .
cd /workspace/encodecmae-to-wav
pip install -e .

Check that the datasets appear in /workspace/datasets
Navigate to /workspace/encodecmae-to-wav/encodecmae-to-wav
Run chmod +x scripts/train.sh
In scripts/train.sh you will find a list of commands, each corresponding to a different experiment. Comment everything except the experiment to be ran. The batch size and other parameters can be modified in the --mods argument or by editing this config
Run scripts/train.sh and it should start training.

Citation

If you use this code or results in your paper, please cite our work as:

@article{alonso2024leveraging,
  title={Leveraging pre-trained autoencoders for interpretable prototype learning of music audio},
  author={Alonso Jim{\'e}nez, Pablo and Pepino, Leonardo and Batlle-Roca, Roser and Zinemanas, Pablo and Serra, Xavier and Rocamora, Mart{\'\i}n},
  year={2024},
  publisher={Institute of Electrical and Electronics Engineers (IEEE)}
}

@article{pepino2023encodecmae,
  title={EnCodecMAE: Leveraging neural codecs for universal audio representation learning},
  author={Pepino, Leonardo and Riera, Pablo and Ferrer, Luciana},
  journal={arXiv preprint arXiv:2309.07391},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
encodecmae_to_wav		encodecmae_to_wav
README.md		README.md
back2waveform.png		back2waveform.png
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encodecmae_to_wav

encodecmae_to_wav

README.md

README.md

back2waveform.png

back2waveform.png

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

Inference

Training

Citation

About

Releases

Packages

Languages

mrpep/encodecmae-to-wav

Folders and files

Latest commit

History

Repository files navigation

Inference

Training

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages