Recipes for voice conversion #1356

aanchan · 2024-06-11T16:44:32Z

aanchan
Jun 11, 2024

Our lab is looking into using Lhotse for voice conversion. While recipes for well-known tasks like speech recognition and text-to-speech do exist, voice conversion seems like it is a bit less-explored. A quick search through the repository brought up the l2-arctic recipe and the vctk recipe. But their use in to create parallel speaker training data for speech to speech seq2seq voice conversion seems non-obvious. Is there a recipe for a similar task that someone could point to for us to get started off on our custom datasets, and work with an existing setup?

pzelasko · 2024-06-11T18:07:10Z

pzelasko
Jun 11, 2024
Maintainer

Lhotse supports any kind of audio-to-audio modeling tasks. I don't know of an open-source recipe for voice conversion but the closest you may be able to find is speech-enhancement/audio-to-audio training in Nvidia NeMo, which supports lhotse dataloading.

Generally speaking, you can follow an ASR data preparation recipe with the following modifications:

you don't really need SupervisionSegment objects
assuming you have source-target recording pairs, you'd create cuts from source recording set, then iterate over them and assign target recordings like this:

cuts = CutSet.from_manifests(recordings=RecordingSet.from_dir("path/to/dir", "*.flac", num_jobs=4))
for cut in cuts:
    cut.target_recording = Recording.from_file(target_audio_path)
cuts.to_file("src_tgt_cuts.jsonl.gz")

in your dataset class you may create source and target audio tensors like this:

src_audio, src_audio_lens = collate_audio(cuts)
tgt_audio, tgt_audio_lens = collate_audio(cuts, recording_field="target_recording")

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recipes for voice conversion #1356

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Recipes for voice conversion #1356

aanchan Jun 11, 2024

Replies: 1 comment

pzelasko Jun 11, 2024 Maintainer

aanchan
Jun 11, 2024

pzelasko
Jun 11, 2024
Maintainer