training data to TTS vice model #48
Replies: 1 comment
-
.npz isn't a model, it's a file containing info about the voice. (specifically the previous fine (encodec), coarse (encodec) and semantic (wav2vec) tokens) You can create a custom .npz from a short audio clip in the clone section on the bark text to speech tts page. Make sure you have a good clip for cloning, optimally in .wav format (you can use ffmpeg to convert it). Soon, serp-ai will release code for fine-tuning bark as well, i'll implement it as soon as i can. And see if i can implement some other things to make fine-tuning easier. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to make TTS models from audio clips that I have on my PC. I can train a model on the tab and from there I know I can send it to RVC, generate speech from TTS, and then send it to RVC. Then have my voice generated. I was wondering if there is a way to make the data that I made work strat in TTS? or if there I a tool to turn that .pth's to .npz model? If anyone got anything that would be much appreciated!
Beta Was this translation helpful? Give feedback.
All reactions