You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When i generate the same speech sample using the following code, baya sounds like there are 2 people speaking at the same time but other speakers dont (tested on xenia). However, generating this text with the telegram bot sounds fine.
# V4importtorchimporttorchaudioimportsoundfileassfimportnumpyasnplanguage='ru'model_id='v4_ru'sample_rate=48000speaker='baya'device=torch.device('cpu')
model, example_text=torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=language,
speaker=model_id)
model.to(device) # gpu or cpuaudio=model.apply_tts(text="Добро пожаловать в компьютизированный экспериментальный центр при лаборатории исследования природы порталов.",
speaker=speaker,
sample_rate=sample_rate)
sf.write('test_baya.wav', audio.numpy(), sample_rate)
Here are the samples from baya and xenia: samples.zip
To Reproduce
Steps to reproduce the behavior:
run code from above
Expected behavior
sound ok
Environment
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Home Single Language
GCC version: (x86_64-win32-seh-rev1, Built by MinGW-Builds project) 13.2.0
Clang version: 18.1.4
CMake version: version 3.29.2
Libc version: N/A
Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 556.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=205
L2CacheSize=9216
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=12th Gen Intel(R) Core(TM) i5-12500H
ProcessorType=3
Revision=
Versions of relevant libraries:
[pip3] flake8==7.1.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.2
[pip3] onnxruntime==1.18.1
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1
[pip3] torchvision==0.18.1+cu121
[conda] Could not collect
Additional context
nope
The text was updated successfully, but these errors were encountered:
🐛 Bug
When i generate the same speech sample using the following code, baya sounds like there are 2 people speaking at the same time but other speakers dont (tested on xenia). However, generating this text with the telegram bot sounds fine.
Here are the samples from baya and xenia:
samples.zip
To Reproduce
Steps to reproduce the behavior:
Expected behavior
sound ok
Environment
Additional context
nope
The text was updated successfully, but these errors were encountered: