Bug report - [Baya ru speaker (v4) sounds weird] #281

JaanDev · 2024-07-07T21:36:12Z

🐛 Bug

When i generate the same speech sample using the following code, baya sounds like there are 2 people speaking at the same time but other speakers dont (tested on xenia). However, generating this text with the telegram bot sounds fine.

# V4
import torch
import torchaudio
import soundfile as sf
import numpy as np

language = 'ru'
model_id = 'v4_ru'
sample_rate = 48000
speaker = 'baya'
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=model_id)
model.to(device)  # gpu or cpu

audio = model.apply_tts(text="Добро пожаловать в компьютизированный экспериментальный центр при лаборатории исследования природы порталов.",
                        speaker=speaker,
                        sample_rate=sample_rate)

sf.write('test_baya.wav', audio.numpy(), sample_rate)

Here are the samples from baya and xenia:
samples.zip

To Reproduce

Steps to reproduce the behavior:

run code from above

Expected behavior

sound ok

Environment

Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home Single Language
GCC version: (x86_64-win32-seh-rev1, Built by MinGW-Builds project) 13.2.0
Clang version: 18.1.4
CMake version: version 3.29.2
Libc version: N/A

Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 556.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=205
L2CacheSize=9216
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=12th Gen Intel(R) Core(TM) i5-12500H
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] flake8==7.1.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.2
[pip3] onnxruntime==1.18.1
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1
[pip3] torchvision==0.18.1+cu121
[conda] Could not collect

Additional context

nope

The text was updated successfully, but these errors were encountered:

JaanDev added the bug Something isn't working label Jul 7, 2024

JaanDev assigned snakers4 Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug report - [Baya ru speaker (v4) sounds weird] #281

Bug report - [Baya ru speaker (v4) sounds weird] #281

JaanDev commented Jul 7, 2024

Bug report - [Baya ru speaker (v4) sounds weird] #281

Bug report - [Baya ru speaker (v4) sounds weird] #281

Comments

JaanDev commented Jul 7, 2024

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context