Skip to content

rhasspy/rhasspy-speech

Repository files navigation

Rhasspy Speech

logo

Port of the speech-to-text system from Rhasspy. This uses Kaldi under the hood to recognize sentences from a set of pre-defined templates.

For example, the template:

sentences:
  - turn (on|off) [the] light

will allow rhasspy-speech to recognize the sentences:

  • turn on light
  • turn off light
  • turn on the light
  • turn off the light

Supported Languages

Pre-built models and derived from the corresponding voice2json models.

  • Czech, Czech Republic
  • German, Germany
  • English, United States
  • Spanish, Spain
  • French, France
  • Italian, Italy
  • Dutch, Netherlands
  • Russian, Russia

Tools and Dependencies

Pre-built tools must be downloaded for rhasspy-speech to work. This includes:

See the build_* scripts in script/ for how these tools are built. See the Dockerfile and script/build_docker.sh for how they are packaged.

You must also have the following system packages installed at runtime:

  • libopenblas0
  • libencode-perl

Handling Out of Vocabulary

rhasspy-speech generates two different Kaldi models from the sentence templates: one with a rigid grammar that only accepts the possible sentences, and another with a language model that allows new sentences to be made from the existing words.

Using both the grammar and language model, it's possible to robustly reject sentences outside of the templates. After transcripts are returned from both models, they can be compared to decide whether to accept or reject the grammar transcript.