Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System wide STT service #161

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ Currently Dicio answers questions about:
## Speech to text

Dicio uses [Vosk](https://github.com/alphacep/vosk-api/) as its speech to text (`STT`) engine. In order to be able to run on every phone small models are employed, weighing `~50MB`. The download from [here](https://alphacephei.com/vosk/models) starts automatically whenever needed, so the app language can be changed seamlessly.
Dicio exports vosk as a speech-to-text service to the android system. Other apps can query this by different ways:
- [Via an intent](https://developer.android.com/reference/android/speech/RecognizerIntent), which shows up a dicio UI for speech input. The result is then provided to the requesting app (automatically of after user agreed as set in dicio settings).
- [From background](https://developer.android.com/reference/android/speech/SpeechRecognizer), if the requesting app has the record audio permission and dicio is set as speech input within settings -> apps -> default apps -> assistant (the exact path may vary depending on the Android version)
- If you want to use it as a "speech keyboard" (IME), you currently still need an app which use the Android speech-to-text-service and provides an IME (e.g. [this one](https://github.com/Kaljurand/K6nele))

## Contributing

Expand All @@ -57,7 +61,6 @@ When contributing keep in mind that other people may have **needs** and **views
If you want to translate Dicio to a new language you have to follow these **steps**:
<ul><li>
Translate the <b>strings used inside the app</b> via <a href="https://hosted.weblate.org/engage/dicio-android/">Weblate</a>. If your language isn't already there, add it with <a href="https://hosted.weblate.org/new-lang/dicio-android/strings/">tool -> start new translation</a>.
</br>
<a href="https://hosted.weblate.org/engage/dicio-android/">
<img src="https://hosted.weblate.org/widgets/dicio-android/-/287x66-grey.png" alt="Translation status" />
</a>
Expand Down
49 changes: 35 additions & 14 deletions app/src/main/AndroidManifest.xml
Original file line number Diff line number Diff line change
@@ -1,46 +1,51 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:installLocation="auto"> <!-- Allow installing also on external storage -->
android:installLocation="auto" >

<!-- Allow installing also on external storage -->
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />

<!-- required by the download manager for APIs < Q -->
<uses-permission
android:name="android.permission.WRITE_EXTERNAL_STORAGE"
android:maxSdkVersion="28" />

<!-- the open skill needs to query all apps -->
<uses-permission
android:name="android.permission.QUERY_ALL_PACKAGES"
tools:ignore="QueryAllPackagesPermission" />

<!-- the telephone skill needs to query contacts and call them -->
<uses-permission android:name="android.permission.READ_CONTACTS" />
<uses-permission android:name="android.permission.CALL_PHONE" />

<queries><!--To access Speech recognizer via system interface on Android 11+
https://developer.android.com/reference/android/speech/SpeechRecognizer#createSpeechRecognizer(android.content.Context,%20android.content.ComponentName)-->
<intent>
<action
android:name="android.speech.RecognitionService" />
</intent>
</queries>

<application
android:name=".App"
android:allowBackup="true"
android:dataExtractionRules="@xml/data_extraction_rules"
android:fullBackupContent="@xml/full_backup_content"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:supportsRtl="true"
android:theme="@style/DarkAppTheme"
tools:ignore="GoogleAppIndexingWarning"
android:dataExtractionRules="@xml/data_extraction_rules">
tools:ignore="GoogleAppIndexingWarning" >

<activity
android:name=".MainActivity"
android:exported="true"
android:theme="@style/SplashScreenTheme"
android:windowSoftInputMode="stateUnspecified|adjustResize">
android:windowSoftInputMode="stateUnspecified|adjustResize" >
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>

<intent-filter>
<action android:name="android.intent.action.ASSIST" />
<category android:name="android.intent.category.DEFAULT" />
Expand All @@ -50,28 +55,44 @@
android:name="com.android.systemui.action_assist_icon"
android:resource="@mipmap/ic_launcher" />
</activity>

<activity
android:name=".settings.SettingsActivity"
android:exported="false" />

<activity
android:name=".error.ErrorActivity"
android:exported="false" />

<activity
android:name=".input.stt_service.SttServiceActivity"
android:taskAffinity=""
android:excludeFromRecents="true"
android:exported="true"
android:taskAffinity=""
android:theme="@style/SttServiceDarkAppTheme"
android:windowSoftInputMode="adjustResize">

android:windowSoftInputMode="adjustResize" >
<intent-filter>
<category android:name="android.intent.category.DEFAULT" />
<action android:name="android.speech.action.RECOGNIZE_SPEECH" />
</intent-filter>
</activity>

<service
android:name=".input.stt_service.SttService"
android:enabled="true"
android:exported="true"
android:description="@string/pref_input_method_vosk"
android:icon="@mipmap/ic_launcher"
>
<!-- TODO check the following tags whether these are helpful too -->
<!-- android:directBootAware=["true" | "false"]-->
<!-- android:foregroundServiceType="microphone" -->
<!-- android:label="string resource"-->
<intent-filter>
<action android:name="android.speech.RecognitionService"/>
<category android:name="android.intent.category.DEFAULT" />
</intent-filter>
<meta-data
android:name="android.speech"
android:resource="@xml/stt_service_metadata" />
</service>
</application>

</manifest>
36 changes: 21 additions & 15 deletions app/src/main/java/org/stypox/dicio/MainActivity.java
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
package org.stypox.dicio;

import static android.Manifest.permission.RECORD_AUDIO;
import static android.content.pm.PackageManager.PERMISSION_GRANTED;

import android.content.Intent;
import android.content.SharedPreferences;
import android.os.Bundle;
Expand All @@ -13,23 +10,16 @@
import android.widget.ProgressBar;
import android.widget.ScrollView;

import androidx.annotation.NonNull;
import androidx.annotation.Nullable;
import androidx.appcompat.app.ActionBarDrawerToggle;
import androidx.appcompat.widget.SearchView;
import androidx.appcompat.widget.Toolbar;
import androidx.core.app.ActivityCompat;
import androidx.core.view.GravityCompat;
import androidx.drawerlayout.widget.DrawerLayout;
import androidx.preference.PreferenceManager;

import com.google.android.material.floatingactionbutton.ExtendedFloatingActionButton;
import com.google.android.material.navigation.NavigationView;

import org.dicio.skill.output.GraphicalOutputDevice;
import org.dicio.skill.output.SpeechOutputDevice;
import org.stypox.dicio.eval.SkillEvaluator;
import org.stypox.dicio.eval.SkillRanker;
import org.stypox.dicio.input.InputDevice;
import org.stypox.dicio.input.SpeechInputDevice;
import org.stypox.dicio.input.AndroidSttServiceInputDevice;
import org.stypox.dicio.input.ToolbarInputDevice;
import org.stypox.dicio.input.VoskInputDevice;
import org.stypox.dicio.input.stt_service.SttServiceActivity;
Expand All @@ -42,8 +32,19 @@
import org.stypox.dicio.skills.SkillHandler;
import org.stypox.dicio.util.BaseActivity;
import org.stypox.dicio.util.PermissionUtils;
import org.dicio.skill.output.GraphicalOutputDevice;
import org.dicio.skill.output.SpeechOutputDevice;

import androidx.annotation.NonNull;
import androidx.annotation.Nullable;
import androidx.appcompat.app.ActionBarDrawerToggle;
import androidx.appcompat.widget.SearchView;
import androidx.appcompat.widget.Toolbar;
import androidx.core.app.ActivityCompat;
import androidx.core.view.GravityCompat;
import androidx.drawerlayout.widget.DrawerLayout;
import androidx.preference.PreferenceManager;

import static android.Manifest.permission.RECORD_AUDIO;
import static android.content.pm.PackageManager.PERMISSION_GRANTED;

public class MainActivity extends BaseActivity
implements NavigationView.OnNavigationItemSelectedListener {
Expand Down Expand Up @@ -304,6 +305,11 @@ private InputDevice buildPrimaryInputDevice() {
.getString(getString(R.string.pref_key_input_method), "");
if (preference.equals(getString(R.string.pref_val_input_method_text))) {
return new ToolbarInputDevice();
} else if (preference.equals(getString(R.string.pref_val_input_method_systemStt))) {
//TODO make a hint/data privacy warning etc. in preference when this one is chosen that
// the speech dicio records is given to a third party app according to system
// settings
return new AndroidSttServiceInputDevice(this);
} else { // default
return new VoskInputDevice(this);
}
Expand Down
Loading