Offline Speech to Text Without any Popup Dialog in Android

Last Updated : 31 Dec, 2020

In this method, we are going to implement an offline speech to text functionality in our project. It can work both Online and Offline. When there is no internet connectivity, it will use the pre-stored language model from our mobile device, so it didn’t recognize much clearly but give good results. When it is Online it recognizes all the words correctly. Note that we are going to implement this project using the Kotlin language.

Note: The offline method will not work on those devices whose API Version is less than 23.

Step by Step Implementation

Step 1: Create a New Project

To create a new project in Android Studio please refer to How to Create/Start a New Project in Android Studio. Note that select Kotlin as the programming language.

Step 2: Adding Permission

To access the mobile device microphone, we have to add RECORD_AUDIO permission in our AndroidManifest.xml file like below:

<uses-permission android:name=”android.permission.RECORD_AUDIO”/>

Step 3: Modify the colors.xml file

Add Below lines in the colors.xml file.

<color name=”mic_enabled_color”>#0E87E7</color>

<color name=”mic_disabled_color”>#6D6A6A</color>

Step 4: Working with the activity_main.xml file

Go to the activity_main.xml file and refer to the following code. Below is the code for the activity_main.xml file.

XML

<?xml version="1.0" encoding="utf-8"?> 
<LinearLayout 
    xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:gravity="center"
    android:orientation="vertical"
    tools:context=".MainActivity"> 
  
    <TextView
        android:id="@+id/speak_output_tv"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginHorizontal="20dp"
        android:text="Output Text Here..."
        android:textAlignment="center"
        android:textSize="25sp" /> 
  
    <ImageView
        android:id="@+id/mic_speak_iv"
        android:layout_width="60dp"
        android:layout_height="60dp"
        android:layout_marginTop="20dp"
        android:src="@drawable/ic_mic"
        app:tint="@color/mic_disabled_color" /> 
  
</LinearLayout>

Output UI:

Step 5: Working with the MainActivity.kt file

Go to the MainActivity.kt file and refer to the following code.

Checking Audio Permission:

To get started, we first need to allow the app to access microphone permission. This function will check if the app is able to access the microphone permission or not. If the permission is not granted then it will open the settings directly and from there the user can allow the microphone permission manually. This offline speech to text is not supported for lower API versions i.e., below 23, so here we are first checking the mobile API version by using Build.VERSION.SDK_INT, and here Build.VERSION_CODES.M will return the constant value of M i.e., 23. Replace the package name from the code with your package name(You can find your package name from the AndroidManifest.xml file)

Kotlin

private fun checkAudioPermission() { 
        if(Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {  // M = 23 
            if(ContextCompat.checkSelfPermission(this, "android.permission.RECORD_AUDIO") != PackageManager.PERMISSION_GRANTED) { 
                // this will open settings which asks for permission 
                val intent = Intent(Settings.ACTION_APPLICATION_DETAILS_SETTINGS, Uri.parse("package:com.programmingtech.offlinespeechtotext")) 
                startActivity(intent) 
                Toast.makeText(this, "Allow Microphone Permission", Toast.LENGTH_SHORT).show() 
            } 
      } 
}

The Function which Handles Speech to Text:

This is the main function of our project which handles speech. We have to first create an object of SpeechRecognizer class of current Context i.e., this (If we are using any Fragments, AlertDialog, etc, there we can replace this with context). Then we have to create an intent and attach EXTRA_LANGUAGE_MODEL and LANGUAGE_MODEL_FREE_FORM to the intent. In setRecognitionListener() method we have to override all the necessary functions like below. To get the speech result, we have to use onResults() method and storing the array list output from the Bundle. The element at the first index will give the output of the speech. We can also use useful functions like onBeginningOfSpeech() which runs first before it started listening and onEndOfSpeech() which runs after the result.

Kotlin

private fun startSpeechToText() { 
        val speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this) 
        val speechRecognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) 
        speechRecognizerIntent.putExtra( 
            RecognizerIntent.EXTRA_LANGUAGE_MODEL, 
            RecognizerIntent.LANGUAGE_MODEL_FREE_FORM 
        ) 
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault()) 
  
        speechRecognizer.setRecognitionListener(object : RecognitionListener { 
            override fun onReadyForSpeech(bundle: Bundle?) {} 
            override fun onBeginningOfSpeech() {} 
            override fun onRmsChanged(v: Float) {} 
            override fun onBufferReceived(bytes: ByteArray?) {} 
            override fun onEndOfSpeech() {} 
            override fun onError(i: Int) {} 
  
            override fun onResults(bundle: Bundle) { 
                val result = bundle.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION) 
                if (result != null) { 
                    // result[0] will give the output of speech 
                } 
            }             
            override fun onPartialResults(bundle: Bundle) {} 
            override fun onEvent(i: Int, bundle: Bundle?) {} 
        })       
        // starts listening ... 
        speechRecognizer.startListening(speechRecognizerIntent)  
}

Below is the final code for the MainActivity.kt file. Comments are added inside the code to understand the code in more detail.

Kotlin

import android.content.Intent 
import android.content.pm.PackageManager 
import android.net.Uri 
import android.os.Build 
import android.os.Bundle 
import android.provider.Settings 
import android.speech.RecognitionListener 
import android.speech.RecognizerIntent 
import android.speech.SpeechRecognizer 
import android.widget.ImageView 
import android.widget.TextView 
import android.widget.Toast 
import androidx.appcompat.app.AppCompatActivity 
import androidx.core.content.ContextCompat 
import java.util.* 
  
class MainActivity : AppCompatActivity() { 
      
    private lateinit var micIV: ImageView 
    private lateinit var outputTV: TextView 
  
    override fun onCreate(savedInstanceState: Bundle?) { 
        super.onCreate(savedInstanceState) 
        setContentView(R.layout.activity_main) 
  
        micIV = findViewById(R.id.mic_speak_iv) 
        outputTV = findViewById(R.id.speak_output_tv) 
  
        micIV.setOnClickListener { 
            checkAudioPermission() 
            // changing the color of mic icon, which  
            // indicates that it is currently listening 
            micIV.setColorFilter(ContextCompat.getColor(this, R.color.mic_enabled_color)) // #FF0E87E7 
            startSpeechToText() 
        } 
    } 
  
    private fun startSpeechToText() { 
        val speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this) 
        val speechRecognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) 
        speechRecognizerIntent.putExtra( 
                RecognizerIntent.EXTRA_LANGUAGE_MODEL, 
                RecognizerIntent.LANGUAGE_MODEL_FREE_FORM 
        ) 
        speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault()) 
  
        speechRecognizer.setRecognitionListener(object : RecognitionListener { 
            override fun onReadyForSpeech(bundle: Bundle?) {} 
            override fun onBeginningOfSpeech() {} 
            override fun onRmsChanged(v: Float) {} 
            override fun onBufferReceived(bytes: ByteArray?) {} 
            override fun onEndOfSpeech() { 
                // changing the color of our mic icon to 
                // gray to indicate it is not listening 
                micIV.setColorFilter(ContextCompat.getColor(applicationContext, R.color.mic_disabled_color)) // #FF6D6A6A 
            } 
  
            override fun onError(i: Int) {} 
  
            override fun onResults(bundle: Bundle) { 
                val result = bundle.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION) 
                if (result != null) { 
                    // attaching the output  
                    // to our textview 
                    outputTV.text = result[0]  
                } 
            } 
  
            override fun onPartialResults(bundle: Bundle) {} 
            override fun onEvent(i: Int, bundle: Bundle?) {} 
  
        }) 
        speechRecognizer.startListening(speechRecognizerIntent) 
    } 
  
    private fun checkAudioPermission() { 
        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {  // M = 23 
            if (ContextCompat.checkSelfPermission(this, "android.permission.RECORD_AUDIO") != PackageManager.PERMISSION_GRANTED) { 
                val intent = Intent(Settings.ACTION_APPLICATION_DETAILS_SETTINGS, Uri.parse("package:com.programmingtech.offlinespeechtotext")) 
                startActivity(intent) 
                Toast.makeText(this, "Allow Microphone Permission", Toast.LENGTH_SHORT).show() 
            } 
        } 
    } 
}