Speech to Text Application in Android using Jetpack Compose

Last Updated : 02 Jun, 2022

Speech to Text is used in most applications such as Google Search for searching any query. For using this feature user simply has to tap on the microphone icon and speak the query he wants to search. The speech of the user will be converted to text. In this article, we will take a look at How we can use speech to text feature within our android application using Jetpack Compose.

Step by Step Implementation

Step 1: Create a New Project in Android Studio

To create a new project in Android Studio please refer to How to Create/Start a New Project in Android Studio. While choosing the template, select Empty Compose Activity. If you do not find this template, try upgrading the Android Studio to the latest version. We demonstrated the application in Kotlin, so make sure you select Kotlin as the primary language while creating a New Project.

Step 2: Adding dependency for using the custom icons

Navigate to Gradle Scripts > build.gradle and add the below dependency in the dependencies section. We will be using this dependency for using custom icons such as the Microphone icons within our application.

implementation "androidx.compose.material:material-icons-extended:1.1.1"

After adding this dependency simply sync your project to install it.

Step 3: Adding color in the Color.kt file

Navigate to the app > java > your app’s package name > ui.theme > Color.kt file and add the below code to it. Comments are added in the code to get to know in detail.

Kotlin

package com.example.newcanaryproject.ui.theme
 
import androidx.compose.ui.graphics.Color
 
val Purple200 = Color(0xFF0F9D58)
val Purple500 = Color(0xFF0F9D58)
val Purple700 = Color(0xFF3700B3)
val Teal200 = Color(0xFF03DAC5)
 
// on below line we are adding green color.
val greenColor = Color(0xFF0F9D58)

Step 4: Working with the MainActivity.kt file

Go to the MainActivity.kt file and refer to the following code. Below is the code for the MainActivity.kt file. Comments are added inside the code to understand the code in more detail.

Kotlin

package com.example.newcanaryproject
 
import android.app.Activity
import android.content.Context
import android.content.Intent
import android.os.Bundle
import android.speech.RecognizerIntent
import android.speech.SpeechRecognizer
import android.widget.Toast
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.compose.foundation.layout.*
import androidx.compose.material.*
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.filled.Mic
import androidx.compose.runtime.*
import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier
import androidx.compose.ui.graphics.Color
import androidx.compose.ui.platform.LocalContext
import androidx.compose.ui.text.style.TextAlign
import androidx.compose.ui.unit.*
import com.example.newcanaryproject.ui.theme.NewCanaryProjectTheme
import com.example.newcanaryproject.ui.theme.greenColor
import java.util.*
 
class MainActivity : ComponentActivity() {
     
    // on below line we are creating a variable
    // for our output of speech to text
    var outputTxt by mutableStateOf("Click button for Speech text ")
 
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContent {
            NewCanaryProjectTheme {
                // on below line we are specifying 
                // background color for our application
                Surface(color = MaterialTheme.colors.background) {
                    // on below line we are 
                    // specifying theme as scaffold
                    Scaffold(
                        // on below line we are 
                        // specifying top bar as our action bar.
                        topBar = {
                            TopAppBar(
                                // on below line we are specifying title
                                // for our action bar as Speech to Text
                                title = {
                                    Text(
                                        text = "Speech to Text",
                                       
                                        // on below line we are specifying
                                        // width of our text
                                        modifier = Modifier.fillMaxWidth(),
                                       
                                        // on below line we are specifying 
                                        // the text alignment for our text
                                        textAlign = TextAlign.Center
                                    )
                                })
                        }) {
                        // on below line we are calling
                        // speech to text method.
                        SpeechToText()
                    }
                }
            }
        }
    }
 
    // on below line we are creating 
    // a speech to text function.
    @Composable
    fun SpeechToText() {
        // on below line we are creating
        // variable for the context
        val context = LocalContext.current
        // on below line we are creating ui 
        // for our home screen in column
        Column(
            // on below line we are specifying 
            // modifier as max size for our column
            modifier = Modifier.fillMaxSize(),
             
            // on below line we are specifying 
            // horizontal and vertical arrangement
            horizontalAlignment = Alignment.CenterHorizontally,
            verticalArrangement = Arrangement.Center
        ) {
            // on the below line we are creating a simple
            // text for displaying the heading.
            Text(
                text = "Speech to Text Example",
                 
                // for this text we are specifying
                // style on below line
                style = MaterialTheme.typography.h6,
                 
                // on below line we are specifying the
                // modifier and padding for our text
                modifier = Modifier
                    .fillMaxWidth()
                    .padding(15.dp),
                 
                // on below line we are 
                // specifying the text alignment.
                textAlign = TextAlign.Center
            )
            // on below line we are specifying 
            // the spacer between two views.
            Spacer(modifier = Modifier.height(30.dp))
             
            // on below line we are creating button for our mic
            Button(
                // on below line we are specifying 
                // the elevation for our button.
                elevation = ButtonDefaults.elevation(
                    // we are specifying elevation for different state of buttons.
                    defaultElevation = 0.dp, pressedElevation = 0.dp, disabledElevation = 0.dp
                ),
                
                // on below line we are specifying the color for our button
                colors = ButtonDefaults.buttonColors(backgroundColor = Color.Transparent),
                 
                // on below line we are specifying on click listener for our button
                onClick = { getSpeechInput(context = context) },
            ) {
                // on below line we are specifying
                // the icon for our button,
                Icon(
                    // in this we are specifying 
                    // the icon as mic icon.
                    imageVector = Icons.Filled.Mic,
                    
                    // on below line we are specifying 
                    // content description as mic.
                    contentDescription = "Mic",
                    
                    // on below line we are specifying 
                    // tint color for our icon
                    tint = greenColor,
                     
                    // on below line we are specifying padding,
                    // height and width for our icon
                    modifier = Modifier
                        .height(100.dp)
                        .width(100.dp)
                        .padding(5.dp)
                )
            }
            // again we are adding a spacer between two views.
            Spacer(modifier = Modifier.height(30.dp))
            // on below line we are creating a text
            // for displaying the speech to text output.
            Text(
                // on below line we are setting
                // output text in our text view
                text = outputTxt,
                 
                // on below line we are specifying 
                // the style for our text.
                style = MaterialTheme.typography.h6,
                 
                // on below line we are adding
                // padding for our text
                modifier = Modifier
                    .fillMaxWidth()
                    .padding(16.dp),
                 
                // on below line we are specifying
                // alignment for our text
                textAlign = TextAlign.Center
            )
        }
    }
 
    // on below line we are creating a method 
    // to get the speech input from user.
    private fun getSpeechInput(context: Context) {
        // on below line we are checking if speech
        // recognizer intent is present or not.
        if (!SpeechRecognizer.isRecognitionAvailable(context)) {
            // if the intent is not present we are simply displaying a toast message.
            Toast.makeText(context, "Speech not Available", Toast.LENGTH_SHORT).show()
        } else {
            // on below line we are calling a speech recognizer intent
            val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
             
            // on the below line we are specifying language model as language web search
            intent.putExtra(
                RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_WEB_SEARCH
            )
             
            // on below line we are specifying extra language as default english language
            intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault())
             
            // on below line we are specifying prompt as Speak something
            intent.putExtra(RecognizerIntent.EXTRA_PROMPT, "Speak Something")
             
            // at last we are calling start activity
            // for result to start our activity.
            startActivityForResult(intent, 101)
        }
    }
 
    // on below line we are calling our on activity result method to get the output.
    override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
        super.onActivityResult(requestCode, resultCode, data)
        // on below line we are checking if the request
        // code is same and result code is ok
        if (requestCode == 101 && resultCode == Activity.RESULT_OK) {
            // if the condition is satisfied we are getting 
            // the data from our string array list in our result.
            val result = data?.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS)
            // on below line we are setting result 
            // in our output text method.
            outputTxt = result?.get(0).toString()
        }
    }
}

Step 5: Adding permissions in AndroidManifest.xml

Now for using the microphone we have to specify microphone permissions. So for adding these permissions. We have to navigate to app > manifest > AndroidManifest.xml and add the below permissions to it. Comments are added in the code to get to know in detail.

XML

<!--on below line we are adding permission for internet as well as recording audio-->
<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
 
<!--on below line we are adding queries for our speech reorganization intent-->
<queries>
        <intent>
            <action android:name="android.speech.RecognitionService"/>
        </intent>
</queries>