Platinum BlogEasy Speech-to-Text with Python

In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.



By Dhilip Subramanian, Data Scientist and AI Enthusiast

Figure

Source: Screenshot from Information-Age

 

Speech is the most common means of communication and the majority of the population in the world relies on speech to communicate with one another. Speech recognition system basically translates spoken languages into text. There are various real-life examples of speech recognition systems. For example, Apple SIRI which recognize the speech and truncates into text.

 

How does Speech recognition work?

 

Figure

Speech Recognition process

 

Hidden Markov Model (HMM), deep neural network models are used to convert the audio into text. A full detailed process is beyond the scope of this blog. In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.

Speech Recognition API supports several API’s, in this blog I used Google speech recognition API. For more details, please check this. It helps to translate for converting speech into text.

 

Python Libraries

 

!pip install SpeechRecognition

 

 

Convert an audio file into text

 
Steps:

  1. Import Speech recognition library
  2. Initializing recognizer class in order to recognize the speech. We are using google speech recognition.
  3. Audio file supports by speech recognition: wav, AIFF, AIFF-C, FLAC. I used ‘wav’ file in this example
  4. I have used ‘taken’ movie audio clip which says “I don’t know who you are I don’t know what you want if you’re looking for ransom I can tell you I don’t have money”
  5. By default, google recognizer reads English. It supports different languages, for more details please check this documentation.

 

Code

 

#import library
import speech_recognition as sr

# Initialize recognizer class (for recognizing the speech)
r = sr.Recognizer()

# Reading Audio file as source
# listening the audio file and store in audio_text variable

with sr.AudioFile('I-dont-know.wav') as source:
    
    audio_text = r.listen(source)
    
# recoginize_() method will throw a request error if the API is unreachable, hence using exception handling
    try:
        
        # using google speech recognition
        text = r.recognize_google(audio_text)
        print('Converting audio transcripts into text ...')
        print(text)
     
    except:
         print('Sorry.. run again...')

 

Output

 

How about converting different audio language?

 
For example, if we want to read a french language audio file, then need to add language option in the recogonize_google. Remaining code remains the same. Please refer more on the documentation

#Adding french langauge option
text = r.recognize_google(audio_text, language = "fr-FR")

 

Output

 

Microphone speech into text

 
Steps:

  1. We need to install PyAudio library which used to receive audio input and output through the microphone and speaker. Basically, it helps to get our voice through the microphone.
!pip install PyAudio

 

  1. Instead of audio file source, we have to use the Microphone class. Remaining steps are the same.

 

Code

 

#import library

import speech_recognition as sr

# Initialize recognizer class (for recognizing the speech)

r = sr.Recognizer()

# Reading Microphone as source
# listening the speech and store in audio_text variable

with sr.Microphone() as source:
    print("Talk")
    audio_text = r.listen(source)
    print("Time over, thanks")
# recoginize_() method will throw a request error if the API is unreachable, hence using exception handling
    
    try:
        # using google speech recognition
        print("Text: "+r.recognize_google(audio_text))
    except:
         print("Sorry, I did not get that")

 

I just talked “How are you?”

Output

 

How about talking in a different language?

 
Again, we need to add the required language option in the recognize_google(). I am talking in Tamil, Indian language and adding “ta-IN” in the language option.

# Adding "tamil language"
print(“Text: “+r.recognize_google(audio_text, language = “ta-IN”))

 

I just said “how are you” in Tamil and it prints the text in Tamil accurately.

Output

 

Note:

 
Google speech recognition API is an easy method to convert speech into text, but it requires an internet connection to operate.

In this blog, we have seen how to convert the speech into text using Google speech recognition API. This would be very helpful for NLP projects especially handling audio transcripts data. If you have anything to add, please feel free to leave a comment!

Thanks for reading. Keep learning and stay tuned for more!

 
Bio: Dhilip Subramanian is a Mechanical Engineer and has completed his Master's in Analytics. He has 9 years of experience with specialization in various domains related to data including IT, marketing, banking, power, and manufacturing. He is passionate about NLP and machine learning. He is a contributor to the SAS community and loves to write technical articles on various aspects of data science on the Medium platform.

Original. Reposted with permission.

Related: