Easy Speech-to-Text with Python
In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.
By Dhilip Subramanian, Data Scientist and AI Enthusiast
Speech is the most common means of communication and the majority of the population in the world relies on speech to communicate with one another. Speech recognition system basically translates spoken languages into text. There are various real-life examples of speech recognition systems. For example, Apple SIRI which recognize the speech and truncates into text.
How does Speech recognition work?
Hidden Markov Model (HMM), deep neural network models are used to convert the audio into text. A full detailed process is beyond the scope of this blog. In this blog, I am demonstrating how to convert speech to text using Python. This can be done with the help of the “Speech Recognition” API and “PyAudio” library.
Speech Recognition API supports several API’s, in this blog I used Google speech recognition API. For more details, please check this. It helps to translate for converting speech into text.
!pip install SpeechRecognition
Convert an audio file into text
- Import Speech recognition library
- Initializing recognizer class in order to recognize the speech. We are using google speech recognition.
- Audio file supports by speech recognition: wav, AIFF, AIFF-C, FLAC. I used ‘wav’ file in this example
- I have used ‘taken’ movie audio clip which says “I don’t know who you are I don’t know what you want if you’re looking for ransom I can tell you I don’t have money”
- By default, google recognizer reads English. It supports different languages, for more details please check this documentation.
#import library import speech_recognition as sr # Initialize recognizer class (for recognizing the speech) r = sr.Recognizer() # Reading Audio file as source # listening the audio file and store in audio_text variable with sr.AudioFile('I-dont-know.wav') as source: audio_text = r.listen(source) # recoginize_() method will throw a request error if the API is unreachable, hence using exception handling try: # using google speech recognition text = r.recognize_google(audio_text) print('Converting audio transcripts into text ...') print(text) except: print('Sorry.. run again...')
How about converting different audio language?
For example, if we want to read a french language audio file, then need to add language option in the recogonize_google. Remaining code remains the same. Please refer more on the documentation
#Adding french langauge option text = r.recognize_google(audio_text, language = "fr-FR")
Microphone speech into text
- We need to install PyAudio library which used to receive audio input and output through the microphone and speaker. Basically, it helps to get our voice through the microphone.
!pip install PyAudio
- Instead of audio file source, we have to use the Microphone class. Remaining steps are the same.
#import library import speech_recognition as sr # Initialize recognizer class (for recognizing the speech) r = sr.Recognizer() # Reading Microphone as source # listening the speech and store in audio_text variable with sr.Microphone() as source: print("Talk") audio_text = r.listen(source) print("Time over, thanks") # recoginize_() method will throw a request error if the API is unreachable, hence using exception handling try: # using google speech recognition print("Text: "+r.recognize_google(audio_text)) except: print("Sorry, I did not get that")
I just talked “How are you?”
How about talking in a different language?
Again, we need to add the required language option in the recognize_google(). I am talking in Tamil, Indian language and adding “ta-IN” in the language option.
# Adding "tamil language" print(“Text: “+r.recognize_google(audio_text, language = “ta-IN”))
I just said “how are you” in Tamil and it prints the text in Tamil accurately.
Google speech recognition API is an easy method to convert speech into text, but it requires an internet connection to operate.
In this blog, we have seen how to convert the speech into text using Google speech recognition API. This would be very helpful for NLP projects especially handling audio transcripts data. If you have anything to add, please feel free to leave a comment!
Thanks for reading. Keep learning and stay tuned for more!
Bio: Dhilip Subramanian is a Mechanical Engineer and has completed his Master's in Analytics. He has 9 years of experience with specialization in various domains related to data including IT, marketing, banking, power, and manufacturing. He is passionate about NLP and machine learning. He is a contributor to the SAS community and loves to write technical articles on various aspects of data science on the Medium platform.
Original. Reposted with permission.
- Easy Text-to-Speech with Python
- Five Cool Python Libraries for Data Science
- Docker: Containerization for Data Scientists