Build Your First Voice Assistant
Hone your practical speech recognition application skills with this overview of building a voice assistant using Python.
Nowadays, it isn’t surprising to hear someone speak to someone that isn’t there. We ask Alexa for the weather and to turn the temperature down on the thermostat. Then, we ask Siri what our schedule for the day is and to call people. We are connected now more than ever using our voice and voice interface technology. I can’t imagine doing things manually anymore! It’s truly the future.
Who doesn't want to have the luxury to own an assistant who always listens for your call, anticipates your every need, and takes action when necessary? That luxury is now available thanks to artificial intelligence-based voice assistants.
Voice assistants come in somewhat small packages and can perform a variety of actions after hearing your command. They can turn on lights, answer questions, play music, place online orders and do all kinds of AI-based stuff.
Voice assistants are not to be confused with virtual assistants, which are people who work remotely and can, therefore, handle all kinds of tasks. Rather, voice assistants are technology based. As voice assistants become more robust, their utility in both the personal and business realms will grow as well.
What is a Voice Assistant?
A voice assistant or intelligent personal assistant is a software agent that can perform tasks or services for an individual based on verbal commands i.e. by interpreting human speech and respond via synthesized voices. Users can ask their assistants’ questions, control home automation devices, and media playback via voice, and manage other basic tasks such as email, to-do lists, open or close any application etc with verbal commands.
Let me give you the example of Braina (Brain Artificial) which is an intelligent personal assistant, human language interface, automation and voice recognition software for Windows PC. Braina is a multi-functional AI software that allows you to interact with your computer using voice commands in most of the languages of the world. Braina also allows you to accurately convert speech to text in over 100 different languages of the world.
History of Voice Assistants
In recent times, Voice assistants got the major platform after Apple integrated the most astonishing Virtual Assistant — Siri which is officially a part of Apple Inc. But the timeline of greatest evolution began with the year 1962 event at the Seattle World Fair where IBM displayed a unique apparatus called Shoebox. It was the actual size of a shoebox and could perform scientific functions and can perceive 16 words and also speak them in the human recognizable voice with 0 to 9 numerical digits.
During the period of the 1970s, researchers at Carnegie Mellon University in Pittsburgh, Pennsylvania — with the considerable help of the U.S Department of Defence and its Defence Advanced Research Projects Agency (DARPA) — made Harpy. It could understand almost 1,000 words, which is approximately the vocabulary of a three-year-old child.
Big organizations like Apple and IBM sooner in the 90s started to make things that utilized voice acknowledgment. In 1993, Macintosh began to building speech recognition with its Macintosh PCs with PlainTalk.
In April 1997, Dragon NaturallySpeaking was the first constant dictation product which could comprehend around 100 words and transform it into readable content.
Having said that, how cool it would be to build a simple voice-based desktop/laptop assistant that has the capability to:
- Open the subreddit in the browser.
- Open any website in the browser.
- Send an email to your contacts.
- Launch any system application.
- Tells you the current weather and temperature of almost any city
- Tells you the current time.
- Play you a song on VLC media player(of course you need to have VLC media player installed in your laptop/desktop)
- Change desktop wallpaper.
- Tells you latest news feeds.
- Tells you about almost anything you ask.
So here in this article, we are going to build a voice-based application which is capable of doing all the above-mentioned tasks. But first, check out this video below which I made while I was interacting with the desktop voice assistant and I call her Sofia.
New video by Nagesh Chauhan
Interaction with Sofia
I hope you guys have liked the above video in which I was interacting with Sofia. Now let’s start building this cool thing…
Dependencies and requirements :
System requirements: Python 2.7, Spyder IDE, MacOS Mojave(version 10.14)
Install all these python libraries :
Let’s start building our desktop voice assistant using python
Start by importing all the required libraries :
For our voice-assistant to perform all the above-discussed features, we have to code the logic of each of them in one method.
So our first step is to create the method which will interpret user voice response.
Next, create a method that will convert text to speech.
Now create a loop to continue executing multiple commands. Inside the method assistant() passing user command(myCommand()) as parameters.
Our next step is to create multiple if statements corresponding to each of the features. So let us see how to create these small modules inside if statement for each command.
1. Open the subreddit Reddit in the browser.
The user will give any command to open any subreddit from Reddit and the command should be “Hey Sofia! Can you please open Reddit subreddit_name”. only the italic bold phrase should be used as it is. You can use any kind of prefix, just take care of the italic bold phrase.
How it works : If you have said the phrase open reddit in your command then it will search for subreddit name in the user command using re.search(). The subreddit will be searched using www.reddit.com and will get opened in the browser using pythons Webbrowser module.The Webbrowser module provides a high-level interface to allow displaying Web-based documents to users.
So, the above code will open your desired Reddit in your default browser.
2. Open any website in the browser.
You can open any website just be saying “open website.com” or “open website.org”.
For example: “Please open facebook.com” or “Hey, can you open linkedin.com” like this you can ask Sofia to open any website for you.
How it works : If you have said the word open in your command then it will search for website name in the user command using re.search(). Next, it will append the website name to https://www. and using webbrowser module the complete URL gets opened in the browser.
3. Send Email.
You can also ask your desktop assistant to send the email.
How it works : If you have said the word email in your command then the bot will ask for receipient, If my response is rajat, the bot will use pthons smtplib library. The smtplib module defines an SMTP client session object that can be used to send mail to any Internet machine with an SMTP or ESMTP listener daemon. Sending mail is done with Python’s smtplib using an SMTP server. First it will initaite gmail SMTP using smtplib.SMTP(), then identify the server using ehlo() function, then encypting the session starttls(), then login to your mailbox using login(), then sending the message using sendmail().
4. Launch any system application.
Say “launch calendar” or “can you please launch skype” or “Sofia launch finder” etc. and Sofia will launch that system application for you.
How it works : If you have said the word launch in your command then it will search for application name(if it is present in your system) in the user command using re.search(). It will then append the suffix “.app” to the application name. Now your application name is for example say calender.app(In macOS the executable files end with extension .app unlike in Windows which ends with .exe). So the executable application name will be launched using python subprocess’s Popen() function. The subprocess module enables you to start new applications from your Python program.
5. Tells you the current weather and temperature of almost any city.
Sofia can also tell you the weather, maximum and minimum temperature of any city around the world. The user just needs to say something like “what is the current weather in London” or “tell me the current weather in Delhi”.
How it works : If you have said the phrase current weather in your command then it will search for city name using re.search(). I have used pythons pyowm library to get the weather of any city. get_status() will tell you about the weather condition like haze, cloudy, rainy etc and get_temperature() will tell you about the max and min temperature of the city.
6. Tells you the current time.
“Sofia can you tell me the current time ?” or “what is the time now ?” and Sofia will tell you the current time of your timezone.
How it works : Its pretty simple
7. Greetings/ leave
Say “hello Sofia” to greet your voice assistant or when you want the program to terminate say something like “shutdown Sofia” or “Sofia please shutdown” etc.
How it works : If you have said the word hello in your command, then depending on the time of the day, the bot will greet the user. If the time is more than 12 noon, the bot will respond “Hello Sir. Good afternoon”, likewise if the time is more than 6 ck pm, the bot will respond “Hello Sir. Good evening”. And when you give command as shutdown, sys.exit() will be called to terminate the program.
8. Play you a song on VLC media player
This feature allows your voice bot to play your desired song in VLC media player. The user will say “Sofia play me a song”, the bot will ask “What song shall I play Sir?”. Just say the name of the song and Sofia will download the song from youtube in your local drive, play that song on the VLC media player and if you again play a song the previously downloaded song will get deleted automatically.
How it works :If you have said the phrase play me a song in your command, then it will ask you what video song to play. The song you will ask will be searched in youtube.com, If found than the song will be downloaded in your local directory using pythons youtube_dl library. The youtube-dl is a command-line program to download videos from YouTube.com and a few more sites. Now the song will be played as soon as it gets downloded using pythons VLC library and play(path_to__videosong) module actually playes the song.
Now if the next time you ask for any other song, the local directory will be flushed and a new song will be downloaded in that directory.
9. Change desktop wallpaper.
You guys can also change your desktop wallpaper using this feature. When you say something like “change wallpaper” or “Sofia please change wallpaper” the bot will download random wallpaper from unsplash.comand sets it as your desktop background.
How it works : If you have said the phrase change wallpaper in your command, the program will download a random wallpaper from unsplash.com, store it in local directory and set it as your desktop wallpaper using subprocess.call(). I have used unsplash API to get access to its content.
Now if the next time you ask to change the wallpaper again, your local directory will be flushed and a new wallpaper will be downloaded in that directory.
10. Tells you latest news feeds.
Sofia can also tell you the latest news update. The user just has to say “Sofia what are the top news for today ?” or “tell me the news for today”.
How it works : If you have said the phrase news for today in your command then it will scrape data using Beautiful Soup from Google News RSS() and read it for you. For convineince I have set number of news limit to 15.
11. Tells you about almost anything you ask.
Your bot can fetch details of almost anything you ask her. Like “Sofia tell me about Google” or “Please tell me about Supercomputers” or “please tell me about the Internet”. So as you can see you can ask about almost anything.
How it works : If you have said the phrase tell me about in your command then it will search for the keyword in the user command using re.search(). Using pythons wikipedia library it will search for that topic and extract first 500 characters(if you dont specify the limit the bot will read the whole page for you). Wikipedia is a Python library that makes it easy to access and parse data from Wikipedia.
Let's put everything together
So you have seen how just by writing simple lines of python code we can create a very cool voice-based desktop/laptop assistant. Apart from these features, you can also include many different features in your voice assistant.
Please not that once you start executing your program, be loud and clear while you are interacting with voice assistant because it may happen that if your voice is not clear your voice assistant may not be able to interpret you properly.
Conclusion: What the future holds
Throughout the history of computing, user interfaces have become progressively natural to use. The screen and keyboard were one step in this direction. The mouse and graphical user interface were another. Touch screens are the most recent development. The next step will most likely consist of a mix of augmented reality, gestures and voice commands. After all, it is often easier to ask a question or have a conversation than it is to type something or enter multiple details in an online form.
The more a person interacts with voice-activated devices, the more trends, and patterns the system identifies based on the information it receives. Then, this data can be utilized to determine user preferences and tastes, which is a long-term selling point for making a home smarter. Google and Amazon are looking to integrate voice-enabled artificial intelligence capable of analyzing and responding to human emotion.
I hope you guys have enjoyed reading this article. Share your thoughts/comments/doubts in the comment section. You can reach me out over LinkedIn.
Bio: Nagesh Singh Chauhan is a Big data developer at CirrusLabs. He has over 4 years of working experience in various sectors like Telecom, Analytics, Sales, Data Science having specialisation in various Big data components.
Original. Reposted with permission.
- Practical Speech Recognition with Python: The Basics
- Comparison of the Top Speech Processing APIs
- TensorFlow vs PyTorch vs Keras for NLP
Top Stories Past 30 Days