Add Speech Recognition and Synthesis to your web apps with Web Speech API

Recently for a Hackathon event, I was toying with the idea to implement a Voice Assistant on the lines of Siri, Google Now or Cortana for one of our popular web applications. So the basic things needed were Speech Recognition and Speech Synthesis i.e. an engine to help with conversion of Speech To Text and Text To Speech.

I knew that Google Chrome introduced an HTML attribute called x-webkit-speech in 2011. The attribute would work with the input tag and will show up as shown below. It was deprecated soon due to the flexibility and control issues (along with the much widely reported security loophole which will allow eavesdropping website to abuse it by listening to users without consent or any indication) and replaced it with Web Speech APIs.


x-webkit-speech in action

Web Speech API

Web Speech API’s are easy to use and one can create wonderful interactive experiences with it.


On the flip side, it’s available as experimental implementation only on Chrome. But looking at the Chrome’s growing popularity, you won’t be out of place if you decide to offer it as an additional feature or user input for your web application or blog. Web Speech API offers Speech Recognition (Speech To Text or STT) interface as well as Speech Synthesis (Text To Speech or TTS) interface.

Web Speech APIs were obviously designed to eliminate drawbacks of an earlier approach by allowing complete control and flexibility via JavaScript. Apart from the permission to use the microphone which is handled by the browser (rightly so), the developer has 100% control and flexibility. Below is the slide from Google Developers presentation which highlights the features offered. I would recommend going through the complete presentation for details and explore more with the examples and downloads provided.

Slide from Google Developers Presentation

Slide from Google Developers Presentation

You may also download an easy to follow code lab from Google Developers and learn how to add it to your web application.

It is important to note that you will need access to a web server to host files to. The Web Speech API only works for files served via https:// or http:// and not file:///. You may host your web page with Google Drive or Dropbox for quick testing. Also, if you run it over SSL or https:// the permission will be persistent and user won’t have to grant/deny every-time. That is why it is advisable to host such applications over SSL for a better user experience. If you are wondering where exactly Chrome stores permission for this, you can use this quick URL chrome://settings/contentExceptions#media-stream to view/alter existing permissions.

Web Speech is not 100% accurate, although it worked reasonably well for me during my experiments with it. And it is only going to improve.