Recently for a Hackathon event, I was toying with the idea to implement a Voice Assistant on the lines of Siri, Google Now or Cortana for one of our popular web applications. So the basic things needed were Speech Recognition and Speech Synthesis i.e. an engine to help with conversion of Speech To Text and Text To Speech.
I knew that Google Chrome introduced an HTML attribute called x-webkit-speech in 2011. The attribute would work with the input tag and will show up as shown below. It was deprecated soon due to the flexibility and control issues (along with the much widely reported security loophole which will allow eavesdropping website to abuse it by listening to users without consent or any indication) and replaced it with Web Speech APIs.
Web Speech API
Web Speech API’s are easy to use and one can create wonderful interactive experiences with it.webstruck
On the flip side, it’s available as experimental implementation only on Chrome. But looking at the Chrome’s growing popularity, you won’t be out of place if you decide to offer it as an additional feature or user input for your web application or blog. Web Speech API offers Speech Recognition (Speech To Text or STT) interface as well as Speech Synthesis (Text To Speech or TTS) interface.
You may also download an easy to follow code lab from Google Developers and learn how to add it to your web application.
It is important to note that you will need access to a web server to host files to. The Web Speech API only works for files served via https:// or http:// and not file:///. You may host your web page with Google Drive or Dropbox for quick testing. Also, if you run it over SSL or https:// the permission will be persistent and user won’t have to grant/deny every-time. That is why it is advisable to host such applications over SSL for a better user experience. If you are wondering where exactly Chrome stores permission for this, you can use this quick URL chrome://settings/contentExceptions#media-stream to view/alter existing permissions.
Web Speech is not 100% accurate, although it worked reasonably well for me during my experiments with it. And it is only going to improve.