Easiest way to add speech recognition to your web site

In continuation with my last article on Chrome’s Web Speech APIs, I stumbled across a much easier way to add speech recognition to a web application or site. “annyang” is a JavaScript library to add voice commands to your site. It makes use of Web Speech APIs only but adds a nice thin layer to make it much more usable. Not to mention, it is super easy to get started.

A tiny JavaScript Speech Recognition library that lets your users control your site with voice commands.

annyang!

annyang is developed by Tal Ater, same fella who reported how a bug in Chrome could be exploited to listen to users conversations even after user left the site. Incidentally, he discovered this while he was working on annyang itself.

annyang just adds 2 kb to your site, but it makes Web Speech APIs extremely usable. The basic premise here is that it allows developers to add voice commands and action to be taken when a user says those commands on his site/application. Simply include the annyang.js library and start adding commands. The sample below shows all the code you need to show “Hello World!” alert to the user who says “hello” on your site. Awesome, isn’t it?

[sourcecode language=”javascript”]
// Include annyang from a CDN service or from your own server.
<script src="//cdnjs.cloudflare.com/ajax/libs/annyang/1.4.0/annyang.min.js"></script>

<script>
// check if speech recognition is supported
if (annyang) {

// Define your commands
var commands = {

‘hello’: function() { alert(‘Hello World!!!’); }

};

// Use addCommands API to add commands to annyang
annyang.addCommands(commands);

// Start listening. You call this right here or do it later on some event like button click
annyang.start();

}
</script>
[/sourcecode]

Apart from one-word commands as shown above, annyang supports more complex commands which allow commands with named variables, splats or optional words.

Named variable is a single word variable, which can be added anywhere in the command text. For e.g. ‘show :month articles’

[sourcecode language=”javascript”]
<script>
var commands = {

// saying ‘show December articles’ will call ‘showArticles’ function with ‘December’ as a parameter

‘show :month articles’: showArticles,

};

var showArticles = function(month) {
// Code to open articles for said ‘month’
}

</script>
[/sourcecode]

A splat will capture anything after * and make it available to your function.

[sourcecode language=”javascript”]
<script>
var commands = {

// saying ‘search chrome bookmarks’ will call ‘performSearch’ function with ‘chrome bookmarks’ as parameter

‘search *term’: performSearch,

};

var performSearch = function(term) {
// Code to perform a search for the said ‘term’
}
</script>
[/sourcecode]

You may also include optional words in your command. For e.g. a command ‘show (open) :month articles‘ will respond to both ‘show December articles‘ or ‘open December articles‘. Just include your optional words in brackets and you’re good to go.

The library will work with browsers that support SpeechRecognition (only Chrome as of now) and leave all other users unaffected.

Do checkout this nice video of a todo application built using Angular JS and annyang.

Quick Tip


If you are developing locally (over HTTP) and troubled by repeated permission asked by Chrome  to use the microphone, you should consider switching to HTTPS. Not only that HTTPS returns speech results faster, but also enhances user experience along with added security. For your testing purpose, it is easy to securely expose your localhost to the internet using awesome tools like ngrok.

Add Speech Recognition and Synthesis to your web apps with Web Speech API

Recently for a Hackathon event, I was toying with the idea to implement a Voice Assistant on the lines of Siri, Google Now or Cortana for one of our popular web applications. So the basic things needed were Speech Recognition and Speech Synthesis i.e. an engine to help with conversion of Speech To Text and Text To Speech.

I knew that Google Chrome introduced an HTML attribute called x-webkit-speech in 2011. The attribute would work with the input tag and will show up as shown below. It was deprecated soon due to the flexibility and control issues (along with the much widely reported security loophole which will allow eavesdropping website to abuse it by listening to users without consent or any indication) and replaced it with Web Speech APIs.

x-webkit-speech
x-webkit-speech in action
Web Speech API

Web Speech API’s are easy to use and one can create wonderful interactive experiences with it.

webstruck

On the flip side, it’s available as experimental implementation only on Chrome. But looking at the Chrome’s growing popularity, you won’t be out of place if you decide to offer it as an additional feature or user input for your web application or blog. Web Speech API offers Speech Recognition (Speech To Text or STT) interface as well as Speech Synthesis (Text To Speech or TTS) interface.

Web Speech APIs were obviously designed to eliminate drawbacks of an earlier approach by allowing complete control and flexibility via JavaScript. Apart from the permission to use the microphone which is handled by the browser (rightly so), the developer has 100% control and flexibility. Below is the slide from Google Developers presentation which highlights the features offered. I would recommend going through the complete presentation for details and explore more with the examples and downloads provided.

Slide from Google Developers Presentation
Slide from Google Developers Presentation

You may also download an easy to follow code lab from Google Developers and learn how to add it to your web application.

It is important to note that you will need access to a web server to host files to. The Web Speech API only works for files served via https:// or http:// and not file:///. You may host your web page with Google Drive or Dropbox for quick testing. Also, if you run it over SSL or https:// the permission will be persistent and user won’t have to grant/deny every-time. That is why it is advisable to host such applications over SSL for a better user experience. If you are wondering where exactly Chrome stores permission for this, you can use this quick URL chrome://settings/contentExceptions#media-stream to view/alter existing permissions.

Web Speech is not 100% accurate, although it worked reasonably well for me during my experiments with it. And it is only going to improve.