=> 🏡 Home | Back to gemlog
I have recently posted about CENode [1] and how it might be used in IoT systems [2].
Since CENode is partially designed to communicate directly with humans (particularly those out and about or "in the field") it makes sense for inputs and queries to be provided via voice in addition to or instead of a text interface. Whilst this has been explored in the browser (including in the previous Philips Hue control demo [3]), it made sense to also try to leverage the Alexa voice service to interact with a CENode instance.
=> 3
The Alexa Voice Service [4] and Alexa Skills Kit [5] are great to work with, and it was relatively straight forward to create a skill to communicate with CENode's RESTful API [6].
This short video [7] demonstrates this through using an Amazon Echo to interact with a standard, non-modified CENode instance running on CENode Explorer [8] that is partly pre-loaded with the "space" scenario used in our main CENode demo [9]. The rest of the post discusses the implementation and challenges.
Typical Alexa skills are split into "intents" [10], which describe the individual ways people might interact with the service. For example, the questions "what is the weather like today?" and "is it going to rain today?" may be two intents of a single weather skill.
=> 10
The skill logic is handled by AWS Lambda [11], which is used to associate each intent with an action. When someone gives a voice command, the Alexa Voice Service (AVS) determines which intent is being called for which service, and then passes the control over to the appropriate segment in the Lambda function. The function returns a response to the AVS, which is read back out to the user.
=> 11
The strength of Alexa's ability to recognise speech is largely dependent on the information given to build each intent. For example, the intent "what is the weather like in {cityName}?", where cityName
is a variable with several different possibilities generated during the build, will accurately recognise speech initiating this intent because the sentence structure is so well defined. A single intent may have several ways of calling it - "what's the weather like in...", "tell me what
the weather is in...", "what's the weather forecast for...", etc. - which can be bundled into the model to further improve the accuracy even in noisy environments or when spoken by people with strong accents.
Since CENode is designed to work with an entire input string, however, the voice-to-text accuracy is much lower, and thus determining the intent and its arguments is harder. Since we need CENode to handle the entire input, our demo only has a single intent with two methods of invocation (slots):
ask Sherlock {sentence}
tell Sherlock {sentence}
Since 'Sherlock' is also provided as the invocation word for the service, both slots implicitly indicate both the service and the single intent to work with. I used 'Sherlock' as the name for the skill as it's a name we've used before for CENode-related apps and it is an easy word for Alexa to understand!
text/gemini;lang=en-GB
This content has been proxied by September (ba2dc).