Blogs | Srijan

Alexa Skill Directives and Managing Different Responses

Written by Sanjay Rohila | Nov 14, 2018 8:00:00 AM

While there's a lot of documentation around how we created our own custom Alexa skill, let's talk a bit about the implementation part of it. We'll focus on how to manage Alexa skill directives, different responses, and the code in itself.

Amazon has a range of devices which support skills -  Echo Dot, Alexa, Echo Show, Fire TV etc. But they do not support the same kind of response templates which we can use. So let's dive a little deeper into all the different types:

Echo (Voice only)

The bare minimum response we can have in the Echo is:

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "PlainText",
"text": "Hey there!"
}
}
}

This is what the devices will receive and give as a voice response to the user. The immediate step we can take is to add more control over voice response and use SSM. We can have SSML in outputSpeech, which will allow us to control, emphasize, slow the pitch, or add a pause between lines. 

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>Hi, I am <emphasis level='strong'>Virtual Assistant.</emphasis></speak>"
}
}
}

For more information, you can also read in detail about SSML and the markup tag it supports.

Same as Lex, Echo also has a dialog state but in a different format. We have to use Alexa skill directives for that. Directives have the 'type' property which is similar to dialogState in Lex  -  Elicit slot, delegate, fulfill etc. Below is a response to the Elicit slot from Alexa:

{
"version": "1.0",
"response": {
"outputSpeech": {
"type": "PlainText",
"text": "What kind of pizza you like?"
},
"directives": [{
"type": "Dialog.ElicitSlot",
"slotToElicit": "pizza_type",
"updatedIntent": "order_pizza"
}]
}
}

If intent and slots, and response content appear identical, we should have a centralized place to manage content which could be served to both Lex and Alexa.

Echo Show (Voice and Screen)

Echo Show has a display screen in addition to the voice assistance. However, it has lots of limitations. As it is a display device, it has a directive 'Display.RenderTemplate' which is awesome but has very few usable pre-defined templates. 

Since we cannot do Elicit slot in 'RenderTemplate' directive, we can't have next level of the slot in Echo Show. Hence we will have to provide all inputs as intents. Below is an example of the basic Echo show response:

 

This will show title and body content on the screen, as well as a voice response. Text content can have rich text with limited markup.

Interested in more how-tos around AWS services? Check out:

 

Srijan is now an AWS Advanced Consulting Partner. Drop us a line if you want to get in touch!