Using Smart Speakers for Accessibility

Talk by Chip Edwards

Smart Speakers (like Amazon Alexa and Google Home) are now in every room of the house. And the same Voice Assistant technology is found in smartphones, cars, and now wearables. When someone asks for your content, how can you make sure that they hear it? Let’s look at how to use this new platform to deliver your content to your audience.

View Chip’s Presentation

Transcript

David Vogelpohl: Hello, everyone, and welcome back to WordPress Accessibility Day. My name is David Vogelpohl from WP Engine, and I’m proud to be your host for the session “Using Smart Speakers for Accessibility,” with Chip Edwards. I’m honored to introduce Chip in his session, “Using Smart Speakers for Accessibility.” Chip is a partner at CreateMyVoice.com. He helps content producers engage audiences on smart speakers like Amazon Alexa and Google Home. Chip is a teacher–hopefully he’s a good one today–a speaker, and technologist who is passionate about connecting WordPress content to the new voice technology.

Chip, welcome, and glad to have you here.

Chip Edwards: Thanks, glad to be here and be able to talk with us about the session. This is good stuff.

David: Looking forward to it.

Chip: Thanks, David. I appreciate it. Ready for me to share my screen?

David: Yes, sir, ready to go. Let’s get it goin’.

Chip: Okay, let’s see.

David: I love your background, by the way.

Chip: Oh, thank you, heh-heh. Share computer sound ’cause we’re gonna do some today. Let’s see. Hopefully that is up and runnin’. Okay, so we all care about accessibility, including making our websites accessible for the visually impaired. Stuff like using header tags properly, makin’ sure that the alt text has meaningful information. These are important so that screen readers can turn our text into audio.

But today I wanna focus on how the adoption of smart speakers is changing how we think about design. If you think about it, smart speakers are kind of like screen readers but more so because the primary interaction with our content is verbal instead of visual. When thinking about voice-first design, our mind-sets shift because now we aren’t talking about doing something special for the visually impaired.

With smart speakers, if your content doesn’t work well for the visually impaired, it doesn’t work well for anybody on the platform. And before we get goin’, I want to acknowledge that smart-speaker platforms are not necessarily platforms for accessibility, however, I wanna show you how smart speakers are causing a significant shift in our UI/UX design thoughts. Smart speakers are shifting how we access compute resources. And I’m gonna be using some smart speakers today.

I’m actually recording some of ’em, but it kind of doesn’t matter on your end. So now would be a good time to un-mute your Amazon or your Google device. Okay, there’s three points that I wanna cover in the next few minutes. First, let’s talk about how smart speakers or voice assistants make tasks easier. The goal is to reduce the friction to getting stuff done, to make it easier to find things, to turn things on or off, to get directions, to hear content, in essence, to make our lives more convenient and to make it easier to get computers to do things for us.

Next, I wanna demonstrate what’s currently happening with voice assistants, and then lastly, I wanna give you some bonus material around verbal branding and why you may want to secure your verbal domain name sooner as opposed to later. So let’s get started. Smart speakers come in many shapes and sizes and from a number of manufacturers. We’re most familiar with devices from Amazon and Google.

They’ve got, like, 85% of the smart speakers out there, I think, that are producing, but there’s many traditional manufacturers of speakers as well, and they’re now making their speakers smart by adding microphones to them and connecting ’em up to one or more of the digital voice assistants, almost always Amazon’s or Google’s voice assistants, but there are a few others out there. In a recent article on Voicebot.ai, they reported on where people are putting these smart speakers.

While it used to be the living room and the kitchen, now the bedroom is the most popular location at over 45%. The living rooms got about 43%, and the kitchen with 41%. The numbers add up to more than 100% because people are–once they start using these devices, they’re finding them really convenient, and so they’re actually putting–they’re actually purchasing multiple smart speakers and putting ’em in a number of locations. By the way, now more than 13% are putting ’em in their bathroom as well.

In an article titled “The Future of Voice and the Implications for News,” published for the Reuters Institute, Nic Newman says, “Voice goes way beyond smart speakers to become embedded in every device”, because the tech companies view these voice-assistant devices as a way to support and anticipate user needs. These tech companies want to have their voice assistants everywhere so they can help us anywhere we are. In 2018, “The Wall Street Journal” reported on Amazon’s direction for their smart speakers. “Amazon had more than 10,000 employees working on its Amazon Alexa virtual assistant at the time, and they wanted to expand Alexa in the offices, cars, and even hotel rooms,” and they’ve got more locations that they’re actually working on right now.

Matter of fact, have any of you seen the latest Buick commercial where they’re arguing over whether it’s an SUV or an Alexa? This idea of our voice assistants being with us everywhere is actually starting to materialize. Last year, Sundar Pichai, the CEO of Google, said that he was “Moving Google from a company that helps you find answers to a company that helps you get things done.” Subtle shift, but they’re changing that, and as you look at even the voice search results, you see how they’re actually making some of those shifts.

Well, it didn’t start off this way.

Now the same voice assistant that’s found in your smart speaker is also in your smartphone and in your cars, and in the latest is in wearables, like glasses frames or earbuds. My favorite place is my earbuds actually. Now my voice assistant can be with me anywhere, places like the gym, where I like to listen to blogs and podcasts, and when I’m in the grocery stores, shopping, and I can just ask for “Hey, what’s on my shopping list?” I’ll be glad when they have these voice assistants in every elevator.

There’s already a computer waiting for somebody to touch the button. Why do I have to touch the button? Why doesn’t the computer just respond when I say, “Third floor, please”? I’m predicting that voice technology will continue to get better, and soon, we’ll start relying on these devices because they make tasks easy, and then, just like our smartphone, we won’t know how we ever lived without the voice assistant. So now I’d like to demonstrate how companies are currently using voice technology.

Most people are familiar with using their voice assistant for things like turning on the lights, adding items to shopping lists, getting the news or the weather or navigation, actually, like letting Siri or Google to find the best route home. But Google and Amazon have made a way for companies to make these voice assistants smarter by building a voice app. Amazon calls these voice apps “skills.” You may have heard that. Google calls these voice apps “actions.” I’ll use the generic term of “voice app” through the rest of this.

But instead of me just telling you what companies are doing, I figured it would be better for me to show you. I just mentioned about using a voice app to add capabilities to these voice-assistant devices. There’s one new term that you need to know. It’s called “invocation name,” sometimes referred to as a “verbal domain name.” So just like your website has a domain name, your voice app has an invocation name. The invocation name is how Google and Amazon know that the user wants to talk to your voice app. You’ll want to own your invocation name.

Well, let’s look at an example of a voice app created by Capital One. Here’s a screenshot from the Capital One website. Notice they’re promoting the Capital One voice app to their customers. I actually got a Capital One credit card just to be able to use their voice app. Now, by using the Capital One voice app, I don’t need to sit at my computer and log into their website. What if I could just ask for what I need and let the computer do all the work for me?

Check this out: Alexa, ask Capital One what my credit card balance is.

Alexa: Your current Quicksilver card balance is $1,227.44. Do you want to know when this bill is due?

Chip: Yes, please.

Alexa: The payment of your Quicksilver card is due March 17, 2020. Do you want to pay now?

Chip: No, thank you.

Alexa: No problem.

Chip: You probably noticed that the Capital One voice app not only answered my question about my credit card balance, but it assumed that I probably wanted to know when my balance was due, so it asked me, and it even asked if I wanted to pay my credit card bill, all just by using my voice.

But what if I’m looking for something specific like recent transactions or how much I spent at a particular merchant? Well, all I have to do is ask. Alexa, ask Capital One what my recent transactions are.

Alexa: Between April 30 and May 30, you have 17 transactions for a total of $535.63. It looks like you also had a credit of $218.49; $72.15 at Harris Teeter on May 27, followed by $19.28 at Circle K on May 27, and $4.88 at Harris Teeter on May 26. Would you like to hear more?

Chip: That’s all. Alexa: Good talking to you today.

Chip: Okay, or how about this one. Alexa, ask Capital One how much I paid to CVS last month.

Alexa: Between April 1 and April 30, you spent a total of $187.85.

Chip: Okay, now voice technology is still advancing. It’s still in its early phases, but the capabilities are growing exponentially right now. So while the voice app is the presentation layer, obviously the information is coming from the Capital One systems. So I’m not sure what time zone you’re in, but in North Carolina, it’s supper time, so let’s order pizza. Pepperoni, jalapeño, and extra cheese are my favorite toppings.

You might not know this about me, but I live a little bit outside the city, in the country on a little farm. It’s close enough to the city to be able to commute in, but it’s far enough that I’m outside the delivery zone for Domino’s, but that doesn’t stop me from ordering a pizza. I just have to go pick it up. Since I have an Alexa wearable and Alexa auto–and actually, I’ve got Google and stuff like that too, but, anyway, I can actually start driving to Domino’s and say, “Alexa, I’d like to order a pizza from Domino’s.”

Alexa: What type of crust do you want for your pizza? You can choose hand-tossed, gluten-free crust, Brooklyn style, hand-made pan, or crunchy-thin crust.

Chip: Crunchy-thin crust.

Alexa: What size would you like for your crunchy-thin crust pizza? You can choose medium or large.

Chip: Large.

Alexa: I’ve got a large, thin, cheese pizza. What toppings would you like with that?

Chip: Pepperoni and jalapeños.

Alexa: Okay, pepperoni and jalapeño peppers. Would you like any more toppings on that?

Chip: Extra cheese, please.

Alexa: Okay, I’ve added a large, thin pizza with pepperoni, jalapeño peppers, and extra cheese to your order. What would you like next?

Chip: I’d like to check out now.

Alexa: Would you like to complete your order by adding a 16-piece Parmesan Bread Bites for $3.99?

Chip: No, thank you.

Alexa: Your order contains one pizza. The total is $18.24, after taxes. Would you like to review your order details, or are you ready to check out?

Chip: I’m ready to check out.

Alexa: Your order should be ready for pickup in about 12 to 22 minutes. Do you want me to place the order?

Chip: Yes.

Alexa: Okay, I’ve placed your order. Your day just got cheesier.

Chip: Notice how they walk me through the entire process of ordering a pizza. They even tried to up-sell me for the cheesy sticks. Anyway, I think it was actually quicker than pulling out my cell phone and finding the number for Domino’s and talking to a real person. And since it takes me about 15 minutes to get to Domino’s, perfect timing. I can get in my car and head out there, and it was gonna be ready when I got there.

So many of you listen to podcasts, or maybe you were a podcaster. How easy is it for you to listen to a historical podcast or try to let people know, “Here’s how to listen to this podcast that I did in the past that’s got the information that you’re looking for”? In this “Audio Banding” podcast by Jodi Krangle, she has a lot of great information, but episodes 35 and 36 are actually her most important episodes, and you’ll see why in just a second. Jodi’s invocation name is “Audio Branding.” What if you could listen to any past episode just by saying something like, “Okay, Google, ask ‘Audio Branding’ to play episode 35.”

Google Assistant: Got it. Getting “Audio Branding.” ♪♪♪

Jodi Krangle: Welcome to “Audio Branding,” the hidden gem of marketing. I’m your host, Jodi Krangle. In this podcast, we’ll discuss just how sound influences our behavior. I generally talk about this in the context of advertising and marketing, but there are other places this is important too. I really feel that it plays a much more important role in our lives than maybe we realize, so let’s delve a little deeper. This is the first part of my interview with Chip Edwards. Whether you like it or not, smart speakers like Amazon Alexa are used by millions of people. The same voice-assistant technology found in smart speakers is also used in smartphones, cars, wearables, and more. Voice assistants are everywhere, and they’re changing how people consume content.

Amazon and Google have made a way for you to register your brand on these voice assistants. My guest today is technologist and teacher Chip Edwards, and he’s spoken across the country about the opportunities available on this new platform and how you can own and manage your verbal brand on these new devices. The future is here, and we’re gonna talk about it. Thanks for joining me today, Chip. I really appreciate it.

Chip: Okay, so now you know why episodes 35 and 36 are her most important episodes. It just happened to be the ones that I was on for. So, but let’s–so audio is already relatively easy to be able to put on to smart speakers and digital voice assistants, but let’s look at this one here. Many of you actually know Aida Correa. In addition to being a fantastic speaker, she’s got a wonderful blog. So what if you could listen to her blog posts while you were commuting to work? Aida’s voice app connects to her WordPress website and makes her blog posts available. Let’s start her voice app by using her invocation name, “Love Built Life.” Alexa, start “Love Built Life.”

Alexa: Welcome to “Love Built Life.” There is a new blog post available, titled “The Joy of Overcoming: A Decade in Retrospect, Part Two.” Would you like to hear the new blog post?

Chip: Yes.

Alexa: “I didn’t bring you on this journey of the last ten years of my life to give all the gloom and doom. No, sir, because you’ve seen the last few years of this decade that has been a turnaround in my situation and my friends. She met a nice young man at work, and they are now engaged to be married, yay.”

Chip: And the blog post goes on, and it would’ve read it for you. I’m just doin’ the first piece of each of these pieces of content. So the point is that she can continue to write blog posts like she likes to do, but now anyone can listen to them just by asking. So William Jackson has been a WordCamp speaker and supporter all over the place. I’ve run into him in I don’t know how many different WordCamps. He’s also a prolific blogger with content that everybody should hear. Read, hear? Both, actually. William wanted to add some character to his blog posts though, so not only does his voice app connect up with his WordPress website, but instead of having the default smart speaker voice read his blog post, his sound is different. William’s blog and invocation name is “My Quest to Teach.” So let’s ask for it. Alexa, ask “My Quest to Teach” to play the latest blog post.

Alexa: This is blog post 77, and it’s titled “Using WordPress to Share Your Voice and Address Social Issues.” Brian: “‘Using WordPress to Share Your Voice and Address Social Issues,’ instructor, William Jackson, M.Ed. Intro: there is a growing need for content creators from diverse backgrounds to tell their stories, to share their experiences and to encourage the need and acceptance for diversity in using the ability to speak, write, teach, and deliver transformative content.”

Chip: So he had a different voice, actually, read his blog post, and it’s all done–this is all automatic, and matter of fact, these audio clips were done earlier in the year. They actually just, both Google and Amazon, upgraded their text-to-speech synthesis process, so the voices are actually getting significantly better. Anyway, let’s look at one more. So I write a blog as well. I use my blog posts to not just write about voice technology but to demonstrate some of the possibilities. So my invocation name is “Create My Voice,” and what you’re about to hear is me asking my smart speaker to play one of my past blog posts.

And just to prep you, you’re going to hear the voice assistant whisper, use different English-speaking voices, and even use voices from other languages, all from within my blog post. And in a minute, I’m gonna show you, kind of, how some of that is done. And so I’m gonna actually have it read my entire blog post. It takes two or three minutes, somethin’ like that, so we’ve got a few minutes for it. So let’s listen to my blog post get read by a smart speaker. Alexa, ask Create My Voice to read blog post 27.

Alexa: Let’s get started. This blog post is titled, “Bringing Life to Your Content.” Greetings, I’m Alexa, Amazon’s smart speaker voice assistant. With a little help from Create My Voice, I can read your blog posts aloud. Now your audience can listen to your blog posts while making breakfast, working out, or driving to work. It’s really easy. The Create My Voice team connects me up to your blog posts using your RSS feed and a custom voice application. This means that you can keep blogging like you always have and take advantage of the new smart-speaker platform. Your audience will ask for your blog by name, and I’ll read your latest post.

I can read your older posts as well, and I’ve heard this is available on Google devices, including every Android device on the planet. But since I’m an Amazon Alexa, you didn’t hear that from me. In addition to just reading your blog, I can help you bring life to your blog. If I was going to tell you a secret– [whispering] I would whisper it to you. [speaking at normal volume] But even cooler than that, I can change my voice. This might be difficult to describe, so let me demonstrate. I have several friends.

I’ll have them introduce themselves. [speaking in British accent] Brian: My name is Brian. If it’s not clear, I’m from across the pond. [speaking in American accent] Joanna: Hi, my name is Joanna. I’m really glad to meet you here. [speaking in Indian accent] Raveena: Greetings, my name is Raveena. I’m also available to read all or part of your blog posts.

Alexa: I told you it was pretty cool. Being able to use multiple voices in your blog posts can bring your blog posts to life in a world where your audience wants to listen instead of read. In addition to multiple English voices, I can also speak using voices from other languages.

I’ll show you by greeting you in French, Spanish, and German. First, a male French voice. Mathieu: Bonjour. [speaking French]

Alexa: Next, a female Spanish voice. Conchita: [speaking Spanish]

Alexa: And, finally, a male German voice. Hans: [speaking German]

Alexa: Pretty cool, don’t you think? Like the additional English voices, these are available to bloggers to expand how you communicate through your blog posts. This concludes the post titled “Bringing Life to Your Content.” Would you like to hear the next blog post titled “How Does Voice Technology Work, An Introduction”?

Chip: No, that’s all.

Alexa: Thank you.

Chip: So what I wanted to demonstrate there is that this idea of being able to take our content and make it available or present our brand is–there’s a lot of stuff that could happen in that space. So while you can go to my website and read my blog posts, sometimes it’s more informative or fun to actually listen to them. But you’re probably wondering how that magic happens. So similar to HTML tags, there’s a standard set of tags that provide directions to the text-to-speech process. It’s called SSML, or Speech Synthesis Markup Language.

Let me show you one of the tags. So here’s an excerpt out of one of my blog posts. Looks pretty standard, but let me pull up the source. So you’ll notice there’s a new tag that I’ve embedded into the blog post, a voice tag. There’s that one, and then there’s that one right there. So let’s listen to how Amazon Alexa turns this text into speech. Alexa: “‘Alexa and the Elderly.’ Comments from participants.

This first commenter shares how Alexa opened her world.” Raveena: “I thought Alexa was just for things like turning on lights, but it opened a whole new world. You can do anything that you want to do.”

Alexa: “And this last comment references how using Alexa helps someone with physical difficulties.” Brian: “I have a genetic tremor. So entering data is a pain. The ability to speak a command and get something to happen is a wonderful thing.”

Chip: So notice that my blog post looks like any blog post or a excerpt out of my blog post here, but when it gets turned into speech, we can bring life to our blog posts or to any of our written content as we use these voice assistants to be able to communicate with our audience. So normally, that’s where I would stop, but I’ve got a few more minutes, and so I’ve got some bonus material for you. So I’ve used that term “invocation name” a number of times, so let’s–I wanna dig into it just a little bit because there are some interesting nuances with this idea of an invocation name.

So with our websites, one of the first things that we think about is choosing a good domain name that represents our brand. Now, in 1995, we could just use our brand name as our domain name, but it’s not quite that simple anymore. Matter of fact, sometimes we pick our domain name first, and then that’s how we decide what our brand name is, but getting a good domain name now is a little bit more challenging.

But when we think about the voice-assistant world, we have to have to think about sounds, not sights.

So let me explain what I mean. Let’s say that I’m really good at giving my two cents out to everyone I meet, and I am, but anyway, so, but, obviously, I would want to build a website and get the domain name of “TwoCents,” right? So let’s say I could get the domain name at “TwoCents.” Okay, I’m good. But what about this domain name of “Two-Cents.com”? Or “2Cents.com,” with the number 2? Or “ToCents.com,” spelled T-O Cents instead of T-W-O, or T-O-O Cents.com.

And that’s not even addressing the idea of, you know, the “dot org,” the “dot net,” “dot ai,” “dot–” all the other top-level domains out there. So when we think about branding, and we think about “I’m building a website, I need a domain name,” we think kind of visually when we look at that, but when we look at the verbal side of it, how many different ways are there to say, “Two Cents”? Well, there’s only one way of being able to make the sounds of “two cents,” and this was just the “Two Cents” with “Cents” spelled C-E-N-T-S, but what about the “Two Cents” when we use a different form of “Cents”? Or the other form of “Cents”?

And so, in other words, when we look at this, we have this idea of this collision that happens because sometimes in the visual world, we think that “Here’s my brand,” but when we go to the audio world, sometimes we have these collisions, and so who’s gonna own the voice app that’s connected with the invocation name of “Two Cents”? So how that works is, with Amazon, what they did is they said, “Okay, we’re gonna let multiple voice apps have the same domain name or the same invocation name, same verbal domain name.”

It causes some confusion because now the user has to figure out which version of “Two Cents” did they want because there’s multiple of ’em out there. Google went the other way. They went the same way that the domain name is. They said the first person that builds that Google action or the voice app in the Google voice-assistant world, they’re the ones that whoever does that now owns the verbal version of the sounds “Two Cents.”

So, in that case, it’s kind of like 1995 right now. You can get your invocation name, but the first person that gets it is gonna own it, and they’re gonna be the ones that decide what is that dialogue that happens whenever somebody invokes that invocation name of “Two Cents” or whatever your brand name is. So that’s one of the things that you wanna think about in the idea of verbal branding of–your domain name or your verbal domain name is somethin’ that you’re gonna wanna look at, you know, I would suggest sooner as opposed to later. Before, we were usin’ the domain–or the invocation name of Capital One. Let’s look at that one, even.

So if I’m thinkin’ about Capital One, CapitalOne.com, right, and they probably own that name, but there’s multiple ways of spelling “Capital,” and there’s multiple ways of spelling “One,” and even words that sound like “One” or close enough that you might not be able to–the voice assistant might not be able to pick up the difference between the sounds, and so, when you register for an invocation name, you’re actually registering for the sounds that happen, not the spelling that happens. Okay, so let’s start to finish up with a quote here.

Britt Armour from Clearbridge Mobile wrote that “Voice is the future for how brands are gonna interact with their customers.” So let me do a quick recap before we get to questions. Voice assistant technology means that the visually impaired don’t have to figure out how to navigate a sighted world because we’re using computers to shift the cognitive load from the user to the computer. This is a transformational mind-set shift, and it’s changing how we think about user-interface designs. We’re changing our UI/UX from “How can we make computers easy to use? How can I put a good user-interface design so the user can figure out how to find what he wants or how to do what he wants?” we’re changing that to “How can we make computers– or how can we use computers to make it easy to get stuff done?”

So we’re shifting the cognitive load from the user having to figure out how to make something happen, to the user just says what he wants, and the computer has to figure out what that user’s intention is or what that user wants to happen, and now the computer performs the action. And so we’re shiftin’ the cognitive load off of the user and onto the computers, and that’s a change in how we actually design systems.

And you’ll see that change, not just in voice assistants, but you’ll see it in chatbots and stuff like that where a good chatbot or a good voice assistant is movin’ that cognitive load off of the user, into the compute space, and to be able to do more work, which is a good thing, but it’s a mental shift for us developers and designers.

And then there’s one final tip that I wanna leave you with, and that’s I would suggest that you consider getting your brand’s invocation name sooner as opposed to later while it’s still relatively easy to get them because there’s not a whole lot of voice apps out there right now, but companies are doin’–there’s a lot of companies actually workin’ to build their presence on these voice devices.

Okay, so here’s my contact information. I love talkin’ about voice technology. I’m available for strategy sessions, special engagements, and how I can build a voice app with your invocation name so that you can start usin’ this new platform as well. And that’s what I have for you today, and see if there’s any questions.

David: There totally are questions, Chip. Thank you so much. That was awesome. We’ll go ahead and start with the first one. Get my video going there. Hello, everyone. All right, first question, “What practical applications in the accessibility space that we’ll see first in terms of this smart speaker technology or is in place already? So what practical applications in the accessibility space that we’ll see first in terms of smart speaker technology?

Chip: So there’s two ways that this can happen because you noticed the first–what I was tryin’ to demonstrate is the idea that now people can just ask for what they want, and as the voice apps are built out more, now the computers are actually figuring that out and responding to them. So instead of having a screen scraper go through and read through a website and try to find different pieces of information and leave all that work on the visually impaired to figure out “How can I find what I want, and how can I make things happen?”– very, very difficult. I mean, they’re good at it. I’ve seen some of the visually impaired actually use some of those screen readers, and it’s amazing, actually. But we’re still putting the cognitive load on them, and we try to remind people, as they’re building their website interfaces, to do a good job to make it so that it’s relatively easy for the screen readers to get the information, but we’re still leavin’ the cognitive load on the user, on the visually impaired, to be able to do that work. Voice technology changes that all on its head because now they can just ask for what they want and get it. We’re moving that cognitive load into the compute space. And it also does the other thing. I actually was in a conference earlier this year, and there was somebody that actually did a presentation, and she wasn’t able to speak, but she was able to type. And so what she did is that she put her whole presentation in a written form and then used voice technology to turn that written form into audio so that we could all listen to what she was trying to communicate, and with this new technology and the different tools with SSML and stuff like that, it’s actually making it so that she can actually bring a lot of life into what she’s communicating. And so I see this voice technology kind of happening in both sides of it and really helping those with accessibility needs to be able to–it’s now their world that we’re playing in as opposed to them havin’ to play in the sighted world or those of us who don’t have those needs.

David: That was really–it struck–it stood out for me as you were playing those Alexa examples how different and more purpose-built the content sounded than what you would hear from a screen reader, reading a web page. So my next question is, you know, how should people think about creating their content for voice? Should it be specific? I know you kind of showed the example of someone, just, kind of, feeding the blog through, but I’m guessing you’re thinking about the context of the content related to, you know, voice, specifically, when you’re leveraging it. Is that true, or how do you think of that?

Chip: That’s actually a really good point, and actually, a really big area to talk about because there’s two sides to that. One of ’em is “How can I make sure my content is in a form that represents my brand, and when it gets converted into audio when people listen to it, how can it represent my brand well? How do I write well, how do I make sure my heading tags–” because I can make something look good on my website that doesn’t turn into audio very well at all. But the other side of that is, lots of times, in our websites, we think of monolithic text in long form, and we’re expecting especially with somebody using a screen reading to have to jump through all that to find what they’re looking for, and when we think about actually preparing stuff for voice assistants, we now need to think of how do I put this content in a capsulized form so that people can ask for the piece that they want as opposed to entire, long-form piece of it and have them listen to the whole thing to be able to figure out what they want? And so it does change how you actually think about putting your content together and where you store it and how you store it and stuff like that, so, yes.

David: Awesome. Thank you.

Chip: Good stuff to think about.

David: Yeah, you showed some examples of Alexa reading the content and different accents with different genders. I know certainly a lot of people here at Accessibility Today strive to be inclusive of those that represent their brand. How do you think about the choice of voice? Do brands typically choose a voice, or will they vary it based on the author? Or I don’t know if you’ve given much thought to diversity as it relates to the voice that you choose for your content?

Chip: So that’s a good question, and from a brand perspective, there’s kind of–there’s a wide range of that because you’ve got from the simple of “I’m just using the default reader,” whatever that is, and in some cases, like, I think, Google is pretty good at letting the user choose what voice they want to be able to hear their content in. So there’s–that’s a good thing. But when you think about the ability to use those voice tags and be able to, kind of, add some differentiation in your content, so there’s some opportunities there, and as you’re–what I tried to do is saying, okay, “Here, we’re gonna read back and forth, and we’re gonna use different ethnicities to be able to read content, to be able to add some additional diversity and an interest into a blog post,” but then, from a branding perspective, you can actually even go as far as–I think it was KFC, that they actually recorded their own voice, Colonel Sanders’s voice, and they have a whole set of specific for their brand, their voice models, and now they can do the same thing. They can put in their text, and it gets converted into the Colonel Sanders’s voice, using that voice model, and it could be very brand specific. So there’s a wide range but definitely something to think about as you’re looking at diversity of making sure that you’re not just saying, “Hey, I want this US male to read all of my content.” It’s something that you can think about ’cause there’s multiple US male voices and female voices and other English voices, and then other language voices as well. I mean, there’s lots of opportunities there to be able to demonstrate diversity in your content.

David: Yeah, I’m sure it gets tricky when you start thinking about the voice representing the author as well.

Chip: Right.

David: Thank you for that. All right, next question, “What types of smart-speaker automations will most benefit the Ally community that you can share with us?” So what is, you know, most exciting for you, for how it will support the Ally community accessibility?

Chip: So, for me, the biggest excitement in this space is this idea of design-thinking shifts because a lot of us, when we think of UI/UX, we’re really thinking about “How can I build this for a sighted person on my site to be able to navigate my site well?” And lots of times, it’s a secondary thought of, “Oh, wait a minute. What if–how can I support somebody that’s visually impaired of some way?” And, “Okay, how much effort am I willing to go into, to be able to accommodate that?” Because my sighted design is really important to me. I like my sliding windows and all that kind of stuff that sometimes give grief to, you know, those who are visually impaired, but it’s–and so it’s a balance of “Which do I want more?” I think that the opportunities in the voice space though kind of turns that on its head from the perspective of “I’m designing for all of my audience the same now because, if I have to build this system that nobody is seeing, the sighted part is secondary, and all of it is audio up front. How can I accommodate everybody, not just a sighted person?” Matter of fact, the first time I did one of my talks, somebody came up to me afterwards, and he said, “You just did this fancy new technology, smart-speaker world talk, but really, it was a diversity talk in disguise, wasn’t it?” Because we talked about this idea of “How do I present my information in such a way that everybody is gonna get the same experience out of it?”

David: I think that’s a great observation, that paradigm shift towards voice and kind of being assistive in that sense of getting brands on board with accessibility. Next question–

Chip: Now, just one more comment.

David: Oh, yeah, please.

Chip: Now, and I– want to say it again–is voice technology doesn’t solve all accessibility issues. It does solve some accessibility issues, and it makes us think about how we can deliver our content or engage with our audience in a way that supports accessibility, but it’s not–it doesn’t solve all of our problems. It’s just a tool, a platform that helps us get out of some of our mind-set of “I’m supporting this set of people, and now, wait a minute, I should also help these people too. How can I do that without messing up my cool stuff?” It makes it so that, no, you’re building cool stuff that happens to be for everybody, not just by sighted people.

David: That’s a great point of distinction. Thank you so much. Next question is from Amber Hinds. Hey, Amber. I know you from around the way. “How do voice tags impact the experience for users on screen readers like NVDA, JAWS, and VoiceOver? Do they read those tags or just ignore them?”

Chip: So I don’t know the answer to that question. I do know that those voice tags and some of the other SSML tags, they’re a set standard, so it’s not like, “Oh, this is only done for smart speakers.” It’s a set of standard tags, and there’s actually quite a few tags that you can do a lot of different things with. So my assumption is that a good, well-rounded screen reader would use those tags in a way.

I do know that the voice assistants use the–accommodate the–some or a lot of the SSML tags and in–to varying degrees. Like, Amazon, I did the Amazon examples on purpose because Amazon supports almost all of the SSML tags, and so I was able to do the voice switching and stuff like that. Google doesn’t support the voice tag yet, so they’re still building out that tag, but they support most of the other SSML tags. So my assumption is yes, they would either ignore it, and/or they would be able to handle them, but I don’t know enough. I haven’t experimented with them to be able to say for sure.

David: Thank you, thank you; if you know the answer, maybe you can post it in a YouTube chat and–

Chip: There you go. That’d be good.

David: List an answer there. Chip, I think you probably spurred a invocation name, like, rush, like, the domain rushes when they would release the new TLDs, and, like, everyone would go register all these domains. I feel like that last slide of yours really probably got some people off their rear and were, you know, Googling how to figure that out. Have you experienced–you mentioned how not a lot of people that are leveraging the invocation names, but have you experienced yet when a brand has an invocation name that’s already claimed, that’s, like, their brand and that kind of fight that we see over the domain name space?

Chip: So, I–how do I answer that right? So what I can say is that to be able to get an invocation name, you have to build a voice app and have it certified by the respective party, either Amazon or Google or Samsung or–and in that certification process is when they say, “Okay, yes, your voice app is certified, and that invocation name works for your voice app.” And in Google’s case, it’s like, “Okay, you now own that invocation name.”

What I have seen is–matter of fact, I had one of my clients call me up the other day, and he’s a podcaster, and when people were asking for his podcast, which is “What’s your Excuse”–there’s a multiple “What’s Your Excuse” podcasts out there, and Amazon was choosing somebody else’s “What’s Your Excuse” podcast by default. And so he said, “Hey, what can we do about that?” Well, we can build a voice app, and we can use the invocation name of “What’s Your Excuse” and so now when people say, “Play ‘What’s Your Excuse,'” Amazon says, “Oh, hey, there’s a voice app out there with the invocation name, ‘What’s Your Excuse.’ We’ll use that to decide what content should be delivered.” And so now, when you say, “Play ‘What’s Your Excuse'” on the Amazon devices, his podcast gets played instead of Amazon getting to decide who–whatever one they wanna play.

So the answer is yes, the invocation name is gonna kind of change thing on its head. Very, very few people know that this invocation name is gonna be a really big deal in the voice assistant space. And so that’s usually at the end of my talk. That’s one of the big things, it’s like, “Okay, wait a minute, so what do I do? Because I got a brand, and I wanna manage that,” so–

David: It doesn’t sound like SEO is dying, Chip.

Chip: No, no.

David: Feel like SEO is here to stay. Well, this was amazing. I learned so much from this presentation, thank you so much.

Chip: You’re absolutely welcome. Thank you for havin’ me. This was–it was great to talk, and good questions. I appreciate ’em.

David: Yeah, thank you, thank you, I have a little history of voice, so I had wanted to nerd all out on it, but I wish we had more time, but thank you all for attending this session with Chip Edwards and myself.

Additional Resources

WP Accessibility Day has not assessed speaker-provided presentation resources for accessibility.

View presentation slides

Questions on “Using Smart Speakers for Accessibility

  1. How do the voice tags impact the exprience for users on screen readers like NVDA, JAWS, and VoiceOver? Do they read those tags or just ignore them?