Podcast: Play in new window | Download
Subscribe: Apple Podcasts | Google Podcasts | Stitcher | TuneIn | RSS

Preston So is an expert in both omnichannel strategy and voice design, as well as a number of other digital business and design practices.
As communications channels proliferate and the variety of digital devices grows, we need strategies to give our customers and users a consistent experience, no matter where they are or how they are consuming our content.
Preston weaves together elements of omnichannel strategy, voice usability, and other modern digital practices into an “immersive content strategy” that can help you craft content programs that address these new challenges.
We talked about:
- his new book, Voice Content and Usability, his product work at Oracle, and Decoupled Days, an event he organizes
- “immersive content strategy” – a way to deal with both channel explosion and the need for a central content repository to execute your omnichannel strategy
- the proliferation of devices and the implications for omnichannel strategy
- the importance of providing a consistent content experience across all devices and channels
- a pragmatic approach to single-sourcing content that arose in a project he did with the US state of Georgia
- the difference in mental models between content that is presented on a website vs. content that is delivered via a voice interface
- the implications of omnichannel delivery for your information architecture
- the benefits to analytics, benchmarking, and metrics of sourcing all of your content from a single CMS
- the use of “dialogue traversal testing” (DTT) in conversational design
- one of the huge differences between web and voice navigation: the lack of menus in voice interfaces
- the importance of a comprehensive omnichannel content audit that evaluates all of the possible contexts in which your content may be presented, and that also considers both the discoverability and the legibility of the content in each
- the generic navigational benefits of voice interfaces over web interfaces
- the opportunity that voice interfaces give us to return to more natural human communications methods
- the importance of “letting our users see themselves in the voice interfaces that we build and the content that we deliver to them”
Preston’s bio
Preston So (he/him) is a product architect and strategist, digital experience futurist, innovation lead, designer/developer advocate, three-time SXSW speaker, and author of Voice Content and Usability (A Book Apart, 2021), Gatsby: The Definitive Guide (O’Reilly, 2021), and Decoupled Drupal in Practice (Apress, 2018). He has been a programmer since 1999, a web developer and designer since 2001, a creative professional since 2004, a CMS architect since 2007, and a voice designer since 2016.
A product leader at Oracle, Preston has led product, design, engineering, and innovation teams since 2015 at Acquia, Time Inc., and Gatsby. Preston is an editor at A List Apart, a columnist at CMSWire, and a contributor to Smashing Magazine and has delivered keynotes around the world in three languages. He is based in New York City, where he can often be found immersing himself in languages that are endangered or underserved.
Follow Preston online
- Preston.So
- email: preston dot so atsign oracle dot com
Video
Here’s the video version of our conversation:
Podcast intro transcript
This is the Content Strategy Insights podcast, episode number 105. Our customers and users need content in many different settings, and they consume it on a constantly growing number of devices. Omnichannel strategy is the new business method for dealing with this growth of communications channels and content-consumption modes. Preston So is an expert in both omnichannel strategy and in voice interaction design, one of the new user experience practices that has arisen to address these content strategy challenges.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 105 of the Content Strategy Insights podcast. I’m really happy today to have with us Preston So. Preston is the Senior Director of Product Strategy at Oracle, a big software company you may have heard of. And he’s also perhaps more germane to this conversation today, he’s the author of the new book, Voice Content and Usability from A List Apart. So welcome, Preston. Tell the folks a little bit more about what you’re up to these days.
Preston:
Hey, Larry, thanks so much for having me on the show today. It’s a real pleasure to be here. As you mentioned on June 22nd, my new book, Voice Content and Usability, my third book actually just came out. It’s the first ever book on voice content, specifically voice content strategy and content for voice experiences, but it’s also A Book Apart’s first ever book on voice as well. So it’s a really exciting book for us here in the content strategy world and also in the voice world. It’s really a fusion of two very distinct realms that have been a little bit separate from each other for a long time. But just to be clear, I don’t do just voice. I don’t just do content. I do a lot of other things too. I work at Oracle as a product leader for our content management system over at Oracle. I also run an event that I know you were part of Larry, Decoupled Days, which is the only nonprofit headless CMS, omnichannel CMS conference in the world. So I get up to a lot of stuff and really excited to talk through some of these ideas around voice and content and immersive experiences today.
Larry:
Cool. And I’m equally excited, if not more so. But so the thing, all this stuff you were just talking about, a lot of that lumps under what we would now call omnichannel strategy. Well, it’s not just websites anymore. It’s not just mobile apps. We’ve got game consoles and voice interactions and chatbots everywhere, and that has implications for content strategy. And then how we organize the whole scheme. And one of the things you’ve talked about, I read a post of yours about “immersive content strategy.” Can you talk a little bit about that?
Preston:
Yeah, I think one of the real paradoxes of content strategy today, and especially when you think about the ways in which digital experiences are shifting, the ways in which these experiences are moving off of these screen-bound, window-bound experiences, is the fact that we have these two opposing forces that are pulling in a form of tug of war on each other. And the first is that we have to increasingly serve content and information and media to a vast variety of different devices nowadays. That could be an augmented reality headset or a wearable. It could be a virtual reality headset, could be digital signage, could be a voice interface, like a smart speaker, could also be a smart home system, could be a Samsung TV, could be your mobile phone, your smartwatch. So a lot of the challenges that we face today are rooted in what I call the channel explosion, which has really been becoming a big concern of a lot of marketing and content organizations over the past several years, because of the notion that you can’t just rely on the web as your sole conduit for delivering information and delivering content anymore.
Preston:
But then you think about this diametrically opposed idea of omnichannel content. And I was just at OmnichannelX, which I know you were at as well, a conference that is really focused around content strategy and content design, not just for the web, but also beyond the web. And I think one of the big concerns that a lot of us have is how do we avoid some of the silos that could be intrinsic to and inherent in the delineation of content as being voice-ready, or being immersive-ready, or being chatbot-ready, or being web-ready, when, frankly, a lot of us still haven’t gotten our heads around the notion of a really solid web content strategy yet? And also given the fact that a lot of our content today is something that is really privileged for or that prestiges the web over other media?
And also is something that really should be managed in a single place. And I think a lot of us talk today about omnichannel content strategy, immersive content strategy, voice content strategy. That’s all fundamentally the notion of how do we not only manage and store and schematize content within content models that are cohesive and within a single repository of content. But also now deliver those in differentiated ways that don’t privilege one particular channel over the other. So these two opposing forces, where you’ve got multiple channels, channels that are proliferating, unprecedented new devices that are out there, plus the notion of keeping all that content in one place, so it actually stays up to date and manageable and maintainable. Those two things are really coming to a head, I think, over the last few years.
Preston:
And my article, Immersive Content Strategy on A List Apart really toys with some of these notions of, “Well, you’ve got to think about some of the environments your content will end up in, but that shouldn’t necessarily jeopardize your shared content strategy, that really should be looking beyond the web, in addition to serving your websites as your first-class citizen as well.”
Larry:
Right? One of the things you talk about in that article was, how did you put it? Cross channel versus channel channel, I guess. Single channel, I guess, as versus cross channel and how to cope with that. And that gets it, that notion, I think a third dynamic to what you’re just talking about besides the channels. Anyhow, there’s also the notion of departmental silos in organizations that you need to span. Do you have any success stories about how you unite all this stuff in a well-executed omnichannel strategy?
Preston:
Sure. Well, first and foremost, I’ll answer this in two parts. The first part is that that’s absolutely true. And I think one of the things that’s really clear about seeing immersive content and seeing the ways in which content manifests in augmented reality or virtual reality and also in digital signage and some of these physical or locational experiences is that a lot of us are no longer using a single device anymore in our day-to-day interactions with things. And I actually write about this in my book, Voice Content and Usability. I mentioned that there’s a very clear amount of folks out there who will whip out their phone, they’ll whip their iPad at the same time, they’re having a conversation with Alexa or with the Google Home device. By the same token, somebody who’s walking through a subway station, looking at content that’s being delivered through digital signage might also have their phone open to the website of that subway system or that transit authority.
Preston:
So one of the really pressing concerns, I think, for content strategists and especially for some of these inter-departmental considerations, especially when you have these different experiences managed by different teams, is how do you make sure all the content stays up to date and stays together, so that somebody who’s looking at your website at the same time they’re interacting with your Alexa device or interacting with their digital signage, isn’t caught up in this really nightmare situation where you have two different versions of content that are telling you two different things. That could have a really, really big impact on the ways in which our users and our consumers are able to actually trust that information. The case study that I’ll mention, though, is something I talk about at length in my book, it really undergirds the entire book Voice Content and Usability, which is Ask GeorgiaGov. And Ask GeorgiaGov was the first ever voice interface, conversational interface, actually, for residents of the state of Georgia.
Preston:
And it was also among the very first ever content-driven voice interfaces. One of the things that was very rare in 2016, 2017, when we worked on this project was the notion of having a voice interface that could do more than just perform transactions. And that could actually deliver information in the form of content. What’s really interesting about this, though, is that in a way, sometimes the limitations that projects have, will illustrate some of the ways in which actually best practices should be formulated around some of these ideas. When we went to Digital Services Georgia, great team, wonderful organization. They do a lot of amazing work to provide compelling and important and essential information to the residents of the state of Georgia, including especially right now during the COVID-19 pandemic, they really had a lot of limitations. As we know, for those of us who are in the United States, a lot of state and local governments are very cash strapped. They’re very limited in terms of their budgets.
Preston:
And they told us, at the very beginning, they said, “We can only manage one version of our content. So whatever content we want to deliver, whatever information we want to deliver through the voice interface that we’re going to build on Alexa, must be managed in the exact same place. And one in the same as the content that’s managed for our website.” And this posed a really vexing challenge for us, a really interesting dilemma, because how do you actually consider the foibles and the nuances of how voice content works versus how web content works when you don’t even have a sense of the fact that these two realms of content are very, very different from each other. Sometimes you have content that’s really been primarily authored for a website. And also the notion that these have to be managed in a single place.
Preston:
So what we did is we looked at “Well, how can we flatten out some of these issues, flatten out some of these problem areas that we see that could jeopardize the content, whether it’s being consumed on a web interface or a voice device?” And that’s where we started out, was looking at, okay, certain things like links that have calls to action in them. certain assumptions about the page, like navigation, bars, or site maps, are really things that you can’t assume people understand from aural or sonic perspective within a voice interface. So we really tamped down a lot of those problem areas that surfaced. And this is why, for example, at Oracle, we talk a lot about pageless user experiences now. And I think especially nowadays given the fact that a lot of organizations are still rooted in this notion of pages as being the main atomic unit for content, given our background on the web and print over the last few millennia, as a matter of fact, we’re really not ready for this notion of decoupling content from the visual strictures in which it is situated.
And I think that’s one of the ways in which voice content, the Ask GeorgiaGov case study and the immersive content really challenged these preconceived notions. And there’s a lot more about this in my book as well.
Larry:
And the way you just said that it’s like, “I like our odds for employment for the next 10 or 20 years,” because you just set out a lot of work there. Because you made a super strong case and showed why omnichannel is so crucial, but also like why it’s so hard to do, but I love the georgia.gov example, because, typically you just think of like, “Oh, we just throw a lot of resources at it and that’s how you fix it,” but in that case, it was a scrappy, pragmatic project that you just had to figure out. Any big top take-home lessons from that project about. And I think in particular like information architecture, the importance of building the interfaces you were talking about, what did you have to do on the back end to accommodate the information architecture in the web, breadcrumb trails and navigation versus like how it works in voice and other conversational interfaces?
Preston:
So the biggest thing to remember about conversational interfaces, about voice interfaces, especially, and I think this is especially true of pure voice interfaces, or those interfaces that only have a voice component, There’s no other visual component, is that because you can’t rely on space. You can’t rely on the visual medium. You have to really rely on crafting a mental model of your interface and your content that will work for that user. And it’s a really different world in voice interfaces. And one way to think about this is if you think about the website, you’ve got all these different hubs and spokes, you’ve got a site map, it’s a very networked architecture. You can click on this link, it takes you over there, but then that’s part of a category of pages. That’s part of this single page.
Preston:
So one of the biggest issues that we have with websites is that websites are fundamentally rooted in the notion of benefiting from visual context. You’ve got nav bars, you’ve got breadcrumbs, like Hansel and Gretel, you’ve got site maps, you’ve got all these things that a voice user doesn’t have the luxury of having. And I think one of the best ways to consider this is that most voice interfaces, not just the ones that we see today, but also those that were created back in the early ’90s, the late ’80s, during the IVR systems or the interactive voice response systems era, those phone hotline systems that speak in those canned statements, those operate along a very linearized and uni-directional information architecture. What that means is that once you have a single entry point, which is of course the most important aspect of a voice interface, which doesn’t necessarily reflect how websites work, because you can come in from anywhere.
Preston:
You also have to make sure that the user understands exactly where you’re taking them step by step, because there is no visual way to have a train, for example, with different stops on it that says, “Hey, you’re on this page number one, this page number two.” So one of the things that I think is very important first and foremost is the georgia.gov team was already very much ahead of their time, because they understood the value of structured content and the importance of semantically structured content for omnichannel use cases that went beyond the web. So they were already looking ahead to some of these conversational experiences. They were already looking ahead to some of the ways that they could serve content beyond their website and focusing on a very clear hierarchy, a very clear way to manage your content. They were using the Drupal content management system, for example. That’s a very important focus for every organization to take. It’s no longer okay to just have your content in this flat file or flat page mechanism that doesn’t really give it a lot more richness in terms of that understandable structure for other users.
Preston:
Because one of the things that we learned from this project is that the ways in which you have progressive disclosure take place on a frequently asked questions, FAQ page, for example, really mirrors the way that a lot of these revelations of content happen during a conversation with a voice interface. What that means is, Hey, if I ask a question that is the broadest question on the page for an FAQ, that can naturally lead me to ask more specific questions. So an FAQ page or a conversational cadence in a page is already a very, very good way to start to bring that structure into voice interfaces. Now there’s one big benefit that I was going to mention before, that’s really key to the experience that the Digital Services Georgia team had. And that is that because we managed all of our structured content within a single omnichannel content strategy slash single content management system that served as that overarching repository for all the content, one of the big benefits was actually not just in the user experience. It was also in the analytics and the benchmarking and the measurements that the team could do.
Preston:
Because we had all the content structured in one place. And because of the fact that both the voice interface and the website are conducting searches or conducting requests across the entirety of the content, we could actually put in a single, shared dashboard side-by-side in parallel sections of the page, here’s how your pages are performing on the website and here’s how your content items are performing on the voice interface. Which allowed for the team to make really educated decisions about, “Hey, well, this page is performing better on the web than it is on voice. Maybe there’s a reason why voice users can’t find it.” Or “Maybe there’s a reason why voice users are asking these questions as opposed to the questions that others might ask on the website.”
Preston:
And this was really illustrated through the fact that the demographics were very different. Voice interfaces, like Amazon Alexa, for example, target a very different population. One of the things that I think is very clear over the past few years that we’ve seen is that a lot of elderly Georgians, a lot of disabled Georgians, actually found using a voice interface to be a much more pleasant experience than either navigating using a screen reader on the website or having to use technology that potentially might be very challenging for an elderly Georgian to use. So a lot of these decisions that we made really were rooted in not just the user experience and accessibility, but also how to really foment and furnish this omnichannel content strategy that would enable long-term maintenance for, of course, the stakeholders in the room who are also important, those who are going to be actually managing this content.
Larry:
I love that you’re comparing the performance of the content in different channels. One of the things that Erica Jorgensen from Microsoft spoke at our content strategy meet up in Seattle a couple months ago, and she was talking about this looming need to measure content independent of it’s channel. So I’m wondering if there’s both that, sort of at the usability and accessibility level, evaluating the end use of it and how it’s performing in different channels. But do you even evaluate, for example, like task accomplishment as just a generic thing for each hunk of content?
Preston:
Yeah. The way that we did that was to look at the search and to make sure that we had a successful way to really allow for content to be navigated to on the voice interface. And that was actually a stand in for that was a successful search. Returned results from a query within the content management system. But there’s definitely are various layers to this. And I think if you look at chapter five of my book, Voice Content and Usability, I talk at length about the various dimensions and the various layers at which you should be looking at and evaluating this sort of content, both from the perspective of what Erica Jorgensen mentions around your sort of holistic perspective, but also on on a per-channel basis. There are various ways to measure this.
Preston:
The first is obviously within the voice interface itself. You can use Alexa, you can program in certain ways of having tracking, for example, making sure that errors are tracked, that errors are recovered from, having an understanding of all those things. That’s a very important figment, I think, an aspect of building voice interfaces at large. We did two rounds of usability tests, actually, for our voice interface before we even launched it, which allowed for us to really hone in on some of those areas that potentially were pigeonholing users into a particular pattern of utterances or not really making sense structure-wise for those users. There’s also in the voice interface world and conversational interface world, a form of testing called dialogue traversal testing, or DTT.
Preston:
And that’s a very important kind of testing to do because one of the things that’s very impossible for us as web users really to understand sometimes is that when you go into a conversation with a voice interface, you’re really opening the first page of a choose-your-own-adventure book. And you have to navigate all of those different page hops, all those different pointers across all of the different pages of the book to understand how your dialogues are performing and how your interface is performing across the entirety of the experience. So those two things. But then also if you’re actually consuming content. And this is where things get very interesting. Is if you’re consuming content from a content management system, and you’re doing this in a headless fashion, which is generally the case when you’re using a voice interface with a CMS, you have that challenge of “Okay, now I also want to track things like what sorts of content is being returned? What sorts of content are actually returning errors from the CMS perspective? How is the structure of the content actually revealing itself to the voice interface?”
Preston:
And a lot of those things are as yet very much in their infancy, because a lot of CMSs simply aren’t ordained or aren’t equipped with a lot of these tools to do that sort of voice interface, style tracking, or that sort of granular tracking of how their content manifests in all these different scenarios. So I think one of the things that’s very exciting, as you said earlier, Larry, I think it’s a very good time to be in this industry. It very much is because the amount of work that still remains, not just for voice interfaces, but also for CMSs as we move into this omnichannel, but not so channel agnostic world, is how to really deal with a lot of these issues that come about and performance tracking, this notion of analytics, error handling, that’s just the beginning of a very large Pandora’s box that we’re opening.
Larry:
You’ve said a couple of things. I want to just run this by you. Are you familiar with Are Halland’s core model? It’s a content strategy tool that a lot of people use. And you don’t even need to, but basically the tool works, because you have this play, he originally talked about web pages, but I think it could work for any of these models. It’s like this destination, where you land the page, the screen, the voice interaction, whatever it is. And like the point is how do you get there? And where can you go from there? And a couple of things you said make me realize that that’s a little different situation, like we all know how to navigate the web, but we’re still learning the voice stuff. Have you thought about that? The difference between arrivals and departures from your digital destinations?
Preston:
Yeah. As I mentioned earlier, the information architecture for a voice interface by necessity of its aural rather than visual nature has to be rooted in linear flows and these guided unidirectional flows. And when we think about, for example, flow journeys on a website, it’s very understandable that you’re going to have 50 different decisions you could make from a single page. You can click on any of these links. You can go to any of these forms in the sidebar, you can go to this place over here. And one of the things that I think really makes that clear is that if you click on the site map link, for example, suddenly you’re thrust into a world where you’ve got 50 different places you can go. With a voice interface, you can’t really do that. And a lot of ink has been spilled over the years about the challenges of actually making sure that you can create menus that are navigable in a voice interface.
Preston:
The whole notion of spoken menus is a bit of an oxymoron, because who wants to sit there waiting for hours on end while somebody recites something like, for example, in an IVR system, press one for reservations all the way out to press 99 for speak with a customer agent. Nobody actually wants to be sitting for that sort of an utterance. So the challenge that I think is unique to voice interfaces is unique to a lot of these zero user interfaces or these user interfaces that don’t have a physical or a visual component whatsoever is how do we actually get people over to this notion of having the ability to have these options without getting into choice fatigue, which is a much bigger consideration and a much bigger concern, a much bigger issue on voice interfaces. Because you can’t simply present… For example, on the web, we know that it’s five to seven links in the top navigation bar, otherwise it becomes very overwhelming for the web user.
Preston:
Well, on a voice interface, you really can’t even get up to that number. So I a lot of this really suggests that, okay, the content strategy, the content might be able to stay the same and the content should stay the same, but how we actually navigate that content is very much something that’s up for debate. Because you can’t just forklift your nav bar, you can’t just forklift your breadcrumbs or your site map into an aurally rooted voice interface and expect that to make a lot of sense. And so this is why I talk a lot about, in chapter two of my book, Voice Content and Usability, the notion of performing an omnichannel or a voice content audit that not only gets at the root of the content itself internally within the context of the content, but also externally and looks at all of the different touch points that that content item or that category of content has and looking at, Okay, how is this going to work when we actually have a very limited means of providing options and providing menus for these users who need to be able to get to their content?
Preston:
And this is where I think a lot of the differences between voice interfaces and visual interfaces really come into the limelight. And one of those is obviously that just as websites can have many more options than voice interfaces, voice interfaces, by pure necessity of the way that they work have to allow for many different kinds of interactions to be able to take place and actually the number of interactions or the number of steps it takes to get to a certain location or a certain core, it oftentimes can be very, very long. And so that’s one of the things that I think is very important is when you’re looking at an omnichannel content strategy, as I say in chapter two, you’re not just thinking about the legibility of that content internally, but also the discoverability of that content externally.
Larry:
So you have thought about that. Thank you. Obviously. Yeah. A couple of things you just said reminded me, and a thing that I read of yours about you’ve talked a couple of times about crummy, old IVR experiences, like “Choose one for this, two for this, five for that.” Another famous usability and accessibility horror story is screen readers for blind users or sight-impaired folks. And often some of those stories have a happy ending in the sense that by addressing those concerns, like the crummy IVR experience or the bad screen reader experience, that everyone benefits. Are there any stories like that coming out of this world?
Preston:
I think the really great thing about screen readers is that obviously they provide a significantly better experience than IVR systems did back then, or that many people had the ability to use in the past decades that we’ve really been focused around accessibility. The problem, though, I think is that in many ways, voice interfaces today have not necessarily outpaced, but they’ve certainly provided a new dimension of interaction that really differs from screen readers. And Chris Maury, who’s a blind technologist, a blind voice interface designer, and an accessibility expert. He writes on this actually in Wired magazine saying, “Why is it that screen readers are designed with the foundation of a visual medium? A screen reader can not exist if it doesn’t have a visually written webpage to interact with.” But the problem of course, with these visual web pages is that you’ve got these pronouncements and these really unwieldy forms of navigation, skip to main content, announcements of all the media, announcements of all links.
Preston:
And this can become a very cognitively exhausting experience for those who are blind users in the web world. And Chris Maury argues, and he defends it very well, that voice interfaces can actually spirit users much more quickly to their destination without having to navigate through the visually-rooted structure of a screen reader. And instead of using something that’s actually designed for the purpose of an aural and verbal interaction, which is the voice interface. So there’s definitely a lot of interesting work happening in this regard. Now I do want to say that there’s certain limitations and stipulations. The first, of course, being that voice interfaces are not the end all be all solution for accessibility. They are just one facet in a milieu of multimodal accessibility. And of course, voice interfaces, especially pure voice interfaces, are not accessible to deaf or deaf-blind individuals who need to be able to access content as well.
Preston:
So a lot of this is, of course, couched in the notion that, okay, screen readers, they provided this one facet, this one access, this one dimension of a compelling experience to consume content. They’re also just one part of the solution in the same way that refresh-able braille displays are just one part of the overarching solution that we must constantly be providing for disabled folks. Now for screen readers, I think it’s really important to recognize that it really depends on the kind of content that you’re trying to consume. And voice interfaces might be more efficient, they might not be. Because the kinds of content you’re delivering through a voice interface might be only a subset or might be potentially, let’s say, a superset of the content that you’re delivering on a website.
Preston:
Nonetheless, and you’ve seen this in my writing as well, I do think that voice interfaces could be a very interesting partner to and a complement to those screen readers that we use on a daily basis. And, of course, if you aren’t using screen readers to test your own content and your own websites, you should, starting today. I think that voice interfaces really do provide a very interesting look at some of the things that voice interfaces can offer us. Of course, I think when it gets into the realm of immersive content and some of the other channels for content or some of the other dimensions for content, things get a little bit more complicated. But for certain types of content, certainly, I think that you do have a lot more options today.
Larry:
That’s a nice little bow on this conversation actually, because it circles right back to the whole omnichannel nature of it and that any specific example down at the usability or accessibility level just reminds you of the whole context. But hey, Preston, we’re running out of time. But before we wrap up, I want to give you one last chance. Is there anything that’s just on your mind about voice or interaction design or voice usability, anything like that, or just anything that we haven’t wrapped up in the conversation that you want to make sure we get to?
Preston:
Sure. I think one of the things that we’ve touched on quite a bit over the course of this conversation that we haven’t really talked about philosophically speaking is the fact that voice interfaces are really an example of our humanity extending into user experiences and these artificial devices in ways that they never have before. When we think about how we use our keyboards, or our computer mice, or our video game controllers, or our VR headsets, these are all artificial, manufactured behaviors. These are things that humans millennia go did not use. If you take a keyboard or you take a computer mouse back to the Ancient Romans, they’re not going to understand anything about what it is that you’re using. But if you go back and you speak classical Latin, or if you speak in some places Attic Greek, or in some places other languages that were very much part of the Roman empire, people will understand you and you can have a conversation.
Preston:
And this is one of those interesting areas where the tables have turned a little bit. Because back in the early decades of computing, it was really about these devices injecting us into these very unnatural behaviors, waving around a mouse, typing on a keyboard, getting those words permitted up there. But now it’s the machines that have to be talking to us on our own terms. Now that comes with really great advantages, but also disadvantages. And one of the big challenges that I talk about in chapter six of my book is because these conversational interfaces, especially voice interfaces, are construed, for better or worse, by our brains as human beings, and they’re personified that way, not only in the marketing that we do with voice interfaces, but also how we interact with them, that raises some very interesting questions. We don’t usually personify or attach an identity that’s human to a website, unless, of course, it’s Ask Jeeves or something like that.
Preston:
But when it comes to a voice interface, we necessarily picture somebody in our minds. We hear somebody talking. We know that that’s neurologically speaking, something that mimics very, very closely the experience of talking to a human being. And so there’s not just the whole debate about how natural can these devices sound. There’s also the very interesting debate of how does that reflect back upon us, our humanity and our inclusion or our equity within the world of user experience. And one illustration of this is that right now, voice interfaces, if you look at Amazon Alexa, if you look at Google Home, if you look at Apple Siri, you look at Microsoft Cortana. When you talk to these interfaces, who’s the kind of person that you’re picturing in your mind. And it’s generally speaking, a cisgender, heterosexual, white woman who speaks with a general American dialect or a middle American dialect. But there’s not much space or seemingly much addressing of the fact that across our world today, we have so many dialectals, so many examples of code switching. So many examples of languages that are underrepresented and marginalized that people simply cannot use with their voice interfaces.
Preston:
And so just as we see right now, a lot of the debates revolving around algorithmic racism, automated oppression, especially with regards to disinformation and credibility of social media platforms, one of the things that really worries me right now is that just as those have fostered these monocultures and these echo chambers that really serve to actually worsen the overarching experience for our users, how do we actually avoid that same outcome in voice interfaces? And that starts with a lot of the things that we do as designers and content strategists, but also from the very top when it comes to building these speech synthesizers and how we actually get equitable user experiences in voice. Because that’s really the root of why we got into content strategy and design in the first place, is letting our users see themselves in the voice interfaces that we build and the content that we deliver to them.
Larry:
And, boy, like I was asking before, if there’s any global lessons about that we can draw from this. Returning to a truly human-centered design practice, that seems like a good outcome. So, well, thanks so much. Oh, one last thing, Preston. What’s the best way for folks to stay in touch with you if they want to follow you on social media or just connect?
Preston:
As I mentioned earlier, my book is out right now at abookapart.com. You can get Voice Content and Usability right now. It was actually on sale until yesterday, I believe. But now you can get it in ebook or print format. Very happy with a lot of the feedback I’ve received so far from many different countries. But the way you can reach me is you can actually email me at preston.so@oracle.com. You can also use the contact form at my website, preston.so. I also do a lot of writing on my website, preston.so. I also write for A List Apart, Smashing Magazine, and I’m a columnist for CMSWire. But if you want to learn more about what I’m up to and get in touch, you can also find me on Twitter @prestonso, on LinkedIn at Preston So as well. And, of course, find out more about my new book that’s coming out later this year, Gatsby: The Definitive Guide, as well as the sequel to Voice Content and Usability, Immersive Content and Usability coming out with A Book Apart next year. preston.so.
Larry:
Do you ever sleep? Anyhow, I’ll put all those links in the show notes as well. Well, thanks so much, Preston. Really enjoyed the conversation.
Preston:
Thanks for having me, Larry.
Leave a Reply