Podcast: Play in new window | Download
Subscribe: Apple Podcasts | Google Podcasts | Stitcher | TuneIn | RSS

Phillip Hunter has been designing digital conversations for 20 years. He started with touch-tone phone applications. More recently, he has worked on Alexa and other voice-based technologies.
Phillip and I talked about:
- how fun it is to “talk about talking”
- his current work at [24]7
- his early work developing touch-tone applications back in the 90s
- his evolution from developer to product designer and product manager
- the history of voice interfaces, from trying to mimic the human mind to the current ability to augment human intelligence with massive computing power
- the evolution for voice technology from simple recognition and basic synthesizing to the convergence of processor power that enabled desktop computers to do speech recognition
- the origins of voice business applications in the customer-service space
- how cloud computing enabled the mass adoption of voice technologies
- the rise of machine learning and its impact on voice
- how we are now at a point where “machines are about as performant as humans” in terms of speech recognition but still pretty far behind human communication capabilities
- “the average four-year-old is still learning much faster than any of our machine learning processors”
- how the combination of voice and cloud technologies and broadband connections that connect them enable systems like Amazon’s Alexa service
- the importance of the emergence of language: “once particular sounds started being associated with one or more particular meanings, then we had a tool that enabled us to do everything that we’ve done in humanity up to this point”
- how much of voice tech is still human augmented – there are still checkpoints where a human will evaluate a voice interaction and determine whether the translation or transcription is right or wrong
- “just-in-time content”
- the differences between a stable-context-variable-data situtation like reporting a bank balance versus more dynamic situation in which a more robust content strategy is needed
- the hypothesis that “the machines are teaching us to talk like them”
- how “part of what we don’t have yet in this technology is a deep appreciation and understanding for what conversation really is” – and current approaches to bridging that gap
- conversational structures and tools
- the role of signals like gestures and facial expressions in human communication
- the importance of being able to predict the track of a conversation and check in on its course and how that’s the next step in machine conversations
- an analogy between tech-based conversations and adjusting conversational expectations based on a child’s age
- the difficulties of creating truly personalized conversational content
- “language as an affordance” as a concept – and its implications for folks who want to get into this field
- the propensity of experience designers to “use language as a system-centered form filling vehicle” and ensuing limitations
- the importance of UX designers want to create conversational experiences to really study language
- some books that he recommends:
- his new conference launching in October, Points Made
Phillip’s Bio
A technology product design leader for 20 years, Phillip Hunter works at the edges where new experiences are created. A veteran of small and large companies, he is passionate about what people need and want, how teams work, how to understand and influence complex systems, and the little details of product design that make big differences.
Phillip loves living, creating, and working in Seattle. He is an active member of local and international design communities. He also fancies himself a songwriter and photographer, and a contagion of groan-inducing puns.
Video
Here’s the video version of our conversation:
Podcast Intro Transcript
We spend a lot of time talking to our computers and other gadgets now a days. Siri, Alexa, and Cortana are starting to feel like a part of everyone’s family. The rise of these voice interactions has created a whole new field that content strategists and other interaction designers need to understand. In this episode, I talk with Phillip Hunter. Phillip has been creating these conversational interactions for 20 years. We talk a bit about the history of voice technology. And then we dive deep into the details of designing and implementing modern voice systems.
Interview Transcript
Larry:
Hi, everyone. Welcome to episode number 48 of the Content Strategy Insights podcast.
Larry:
I’m really happy today to have with us Phillip Hunter. Phillip is currently a Senior Principal at [24]7, which is one of those companies that kind of helps people with customer experience across the board, right?
Phillip:
That’s right. Yeah.
Larry:
Yeah. Before that, you were at AWS, and Alexa, at Amazon. And before that, you were with… After that, you were with Pulse Labs-
Phillip:
Pulse Labs, yeah, a startup.
Larry:
Right, yeah. Well, so tell me a little bit more about how does one… So we’re in the midst of this crazy new conversational era with Alexa and Siri all over the place. And you’re one of the people who built it. So tell me a little bit about your background, how you got here –
Phillip:
Sure, yeah. Sure. So, first of all, thanks a lot for having me. It’s a lot of fun to participate in this. It’s a lot of fun to talk about where things are and talk about talking as an interface, as a next generation of where computing is going, and user-interface and user-experience, and there’s a lot of exciting things to think about. There’s a lot of misconceptions out there, too, so we can touch on that.
Phillip:
I’m at [24]7, which is in the customer experience contact center space, primarily building systems that help people with customer service inquiries, whether they want to call over the phone, or they use a chat bot, or they connect directly to a human through a phone or through a chat interface online, and it’s a field I’ve been in and out of over my career.
Phillip:
I started as a developer, building touch-tone applications, the button pressing ones, way back in the early 90s. In my career trajectory into both UX and design, and voice, happened about the time, in the late 90s. I discovered that I actually really hate developing. I understand it, and I appreciate how much of a skill it is to be a great developer, but I really don’t like doing that part, and a lot of it is because I tend to think about the problem at a higher level, that’s being solved, not the immediate code problem, or the technology problem. I think about the human problem.
Phillip:
Now I’m at [24]7, I get to focus on both how the company is embracing and moving into this conversational age, and so how design gets done in that realm, but I also work at a more strategic level around different programs that we have going on to help customers, help clients, get the most out of our services.
Phillip:
So I spent a lot of time with data, spend a lot of time with technical teams, understanding how we make certain things happen for the customer experience, and I’ve got a background over the past few years moving into more product management type activities as well. So I tend to put several hats on when I go to work in the morning, which I enjoy.
Phillip:
I was at a startup, Pulse Labs, like you mentioned before, as VP of Product. One of things I enjoy about start up, and it’s my fourth time doing that, is just the ability to work on what’s needed that day, whether it’s product, or solving a personnel problem, or designing the future, or whatever. So it’s a lot of fun to be able to move between things.
Phillip:
So, history. That’s kind of where I’m at. How did we get here? I don’t want to get too esoteric or philosophical, but one of the quests for computing has always been to mimic the human brain to some extent. So even the calculation machines and the engines and, going all the way back to the very beginnings where computers were more mechanical than digital, you see people reaching for, “How do we mimic the human brain, and how do we mimic the things we can do as humans?” And then, of course, we quickly learned that we could do that at certain levels, and then we could do it more powerfully. So now our computers do amazing things that mentally, intellectually, we are capable of creating and making it happen, but we could never do that processing on our own. None of us could, we can’t archive a billion objects in our head and find them in milliseconds, or find one in particular in milliseconds.
Phillip:
So, voice has been very much a part of that for decades and decades now. The research on recognizing voice and synthesizing voice goes back to the 50s and the 60s. Bell Labs and other research facilities around the world, and for a long time, like many other parts of computing, this was a very intensive task that took large computers, at the time, to process. So you had a room full of processors that were just trying to decode a few seconds worth of speech, and things like that. And it wasn’t until several things started coming together in the late 80s and 90s around processor power, was increasingly coming in a smaller package, storage was being condensed as well, so you had this whole miniaturization effect that was going on through the 80s and 90s. At some point, there was a convergence where the amount of voice processing that could happen on a relatively small, a PC sized computer, was acceptable enough to do commercial applications.
Phillip:
That’s when it finally moved out of the lab, and out of theory into practice. In the mid-90s, you had several startups that were focused on this, and you had some products, like Dragon Systems NaturallySpeaking, where your computer could do speech recognition for you. But that was a very different thing than a widespread multi-user commercial application.
Phillip:
In the mid-90s you started seeing these startups taking advantage of this convergence, which is the startup story. Whenever multiple things collide, or intersect, all of a sudden you have new opportunity. So you had several startups who said, “Hey, let’s build processing packages that’ll run on a PC, and then we can connect other PC enterprise grade applications to that”, which started as obvious in hindsight, in the customer service space where there were already billions of phone calls a year going into companies, into your bank, your airline, your insurance company, whatever. And they wanted to be able to automate some of those, which was done poorly in touch-tone, and so we started to move over to speech recognition.
Phillip:
Now, it became pretty ubiquitous, and the technology has stayed, actually a fairly level plateau since the late 90s, in terms of what it’s capable of. There have been moderate gains, but we haven’t yet hit really the next inflection point.
Larry:
That’s interesting though, because that’s about when the internet comes along. So does that then become the mechanism by which things throw fuel on that fire?
Phillip:
Yeah, yeah. So to go back to the theme of intersecting technological advances, the internet comes along and one of the first things it started getting used for were software as a service, what we now call the cloud. You are able to use an application, but the application is being served somewhere else. You might have an interface to it right on your desktop, but you’re not installing anything. You’re using it from somewhere else. One big thing started happening there.
Phillip:
AWS was born out of that in 2007. It was happening already, but AWS really made a business out of it. The other thing that started happening was, because applications were being centralized as a service, now you had the capability to collect massive amounts of data in one location. For speech recognition that’s really crucial, it’s something we call a non-deterministic technology, which means at any given moment the processing is taking a best guess about what’s happening. If you’re familiar with machine learning, that’s essentially what machine learning is doing, too. It’s a very, very sophisticated pattern matching system.
Phillip:
In the speech recognition technology is essentially trying to match patterns with the audio signal that it detects, and then with past patterns it has stored and amalgamated over time. So machine learning, then, as it developed, became a second very powerful thing here as well, because the more data that we can collect, and throw at things like speech recognition, then better we can get at it.
Phillip:
There’s still a noticeable gap between a speech processor and a human. People might chuckle at that, when they hear me say that. The interesting thing is, it’s-
Larry:
But you know it when you see it. When a machine makes some obvious machine-level-
Phillip:
Right. Oh, right. Well, that’s what I’m saying. The chuckle might be because some people will say, “No, there’s a huge gap, Phillip. What are you talking about? There’s a noticeable gap.” You’re not wrong if you think that, but the interesting thing is on a … So there’s a lot of terminology that gets thrown around, but on a pure accuracy standpoint, machines are about as performant as humans. Meaning, there are just a few percentage points that distinguish the machine and us. Now, the thing is, though, there’s so much more than pure recognition that goes into being able to process language.
Larry:
Right, that’s what I want to get. How do you… You got something that can pretty much hear the same as you, is it listening as well?
Phillip:
Right, yeah.
Larry:
And can it create, craft a conversation on the fly? That’s kind of …
Phillip:
Right, and we’re nowhere near some of those things, excuse me, that’s the reality, is that the level of sophistication that we have with speech recognition now is astonishing compared to when I started 20 years ago. But it doesn’t mean it’s anywhere near human capabilities. So there’s various characterizations of it you might hear, like a four year old who has memorized the encyclopedia, or a four year old who knows how to use the internet.
Phillip:
So what’s being got out there is that four year olds have a very limited amount of domain knowledge, they’re still in rapid learning mode. I would say, though, that the average four year old is still learning much faster than any of our machine learning processors. And there’s various reason for that, some commercial, some technological, but where we are today is a factor of the internet coming along, high speed band width making it possible to take high quality recordings out of an Alexa, or any other speech recognition device, pass them up to the processors in the cloud very quickly, do that processing, and ship it right back down.
Phillip:
Where things have gotten a lot better are processing speed, coverage of basic things, cross-context, multi-context. When I was first doing this we were deploying applications with a very narrow context, and if you said anything outside that context it would either bring about a false positive, or it would think you said something that you didn’t say because it was trying really hard to make a match. Or it would just reject it and say, “I don’t understand you.” You can still see these things happen with Alexa and Google Assistants, you can say things to them that are so particular to you, or particular to your context, that it’s just not going to get it.
Phillip:
But at the same time, you can ask about everything from what movies are out right now to facts and figures that you might find on Wikipedia. There’s a fairly broad range of coverage, but it’s the putting it all together.
Larry:
Yeah that’s what I’m curious… Because I get that it gets the vocabulary, and it’s like this really precocious four year old learning like crazy, and putting it all together. But going from that to a coherent kind of adult-level conversation, what’s the secret sauce, this magic that’s happening there? Because we do it all the time.
Phillip:
As you said, as we were talking a little bit earlier, sharing language with each other is our first sophisticated interface. Now, of course our ancestors were drawing figures and gesturing, and making noises but once language, once particular sounds started being associated with one or more particular meanings, then we had a tool that enabled us to do everything that we’ve done in humanity up to this point. Right?
Phillip:
There is nothing that has gotten done without language. If we had no language, we’d still be communicating by bopping each other on the head or grunting, or things like that. Which is still a form of language but it’s not the sophistication that we need to accomplish what we’ve done as people. And so where we’re trying to get to in the technology is to start to do that at a very basic level, and a lot of that is very manual work right now.
Larry:
I’ve heard a lot, many conversations and things I’ve read, about what percentage of these virtual, or these online interactions are actually AI machine learning driven things that just the machine is doing, versus human augmented stuff. It’s a pretty high percentage of still being humanly augmented, or?
Phillip:
Definitely augmented. Now that doesn’t mean that a human is actively listening in and doing something during the course of your particular interaction but there are a number of pieces.
Phillip:
So, for example, in the news there’s been, “Alexa is listening”, and “Amazon has your recordings”. Well, that’s always been true, and it has to have been true for us to do this work at all. We have to take the audio signals and just actually do a verification. It’s a very simple verification. Someone listens to the audio, they write down the words that are in it, and then we match that against what the machine thought those words were. So you do a text on text matching, and you basically form a judgment about was it right or was it wrong, and then what’s the delta between those?
Larry:
So it’s that simple? That’s so interesting.
Phillip:
Oh yeah. And then from there now, data scientists will use, there’s sophisticated tagging mechanisms that we use to understand, if it was different, if the machine got it wrong, what was the difference, and what should we be learning from it?
Phillip:
But those have to be used in order to feed back into the machine learning. Now, this is rue in any machine learning. You and I were at the recent Ignite, and there was a talk about using machine learning to detect a cat approaching a cat window.
Phillip:
He used the same exact process. He took a bunch of pictures, or he had his setup take a bunch of pictures, and then he went through and analyzed, “Which ones did the algorithm properly understand as a cat approaching the cat window?” And which ones were false positives or, in some cases, false negatives where the cat was approaching but it didn’t detect it.
Phillip:
It’s the same thing. Without getting into the ethical concerns, of which there are many, the raw process depends on that audio signal being translated or transcribed into text, and then compared to what the machine did and tell the machine, “You were right” or “You were wrong.” And, depending on what it is, how can you do better next time?
Larry:
Right. The flip side of that is what would be of most interest to my content strategy peeps. How do you then harvest that or, you were talking about tagging it, and I assume there’s other taxonomic things you’re doing to kind of clump it together and make it into useful things. That turnaround, when I was talking to you at Ignite you used the term “just-in-time content.”
Phillip:
Right.
Larry:
And I was like, “Bing, that’s what we’re talking about here. That’s what a conversation entails.” So tell me how that the other side of that, the content in your answer, where does that come from?
Phillip:
There are a couple of different things going on there. When we used to do these very narrow context applications, they were static and we would primarily, now we would have some dynamic, we could read a piece of data, or a set of data, and know that we would extract certain values from that and plop it into a construct we had already defined.
Phillip:
So, reading your bank balance. You have an available balance of $162.48, so the value is dynamic, but the context is very static. What we do now though, and when we’re able to do, with things like Google Assistant and Alexa and Cortana, is start to mimic a much more dynamic set of information. The reality is that it’s not all, there are a lot of constructs that are prebuilt before anyone ever interacts with it, so the machine is not making it up on the fly.
Larry:
Got it.
Phillip:
But it is, in some cases, pulling things that it doesn’t have in memory. So, for example, Alexa will do a Wikipedia search for you. Now, they have an algorithm to pull some fairly small set of information from Wikipedia and give that to you. It’s not going to read you a five page, multi-sectional Wikipedia article. It’s just going to give you a paragraph or two. And that’s useful, and there is certainly a content strategy around that of course, to say, “What are we going to pull, and how much of it will we read?” and things like that. But it doesn’t rise to the level of sophisitication of really directly answering a question.
Phillip:
So, for example, we play around at home sometimes, just asking questions. My partner worked at Alexa as well, and so we have several devices, and we’ll just, every once in a while, see how it’s doing. We’ll ask things like, before the last Avengers movie came out and everybody’s predicting blockbuster, and we were wondering, “What are the records? What are the top grossing films of all time?”
Phillip:
We just kept asking all of these different questions, and it was really interesting. It would technically get the answers right, but it wasn’t engaged in the conversation.
Larry:
So this is interesting. You’re kind of learning how to structure queries that get… Could you get like, “Oh, if I ask it this way it’ll return me a top-ten list” or something like that, or-
Phillip:
Well, yes. Yes. And that’s a fascinating tangent, because there are hypotheses out there that the machines are teaching us to talk like them, as if the machines have some sort of actor ability. Yes, they are actors, in the sense of they carry out motions and commands and things, but the machines don’t have a will. You can debate that offline, but the machines don’t have a will to make us talk like them.
Phillip:
What’s happening is, for all of us UX-oriented people, it’s the way the machines are designed, the way the automated systems are designed, they’re not designed to allow human, or enable really, human conversations.
Phillip:
So, if you and I were having a discussion about top grossing films, we would probably also get into which ones we liked, and which ones we thought were better movies than others, and which ones should have done well, or should have done better than others. You know, we would just sort of jump around in that area. And if you happen to be a movie buff, or somebody who memorizes a lot of entertainment statistics, then you might be able to roll off a few things.
Phillip:
But if I said, “Do you know that Avatar is still the number one grossing movie?” You might say, “I didn’t even know that would be the top one. I would have thought it was such-and-such” or something, and that’s a very meaningful conversation for us. We might touch on all the same facts that we would with Alexa, but we don’t just stay there.
Phillip:
And to go back to your question about where do we really need to go is… Part of what we don’t have yet in this technology is a deep appreciation and understanding for what conversation really is. We’re using this term of conversational, whether it’s chat bot, or Alexa, or a system that you call over the telephone –
Larry:
Exactly.
Phillip:
We’re using that term conversational simply because there’s an exchange of words and sentences, but conversation is so much richer than that. There’s several things that we aren’t yet doing to really approach that richness. One of them is that variability, and the fact that a conversation really can float between various contexts and we’re really, really good at managing that in our heads. We don’t even think about it, we just… If we had to stop and think every time we were going to say more than five words, I mean really think, I don’t mean thoughtless speech, but I mean really think, all of our interactions would grind to a halt.
Larry:
Well, just by coincidence when I ran into you at Ignite, I was reading Erika Hall’s book on conversational design. She goes back to how human conversation was, kind of what we can do now, and how you can’t just mimic how people, like you were saying, that’s not going to be possible anytime soon. So you kind of have to do it in a way that makes sense in the medium, in the interaction, and that’s a whole other conversation about the design of this whole –
Phillip:
And that’s what I’m getting to, is that there are conversational structures and tools. We don’t tend to think of… Because we talk so naturally, those of us who are fortunate enough to be able to do so, talk and listen so naturally, we don’t typically stop and think that there are many, many tools and practices, and other things that we are doing as we are talking and listening that are fairly well structured.
Phillip:
There are some fantastic books out there on mechanical and conceptual conversation analysis, and how we construct, and box up, and process meaning. Everything from word choice to how do we tell that the other person is in sync with us? What mechanism do I have that-
Larry:
Right, and now I’m thinking about that truism that 90% of human conversation and understanding is non-verbal, and then it’s…
Phillip:
Well, I mean it’s a funny thing. So I’m familiar with a number of those truisms around that, they’re all, like most truisms, they’re all true enough to be useful that statistics don’t really matter.
Phillip:
A lot of the things that we do with gesture and facial expressions are about the signals we’re giving each other. Are we in sync? So there’s some concepts around the collaborative nature, and the cooperative nature, of conversation. You and I have to negotiate fairly constantly while we’re talking.
Phillip:
In other words, if I, in the middle of this, started talking about, even something relevant like the Women’s World Cup, or the recent spate of Supreme Court decisions, you would know what I’m talking about but you would not prefer that I go down that direction. You would use, even without knowing, you might use some visual mechanisms to indicate, “What are we doing here? I thought we were on this other topic.” And I would, being a fairly sensitive enough person, I would pick up on those and say, “Oh wait, Larry is indicating that he is not in sync with where I’m taking this conversation anymore. I better check in with him.” And we do that pretty often.
Phillip:
We explain something to someone, and if we’re looking at them or if you can see them somehow, and they get a look on their face, and then you’re like, “Oh, was what I just said clear, or do you want me to try to say it in a different way?” And they didn’t say anything to that effect, but it’s a very rich communication that they’ve given us to say, they’re saying, “I might need a little help following where you just went.”
Larry:
I’m wondering now, too… You’ve got me thinking about some of the online interactions I’ve had. Can you sort of build in check points, or something like that, “Hey, are we on track here?”
Phillip:
Well, that is one of the things that I think we’re not using very well, and honestly, it’s really complex. Because there’s a level of very implicit trust and understanding we have when we’re talking to another person, that we are not able yet to give to a machine, because we don’t know.
Phillip:
And the number one problem I see is hesitancy among people who are using these devices, because they don’t know what to expect. When we don’t have a way to predict something reasonably, then we get very hesitant, it’s just human nature. You can know this feeling anytime you do something new for the first time. If you pick up a musical instrument you’ve never played, or if you like to cook and you start to work with a food, or a set of ingredients that you’ve never worked with before, or maybe you get a new tool for your woodworking shop.
Phillip:
There’s this time period where you’re… You know, we say all kinds of things: “I’m getting used to it” or “I’m still picking it up” or “I’ll come up to speed soon.” All of those have to do with, not just your level of comfort using it, but it’s also the ability to predict what will happen when you do use it, or when you do try to engage that.
Phillip:
When we are talking to a machine right now, we don’t have a level of predictability that we’re comfortable with for general conversation. Now, you can learn, and this is why with Alexa the stats are through the roof on music and weather and news, and other stuff that has been predicable for a long time. Those are sort of table stakes.
Phillip:
But if it’s trying to… You wouldn’t say, “Alexa, give me a synopsis and analysis of the current Washington D.C. political situation.” But you might have a good friend where you would say that sort of sentence, and be able to learn and things, but Alexa, there’s a number of layers there that Alexa, and any of the others, just aren’t capable of.
Phillip:
Saying that kind of thing to Alexa, you’re actually more predicting that some sort of hilarity with ensue, because it will just not know what to do with it. But if you said that to a good friend, that you trusted there, you would expect that person to come back with, “Well here’s what I think is going on” and you would be very comfortable waiting for that answer, and accepting of that.
Phillip:
That level of predictability, and the ability to correct, the ability to check, do those check ins, it really gets down to this ability to have a predictable, cooperative, conversation that we’re so used to with other human beings. That’s really the next step, is when will our machines be able to do that in a way that feels good to us?
Larry:
And it won’t necessarily be mimicking what we’re doing right now, it’ll be something appropriate for this new medium.
Phillip:
Yeah, yeah. Exactly.
Phillip:
Thinking about it from the analogy of the child again. It’s not too far from that, in terms of, if you’re a parent or you’ve spent a certain amount of time around children of different ages, you start to get used to tailoring your interaction with them according to what you think they’re capable of. And so you talk to a three year old very differently than you would talk to a ten year old very differently than you’d talk to a 16 year old differently than you’d talk to a college age kid. And this goes back to those expectations. You’d expect that a kid at 16 has been through a certain amount of school, has been through a certain amount of life experiences, is capable of higher order level of thought that is more symbolic and logical versus super concrete.
Phillip:
But we can’t yet put those projections on the machine. And, in fact, we know that they don’t have lived and learned experiences similar to a human. So right off the bat there’s this big X through a set of things that we usually depend on knowing, and we don’t have anything to replace it with.
Larry:
You’re kind of reminding me of… I think an attempt to replace that is the notion of personalization, like to have an awareness of demographics like age and context, and stuff like that, and to personalize that the conversation, or the content based on that, is that-
Phillip:
Well, it is, and it isn’t. There’s a certain set of things that… It is, in a sense that, if there is a team behind that, then they can structure a fairly large set of interactions that could go along with things that are personalized to you, but stepping outside of that…
Phillip:
I’m just going to make up a silly example. Now I’ve got college kids on my mind. Cold pizza. Many of us were in college, and we got used to the idea of cold pizza the next morning. Maybe it was on the floor. Maybe it was in the fridge, who knows. It doesn’t really matter. We ate it because we were stupid, and we were in college.
Phillip:
Some of us still have a taste for cold pizza. But most of us, as adults, prefer certain types of meals hot. Now, one of things that we think about when someone really gets to know us is, how well do they know these little bitty things about us? Our tiny little preferences. There’s no database on earth that is necessarily going to register these sorts of things because they don’t show up in our buying behavior or our browsing behavior.
Phillip:
In other words, we’re not looking up, “Ten best ways to serve cold pizza.” Now, we might get some pizza recipes, or we might get a bunch of offers for delivery, or things like that from that, but there’s no algorithm out there that’s going to pick that up and say, “Well, Larry’s one of those interesting people that never grew out of that taste for cold pizza from college, and that’s why I like him so much, doggone it.”
Phillip:
There’s personalization, in the sense of, we can know a bunch of facts about you that are super contextual, and we can put those together in some useful ways to make your browsing experience better, or your shopping faster. Or even just do things like remind you that it’s your significant other’s birthday coming up, and by the way, here are the ten things they asked for, and you didn’t know they wanted.
Phillip:
But there’s a huge difference between that, and seeing that old friend that you haven’t seen in 20 years who says, “Hey, I remember that you used to really like to put ketchup on your cold pizza in the morning. Do you still do anything crazy like that?” And you’re like, “Aw, that’s so funny.”
Larry:
And that’s the kind of thing that would be really creepy if your-
Phillip:
Oh absolutely, right. Right.
Larry:
We’re coming up on time, I always like, before I wrap up, to give my guests an opportunity, is there anything last, anything that hasn’t come up, or just on your mind about conversational design or chat bots or voice interfaces, anything that we haven’t got to yet?
Phillip:
Well there’s so much to say and we could go on and on, on a number of topics. I think one of the biggest things that I think about a lot is the fact that language is a tool, it’s an interface, it’s a set of affordances even. It’s language as an affordance is a concept that I think about when, both for people who are practicing this now and for people who want to get into this field. What do they need to think about? So one of the biggest problems that we introduce, as technologists, is to use language as a system-centered form filling vehicle.
Phillip:
In other words, just answering, making the human answer a bunch of question, or memorize a bunch of commands. We’ve gotten used to these things through mobile, through the web, through the ubiquity of computing devices now. Most of us carry one around, and we work on one or two or three, and we have multiple devices now in our homes, in our cars, that require some kind of computational interaction.
Phillip:
But language transcends all that, and when we think about the things that we use for UX on our laptops or web or mobile, we are taking a very, very small slice of the language capabilities and narrowing them way, way, way, way down for those contexts.
Phillip:
If you look at even the most sophisticated interface, let’s say for enterprise applications, you’ve still got this boiled down to, “There’s some things I can move around on the screen. There’s some things I can click, which is a metaphor, we’re not pushing any physical buttons. There’s lists, chooser options, there are text entry.”
Phillip:
These things are a fraction of the things that we use language for. One of the biggest ways I can encourage people who want to get into this field, to start opening their mind to this is, start to study language. Start to study how people interact using language, because that’s where it’s got to go. This is not about making it easier for computers to understand us, it’s about making computers work harder to facilitate the things that we already do really, really well.
Phillip:
I’ll put in a little plug if that’s okay, otherwise you can cut it out.
Larry:
No, go ahead.
Phillip:
An associate of mine and I are starting a new conference called Points Made. It’s about addressing these sorts of issues in the conversational space. What would it really mean if a computer could support things that really felt conversational? A lot of that hinges on the natural evolution of language that’s occurred over thousands and thousands of years, which we’re still decoding, by the way. Even just to understand it between two people, it’s still fascinating and there’s new literature on topics that is very significant. So I would say, start there. Don’t think about, “How do I turn a mobile site into a voice application?” Start to think about, “How would two people talk about that?”
Larry:
And that’s what the Points Made conference will be all about?
Phillip:
Yeah.
Larry:
Have you planned the first one yet?
Phillip:
Yeah, yeah. Today’s our launch day, serendipitously.
Larry:
Oh nice.
Phillip:
Yes, we just launched the website.
Larry:
Well, send me a link, I’ll make sure we put it in the show.
Phillip:
I will. Oh, awesome, I appreciate that.
Larry:
All right, cool. We should wrap up here. Well, thanks so much. This has been a great conversation.
Phillip:
Yes. Oh I really appreciate it. Thank you, Larry.
Leave a Reply