David Beisel is a co-founder and partner at NextView Ventures. Voicebot connected with him recently to discuss his Personal Voice Computing Map concept.
We are going to talk about the Personal Voice Computing Map that you recently published in the NextView blog, but I want to go back to an earlier post you had on the ubiquity of voice. You talk about two primary concepts: making voice a primary input and putting voice as close to you as possible or immediately accessible. Why is that so important?
David Beisel: The contrarian argument is that we have ubiquitous computing at our fingertips all the time with a mobile device in your pocket. Why would you need a different input mechanism? There are a couple of reasons. There is the friction of reaching into your pocket and interacting with a screen. The second is that voice is the most natural way to communicate. That is how people communicate to each other. It is the direct access or deep link into things. The music example is telling Alexa to turn down the volume. Some of the functionality interfaces on a screen have to be navigated in a couple of different clicks, but on voice it is right there.
On the closer part, the Amazon approach is having a microphone in every room. You don’t have to think about having a computing device there, you will just assume that you do. The idea of AirPods in your ear is a different route but it is the same thing. You could have a voice AR instead of visual. It’s a different input.
You talk about this being a two-horse race between Amazon, because of the success of Echo in the home and Apple because of the voice interface on the watch. The second point about Apple is a bit contrarian. Most people see Apple as woefully behind Amazon, Google and Microsoft, and the Apple Watch has been a disappointment. The initial consumer response to AirPods has been skepticism. Despite these set-backs, you disagree and think Apple is well poised for success. Why?
Beisel: The current media narrative is that Amazon is ahead and they have already won the game. I just don’t think that is true. They are certainly ahead and winning the game, but if voice computing is the next platform then all of the platform players are going to devote real resources to the space and I wouldn’t count anyone out just yet. Each company brings a different set of context. Amazon knows everything you buy. Facebook knows who all of your friends are. Google knows your calendar and everything you are. Apple launched Siri five years ago and they have been thinking about this for awhile now. Their approach is ubiquitous access to voice input [through AirPods and the iPhone].
You also mention that voice applications are bound to have their Pokemon moment, presumably a viral hit that increases user engagement and becomes a cultural phenomenon. What type of voice application do you think will be the catalyst?
Beisel: If I knew it, I would go out and create it. Amazon touts that it has seven thousand skills, but for the most part, they are toys. What is missing? In order for the platform to have serious applications built on top of it you need serious developers spending serious time. In order for that to happen you need two things. First, a way for developers to make money. The second is around distribution and discovery. How do you find out about skills? A roundup blog post on the best dozen skills? You need the discovery and monetization pieces before serious development time will be devoted.
Domino’s Pizza and Uber are the best examples of skills. For example, Uber monetizes it in a different way and can promote it through their existing app user system. Who would have predicted for mobile phones that a killer app would be Uber? It wasn’t something that was an obvious app for mobile. In the same way, we are likely to see something orthogonal for voice that naturally leverages all of the strengths of the voice platform.
You mention in another blog that we haven’t seen a viral Alexa hit because there is no integrated social functionality. How do you expect that to come about?
Beisel: I think it is an outstanding question of whether or not Google and Amazon will allow or easily enable users to connect to the social graph like Facebook or LinkedIn when both are owned by other platform players. Microsoft with Cortana plays well to LinkedIn integration. Will Amazon try to enable that or try to recreate it? That’s kind of tough. I don’t know how that will play out.
Does this mean that Facebook, although behind, could catch up quickly by simply helping introduce the social functionality to voice and be the fist to capitalize on a Pokemon moment?
Beisel: I think that it is possible. But, they don’t have distribution right now for their platform. What Oculus-like thing could they acquire? Mycroft or Jibo? You could potentially bolt onto something to give you a jumpstart.
You talk about three types of voice applications: 1) mobile replacement; 2) new distribution; 3) new native voice applications. Could you differentiate the first two categories and then unpack your thoughts about voice-first and potentially voice-only applications?
Beisel: You can use mobile as an analogy. There was an application set that did exactly what it did on the previous platform and did it on the new one. The functionality is translated and replicated. A lot of the skills put out there to date by larger companies are just that. You can order an Uber on your phone and now you can order it via voice.
The second category of applications we haven’t seen yet on voice because the platforms are lacking discovery and distribution. When Facebook opened its platform, you had games like those by Zinga. It then built its business on Facebook. The replication is just, “I want to order a Domino’s pizza.” The new distribution is that, “I want to order a pizza.” [The voice assistant] then procures from whatever it thinks is the best option. There could be a word and domain name rush for people that want different domains and name spaces for activation.
Then there is a new category of skills that are truly native to voice. They don’t make sense on a different platform because they truly take advantage of the unique characteristics of the new platform. The Warner Brothers Wayne Investigation skill is a good example. It has personas and native voice interaction.
The third category is the most potentially exciting but also the hardest to describe because it is the furthest out there. Who would have known when the iPhone was released that one of the killer apps would be requesting transportation? It facilitates the exchange between a driver and a passenger that you couldn’t translate on desktop. We are talking about applications that are not backward compatible for the old platform.
Are people too focused on the voice versus messaging UI when the real value is created in the AI?
Beisel: They are intertwined. Some people talk about a voice assistant. It implies human qualities to it. AI and conversational bots create an expectation by consumers that there is a back and forth and the context is maintained. The deeper you go, the more important that is. There is power with voice without having AI. I can tell Alexa to turn on the lights without AI and there is no expectation of back and forth response.
What prompted you to create the Personal Voice Computing Map?
Beisel: NextView is a seed stage venture capital firm investing in companies across the U.S. In our home we have an Echo and at the office we have an Echo and Google Home. On the one hand, we have a proliferation of speakers. You have an ubiquitous computing access layer now which is how we communicate most naturally. We have microphones in every room and we can immediately start talking to the cloud.
When I saw Amazon and Google both investing in this, I started this thematic pursuit about where the white space is going to be for startups. Is it in the enabling layer technology? The analytics around building these skills especially cross platform? Is it a skill? Part of my motivation for building out the Personal Voice Computing Map was to understand it better.
How have the recent CES announcements influenced your thinking?
Beisel: Amazon in particular is picking up a lot of places in the stack. They had already made indications that you don’t have to buy an Amazon device to access Alexa, but CES demonstrated that an Alexa-everywhere strategy is already there. The vast presence of Amazon at CES but not [officially present] at the event itself was the thing that struck me the most.
There are two lines of thought about AI-enabled voice applications. One suggests the market will develop along the same pattern as mobile. The other suggests that the world wide web is a better analogy. What do you think?
Beisel: It is going to be different than both, but from what I’ve seen so far, it rhymes more with mobile. It seems so much more proprietary. In the 90’s, things were truly interchangeable. Hence, the browser wars. There is a little more potential here for lock-in — I’m an Alexa, Cortana or Home Assistant user — as opposed to having one PC in the house. You are going to have many different access points that are going to have a consistent experience across the voice cloud.