Tell me about your background.
Mike Elgan: I started as a newspaper journalist out of college. After a few years I fell in love with newspapers and computers. It started when I was transitioning a newspaper, South Coast Community newspapers, to a digital edition.
Before that, we were using a Mac and a printer to print out the articles and then cutting them out with razor blades, fixing them to a board and photographing them. I was transforming that system into one that was all digital. The electronic layout went straight to the printer. It made me love computers and computing as a subject matter more than local water politics and baseball that we were reporting on most frequently.
How did that interest in technology translate into your interest today in voice assistants?
Elgan: I’ve always been obsessed with voice command. In the last five years, I’ve been obsessed with virtual assistants.
I’m a writer so my first thought was that I could use Dragon by Nuance to write faster. I imagined I could speak and it would transcribe it. I tried a lot of different things. None of them worked for writing. I discovered that the writing process was not conducive to speaking. Writing is a lot slower than talking and the quality of the writing was lower if I started with speech.
I recall later having an epiphany because I was following the progress of speech recognition. At some point, I became obsessed with the idea of a large screen desktop — something akin to the Microsoft Surface Studio but much bigger. Every time I wrote about and predicted it everyone said they could never live without their mouse and keyboard. They don’t understand that so much of our computing will be done with voice that the idea you will be overly reliant on a keyboard or mouse is overlooking voice.
Voice is the key to radically democratizing computing. It will radically change our assumptions about hardware. When you are talking to a virtual agent that understands what you are saying then that changes absolutely everything.
How do you think the changes brought about by voice computing compare to other technology trends such as the advent of mobile or the web?
Elgan: I think they are more different than similar. The AI part and the voice part are separable. Imagine you have this massive super computer data center that does artificial intelligence. I learned this at Microsoft Research in 1990s. One of the fundamental problems of AI is that humans understand things that computers do not. If I say, “I saw the bird with a telescope,” the AI doesn’t know if you had the telescope or the bird did. If you are going to use AI to simulate human intelligence there is no end to the variation. They knew even then that for AI to do anything, the AI actually has to know about the world. It has to have knowledge.
What we are getting at is AI that understands the world. That is in this data center out there in the cloud. Part of what it knows about the world it knows about me specifically. A way to access that AI is through a voice virtual assistant. The computer does voice analysis to figure out what I said and feeds it into the AI in text and it tells me the answer and it happens in a second. That is an interface that has incredible implications for computers.
I figure there is not enough emphasis on this. It is the inevitability of virtual assistant software integrated into glasses. One of these companies is Vue Glasses. They have ordinary glasses, except in the back they are thick with the battery on one side and the electronics on the other side. They use bone conduction to hear and recognize voice. I believe it is a 100% certainty that glasses that enable you to speak with a virtual assistant will be common in 3-5 years. In every moment in your day you will be able to talk to your virtual assistant from the time you wake up. There is nothing embarrassing about it.
The other one is a high school kid that is making something called Kai [click to listen to Mike’s Food and Technology podcast interview of the founder, Dylan Rose]. He is using the Houndify platform. Kai clips on your glasses and uses a similar approach.
You mentioned that voice is going to democratize computing. How?
Elgan: Once we get use to relying on virtual assistants and they have agency and understanding, it will change things in so many different ways. One is democratization of technology. Even technology luddites can already use voice technology. They know how to talk. Someone can say, “please make an appointment at my doctor sometime tomorrow afternoon.” It will come back in seconds and say I made an appointment tomorrow at 2:30. At 1:00 pm the next day it will say, “you need to leave for the doctor in 15 minutes because traffic is heavy.” This is something everyone can use.
Imagine a world when anyone can have 24 hour access to an assistant. It’s not about computing anymore. It is about augmented humanity.
The other big implication of this is that people are concerned that robots will take over our jobs. I don’t think so. Virtual assistants still give us direct access to the same AI. Therefore, a person with AI will be better than AI alone in almost every case. AI is centuries away from thinking and acting like a human. The killer combination is a human being with AI access.
So you don’t think AI will impact on jobs or displace workers. What about the Japanese insurer that replaced a bunch of claims adjusters with IBM Watson?
AI is centuries away from thinking and acting like a human. The killer combination is a human being with AI access.
Is voice really the key here or is it the AI?
Elgan: Voice is an interface. Think about a smartphone. It has a touch interface that has certain characteristics. Voice is a different interface. The application and compute power behind it are different. What is transformative is access to AI and virtual assistants. The voice assistant’s job is to understand what you want, contextualize it in the world and your world personally, and return with the right response that is a human-like response.
There are two parts to this. There is the voice part — technology that listens to the sounds and figures out the words — and a virtual assistant is an artificial human that understands and contextualizes and responds in a human-like way with variation in tone and humor. Everything behind the scenes, all of those applications are not part of the revolution. The same software goes to the same servers and same databases as with other computing interfaces.
The revolution is not the voice part and not the computing part. It is the voice assistant part. That is the really exciting part.
Do you believe voice assistants will change people’s lives more than mobile or web?
Elgan: Absolutely. The ability to take advantage of an iPhone is dependent on the skill and knowledge of the device user. I know people who spend time studying how to do something amazing with a smartphone. The average person is a 3 out of 10. A smartphone can do a million things, but the average user is doing five things. If an AI-based voice assistant can do a million things then the average user can use a million things. It turns every user into a power user because with the right kind of agency a voice assistant can make decisions. You don’t have to study and learn how to use the technology. It is thinking about these things for you. That kind of pre-emptive agency turns every user into a power user.
Let me give you a practical example. War Dogs is about this guy that figured out this system in how to win government contracts. He figured out that by finding certain types of low level acquisition requests, he could game the system and make a fortune. He had a unique insight that enabled him to game the system because of the new rules. He put in enormous work to go through a mind numbing chore to look for things he learned how to look for. He made a very good living because he developed a unique perspective, had unique knowledge and went to unique effort. The way virtual assistant technology should function is you start looking at military stuff and it should notice you are trying to get into the arms dealing business and it should figure out that system. It should go to that effort for you automatically.