Variety had a report last week that suggested recent hiring at Roku pointed toward a new smart speaker from the over the top (OTT) streaming service company.
Roku recently posted multiple job offers audio-centric roles including a “Sr. Software Engineer, Audio” as well as a “Sr. Software Engineer, New Products, Audio (Expert).”
At the same time, the company has been looking to hire multiple team members to build out its voice control capabilities, including a job offer for a “Sr. Interaction Designer, Voice,” as well as a “Voice User Interface Designer” who is supposed to become Roku’s “expert on all things voice related.”
But, that’s not all. Current LinkedIn profiles for Roku employees Tyler Bell (i.e. natural language understanding and automated speech recognition) and Hari Ramakrishnan (i.e. far-field voice and audio engineering) both list voice technologies as among their current responsibilities. So, is this a sign that Roku is about to launch a smart speaker?
Why Roku Needs Voice Expertise
We have seen that voice is becoming a must-have feature for television services. Cable provider Xfinity and OTT services such as Nvidia Shield and Apple TV already have this capability. Roku already provides limited voice search capabilities with some remotes and through its iOS and Android mobile apps, but its support forums point out limitations:
Note: When using Voice Search, simply speak the title or actor/director name, for example “Finding Nemo.” Voice Search will not work with full sentence commands like “Show me movies with Tom Hanks.”
This isn’t going to be good enough very soon. Expertise in NLU and ASR will be important to extend the capabilities to enable natural language search and system commands. Far-field voice recognition will be important to ensure you can use voice from across the room and not just with your remote which would be optimized for near field communication. These hires and job responsibilities suggest that Roku intends to support natural language voice interaction from the controller box in addition to the remote. This latter point is also important. You could then use voice to control your television watching even if “someone” has misplaced your remote.
Voice Capabilities Don’t Mean Smart Speaker or Voice Assistant
It is important to differentiate between voice capabilities, smart speakers and voice assistants. Roku intends to be an entertainment hub for the household. You could imagine the company adding access to Spotify, Pandora and radio live streams to its offering. You might even think about adding basic features such as weather and timers. However, it is a bigger stretch to see Roku building a robust developer community that would build thousands of third-party apps for the ecosystem. If you think of it as a smart speaker, the intelligence would be very limited.
A true voice assistant initiative is even more far-fetched. The investment required to build a general-purpose voice assistant is hundreds of millions to billions of dollars. More likely, Roku intends to build a deep domain model that it can integrate with an existing voice assistant. That deep domain model would offer expertise in entertainment while leaving the general-purpose functions to the established voice assistants. That could be a partner such as Alexa or Google Assistant as they open up to using more third-party NLU domains or it could be Hound today. Hound would also allow Roku to keep all of its data so that must be a consideration given that Amazon Fire TV and Google Chromecast are competitors.
Not Every Voice Engineer Will Work on a Full Platform
The key point to keep in mind is that not every voice engineer will be working toward building a full voice platform. Amazon, Apple, Google, Microsoft, Samsung, Soundhound and a few others will invest in general-purpose voice assistant capabilities. Many other companies will focus on narrow domains and become niche experts that the voice assistants tap into to fulfill user intents.