When Google wowed the tech world with its demo of Duplex — the tech that allows its digital Assistant make phone calls to perform mundane tasks like booking haircuts or making restaurant reservations — Microsoft’s Cortana chief was impressed, but not worried.
“The technologist in me had no choice but to feel impressed,” Javier Soltero, Microsoft corporate vice president of Cortana, told me in a far-ranging discussion about voice technology for Mashable’s MashTalk podcast. “The idea that a computer can generate a voice with the right processes, right inflection, all of the right things to mimic humans, is amazing to see in practice, but not entirely surprising.”
But Soltero didn’t immediately think, “We need to do something similar with Cortana so we can catch up to what they’re doing.” In fact, the Google Duplex demo emphasized just how different the two companies’ approaches to voice technology are.
Whereas Google is clearly putting consumer-friendly features that automate mundane tasks front and center, Microsoft is looking to make Cortana into a more symbiotic tool — something that works in conjunction with a human, not necessarily in the person’s stead. Think that scene in the first Iron Man movie, where Tony Stark has a continuous conversation with his digital assistant, JARVIS, as he designs the second generation of his armor.
“It was clear that that was not where we were headed,” Soltero said of Duplex. “We are interested in not that level of having the computer do stuff for you. We’re more trying to enable you to do more things yourself.”
Cortana by way of Outlook
Soltero became master of Cortana in March 2018 after playing a big role in Microsoft’s Outlook and Office apps. He first came to the company by way of Acompli, a well-regarded email app that Microsoft acquired in 2014.
After successfully turning Outlook into one of the best email apps on mobile, he’s now putting his expertise to work to push forward Microsoft’s voice assistant. And it definitely needs a push — the mindshare in the voice-assistant space is dominated by Amazon, Google, and, to a lesser extent, Apple. Even Samsung seems to have gotten more buzz.
It’s not like Cortana has been stagnant. It’s made progress by migrating from phones to PCs (although, considering the fate of Windows Phone/Mobile, it was more like abandoning ship), and the first Cortana-enabled smart speakers, starting with the Harman Kardon Invoke, arrived on the market last year.
Still, the Invoke isn’t a Microsoft product. Given the ever-expanding number of competing products powered by Alexa or Google Assistant, everybody’s wondering when Microsoft will flex its growing hardware muscles (its line of Surface tablets and laptops has been a solid success for the company) to build its own smart speaker.
“We have lots and lots of ambitious plans that I can’t discuss, but you will be learning more as the course of the year plays out,” Soltero said. “And as you’ve probably seen, we’re working closely with Amazon to integrate Cortana into the Alexa and Echo experience as well as having Alexa integrate into Cortana. It’s ultimately less about the device and more about where the effect of the assistant is felt.”
The Tao of Cortana
Although everything is in the “early days” in tech, when it comes to voice assistants, it’s not just a line. In this case, science fiction has set the standard for what constitutes success in the field — an interactive, almost telepathic computer that we can talk to just like we would a human. Star Trek, Her, and other movies have all shown the dream, but it’s still a long way off.
Getting there will mean taking many, many baby steps, but Microsoft thinks it’s on the right path after its acquisition of Semantic Machines. Besides technology, Semantic brings a keen philosophy to voice interactions — there should be no “dead ends.” Which is to say, when you make a query to one of these assistants, and it can’t complete the query for whatever reason, it shouldn’t just give up and tell you it doesn’t understand. Instead, it’ll keep asking follow-up questions until it gets you to what you want, and if it still can’t do that, it provides an opportunity to teach the AI what the right answer is.
The semantic machines team is hard at work with the Cortana folks to bring those capabilities into Cortana. I’m still, along with my team, really focused on this problem… how do we go from, ‘Hey, Alexa, turn on the lights,’ to what is a much more complicated and natural-sounding articulation.”
Getting to those natural interactions, tailored to each user, will clearly be a long process, with many steps, but Soltero is clear what he thinks is the next one.
“The wake word is the first thing to go.”
You can subscribe to or , and we’d appreciate it if you could leave a review. Feel free to hit us with questions and comments by tweeting to or attaching the #MashTalk hashtag. We welcome all feedback.