183. Restaurant Technology Guys Podcast Ep. 183 – Voice Enabled AI to help with Phone orders, Kiosk and Drive through with Innovator SoundHound

SoundHound offers an independent voice AI platform that enables businesses across industries to deliver conversational experiences to their customers. Built on proprietary Speech-to-Meaning® and Deep Meaning Understanding® technologies, SoundHound’s advanced voice AI platform provides exceptional speed and accuracy and enables humans to interact with products and services like they interact with each other—by speaking naturally.


What differentiates SoundHound from competitors in the restaurant space?

  • While many voice AI companies optimize for text, our advanced system uses patented voice AI to optimize for the speech itself. This allows our technology to work swiftly and accurately, deriving meaning from the speaker as they’re talking.
  • Because our AI can understand natural human speech, a customer doesn’t have to modify their language to interact with software, they can just speak as they would to another person.
  • Our voice technology can be deployed by any restaurant to take food and drink orders over the phone, via menu kiosks or at the drive-thru. Its sophisticated technology learns the menu of each business and can answer questions, accept modifications and even upsell, helping the restaurant process more orders quickly and efficiently.
  • Last year, we also announced Dynamic Interaction, a game changing new interface. Where existing voice technology requires wake words and relies on turn-taking with awkward pauses to process requests, Dynamic Interaction uses the twin technologies of fragment parsing – which breaks speech down to partial-utterances and processes them in real-time – and full-duplex audio-visual integration to create an instantaneous, next-generation experience.

Can you expand on how this technology is more sophisticated than others on the market?

  • In SoundHound’s 17-year history of developing cutting-edge voice AI, this is perhaps the most technical leap forward.
  • The technology:
  • Instantly follows and captures fluent speech in real-time – no awkward pauses or “turn-taking” as with some other assistants.
  • Completely ignores off-topic speech – only responding to domain-specific topics, like the items on a menu. 
  • Multimodal, continuous feedback confirms requests via audio and visuals “live” as the customer engages with a device or service – gives firm reassurance that an order or request has been understood accurately
  • Allows users to change, adapt, and delete requests in real-time – food orders can be customized and changed using natural human speech
  • Makes proactive suggestions to the user based on a real-time interpretation of the user’s speech – like a dessert menu popping up onscreen when the customer’s sentence begins “for dessert I’ll have…”
  • Users can input information via voice and touch interface interchangeably and simultaneously
  • Assistant responds with audio and visual output, and intelligently decides when to speak to the user versus simply updating the visual output

