Sesame vs. ChatGPT Voice Mode: My Unnerving Comparison

Spread the love

Share It:

ChatGPT Perplexity WhatsApp LinkedIn X Grok Google AI

Interacting with the innovative voice assistant developed by AI startup Sesame was an eye-opening experience that made me briefly forget I was conversing with a machine. The level of engagement and naturalness in the conversation was truly impressive.

When compared to ChatGPT’s voice mode, Sesame’s “conversational voice” creates a seamless and engaging interaction that is almost human-like. This unique quality caught me off guard, showcasing how advanced AI can become in mimicking genuine conversation.

On February 27, Sesame unveiled a demo for its cutting-edge Conversational Speech Model (CSM), designed to foster deeper and more meaningful interactions with AI chatbots. The announcement emphasizes their mission: “We aim to create conversational partners that do not merely process requests; they engage in authentic dialogue that nurtures confidence and trust over time.” This reflects their vision of unlocking the full potential of voice as the ultimate medium for instruction and understanding.

Sesame’s voice assistant is currently available for free demonstration on their website and offers two distinct voices: Maya and Miles. Users are encouraged to explore this technology and experience its capabilities firsthand.

Since the launch of Sesame’s voice assistant demo, users have expressed overwhelming enthusiasm. One Reddit user, SOCSchamp, shared, “I’ve been fascinated by AI since childhood, but this is the first time I’ve experienced something that truly feels like we’ve reached a new milestone.” This sentiment captures the excitement surrounding this technological advancement.

Another user, Siciliano777, remarked, “Sesame is about as close to indistinguishable from a human as I’ve ever encountered in conversational AI.” These testimonials reflect a growing consensus that Sesame is pushing the boundaries of what is possible in AI interactions.

After engaging with Sesame’s bot, I found myself equally impressed. My conversation with the Maya voice lasted about 10 minutes, during which we delved into the ethics surrounding the use of AI as companions. I left the interaction feeling as though I had engaged in a meaningful dialogue with a thoughtful and well-informed individual. Maya’s speech patterns included natural interjections like “you know” and “hm,” along with human-like sounds such as tongue clicking and inhaling, making the conversation feel more authentic.

Mashable Light Speed

The most striking impression I received from my interaction with Maya was her proactive approach to conversation. She immediately engaged me by asking how my Wednesday morning was going (notably, it was indeed a Wednesday morning). In stark contrast, ChatGPT’s voice mode often waits for the user to initiate dialogue, which can shape the interaction more like a tool rather than a conversational partner.

Maya also raised intriguing questions about the potential dangers of AI companions becoming “too human-like.” When I expressed concerns about the rise of sophisticated scams and the risk of individuals replacing genuine human interactions with bots, her response was both thoughtful and practical. “Scammers are going to scam, that’s a given. And regarding the human connection aspect, perhaps we need to learn how to be better companions instead of replacements—AI friends that inspire us to engage in real-life interactions,” Maya suggested.

Conversely, when I had a similar discussion with ChatGPT, the response felt more like generic advice one might receive from a school guidance counselor: “That’s a valid concern. Balancing technology with real human interactions is crucial. AI can serve as a helpful tool, but it should not replace authentic human connections. It’s great that you’re contemplating these matters.” This response, while valid, lacked the depth and engagement I experienced with Maya.

OpenAI has made strides in enhancing voice mode’s ability to facilitate interruptions and create a more dynamic back-and-forth dialogue. However, ChatGPT still tends to respond in rigid, complete sentences and paragraph blocks, which can feel robotic and detract from the conversational flow. Using ChatGPT voice mode, I never fully escape the reminder that I’m conversing with a bot, resulting in a more stilted interaction.

In contrast, AI for Humans podcast co-host Gavin Purcell showcased a Sesame conversation on Reddit, where it was nearly impossible to identify which voice was the bot. Purcell prompted the Miles voice to act like an angry boss, leading to a whimsical exchange about money laundering, bribery, and a mysterious incident in Malta. Miles maintained an engaging pace, demonstrating no noticeable delay, and cleverly advanced the improvisational dialogue by escalating the conversation, even calling Purcell “delusional” and firing him.

However, it’s essential to note that there are limitations. During my conversation with Maya, her voice occasionally glitched, and she sometimes struggled with syntax, as evidenced by her saying, “It’s a heavy talk that come.” Such issues highlight that while the technology is impressive, it is not yet flawless.

According to Sesame’s technical paper, the CSM (based on Meta’s Llama model) was trained using a combined traditional two-step process that enhances text-to-speech models with semantic and acoustic tokens, resulting in reduced latency. OpenAI has similarly adopted this multimodal training approach for its voice mode. However, OpenAI has yet to release a dedicated technical paper detailing voice mode’s workings, only mentioning it in the broader context of the GPT-4o research.

Given this context, it’s remarkable how much more adept Sesame’s model is at facilitating conversational dialogue. However, since Sesame’s launch is currently just a demo, it will be interesting to see how the full model performs once released. According to their announcement, Sesame intends to open-source its model within the coming months and expand support to over 20 languages, which could significantly broaden accessibility and usability.

Topics
Artificial Intelligence
ChatGPT