ChatGPT-4o vs Google Gemini Live — how the new AI assistants stack up

Google Gemini vs GPT 4o
(Image credit: Google / Future)

Google launched a new artificial intelligence product at its Google I/O event on Tuesday — Gemini Live. We all assumed that is what the Gemini Assistant in Android was supposed to do but this is Google and anything goes.

If it wasn’t for the fact it comes just one day after OpenAI’s first consumer product event, I’d ponder over whether Gemini Live was launched to take on ChatGPT Voice. Both are built using native multi-modal AI models and have impressive voice and video capabilities.

Currently in the global AI race the front runners seem to be OpenAI and Google, with the former seemingly cozying up to Apple and the iPhone and the latter in control of Android. Forget AI devices like the Rabbit r1 or the Humane Pin — the short-term winner is the smartphone.

Both ChatGPT Voice and Gemini Live are being integrated into an existing AI product and neither is available today — but how else do these next-generation assistants compare?

How do Gemini Live and ChatGPT 4o compare?

Google is on the back foot a little when it comes to credibility, especially around showing off live video analysis and voice capabilities. When it announced Gemini Ultra last year it did so with a video of it responding to real-time video — only it wasn’t real-time or video.

However, this time they made a point of making the tech, at least the underlying “Project Astra” aspect of it including speech and video conversation available to try out at I/O.

Both offer a conversational, natural language voice interface, both offer the potential for live video analysis through a smartphone camera and both seem to be fast enough for a truly natural conversation where you can interrupt the AI mid-flow.

However, there are some notable differences. OpenAI’s ChatGPT Voice sounds more natural, can detect and respond to emotion and vocal tones and even adapt in real-time to how you ask it to speak. I didn’t see evidence of that capability from Gemini Live.

The other big difference is around multimodality. Gemini still relies on other models for output including using Imagen 3 for images and Veo for video. GPT-4o is natively multimodal in both directions — the o stands for omni, or in all directions. It creates its own images and sound.

Gemini Live vs GPT-4o: The future of voice assistants

Google Glass Enterprise Edition 2

(Image credit: Google)

The world seems to be moving towards voice and away from text input. When I first watched the OpenAI announcement my reaction was that this is a paradigm shift in human-computer interface, one as big as the launch of the mouse or the touch screen.

I still hold that view and the fact Google is also launching a native, natural-sounding voice interface further cements that. Even Meta has its MetaAI, a voice bot available in its VR headsets and the Ray-Ban smart glasses.

While the smartphone might be the winner for now, its clear the real form factor for these voice AI models is smart glasses. Available with cameras at eye height and arms to send soundwaves into your ears — they are the perfect AI device.

The question is whether OpenAI moves into hardware, launching its own pair of smart glasses or whether this is the new Siri and will power a future Apple Glasses product. Also, whether Google is really brave enough to resurrect Google Glass.

More from Tom's Guide

Back to MacBook Air
Storage Size
Screen Size
Any Price
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?