Google shows off Project Astra: a demonstration of the future effects of combining smart glasses with responsive AI - XR Navigator News

(XR Navigator Information)GoogleToday's I/O developer conference showcased a new product calledProject Astra's project and demonstrated responsive AI running on thesmart glassesThe effect of Project Astra is a "responsive agent that sees and speaks". Project Astra, they say, is a "responsive agent that sees and speaks".

However, Google didn't reveal too many specifics and didn't provide more details in the interview.

In a related blog post, Google wrote: "Google DeepMind's mission is to responsibly develop artificial intelligence for the benefit of humanity. As part of that mission, we've been working to develop a general-purpose AI agent that helps in everyday life. So today we're sharing our progress in building the AI assistant of the future with Project Astra, an advanced, responsive agent that sees and speaks."

As you can see in the video above, Project Astra consists of two parts, each of which is filmed in real-time one-shot. In the first half, the woman wearing smart glasses interacts with the AI agent primarily through her smartphone. And in the second half, the woman interacts with the AI agent directly through her smart glasses.

视频显示，这款眼镜具有图形叠加功能。回答问题时，眼镜会在用户视野中同时显示相关的文字转录和信息。不过，当前模型存在一定的延迟，无法即时应答。

Google explains that in order to be truly useful, an intelligent agent needs to understand and react to a complex, dynamic world just like a human would, being able to absorb and remember everything it sees and hears in order to understand the scenario and take action. At the same time, it needs to be proactive, teachable and personalized so that users can talk to it naturally and without delay.

While the team has made incredible progress in developing AI systems capable of understanding multimodal information, reducing response times to the conversational level is a daunting engineering challenge. In recent years, Google has been working to improve how models perceive, reason, and converse to ensure that the speed and quality of interactions are more natural.

Building on Gemini, the researchers developed a prototype agent that can process information faster by continuously encoding video frames, combining video and voice inputs into a timeline of events, and caching them for efficient recall.

By utilizing leading voice models, Google has further enhanced the pronunciation of intelligent agents, providing them with a wider range of intonation. Ultimately, such intelligent agents can better understand their surroundings and respond quickly to conversations.

Google concludes, "With technology like this, it's easy to imagine the future: people will have a professional AI assistant via their phone or glasses."