Covering Scientific & Technical AI | Sunday, December 22, 2024

Google’s Gemini 2.0 Paving the Way for the Agentic Era 

Tech companies are in a relentless pursuit to integrate AI into every aspect of their offerings, from enhancing existing products to launching entirely new AI-powered solutions. The competition in this space is fierce, with leading players racing to develop cutting-edge models that can secure their position as leaders in the next wave of technological innovation. 

Google has unveiled Gemini 2.0,  a new version of its flagship AI model that is designed to become the foundation for GenAI agents and assistants. 

The search giant has been on a mission to organize the world’s information for more than 26 years. At the end of last year, the company introduced Gemini 1.0, which it claimed was the first model built to be natively multimodal. The tech giant is now expanding its efforts into AI, aiming to reshape how information is structured and accessed. 

“No product has been transformed more by AI than Search,” shared Google CEO Sundar Pichai via a blog. “Our AI Overviews now reach 1 billion people, enabling them to ask entirely new types of questions — quickly becoming one of our most popular Search features ever.” 

“As a next step, we’re bringing the advanced reasoning capabilities of Gemini 2.0 to AI Overviews to tackle more complex topics and multi-step questions, including advanced math equations, multimodal queries, and coding. We started limited testing this week and will be rolling it out more broadly early next year. And we’ll continue to bring AI Overviews to more countries and languages over the next year.”

A standout feature of the new model is Gemini 2.0 Flash, which Google claims “outperforms 1.5 Pro on key benchmarks, at twice the speed”, and supports multimodal inputs such as images, text, video, and even multilingual audio. It also supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) audio. 

The speed and efficient enhancements make Gemini more suitable for applications that require rapid response, such as AI agents and real-time assistants. 

The model also has built-in support for external tools, such as Google Search and third-party functions. This enables it to gather information, execute tasks, and improve its efficiency across a range of use cases.

Google shared that developers can test Gemini 2.0 Flash through Google AI Studio and Vertex AI, with a plan for general availability in early 2025. A chat-optimized version of 2.0 Flash experimental is available on desktop and mobile web and is expected to be available on the Gemini mobile app soon.  

To address concerns about the misuse of AI-generated content, Google has integrated its SynthID watermarking technology into all audio and visual outputs produced by Gemini 2.0 Flash.

Google is also exploring agentic possibilities with Gemini 2.0. The company has introduced a new feature called Deep Research, designed to assist users with conducting detailed online research. This tool allows users to input a question, after which it creates a research plan that can be revised or approved. 

Once approved, the system navigates the web autonomously, gathering and refining relevant information over several iterations. The end result is a concise report summarizing key findings, complete with source links for further review. 

Deep Research is ideal for use cases that involve in-depth analysis as it reduces time spent on manual research. This allows users to redirect their focus to higher-level tasks such as critical analysis and creative input. 

“Earlier this year, we shared our vision of building more agentic capabilities into our products; Deep Research is the first feature in Gemini to bring that vision to life,” Google noted in a blog post on Deep Research. “We’ve built a new agentic system that uses Google's expertise of finding relevant information on the web to direct Gemini's browsing and research.”

Gemini 2.0 enhances Google’s Project Astra, a visual system designed to identify objects, assist with navigation, and even help locate misplaced items. With the upgrades in Gemini 2.0, Astra's capabilities are expanded, offering more precise object recognition and improved real-time assistance.

Other notable upgrades include the new Project Mariner, formerly known as Jarvis. It’s an experimental Chrome extension that allows an AI agent to run the browser for the user. Gemini 2.0 is also improving Jules, an AI-driven tool designed to assist developers in locating and fixing errors in code. 

It won't be surprising if Google integrates Gemini 2.0 across its entire ecosystem. The model is set to power AI Overviews in Google Search, which now reaches over 1 billion users. While issues like inference costs and performance efficiency still persist, Google may have to also contend with emerging threats, such as safety risks posed by autonomous agents. 

Gemini 2.0 is poised to make a significant impact as Google prepares to expand its reach. Although currently in its early stages, plans for its adoption across Google's platforms suggest a strong commitment to integrating advanced AI into everyday technology. 

AIwire