Google unveils multi-modal Gemini 1.5 with a million token context length!

Google has unveiled Gemini 1.5, their latest next-generation AI model! This multimodal large language model (MLLM), according to the company, shows “dramatic improvements” in a variety of departments. According to Google, this new model might use less computation while achieving quality equivalent to that of Gemini Ultra 1.0, the company’s most sophisticated AI model at the moment. The Pro version of Gemini 1.5 is the first model that the company is making available for early testing. The mid-size multimodal Gemini 1.5 Pro will be made available in private preview by Vertex AI and AI Studio to a limited group of commercial clients and developers.

In a recent blog post, Google CEO Sundar Pichai unveiled the enhanced capabilities of the Gemini 1.5 Pro model. Pichai highlighted the remarkable advancement in information processing, with the model now capable of consistently handling up to 1 million tokens, surpassing its predecessor’s capabilities.

The Mixture-of-Experts (MoE) architecture is the foundation of the Google Gemini 1.5 model. When it comes to models based on MoE, the network is divided into smaller “experts” that are specialized to compute particular tasks, as opposed to traditional Transfer architectures that function as a single, huge neural network. These models selectively engage just the most relevant expert to complete the job, according to the type of input received. This method increases both the model’s efficiency and the output’s quality. Additionally, the model may be trained to do increasingly difficult tasks due to the MoE architecture.

Google said that the “context window” on the Gemini 1.5 model is larger. Tokens, which might be words, images, videos, or codes, make up the context window. A model’s capacity to accept input increases with the size of its context window. Google has expanded the context window capacity from 32,000 tokens on the Gemini 1.0 model to 1 million tokens with Gemini 1.5 Pro, which is currently under testing. According to Google, the new model can process more than 700,000 words, 11 hours of audio, and an hour of video in one go!

Author
Recent Posts

Norah Noahn

Google unveils multi-modal Gemini 1.5 with a million token context length!

Leave a Reply Cancel reply

OpenAI’s Latest Innovation ‘Sora’ Can Help in Generating Videos from Text

Introducing Gemma: Google’s New Family of Open-Source Lightweight AI Models for Developers

Google unveils multi-modal Gemini 1.5 with a million token context length!

Related Articles

WWDC Developer Sessions Reveal Apple’s New AI Future

WWDC 2026 Recap: Apple’s AI Comeback Starts With a New Siri

Apple WWDC 2026: Can Siri Finally Deliver on Apple’s AI Vision?

Leave a Reply Cancel reply

OpenAI’s Latest Innovation ‘Sora’ Can Help in Generating Videos from Text

Introducing Gemma: Google’s New Family of Open-Source Lightweight AI Models for Developers