What Is Google Gemini?

Learn about the key features and functionalities of Google Gemini.

Google Gemini refers to two things: a family of large language models and a chatbot powered by a Gemini LLM. There are two major versions, 1.0 and its successor, 1.5. Let’s break it down.

Gemini AI model

Launched in December 2023, the Gemini AI models are Google’s next step forward after PaLMPathways Language Model and LaMDALanguage Model for Dialogue Applications. Gemini models are built on top of transformer decoders, enhanced with architectural improvements and model optimization. Gemini 1.0 models are trained to support a context length of 32K tokens, with Gemini 1.5 extending the context window to an impressive 2 million tokens.

Press + to interact

Gemini models are designed to be multimodal, meaning they can handle a variety of input formats beyond just text. Text is a natural starting point, but Gemini can also work with images, charts, PDFs, videos, and audio. For videos, it breaks them down into a sequence of frames. This allows these frames, or any images, to seamlessly interweave with text or audio inputs. Gemini can also directly process audio without requiring any text conversion beforehand. For outputs, Gemini models can generate text and images natively. Native image generation forgoes the need to have an intermediate natural language description before an image can be generated.

Press + to interact
An overview of the Gemini model
An overview of the Gemini model

Gemini is offered in four different versions:

  • Gemini Ultra is the most powerful and feature-rich version of Gemini. It excels at complex tasks and can handle massive amounts of data and context. It is ideal for research labs and large companies working on cutting-edge AI applications. Access to Gemini Ultra is limited at the time of writing, and approvals are at Google’s discretion.

  • Gemini Pro is the best all-rounder version of Gemini. It offers a context window of up to 2 million tokens and has been crushing benchmarks. It can handle tasks such as generating different text, code, and images. The latest version of Gemini 1.5 Pro has scored better in most benchmarks than the Gemini 1.0 Ultra.

  • Gemini Flash is a lightweight and cost-efficient model that offers a good mix of speed and efficiency. It offers performance similar to Gemini Pro but at a much lower cost. It offers a context window of up to 1 million tokens.

  • Gemini Nano is the most lightweight and efficient version of Gemini. While it is not as powerful as the other versions, it can still perform many tasks well. This makes it a good choice for enabling AI applications on mobile devices or devices with lower processing power.

The Google Pixel 8 Pro and Samsung S24 Series use Google’s Gemini Nano model on-device for quick and efficient processing.

Gemini chatbot

The Gemini chabot was formally known as Bard. Google announced Bard in February 2023 as a GenAI chatbot powered by LaMDA. Since then, the chatbot has received multiple updates and features. The chatbot was switched to use the newer PaLM model before finally switching to the Gemini model and merging under the same Gemini brand. The chatbot uses a tuned version of a Gemini’s 1.0 Pro model by default. The chatbot allows for image upload and voice input as well. A paid subscription, named Gemini Advanced, provides access to the Gemini 1.5 Pro with a 1 million token context window and the ability to upload files.

Press + to interact
Gemini chatbot web interface
Gemini chatbot web interface

Gemini 1.5

Gemini is still a work in progress. New features and updates drop regularly. In February 2024, Gemini 1.5 was revealed. This was a generational leap forward and allowed Gemini to be more capable than ever. Compared to earlier versions, Gemini 1.5 achieves similar performance while requiring less training data and being more efficient to run. The context window of Gemini was increased from 32K tokens to 1 million tokens. In May 2024, Gemini 1.5 was updated again, with Gemini 1.5 Pro now offering a context window of 2 million tokens. To put that into perspective, Gemini 1.5 Pro can process up to 2 hours of video, 22 hours of audio, 60,000 lines of code, or 1.4 million words of text.

Press + to interact
What 2 million tokens can enable
What 2 million tokens can enable

Even more impressive is that Gemini 1.5 was tested during research with a context window of up to 10 million tokens and showed promising results. The future truly is brimming with possibilities!

Gemini and the Google ecosystem

While OpenAI’s ChatGPT broke records for user adoption, Gemini has also been growing rapidly. Google’s ecosystem of applications and services stands to gain the most from AI integrations. Google has been releasing smart features in its workspace suite of applications, and Android has also been getting smarter thanks to Gemini.

Press + to interact
The Gemini app on Andriod
The Gemini app on Andriod

Let’s look at a few key experiences powered by Gemini:

  • that is The Gemini app has been released for both iOS and Android. The Gemini mobile app is well-integrated with Google apps such as YouTube and Maps. For Android devices with the application installed, Gemini is now the app opened by default using the “Hey Google” key phrase.

  • Gemini Nano powers the summary transcription features in the Google Recorder app and enables more contextual auto-replies called “Smart Replies” in a few applications. Google’s screen reading accessibility feature now uses Gemini to visualize and understand the images on the screen for users with low vision.

  • Google announced Gemini Nano with multimodality during Google’s I/O event in 2024. The highlight feature was real-time “Scam Detection.” Gemini would be able to listen to phone calls and alert you if it detects that the caller might be trying to scam you.