Curious about Google’s latest AI innovation?

June 16, 2024

Check out the lowdown on Google Gemini and how it’s revolutionizing the tech world!

Artificial intelligence has become this year’s standout technology, but with various versions from numerous companies, it can be quite confusing. Besides OpenAI’s ChatGPT, major players like Google, Apple, and Microsoft are developing their own AI solutions. Google’s latest offering is Gemini, which is just as perplexing as the rest. When I first looked into Gemini, I searched for “versions of Google Gemini” and found an AI-generated summary:

“Google Gemini has three versions: Ultra, Pro, and Nano. Ultra is the largest model designed for complex tasks, Pro is optimal for scaling across various tasks, and Nano is the most efficient for on-device tasks.”

This summary is helpful, but it doesn’t tell the whole story.

What is Gemini?

Gemini is the third astrological sign, symbolized by the twins Castor and Pollux from Greek mythology.

Okay, sorry about that. Actually, Gemini is a chatbot developed by Google, replacing its previous chatbot Bard. It is based on a large language model (LLM) developed by DeepMind, a division of Google.

Gemini serves as both a chatbot and a large language model (LLM). How many variations of Gemini exist? There are several types you may encounter, though we’ll focus on those most relevant. Initially launched in December 2023, Gemini offered three models: Nano for lightweight Android use, Pro for everyday applications, and Ultra for demanding business and enterprise needs.

During Google’s I/O 2024 event on May 14th, they introduced Gemini 1.5 Pro, marking the debut of a “mid-sized multimodal model.” This new Pro version matches the previous Ultra model in power and is designed to enhance existing apps and facilitate the creation of new ones for daily use.

Wait, what?Multimodal?

What about multimodal capabilities? Gemini 1.5 Pro can handle prompts across different modes: text, images, audio, and video.

So that’s it for the models, right?

But that’s not all. There’s also Gemini 1.5 Flash, a faster version tailored for developers working on specific applications, less relevant for general users.

To summarize, there are now four Gemini models developers can utilize: Ultra, Pro, Flash, and Nano. (We’ll explain how you can experiment with them shortly.)

During the Google event I watched, they frequently mentioned “1 million tokens” and “2 million tokens.” What exactly does that mean?

During the Google event, they mentioned terms like “1 million tokens” and “2 million tokens.” What does that mean? Tokens are the fundamental units used to train AI models like Gemini. The more tokens a model supports, the more data it can process and better understand user needs.

Okay, back to Gemini 1.5 Pro. What can I do with it?

Now, what can you do with Gemini 1.5 Pro? For developers, it enables enhancing or creating new apps. Google plans to integrate it into various existing apps and develop new functionalities.

Like what?

For instance, in Google Photos, an upcoming feature called Ask Photos will allow complex queries like finding photos of a grandmother throughout the years while working on carpentry projects.

Additionally, Google Lens will expand its capabilities to include video-based searches alongside text and photo searches. Moreover, Gemini will unify various Google apps in platforms like Docs, Sheets, Slides, Drive, and Gmail, enhancing connectivity and accessibility across these platforms for subscribers starting next month.

The Google search page with an AI Overview at top.

AI Overviews explaining AI Overviews.Screenshot: Google

Even Google’s standard search has been impacted: AI Overviews now appear first in search results, providing an AI-generated summary of what Google believes you’re seeking. However, this change has faced significant criticism, with many users seeking to disable it.

Those are apps that already exist. What about new ones?

There are numerous new applications currently in development:

Project Astra, resembling Google Assistant but with added capabilities to interact via spoken language and visual cues through your phone’s camera, is still in early stages of development.
LearnLM, designed to assist students in finding educational answers from various sources, is already integrated into some products and being introduced to educators.
Veo, a generative AI video model, can create 1080p videos based on user requests, such as a cat in a nightgown and top hat jumping over the Moon. Like Project Astra, Veo is undergoing testing and not yet available to the general public.

This sounds intriguing. How do I get started? And what are the costs involved?

If you’re interested in signing up, you can begin using the Gemini 1.0 chatbot immediately. For access to Gemini 1.5 Pro, which offers faster performance and enhanced capabilities, you’ll need to subscribe to Gemini Advanced. The subscription costs $20 per month after a two-month trial and is part of the Google One subscription, which includes 2TB of data storage and other benefits.

Businesses using Google Workspace can also explore advanced AI levels starting at $20 per month. More details can be found on their website.

Is there anything else important I should be aware of?

A word of caution: like all AI applications, Gemini’s responses can sometimes be inaccurate or unreliable. It’s still an emerging technology, so while it can be beneficial, it’s wise to verify any data you receive. Incorrect information generated by AI engines is referred to as “hallucinations,” highlighting the potential for misleading outputs.

Gemini reply about woodpeckers with overlay suggesting a double check.

It’s not a bad idea to be cautious about Gemini’s answers.Screenshot: Google

With that said, it seems artificial intelligences will remain integral for the foreseeable future. It’s beneficial to gain hands-on experience to understand their workings better. Alongside ChatGPT and Gemini, Microsoft’s upcoming CoPilot Plus PCs with integrated AI-capable hardware, and Apple’s newly announced suite of features under Apple Intelligence, offer diverse opportunities. Depending on your preferred operating system and curiosity level, you can explore various AI chatbots, advanced applications, and additional functionalities.