Gemini — Google''s Multimodal LLM Lineup
Gemini — Google's Multimodal LLM Lineup
Gemini is the model series Google DeepMind released in late 2023. Multimodal input — handling images · audio · video · code alongside text — and the very long context that came in with 1.5 are the often-cited features.
1. About Gemini
Google DeepMind released Gemini 1.0 on December 6, 2023. The chatbot previously offered as Bard was unified into Gemini, and a Nano variant was placed onboard devices like the Pixel 8 Pro, spreading the lineup across desktop · mobile · server.
| Time | Model | Note |
|---|---|---|
| 2023-12 | Gemini 1.0 (Ultra · Pro · Nano) | First release. |
| 2024-02 | Gemini 1.5 Pro | 1M token context. |
| 2024-05 | Gemini 1.5 Flash | Fast and cheap variant. |
| 2024-12 | Gemini 2.0 (Flash etc.) | Reinforced multimodal output·tool use. |
| 2025 | Gemini 2.5 Pro · Flash | Reasoning-reinforced variants. |
Position as generations pass:
- Pro · Ultra — Greatest capability. Higher cost · latency.
- Flash — Light variant. Throughput-oriented.
- Nano — On-device built-in small variant.
Exact model names and availability change often by generation · date, so check the model card in official docs each time.
2. The 1M token context
Gemini 1.5 Pro was announced with a standard 1M token context at general availability (with research announcements introducing up to 2M). With very long context, use patterns become possible: putting a whole book · video · code base in at once.
Position effects like "lost in the middle" are still observed, so large context isn't always the answer.
3. Two API entry points
- Google AI Studio (
ai.google.dev) — Individual developers · experiments. Start with one API key. - Vertex AI (Google Cloud) — Enterprise entry point integrated with GCP project · IAM · logging · billing. Controls like data residency (region) · VPC-SC.
The same models, but auth · billing · feature availability · SLA can differ.
4. Call shape
from google import genai
client = genai.Client(api_key="...")
response = client.models.generate_content(
model="gemini-2.0-flash",
contents="Please summarize in one paragraph in Korean.",
)
print(response.text)
REST follows the same line. Inputs like images · PDFs · audio · video are split into Part units inside contents.
5. Multimodal input
| Input | Note |
|---|---|
| Image | PNG · JPEG · WEBP · HEIC. |
| Audio | Voice · music. Captions · summary · analysis. |
| Video | MP4. Frame-based or timestamp-based. |
| Mixed pages · images · text documents. |
Upload limits · supported formats vary by model · generation.
6. Function calling · JSON mode
- Function calling — Pass function signatures to the model; the model produces and returns the call parameters (JSON). The actual call is made by the caller.
- JSON mode · response schema — Force output format to JSON. Schema via JSON Schema or Pydantic.
7. Objective comparison with other models
| Model family | Provider | Release | Trait |
|---|---|---|---|
| Gemini | Google DeepMind | 2023-12 | Multimodal breadth · very long context · GCP integration. |
| GPT (4 · 4o · o1 · o3) | OpenAI | 2022-11 | Tool ecosystem · broad adoption · reasoning model line. |
| Claude (3 · 3.5 · 4) | Anthropic | 2023-03 | Long context · strong writing·coding. |
| Mistral · Codestral | Mistral AI | 2023 | Europe-based · open-weight variants. |
| Llama (3 · 3.1 · 3.2) | Meta | 2023~ | Open weights (license separately). |
| Qwen | Alibaba | 2023~ | Open weights · multilingual. |
Strengths and weaknesses shift quickly by generation · time. Your own domain evaluation is more reliable than a single benchmark.
8. Pricing · context caching
Pricing — Per-token billing (input · output separated, cache · context-caching separate). Free tiers exist in some places, and quotas · constraints differ. Vertex AI is bundled with general GCP billing, so other service costs (storage · logging · network) come along.
Context caching — A feature to cache large system prompts · documents on the server so they don't have to be sent every time, introduced from the 1.5 generation. Anthropic · OpenAI also have similar cache features, with differing pricing · TTL · key definitions per provider.
9. Safety settings · environment variables
The Gemini API allows setting category-level safety classifier thresholds (violence · sexual · harassment · dangerous). Verify the difference between defaults and changed values with your own data.
export GOOGLE_API_KEY=... # macOS · Linux
$env:GOOGLE_API_KEY = "..." # Windows PowerShell
Vertex AI auth is usually via ADC (Application Default Credentials) obtained with gcloud auth application-default login or a service account key file.
10. Spots where you often get stuck
Model name volatility — Aliases like gemini-1.5-pro-latest and date pins (gemini-1.5-pro-002) mean different things. Pinning is safer in operations.
Region constraints — Some models · features are limited to specific regions. Watch the location setting on Vertex AI.
Context limit vs actual limit — Even when 1M tokens is advertised, input·output total and per-model limits are defined separately. Output tokens usually have a separate, smaller cap.
Image · video token conversion — Non-text inputs are internally converted to tokens. Looking only at text tokens for cost calculation is off.
Blocking · filtering — Cases where safety classification blocks input · output. Check the reason · category code in the response.
Response length limit — If you set max_output_tokens small and forget, responses get cut.
AI Studio vs Vertex AI difference — The same code works on one side and needs additional permissions · settings on the other.
Data usage policy — There's notice that data-training-use policies differ between AI Studio free key and Vertex AI. Check the terms.
Closing thoughts
Gemini's appeal is multimodal breadth and very long context. However, model names · prices · limits change often, so for operations: pin the model + your own domain evaluation set + dev verification with WireMock cutting external dependencies — that's the safe path.
Next
- embeddings-deep
- agents-overview
References: Google AI for Developers · Vertex AI Generative AI · Gemini API Models · Google DeepMind Gemini · Gemini 1.5 Report · LMArena · LiveBench.