cheat sheet

google-genai

Package-level reference for google-genai (the current Gemini SDK) and its predecessor google-generativeai — install, auth, versioning, and alternatives.

google-genai

Renamed from google-generativeai. This article previously lived at /sections/packages-pip/pip-google-generativeai. The legacy SDK is in maintenance mode; google-genai (imported as from google import genai) is the active SDK unifying Google AI Studio and Vertex AI access. Both packages coexist on PyPI.

What it is

google-genai is Google's current Python SDK for the Gemini family of models, unifying access to both the Google AI Studio API and Vertex AI. It supports text, multimodal input (images, video, audio, PDF), chat sessions, function calling, embeddings, context caching, and the File API for large uploads. Import as from google import genai.

The predecessor google-generativeai was Google's first-generation SDK for AI Studio only. It still works for existing code but receives only maintenance updates; new features land on google-genai. New projects should start with google-genai; existing google-generativeai code continues to run but is encouraged to migrate.

Install

bash
pip install google-generativeai

Output: installs the legacy SDK — import google.generativeai as genai

bash
pip install google-genai

Output: installs the newer unified SDK — from google import genai

bash
uv add google-generativeai

Output: dependency resolved + added to pyproject.toml

bash
poetry add google-genai

Output: updated lockfile + virtualenv install for the successor SDK

Versioning & Python support

  • google-generativeai is on the 0.x line and entering maintenance — bug fixes and minor model-id additions, but no major new features. Pin and avoid surprises.
  • google-genai is the actively-developed SDK, also pre-1.0; expect minor API churn until 1.0 ships. Treat as alpha for production.
  • Both support Python 3.9+.
  • The Gemini model IDs evolve faster than either SDK — gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash, gemini-2.5-*. Sometimes a model is only callable via the newer SDK or via Vertex AI; check model availability per-SDK before betting on a feature.
  • Old name google.ai.generativelanguage is the underlying gRPC client; ignore it unless you're writing a custom transport.

Package metadata

  • Maintainer: Google (the google-gemini and googleapis GitHub orgs)
  • Project home (legacy): github.com/google-gemini/generative-ai-python
  • Project home (successor): github.com/googleapis/python-genai
  • Docs: ai.google.dev
  • PyPI: pypi.org/project/google-generativeai + pypi.org/project/google-genai
  • License: Apache-2.0
  • Governance: vendor-maintained, open development on GitHub
  • First released: google-generativeai in 2023; google-genai in late 2024

Optional dependencies & extras

Both SDKs are relatively self-contained Python packages over Google's gRPC / REST APIs. Core deps include:

  • google-api-core
  • google-auth — Application Default Credentials (ADC) flow
  • protobuf
  • pydantic (newer SDK)
  • httpx (newer SDK; the legacy SDK leans on requests)
  • tqdm for streaming progress

Notable extras and companions:

  • For Vertex AI authentication and quota, the legacy SDK requires google-cloud-aiplatform separately; the new google-genai SDK has a vertexai=True client flag and bundles what it needs.
  • File uploads (PDFs, videos) use the Gemini File API — both SDKs handle it natively, no extra package.
  • Async clients exist in both (genai.GenerativeModel(...).generate_content_async() legacy; client.aio in the successor).

Alternatives

PackageTrade-off
openaiThe dominant LLM SDK; comparable feature set; different model family.
anthropicClaude SDK; cleanest tool-use API; no native multimodal video yet.
mistralaiOpen-weight + hosted Mistral models. Smaller multimodal surface.
cohereEmbeddings + reranker focus, plus chat.
google-cloud-aiplatformVertex AI SDK — broader Google Cloud surface (deployment, batch prediction, tuning). Heavier.
litellmProvider-agnostic OpenAI-compatible wrapper over many SDKs, Gemini included.

Common gotchas

  1. Package rename / dual existence. google-generativeai (legacy, import google.generativeai as genai) and google-genai (new, from google import genai) are different packages with overlapping namespaces. Mixing both in the same project causes confusing import shadows. Pick one.
  2. Auth flows differ. AI Studio key (GOOGLE_API_KEY env) is one path; Application Default Credentials via gcloud auth application-default login is another (Vertex AI). The legacy SDK is AI-Studio-only; the new SDK supports both with a vertexai=True flag.
  3. Safety-setting JSON shapes differ between SDKs and over time. The enum names (HARM_CATEGORY_HARASSMENT, etc.) and threshold strings have shifted; copy-pasting safety config across SDK versions frequently silently fails to apply.
  4. Quota limits per model. Free-tier AI Studio keys have low per-minute limits and rate-limit with ResourceExhausted — wrap calls with tenacity/backoff.
  5. Function-calling response shapes differ. Legacy returns a parts list with function_call items; new SDK returns a function_calls accessor. Tutorials targeting one don't run on the other.
  6. File API uploads have a TTL. Files uploaded for context expire after ~48 hours. Re-upload for long-lived sessions.
  7. Streaming behaviour. Both SDKs stream chunks, but the chunk granularity depends on the model and region — don't rely on chunk boundaries being words or sentences.

Evaluation & observability

Gemini is a hosted model — evaluation patterns are the same as for any LLM API: regression suites, A/B tests, and trace-based observability.

  • langsmith traces with @traceable decorators capture every call, latency, and token count.
  • langchain.evaluation / ragas / deepeval all support Gemini via the LangChain ChatGoogleGenerativeAI wrapper.
  • Vertex AI Model Garden ships its own evaluation tooling for fine-tuned Gemini variants.
  • Per-region latency. Different regions have different p99 latency profiles; benchmark before locking in.
  • Token counts. Gemini's count_tokens API is the authoritative source for budget tracking; tokenizer guesses from other libraries diverge.

Real-world recipes

The recipes below cover the most common patterns — function calling, multimodal input, embeddings, and streaming — using the newer google-genai SDK where possible.

Recipe: function calling with the new SDK

python
from google import genai
from google.genai import types

client = genai.Client(api_key="...")

tools = [types.Tool(function_declarations=[types.FunctionDeclaration(
    name="get_weather",
    description="Get current weather for a city.",
    parameters=types.Schema(
        type=types.Type.OBJECT,
        properties={"city": types.Schema(type=types.Type.STRING)},
        required=["city"],
    ),
)])]

resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the weather in Berlin?",
    config=types.GenerateContentConfig(tools=tools, temperature=0),
)
for call in resp.function_calls:
    print(call.name, dict(call.args))

Output: get_weather {'city': 'Berlin'} — the model picked the tool and filled the schema.

Recipe: multimodal — image + text

python
from google import genai
from pathlib import Path

client = genai.Client(api_key="...")

img = client.files.upload(file=Path("./diagram.png"))
resp = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=["Describe this architecture diagram in three bullets.", img],
)
print(resp.text)

Output: descriptive bullets. The File API handles uploads >20 MB; small images can be passed inline as bytes.

Recipe: streaming chat

python
chat = client.chats.create(model="gemini-2.5-flash")

for chunk in chat.send_message_stream("Explain RAG in two paragraphs."):
    print(chunk.text, end="", flush=True)

Output: tokens stream in to stdout; the chat retains history for follow-ups.

Recipe: structured output via response schema

python
from pydantic import BaseModel

class Movie(BaseModel):
    title: str
    year: int
    rating: float

resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Give me three classic sci-fi films as JSON.",
    config={"response_mime_type": "application/json", "response_schema": list[Movie]},
)
movies = [Movie.model_validate(m) for m in resp.parsed]

Output: parsed Pydantic instances — no manual JSON extraction.

Recipe: vision QA over a PDF

Gemini's File API accepts PDFs directly — the model reads layout and text together, no separate extraction step required.

python
import time
from pathlib import Path
from google import genai

client = genai.Client(api_key="...")

pdf = client.files.upload(file=Path("./contract.pdf"))

# Wait for processing (files start ACTIVE for small docs; large files queue)
while pdf.state.name != "ACTIVE":
    time.sleep(1)
    pdf = client.files.get(name=pdf.name)

resp = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=[
        "Summarise this contract, listing each party's obligations.",
        pdf,
    ],
)
print(resp.text)

Output: structured summary with party-by-party obligations; cheaper than running OCR + summarisation separately.

Recipe: long-context with cached content

For prompts with a large stable prefix (system instructions, retrieved corpus), context caching reduces both cost and latency.

python
from google.genai import types

# Create cache once
cache = client.caches.create(
    model="gemini-2.5-flash",
    config=types.CreateCachedContentConfig(
        display_name="company-handbook",
        contents=[handbook_content],   # large stable text
        system_instruction="Answer using only the cached handbook.",
        ttl="3600s",
    ),
)

# Reuse across many calls
resp = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is our PTO policy?",
    config=types.GenerateContentConfig(cached_content=cache.name),
)

Output: subsequent calls skip the cached prefix tokens — significant cost savings on repeated chat over the same corpus.

Recipe: embedding pipeline

python
texts = ["Linux pipes", "Python decorators", "TLS handshake"]
result = client.models.embed_content(
    model="text-embedding-004",
    contents=texts,
)
import numpy as np
vectors = np.array([e.values for e in result.embeddings])
print(vectors.shape)

Output: shape (3, 768) — standard semantic-search vectors.

Cost & rate-limit management

Gemini's cost story is generally lower than peer providers, but free-tier limits are aggressive. Production traffic should use a paid tier with explicit quota management.

  • Choose the right Gemini tier. Flash models are ~10-25× cheaper than Pro; Nano runs on-device. Pick the cheapest model that meets quality.
  • Free-tier quotas. AI Studio free keys impose tight per-minute and per-day limits. Migrate to a billed key for anything beyond prototyping.
  • tenacity / backoff for retries. ResourceExhausted errors are normal under load; retry with jittered exponential backoff (3-5 attempts).
  • Context caching. Long stable prefixes can be cached via the cachedContents API — significant savings for chat with a fixed system prompt.
  • Batch endpoints. Gemini supports asynchronous batch generation at ~50% discount for non-latency-sensitive workloads; use when feasible.
  • File API TTL. Uploaded files expire (~48 hours). Don't re-upload the same file per request — cache the file reference and refresh on TTL expiry.
  • Streaming reduces perceived latency but does not reduce total tokens; for cost control, set max_output_tokens.
  • Per-region pricing. Vertex AI charges differ by region; AI Studio is flat-priced.

Version migration guide

The big migration is google-generativeaigoogle-genai. Both SDKs coexist on PyPI; new projects should default to google-genai.

Aspectgoogle-generativeai (legacy)google-genai (successor)
Importimport google.generativeai as genaifrom google import genai
Constructiongenai.configure(api_key=...) + genai.GenerativeModel(...)client = genai.Client(api_key=...)
Generationmodel.generate_content(...)client.models.generate_content(model=..., contents=...)
Chatmodel.start_chat()client.chats.create(model=...)
Vertex AIRequires google-cloud-aiplatform separatelygenai.Client(vertexai=True, project=..., location=...)
Function-calling returnparts[].function_call listresponse.function_calls accessor
Asyncmodel.generate_content_async(...)client.aio.models.generate_content(...)

Migration discipline:

  1. Don't run both in the same project — namespace collisions are confusing.
  2. Update safety_settings shape when migrating — enum names and threshold strings have changed across both SDKs.
  3. Model IDs are SDK-agnostic but availability isn't — verify model availability in your target SDK before assuming.
  4. Hedge: minor version churn in google-genai is ongoing; pin both google-genai and the Gemini model IDs in production.

Troubleshooting common errors

  • ImportError: cannot import name 'genai' — installed the wrong package. from google import genai requires google-genai; import google.generativeai as genai requires google-generativeai.
  • ResourceExhausted: 429 — quota / rate-limit. Add exponential backoff, lower QPS, or upgrade to a billed key.
  • PermissionDenied: 403 — API key invalid, or Vertex AI path with missing ADC. Run gcloud auth application-default login.
  • InvalidArgument: 400 on multimodal calls — file format unsupported or file exceeded inline-bytes limit (~20 MB). Upload via the File API instead.
  • Function calling returns text, not a call. Model decided not to invoke; check the prompt for ambiguity or force a tool via tool_config.
  • Safety filters block legitimate content. Adjust safety_settings thresholds — but understand what you're loosening; defaults are conservative for a reason.
  • File API references expire. Files have a TTL; re-upload after expiry.
  • Streaming chunks arrive coarse. Chunk granularity depends on region and load; do not rely on chunk == word boundaries.

Security considerations

  • API key storage. Treat Gemini API keys like any other production secret — environment vars, secret managers, never in source.
  • ADC vs API key. For Vertex AI, ADC (Application Default Credentials) is the right path in GCP-hosted environments — no long-lived keys.
  • Safety settings are not content filters for your app. They block obvious abuse categories at the API; your application still needs output review for domain-specific risks.
  • PII in prompts. Hosted inference means PII transits Google's infrastructure. Check your regulatory posture; consider Vertex AI's regional endpoints for data residency.
  • Function-call abuse. Tool descriptions are part of the prompt; an attacker controlling user input can prompt-inject the model into calling tools they shouldn't. Validate tool args before execution.
  • File API uploads. Uploaded files are accessible by the project's API key for the file's TTL; treat the API key as having read access to all uploaded files.
  • Key rotation. Rotate on staff offboarding; revoke from the AI Studio / Cloud Console UI.

Production deployment

google-genai is a thin Python wrapper over Google's gRPC / REST APIs — production deployment is mostly secret management and concurrency control.

  • Client reuse. genai.Client(...) is thread-safe; create once per process and reuse. Repeated construction has connection overhead.
  • Concurrency. The client uses httpx (async) or requests (sync) under the hood — set sensible concurrency limits to respect provider QPS quotas.
  • Async paths. Use client.aio.* from async code (FastAPI, aiohttp). Mixing sync + async in the same handler block ties up the event loop.
  • Region selection (Vertex AI). Set the location parameter explicitly; default may not match your data-residency requirements.
  • Retry policy. Wrap calls with tenacity — retry on ResourceExhausted, Internal, Unavailable. Do not retry on InvalidArgument.
  • Health checks. A /health endpoint that runs a trivial 1-token generation catches API-key revocation faster than process-level checks.

Performance tuning

Gemini's models cover a wide latency/quality range; performance tuning is largely model selection plus standard API-client hygiene.

  • Model selection. Flash variants are 5-10× faster than Pro at smaller quality cost; Nano runs on-device with negligible cost but limited reasoning depth.
  • max_output_tokens everywhere. Unbounded output means longer p99 latency and larger bills.
  • Streaming for interactive UX. First-token latency matters more than total — stream where possible.
  • Connection reuse. A single genai.Client instance shares an HTTP session; constructing per-request adds connection setup latency.
  • Async client (client.aio.*) under high concurrency. Sync clients block the event loop under FastAPI; async clients don't.
  • Batch endpoints. For non-latency-sensitive workloads, the batch API at half-price beats real-time calls.
  • Context caching for stable prompts. Long stable prefixes (system prompts, retrieved corpora) cache server-side; reuse them.
  • Inline vs File API. Small images inline avoid an extra round-trip; large or repeated files belong in the File API.

Multi-provider patterns

When Gemini is one of several models in your stack, the integration patterns are similar to other providers.

  • LiteLLM routes between Gemini, OpenAI, Anthropic, and others behind one OpenAI-compatible HTTP API. Useful for centralised quota and key management.
  • LangChain init_chat_model("gemini-2.5-flash", model_provider="google_genai") abstracts Gemini behind the same ChatModel interface as other providers.
  • Provider failover. model.with_fallbacks([backup]) in LangChain (or hand-rolled try/except) routes around Gemini outages by falling through to OpenAI / Anthropic.
  • Tokenizer parity. Gemini's tokenizer isn't tiktoken-compatible — for cross-provider token budgeting, use the provider-native count_tokens API.
  • Vertex AI vs AI Studio. Same models, different billing / quota / auth. google-genai lets you switch with a constructor flag.

When to choose which Gemini SDK

The dual-SDK situation is real and persistent — picking the right one matters.

NeedPick
New project, AI Studio API keygoogle-genai
New project, Vertex AI authgoogle-genai (with vertexai=True)
Existing google.generativeai-based code that worksStay on google-generativeai for now; migration is not urgent
Tutorials and Quickstarts from 2024Most still use google-generativeai; modern blog posts increasingly use google-genai
Vertex AI tuning jobs, batch predictiongoogle-cloud-aiplatform — the older Vertex SDK has more deployment surface than google-genai
Strict pin / minimum churngoogle-generativeai is in maintenance — fewer surprise API changes
Latest features (model GA, tools)google-genai gets new features first

Practical tip: check the model availability page before assuming a model works in both SDKs — sometimes a new model is gated to one path.

When NOT to use this

  • You only need OpenAI-compatible HTTP. LiteLLM exposes Gemini via OpenAI-compatible endpoints — skip the SDK entirely.
  • You need deep GCP integration. google-cloud-aiplatform (Vertex SDK) exposes broader Vertex features (tuning jobs, batch prediction, model deployment) that google-genai does not.
  • You're building strictly for Anthropic/OpenAI. Provider abstraction with a one-provider stack is just dead code.
  • Single-call utilities. curl + the REST API is sometimes simpler than the SDK for one-off scripts.
  • Air-gapped environments. Gemini is a hosted API. For on-prem, use a local model via transformers or vLLM instead.

Ecosystem integrations

LayerIntegration
Frameworkslangchain-google-genai and langchain-google-vertexai wrap both SDKs. LlamaIndex has a first-party Gemini integration.
Multi-providerlitellm exposes Gemini via OpenAI-compatible HTTP.
Observabilitylangsmith, OpenTelemetry (OpenInference Gemini instrumentation) trace calls.
Authenticationgoogle-auth for ADC flows; AI Studio API key for simpler setups.
File / multimodalThe Gemini File API is built into both SDKs; no extra package. PDFs, images, video, audio supported.
Vertex extensionsgoogle-cloud-aiplatform adds tuning jobs, batch prediction, model deployment to Vertex AI.
ToolsFunction calling integrates with Pydantic schemas; LangChain and LlamaIndex tool abstractions both work.

See also