cheat sheet
google-genai
Package-level reference for google-genai (the current Gemini SDK) and its predecessor google-generativeai — install, auth, versioning, and alternatives.
google-genai
Renamed from
google-generativeai. This article previously lived at/sections/packages-pip/pip-google-generativeai. The legacy SDK is in maintenance mode;google-genai(imported asfrom google import genai) is the active SDK unifying Google AI Studio and Vertex AI access. Both packages coexist on PyPI.
What it is
google-genai is Google's current Python SDK for the Gemini family of models, unifying access to both the Google AI Studio API and Vertex AI. It supports text, multimodal input (images, video, audio, PDF), chat sessions, function calling, embeddings, context caching, and the File API for large uploads. Import as from google import genai.
The predecessor google-generativeai was Google's first-generation SDK for AI Studio only. It still works for existing code but receives only maintenance updates; new features land on google-genai. New projects should start with google-genai; existing google-generativeai code continues to run but is encouraged to migrate.
Install
pip install google-generativeai
Output: installs the legacy SDK — import google.generativeai as genai
pip install google-genai
Output: installs the newer unified SDK — from google import genai
uv add google-generativeai
Output: dependency resolved + added to pyproject.toml
poetry add google-genai
Output: updated lockfile + virtualenv install for the successor SDK
Versioning & Python support
google-generativeaiis on the0.xline and entering maintenance — bug fixes and minor model-id additions, but no major new features. Pin and avoid surprises.google-genaiis the actively-developed SDK, also pre-1.0; expect minor API churn until 1.0 ships. Treat as alpha for production.- Both support Python
3.9+. - The Gemini model IDs evolve faster than either SDK —
gemini-1.5-pro,gemini-1.5-flash,gemini-2.0-flash,gemini-2.5-*. Sometimes a model is only callable via the newer SDK or via Vertex AI; check model availability per-SDK before betting on a feature. - Old name
google.ai.generativelanguageis the underlying gRPC client; ignore it unless you're writing a custom transport.
Package metadata
- Maintainer: Google (the
google-geminiandgoogleapisGitHub orgs) - Project home (legacy): github.com/google-gemini/generative-ai-python
- Project home (successor): github.com/googleapis/python-genai
- Docs: ai.google.dev
- PyPI: pypi.org/project/google-generativeai + pypi.org/project/google-genai
- License: Apache-2.0
- Governance: vendor-maintained, open development on GitHub
- First released:
google-generativeaiin 2023;google-genaiin late 2024
Optional dependencies & extras
Both SDKs are relatively self-contained Python packages over Google's gRPC / REST APIs. Core deps include:
google-api-coregoogle-auth— Application Default Credentials (ADC) flowprotobufpydantic(newer SDK)httpx(newer SDK; the legacy SDK leans onrequests)tqdmfor streaming progress
Notable extras and companions:
- For Vertex AI authentication and quota, the legacy SDK requires
google-cloud-aiplatformseparately; the newgoogle-genaiSDK has avertexai=Trueclient flag and bundles what it needs. - File uploads (PDFs, videos) use the Gemini File API — both SDKs handle it natively, no extra package.
- Async clients exist in both (
genai.GenerativeModel(...).generate_content_async()legacy;client.aioin the successor).
Alternatives
| Package | Trade-off |
|---|---|
openai | The dominant LLM SDK; comparable feature set; different model family. |
anthropic | Claude SDK; cleanest tool-use API; no native multimodal video yet. |
mistralai | Open-weight + hosted Mistral models. Smaller multimodal surface. |
cohere | Embeddings + reranker focus, plus chat. |
google-cloud-aiplatform | Vertex AI SDK — broader Google Cloud surface (deployment, batch prediction, tuning). Heavier. |
litellm | Provider-agnostic OpenAI-compatible wrapper over many SDKs, Gemini included. |
Common gotchas
- Package rename / dual existence.
google-generativeai(legacy,import google.generativeai as genai) andgoogle-genai(new,from google import genai) are different packages with overlapping namespaces. Mixing both in the same project causes confusing import shadows. Pick one. - Auth flows differ. AI Studio key (
GOOGLE_API_KEYenv) is one path; Application Default Credentials viagcloud auth application-default loginis another (Vertex AI). The legacy SDK is AI-Studio-only; the new SDK supports both with avertexai=Trueflag. - Safety-setting JSON shapes differ between SDKs and over time. The enum names (
HARM_CATEGORY_HARASSMENT, etc.) and threshold strings have shifted; copy-pasting safety config across SDK versions frequently silently fails to apply. - Quota limits per model. Free-tier AI Studio keys have low per-minute limits and rate-limit with
ResourceExhausted— wrap calls withtenacity/backoff. - Function-calling response shapes differ. Legacy returns a
partslist withfunction_callitems; new SDK returns afunction_callsaccessor. Tutorials targeting one don't run on the other. - File API uploads have a TTL. Files uploaded for context expire after ~48 hours. Re-upload for long-lived sessions.
- Streaming behaviour. Both SDKs stream chunks, but the chunk granularity depends on the model and region — don't rely on chunk boundaries being words or sentences.
Evaluation & observability
Gemini is a hosted model — evaluation patterns are the same as for any LLM API: regression suites, A/B tests, and trace-based observability.
langsmithtraces with@traceabledecorators capture every call, latency, and token count.langchain.evaluation/ragas/deepevalall support Gemini via the LangChainChatGoogleGenerativeAIwrapper.- Vertex AI Model Garden ships its own evaluation tooling for fine-tuned Gemini variants.
- Per-region latency. Different regions have different p99 latency profiles; benchmark before locking in.
- Token counts. Gemini's
count_tokensAPI is the authoritative source for budget tracking; tokenizer guesses from other libraries diverge.
Real-world recipes
The recipes below cover the most common patterns — function calling, multimodal input, embeddings, and streaming — using the newer google-genai SDK where possible.
Recipe: function calling with the new SDK
from google import genai
from google.genai import types
client = genai.Client(api_key="...")
tools = [types.Tool(function_declarations=[types.FunctionDeclaration(
name="get_weather",
description="Get current weather for a city.",
parameters=types.Schema(
type=types.Type.OBJECT,
properties={"city": types.Schema(type=types.Type.STRING)},
required=["city"],
),
)])]
resp = client.models.generate_content(
model="gemini-2.5-flash",
contents="What's the weather in Berlin?",
config=types.GenerateContentConfig(tools=tools, temperature=0),
)
for call in resp.function_calls:
print(call.name, dict(call.args))
Output: get_weather {'city': 'Berlin'} — the model picked the tool and filled the schema.
Recipe: multimodal — image + text
from google import genai
from pathlib import Path
client = genai.Client(api_key="...")
img = client.files.upload(file=Path("./diagram.png"))
resp = client.models.generate_content(
model="gemini-2.5-pro",
contents=["Describe this architecture diagram in three bullets.", img],
)
print(resp.text)
Output: descriptive bullets. The File API handles uploads >20 MB; small images can be passed inline as bytes.
Recipe: streaming chat
chat = client.chats.create(model="gemini-2.5-flash")
for chunk in chat.send_message_stream("Explain RAG in two paragraphs."):
print(chunk.text, end="", flush=True)
Output: tokens stream in to stdout; the chat retains history for follow-ups.
Recipe: structured output via response schema
from pydantic import BaseModel
class Movie(BaseModel):
title: str
year: int
rating: float
resp = client.models.generate_content(
model="gemini-2.5-flash",
contents="Give me three classic sci-fi films as JSON.",
config={"response_mime_type": "application/json", "response_schema": list[Movie]},
)
movies = [Movie.model_validate(m) for m in resp.parsed]
Output: parsed Pydantic instances — no manual JSON extraction.
Recipe: vision QA over a PDF
Gemini's File API accepts PDFs directly — the model reads layout and text together, no separate extraction step required.
import time
from pathlib import Path
from google import genai
client = genai.Client(api_key="...")
pdf = client.files.upload(file=Path("./contract.pdf"))
# Wait for processing (files start ACTIVE for small docs; large files queue)
while pdf.state.name != "ACTIVE":
time.sleep(1)
pdf = client.files.get(name=pdf.name)
resp = client.models.generate_content(
model="gemini-2.5-pro",
contents=[
"Summarise this contract, listing each party's obligations.",
pdf,
],
)
print(resp.text)
Output: structured summary with party-by-party obligations; cheaper than running OCR + summarisation separately.
Recipe: long-context with cached content
For prompts with a large stable prefix (system instructions, retrieved corpus), context caching reduces both cost and latency.
from google.genai import types
# Create cache once
cache = client.caches.create(
model="gemini-2.5-flash",
config=types.CreateCachedContentConfig(
display_name="company-handbook",
contents=[handbook_content], # large stable text
system_instruction="Answer using only the cached handbook.",
ttl="3600s",
),
)
# Reuse across many calls
resp = client.models.generate_content(
model="gemini-2.5-flash",
contents="What is our PTO policy?",
config=types.GenerateContentConfig(cached_content=cache.name),
)
Output: subsequent calls skip the cached prefix tokens — significant cost savings on repeated chat over the same corpus.
Recipe: embedding pipeline
texts = ["Linux pipes", "Python decorators", "TLS handshake"]
result = client.models.embed_content(
model="text-embedding-004",
contents=texts,
)
import numpy as np
vectors = np.array([e.values for e in result.embeddings])
print(vectors.shape)
Output: shape (3, 768) — standard semantic-search vectors.
Cost & rate-limit management
Gemini's cost story is generally lower than peer providers, but free-tier limits are aggressive. Production traffic should use a paid tier with explicit quota management.
- Choose the right Gemini tier. Flash models are ~10-25× cheaper than Pro; Nano runs on-device. Pick the cheapest model that meets quality.
- Free-tier quotas. AI Studio free keys impose tight per-minute and per-day limits. Migrate to a billed key for anything beyond prototyping.
tenacity/backofffor retries.ResourceExhaustederrors are normal under load; retry with jittered exponential backoff (3-5 attempts).- Context caching. Long stable prefixes can be cached via the
cachedContentsAPI — significant savings for chat with a fixed system prompt. - Batch endpoints. Gemini supports asynchronous batch generation at ~50% discount for non-latency-sensitive workloads; use when feasible.
- File API TTL. Uploaded files expire (~48 hours). Don't re-upload the same file per request — cache the file reference and refresh on TTL expiry.
- Streaming reduces perceived latency but does not reduce total tokens; for cost control, set
max_output_tokens. - Per-region pricing. Vertex AI charges differ by region; AI Studio is flat-priced.
Version migration guide
The big migration is google-generativeai → google-genai. Both SDKs coexist on PyPI; new projects should default to google-genai.
| Aspect | google-generativeai (legacy) | google-genai (successor) |
|---|---|---|
| Import | import google.generativeai as genai | from google import genai |
| Construction | genai.configure(api_key=...) + genai.GenerativeModel(...) | client = genai.Client(api_key=...) |
| Generation | model.generate_content(...) | client.models.generate_content(model=..., contents=...) |
| Chat | model.start_chat() | client.chats.create(model=...) |
| Vertex AI | Requires google-cloud-aiplatform separately | genai.Client(vertexai=True, project=..., location=...) |
| Function-calling return | parts[].function_call list | response.function_calls accessor |
| Async | model.generate_content_async(...) | client.aio.models.generate_content(...) |
Migration discipline:
- Don't run both in the same project — namespace collisions are confusing.
- Update
safety_settingsshape when migrating — enum names and threshold strings have changed across both SDKs. - Model IDs are SDK-agnostic but availability isn't — verify model availability in your target SDK before assuming.
- Hedge: minor version churn in
google-genaiis ongoing; pin bothgoogle-genaiand the Gemini model IDs in production.
Troubleshooting common errors
ImportError: cannot import name 'genai'— installed the wrong package.from google import genairequiresgoogle-genai;import google.generativeai as genairequiresgoogle-generativeai.ResourceExhausted: 429— quota / rate-limit. Add exponential backoff, lower QPS, or upgrade to a billed key.PermissionDenied: 403— API key invalid, or Vertex AI path with missing ADC. Rungcloud auth application-default login.InvalidArgument: 400on multimodal calls — file format unsupported or file exceeded inline-bytes limit (~20 MB). Upload via the File API instead.- Function calling returns text, not a call. Model decided not to invoke; check the prompt for ambiguity or force a tool via
tool_config. - Safety filters block legitimate content. Adjust
safety_settingsthresholds — but understand what you're loosening; defaults are conservative for a reason. - File API references expire. Files have a TTL; re-upload after expiry.
- Streaming chunks arrive coarse. Chunk granularity depends on region and load; do not rely on chunk == word boundaries.
Security considerations
- API key storage. Treat Gemini API keys like any other production secret — environment vars, secret managers, never in source.
- ADC vs API key. For Vertex AI, ADC (Application Default Credentials) is the right path in GCP-hosted environments — no long-lived keys.
- Safety settings are not content filters for your app. They block obvious abuse categories at the API; your application still needs output review for domain-specific risks.
- PII in prompts. Hosted inference means PII transits Google's infrastructure. Check your regulatory posture; consider Vertex AI's regional endpoints for data residency.
- Function-call abuse. Tool descriptions are part of the prompt; an attacker controlling user input can prompt-inject the model into calling tools they shouldn't. Validate tool args before execution.
- File API uploads. Uploaded files are accessible by the project's API key for the file's TTL; treat the API key as having read access to all uploaded files.
- Key rotation. Rotate on staff offboarding; revoke from the AI Studio / Cloud Console UI.
Production deployment
google-genai is a thin Python wrapper over Google's gRPC / REST APIs — production deployment is mostly secret management and concurrency control.
- Client reuse.
genai.Client(...)is thread-safe; create once per process and reuse. Repeated construction has connection overhead. - Concurrency. The client uses
httpx(async) orrequests(sync) under the hood — set sensible concurrency limits to respect provider QPS quotas. - Async paths. Use
client.aio.*from async code (FastAPI,aiohttp). Mixing sync + async in the same handler block ties up the event loop. - Region selection (Vertex AI). Set the
locationparameter explicitly; default may not match your data-residency requirements. - Retry policy. Wrap calls with
tenacity— retry onResourceExhausted,Internal,Unavailable. Do not retry onInvalidArgument. - Health checks. A
/healthendpoint that runs a trivial 1-token generation catches API-key revocation faster than process-level checks.
Performance tuning
Gemini's models cover a wide latency/quality range; performance tuning is largely model selection plus standard API-client hygiene.
- Model selection. Flash variants are 5-10× faster than Pro at smaller quality cost; Nano runs on-device with negligible cost but limited reasoning depth.
max_output_tokenseverywhere. Unbounded output means longer p99 latency and larger bills.- Streaming for interactive UX. First-token latency matters more than total — stream where possible.
- Connection reuse. A single
genai.Clientinstance shares an HTTP session; constructing per-request adds connection setup latency. - Async client (
client.aio.*) under high concurrency. Sync clients block the event loop under FastAPI; async clients don't. - Batch endpoints. For non-latency-sensitive workloads, the batch API at half-price beats real-time calls.
- Context caching for stable prompts. Long stable prefixes (system prompts, retrieved corpora) cache server-side; reuse them.
- Inline vs File API. Small images inline avoid an extra round-trip; large or repeated files belong in the File API.
Multi-provider patterns
When Gemini is one of several models in your stack, the integration patterns are similar to other providers.
- LiteLLM routes between Gemini, OpenAI, Anthropic, and others behind one OpenAI-compatible HTTP API. Useful for centralised quota and key management.
- LangChain
init_chat_model("gemini-2.5-flash", model_provider="google_genai")abstracts Gemini behind the sameChatModelinterface as other providers. - Provider failover.
model.with_fallbacks([backup])in LangChain (or hand-rolled try/except) routes around Gemini outages by falling through to OpenAI / Anthropic. - Tokenizer parity. Gemini's tokenizer isn't
tiktoken-compatible — for cross-provider token budgeting, use the provider-nativecount_tokensAPI. - Vertex AI vs AI Studio. Same models, different billing / quota / auth.
google-genailets you switch with a constructor flag.
When to choose which Gemini SDK
The dual-SDK situation is real and persistent — picking the right one matters.
| Need | Pick |
|---|---|
| New project, AI Studio API key | google-genai |
| New project, Vertex AI auth | google-genai (with vertexai=True) |
Existing google.generativeai-based code that works | Stay on google-generativeai for now; migration is not urgent |
| Tutorials and Quickstarts from 2024 | Most still use google-generativeai; modern blog posts increasingly use google-genai |
| Vertex AI tuning jobs, batch prediction | google-cloud-aiplatform — the older Vertex SDK has more deployment surface than google-genai |
| Strict pin / minimum churn | google-generativeai is in maintenance — fewer surprise API changes |
| Latest features (model GA, tools) | google-genai gets new features first |
Practical tip: check the model availability page before assuming a model works in both SDKs — sometimes a new model is gated to one path.
When NOT to use this
- You only need OpenAI-compatible HTTP. LiteLLM exposes Gemini via OpenAI-compatible endpoints — skip the SDK entirely.
- You need deep GCP integration.
google-cloud-aiplatform(Vertex SDK) exposes broader Vertex features (tuning jobs, batch prediction, model deployment) thatgoogle-genaidoes not. - You're building strictly for Anthropic/OpenAI. Provider abstraction with a one-provider stack is just dead code.
- Single-call utilities.
curl+ the REST API is sometimes simpler than the SDK for one-off scripts. - Air-gapped environments. Gemini is a hosted API. For on-prem, use a local model via
transformersor vLLM instead.
Ecosystem integrations
| Layer | Integration |
|---|---|
| Frameworks | langchain-google-genai and langchain-google-vertexai wrap both SDKs. LlamaIndex has a first-party Gemini integration. |
| Multi-provider | litellm exposes Gemini via OpenAI-compatible HTTP. |
| Observability | langsmith, OpenTelemetry (OpenInference Gemini instrumentation) trace calls. |
| Authentication | google-auth for ADC flows; AI Studio API key for simpler setups. |
| File / multimodal | The Gemini File API is built into both SDKs; no extra package. PDFs, images, video, audio supported. |
| Vertex extensions | google-cloud-aiplatform adds tuning jobs, batch prediction, model deployment to Vertex AI. |
| Tools | Function calling integrates with Pydantic schemas; LangChain and LlamaIndex tool abstractions both work. |
See also
- AI: google-generativeai — generation, multimodal, chat, function calling
- Concept: api — client SDK design
- Concept: http — underlying transport