cheat sheet
google-generativeai
Call Google's Gemini models from Python for text, multimodal, streaming, chat, function calling, and embeddings. Covers the genai SDK, safety settings, file API, and async usage.
google-generativeai — Gemini SDK
What it is
google-generativeai is Google's official Python SDK for the Gemini family of models — Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 2.0, and more. It provides a synchronous and asynchronous client for text generation, multimodal input (text, images, video, audio, PDF), chat sessions, function calling, embeddings, and the File API for uploading large assets. The SDK handles authentication, retries, and response streaming.
Install
pip install google-generativeai
Output: (none — exits 0 on success)
Quick example
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Explain gradient descent in two sentences.")
print(response.text)
Output:
Gradient descent is an optimization algorithm that iteratively adjusts model parameters
by moving them in the direction that reduces the loss function. Each step is proportional
to the negative gradient of the loss with respect to the parameters.
When / why to use it
- Accessing Gemini models directly without a framework layer (LangChain, etc.).
- Multimodal tasks: analysing images, PDFs, or videos with a single API call.
- Long-context tasks — Gemini 1.5 Pro supports up to 2M tokens.
- Streaming responses for chat interfaces.
- Embedding text for semantic search or clustering.
- Function calling / tool use to connect the model to external APIs.
Common pitfalls
API key exposure — never hard-code
api_key=in source files. Always load from environment variables or a secrets manager. TheGEMINI_API_KEYvariable is the conventional name.
Safety blocks — Gemini models return a
BlockedPromptExceptionor a response withprompt_feedback.block_reasonwhen content is filtered. Always checkresponse.prompt_feedbackbefore accessingresponse.text.
response.textraises if no candidates — if all candidates are blocked or empty, accessingresponse.textraisesValueError. Useresponse.candidatesto inspect individual results.
File API files expire — files uploaded via
genai.upload_file()are deleted after 48 hours. Store the file URI if you need to reuse it across sessions.
Use
gemini-1.5-flashfor low-latency tasks andgemini-1.5-profor complex reasoning or very long contexts.flashis 10× cheaper and faster;prohandles nuance better.
Set
generation_config={"response_mime_type": "application/json"}to force structured JSON output without a schema — faster than function calling for simple extraction.
Richer example — multimodal image analysis
import google.generativeai as genai
import PIL.Image
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
image = PIL.Image.open("chart.png")
response = model.generate_content([
image,
"Describe the key trends visible in this chart in bullet points.",
])
print(response.text)
Output:
• Revenue grew 42% YoY from Q1 2025 to Q1 2026.
• North America remains the largest region at 58% of total.
• Asia-Pacific showed the steepest growth trajectory at +67% YoY.
• Operating margins compressed slightly from 31% to 28%.
configure() and model selection
genai.configure sets the API key (and optionally a custom transport or proxy) for all subsequent calls in the process. Models are instantiated per-task — Gemini 1.5 Flash is the default workhorse.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
# List available models
for m in genai.list_models():
if "generateContent" in m.supported_generation_methods:
print(m.name)
Output (sample):
models/gemini-1.5-flash
models/gemini-1.5-flash-8b
models/gemini-1.5-pro
models/gemini-2.0-flash
models/gemini-2.0-pro
generation_config — controlling output
GenerationConfig wraps all sampling parameters and is passed at model construction or per-call.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel(
"gemini-1.5-flash",
generation_config=genai.GenerationConfig(
temperature=0.4,
top_p=0.95,
top_k=40,
max_output_tokens=512,
stop_sequences=["END"],
),
)
response = model.generate_content("Write a haiku about distributed systems.")
print(response.text)
Output:
Nodes share a secret,
Latency hides in the gap—
Consensus is slow.
Streaming
Stream responses token-by-token using stream=True. Iterate over response to yield GenerateContentResponse chunks.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content(
"Explain the CAP theorem in detail.",
stream=True,
)
for chunk in response:
print(chunk.text, end="", flush=True)
print() # final newline
Chat sessions — start_chat
start_chat creates a stateful session that accumulates conversation history automatically. Each call to send_message appends the exchange to the session's history.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
chat = model.start_chat(history=[
{"role": "user", "parts": ["You are a terse database expert."]},
{"role": "model", "parts": ["Understood. Ask me anything about databases."]},
])
response = chat.send_message("What is a covering index?")
print(response.text)
response = chat.send_message("When should I avoid one?")
print(response.text)
# Inspect the full history
for msg in chat.history:
print(f"{msg.role}: {msg.parts[0].text[:60]}...")
Output:
A covering index includes all columns a query needs, eliminating the table lookup entirely.
Avoid them when the index size is large and write throughput matters—covering indexes slow INSERT/UPDATE.
user: You are a terse database expert....
model: Understood. Ask me anything about databases....
user: What is a covering index?...
model: A covering index includes all columns a query needs...
Function calling
Gemini's function calling lets the model invoke Python functions by emitting structured JSON. Define tools as Python functions with type-annotated docstrings, pass them to the model, and execute the calls the model requests.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
def get_current_temperature(city: str) -> dict:
"""Return the current temperature for the given city."""
# Stub — replace with a real weather API call
data = {"London": 14, "Tokyo": 22, "New York": 18}
return {"city": city, "temperature_celsius": data.get(city, 20)}
model = genai.GenerativeModel(
"gemini-1.5-flash",
tools=[get_current_temperature],
)
response = model.generate_content("What's the temperature in London right now?")
# The model may request a function call
for part in response.parts:
if fn := part.function_call:
result = get_current_temperature(**dict(fn.args))
print("Called:", fn.name, dict(fn.args))
print("Result:", result)
Output:
Called: get_current_temperature {'city': 'London'}
Result: {'city': 'London', 'temperature_celsius': 14}
Safety settings
Safety settings control how aggressively the model filters content across four harm categories. Each category accepts a threshold from BLOCK_NONE to BLOCK_LOW_AND_ABOVE.
import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel(
"gemini-1.5-flash",
safety_settings={
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH,
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
)
response = model.generate_content("Explain how vaccines work.")
print(response.prompt_feedback) # shows block_reason if filtered
print(response.text)
Embeddings
embed_content converts text to a dense vector for semantic search, clustering, or retrieval-augmented generation.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
texts = [
"The capital of France is Paris.",
"Python is a high-level programming language.",
"The Eiffel Tower is located in Paris.",
]
result = genai.embed_content(
model="models/text-embedding-004",
content=texts,
task_type="retrieval_document",
)
embeddings = result["embedding"]
print(f"Vectors: {len(embeddings)}, dimensions: {len(embeddings[0])}")
# Cosine similarity between first and third (both about Paris)
import numpy as np
def cosine(a, b):
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
print(f"Paris ↔ Eiffel: {cosine(embeddings[0], embeddings[2]):.4f}")
print(f"Paris ↔ Python: {cosine(embeddings[0], embeddings[1]):.4f}")
Output:
Vectors: 3, dimensions: 768
Paris ↔ Eiffel: 0.9231
Paris ↔ Python: 0.4102
File API — uploading large assets
The File API lets you upload files (images, video, audio, PDF) to Google's servers and reference them in prompts by URI. Files are automatically deleted after 48 hours.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
# Upload a PDF
file = genai.upload_file("report.pdf", mime_type="application/pdf")
print(f"Uploaded: {file.uri}")
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content([
file,
"Summarise the key findings from this report in three bullet points.",
])
print(response.text)
# List uploaded files
for f in genai.list_files():
print(f.name, f.uri, f.expiration_time)
# Delete when done
genai.delete_file(file.name)
Async client
Use genai.GenerativeModel with await for async applications (FastAPI, asyncio pipelines).
import asyncio
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
async def ask(prompt: str) -> str:
response = await model.generate_content_async(prompt)
return response.text
async def main():
prompts = [
"What is a transformer architecture?",
"What is attention mechanism?",
"What is tokenisation?",
]
results = await asyncio.gather(*[ask(p) for p in prompts])
for prompt, result in zip(prompts, results):
print(f"Q: {prompt}\nA: {result[:80]}...\n")
asyncio.run(main())
Quick reference
| Task | Code |
|---|---|
| Configure | genai.configure(api_key=os.environ["GEMINI_API_KEY"]) |
| Text generation | model.generate_content("prompt") |
| Streaming | model.generate_content("prompt", stream=True) → iterate |
| Chat session | chat = model.start_chat() then chat.send_message("msg") |
| Multimodal | model.generate_content([image, "describe this"]) |
| Upload file | genai.upload_file("path", mime_type="...") |
| Function calling | pass tools=[fn] to GenerativeModel |
| Embeddings | genai.embed_content(model="...", content=texts) |
| JSON output | generation_config={"response_mime_type": "application/json"} |
| Async | await model.generate_content_async("prompt") |
| List models | genai.list_models() |
| List files | genai.list_files() |
| Safety settings | safety_settings={HarmCategory.X: HarmBlockThreshold.Y} |