cheat sheet

google-generativeai

Call Google's Gemini models from Python for text, multimodal, streaming, chat, function calling, and embeddings. Covers the genai SDK, safety settings, file API, and async usage.

updated 04-27-2026

google-generativeai — Gemini SDK

What it is

google-generativeai is Google's official Python SDK for the Gemini family of models — Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 2.0, and more. It provides a synchronous and asynchronous client for text generation, multimodal input (text, images, video, audio, PDF), chat sessions, function calling, embeddings, and the File API for uploading large assets. The SDK handles authentication, retries, and response streaming.

Install

bash

pip install google-generativeai

Output: (none — exits 0 on success)

Quick example

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Explain gradient descent in two sentences.")
print(response.text)

Output:

text

Gradient descent is an optimization algorithm that iteratively adjusts model parameters
by moving them in the direction that reduces the loss function. Each step is proportional
to the negative gradient of the loss with respect to the parameters.

When / why to use it

Accessing Gemini models directly without a framework layer (LangChain, etc.).
Multimodal tasks: analysing images, PDFs, or videos with a single API call.
Long-context tasks — Gemini 1.5 Pro supports up to 2M tokens.
Streaming responses for chat interfaces.
Embedding text for semantic search or clustering.
Function calling / tool use to connect the model to external APIs.

Common pitfalls

API key exposure — never hard-code api_key= in source files. Always load from environment variables or a secrets manager. The GEMINI_API_KEY variable is the conventional name.

Safety blocks — Gemini models return a BlockedPromptException or a response with prompt_feedback.block_reason when content is filtered. Always check response.prompt_feedback before accessing response.text.

response.text raises if no candidates — if all candidates are blocked or empty, accessing response.text raises ValueError. Use response.candidates to inspect individual results.

File API files expire — files uploaded via genai.upload_file() are deleted after 48 hours. Store the file URI if you need to reuse it across sessions.

Use gemini-1.5-flash for low-latency tasks and gemini-1.5-pro for complex reasoning or very long contexts. flash is 10× cheaper and faster; pro handles nuance better.

Set generation_config={"response_mime_type": "application/json"} to force structured JSON output without a schema — faster than function calling for simple extraction.

Richer example — multimodal image analysis

python

import google.generativeai as genai
import PIL.Image
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel("gemini-1.5-flash")
image = PIL.Image.open("chart.png")

response = model.generate_content([
    image,
    "Describe the key trends visible in this chart in bullet points.",
])
print(response.text)

Output:

text

• Revenue grew 42% YoY from Q1 2025 to Q1 2026.
• North America remains the largest region at 58% of total.
• Asia-Pacific showed the steepest growth trajectory at +67% YoY.
• Operating margins compressed slightly from 31% to 28%.

configure() and model selection

genai.configure sets the API key (and optionally a custom transport or proxy) for all subsequent calls in the process. Models are instantiated per-task — Gemini 1.5 Flash is the default workhorse.

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# List available models
for m in genai.list_models():
    if "generateContent" in m.supported_generation_methods:
        print(m.name)

Output (sample):

text

models/gemini-1.5-flash
models/gemini-1.5-flash-8b
models/gemini-1.5-pro
models/gemini-2.0-flash
models/gemini-2.0-pro

generation_config — controlling output

GenerationConfig wraps all sampling parameters and is passed at model construction or per-call.

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel(
    "gemini-1.5-flash",
    generation_config=genai.GenerationConfig(
        temperature=0.4,
        top_p=0.95,
        top_k=40,
        max_output_tokens=512,
        stop_sequences=["END"],
    ),
)

response = model.generate_content("Write a haiku about distributed systems.")
print(response.text)

Output:

text

Nodes share a secret,
Latency hides in the gap—
Consensus is slow.

Streaming

Stream responses token-by-token using stream=True. Iterate over response to yield GenerateContentResponse chunks.

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")

response = model.generate_content(
    "Explain the CAP theorem in detail.",
    stream=True,
)

for chunk in response:
    print(chunk.text, end="", flush=True)
print()  # final newline

Chat sessions — start_chat

start_chat creates a stateful session that accumulates conversation history automatically. Each call to send_message appends the exchange to the session's history.

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")

chat = model.start_chat(history=[
    {"role": "user",  "parts": ["You are a terse database expert."]},
    {"role": "model", "parts": ["Understood. Ask me anything about databases."]},
])

response = chat.send_message("What is a covering index?")
print(response.text)

response = chat.send_message("When should I avoid one?")
print(response.text)

# Inspect the full history
for msg in chat.history:
    print(f"{msg.role}: {msg.parts[0].text[:60]}...")

Output:

text

A covering index includes all columns a query needs, eliminating the table lookup entirely.

Avoid them when the index size is large and write throughput matters—covering indexes slow INSERT/UPDATE.

user: You are a terse database expert....
model: Understood. Ask me anything about databases....
user: What is a covering index?...
model: A covering index includes all columns a query needs...

Function calling

Gemini's function calling lets the model invoke Python functions by emitting structured JSON. Define tools as Python functions with type-annotated docstrings, pass them to the model, and execute the calls the model requests.

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

def get_current_temperature(city: str) -> dict:
    """Return the current temperature for the given city."""
    # Stub — replace with a real weather API call
    data = {"London": 14, "Tokyo": 22, "New York": 18}
    return {"city": city, "temperature_celsius": data.get(city, 20)}

model = genai.GenerativeModel(
    "gemini-1.5-flash",
    tools=[get_current_temperature],
)

response = model.generate_content("What's the temperature in London right now?")

# The model may request a function call
for part in response.parts:
    if fn := part.function_call:
        result = get_current_temperature(**dict(fn.args))
        print("Called:", fn.name, dict(fn.args))
        print("Result:", result)

Output:

text

Called: get_current_temperature {'city': 'London'}
Result: {'city': 'London', 'temperature_celsius': 14}

Safety settings

Safety settings control how aggressively the model filters content across four harm categories. Each category accepts a threshold from BLOCK_NONE to BLOCK_LOW_AND_ABOVE.

python

import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel(
    "gemini-1.5-flash",
    safety_settings={
        HarmCategory.HARM_CATEGORY_HATE_SPEECH:      HarmBlockThreshold.BLOCK_ONLY_HIGH,
        HarmCategory.HARM_CATEGORY_HARASSMENT:       HarmBlockThreshold.BLOCK_ONLY_HIGH,
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
)

response = model.generate_content("Explain how vaccines work.")
print(response.prompt_feedback)  # shows block_reason if filtered
print(response.text)

Embeddings

embed_content converts text to a dense vector for semantic search, clustering, or retrieval-augmented generation.

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

texts = [
    "The capital of France is Paris.",
    "Python is a high-level programming language.",
    "The Eiffel Tower is located in Paris.",
]

result = genai.embed_content(
    model="models/text-embedding-004",
    content=texts,
    task_type="retrieval_document",
)

embeddings = result["embedding"]
print(f"Vectors: {len(embeddings)}, dimensions: {len(embeddings[0])}")

# Cosine similarity between first and third (both about Paris)
import numpy as np
def cosine(a, b):
    a, b = np.array(a), np.array(b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

print(f"Paris ↔ Eiffel: {cosine(embeddings[0], embeddings[2]):.4f}")
print(f"Paris ↔ Python: {cosine(embeddings[0], embeddings[1]):.4f}")

Output:

text

Vectors: 3, dimensions: 768
Paris ↔ Eiffel: 0.9231
Paris ↔ Python: 0.4102

File API — uploading large assets

The File API lets you upload files (images, video, audio, PDF) to Google's servers and reference them in prompts by URI. Files are automatically deleted after 48 hours.

python

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# Upload a PDF
file = genai.upload_file("report.pdf", mime_type="application/pdf")
print(f"Uploaded: {file.uri}")

model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content([
    file,
    "Summarise the key findings from this report in three bullet points.",
])
print(response.text)

# List uploaded files
for f in genai.list_files():
    print(f.name, f.uri, f.expiration_time)

# Delete when done
genai.delete_file(file.name)

Async client

Use genai.GenerativeModel with await for async applications (FastAPI, asyncio pipelines).

python

import asyncio
import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")

async def ask(prompt: str) -> str:
    response = await model.generate_content_async(prompt)
    return response.text

async def main():
    prompts = [
        "What is a transformer architecture?",
        "What is attention mechanism?",
        "What is tokenisation?",
    ]
    results = await asyncio.gather(*[ask(p) for p in prompts])
    for prompt, result in zip(prompts, results):
        print(f"Q: {prompt}\nA: {result[:80]}...\n")

asyncio.run(main())

Quick reference

Task	Code
Configure	`genai.configure(api_key=os.environ["GEMINI_API_KEY"])`
Text generation	`model.generate_content("prompt")`
Streaming	`model.generate_content("prompt", stream=True)` → iterate
Chat session	`chat = model.start_chat()` then `chat.send_message("msg")`
Multimodal	`model.generate_content([image, "describe this"])`
Upload file	`genai.upload_file("path", mime_type="...")`
Function calling	pass `tools=[fn]` to `GenerativeModel`
Embeddings	`genai.embed_content(model="...", content=texts)`
JSON output	`generation_config={"response_mime_type": "application/json"}`
Async	`await model.generate_content_async("prompt")`
List models	`genai.list_models()`
List files	`genai.list_files()`
Safety settings	`safety_settings={HarmCategory.X: HarmBlockThreshold.Y}`