cheat sheet

autogen-agentchat

Package-level reference for the autogen-agentchat / autogen-core / autogen-ext family on PyPI plus the legacy pyautogen — install, rename history, versioning, and alternatives.

updated 05-31-2026

autogen-agentchat

What it is

autogen-agentchat is the high-level multi-agent SDK from the AutoGen v0.4 redesign — Microsoft's framework for building systems where multiple AI agents converse to complete tasks. In v0.4, AutoGen was split into a layered family of packages: a low-level message-passing runtime, a high-level agent/team SDK, and an extensions package for model-client and tool integrations.

A note on naming: the v0.2 monolith was published as pyautogen on PyPI. The v0.4 rewrite uses autogen-agentchat, autogen-core, and autogen-ext — the bare autogen name on PyPI has a complicated history and is not the canonical v0.4 install. Use the split packages.

Install

bash

pip install autogen-agentchat autogen-ext[openai]

Output: the standard v0.4 stack with OpenAI model client

bash

pip install autogen-agentchat autogen-ext[anthropic]

Output: v0.4 with Anthropic Claude support

bash

pip install autogen-agentchat autogen-ext[azure]

Output: v0.4 with Azure OpenAI

bash

uv add autogen-agentchat "autogen-ext[openai]"

Output: dependencies resolved + added to pyproject.toml

bash

pip install pyautogen     # ← legacy v0.2 only; do not mix with v0.4

Output: installs the older monolith — different API entirely

Versioning & Python support

v0.2 (pyautogen) — original AutoGen, single package, 0.2.x line. Still receives bug fixes but feature work has moved on. Tutorials from 2023–early-2024 target this API.
v0.4+ (autogen-agentchat family) — current redesigned SDK on the 0.4.x / 0.5.x line (as of late 2025). Different abstractions: AssistantAgent, UserProxyAgent, RoundRobinGroupChat, Swarm, the async TeamWorkbench runtime.
The two APIs are NOT compatible — code written for pyautogen does not run on autogen-agentchat. The framework redesign was a clean break.
Python 3.10+ for v0.4 (the runtime relies on modern asyncio features). v0.2 supported 3.8+.
Pre-1.0 on both lines — pin tightly.
Microsoft Research's experimental forks live under microsoft/autogen on GitHub but a related fork lives at ag2ai/ag2 — released as ag2 on PyPI. They originated from the same codebase and have diverged. If a tutorial references ag2, it's the community fork, not the Microsoft mainline.

Package metadata

Maintainer: Microsoft (the microsoft/autogen repo)
Project home: github.com/microsoft/autogen
Docs: microsoft.github.io/autogen
PyPI: pypi.org/project/autogen-agentchat
License: MIT (Apache-2.0 / CC-BY-4.0 in places — check each subpackage)
Governance: Microsoft Research + open contribution; the ag2 fork is community-led
First released: v0.2 in 2023; v0.4 family in late 2024
Downloads: millions per month across the family

Optional dependencies & extras

The v0.4 family is layered. Pick packages from the appropriate layer:

Package	Layer	Purpose
`autogen-core`	Foundation	Async message-passing runtime, agent base classes. Used by everything else.
`autogen-agentchat`	High-level	The "AutoGen API" most users want — assistants, group chats, teams.
`autogen-ext[openai]`	Extensions	OpenAI / Azure OpenAI `ChatCompletionClient`
`autogen-ext[anthropic]`	Extensions	Anthropic Claude model client
`autogen-ext[azure]`	Extensions	Azure-specific clients
`autogen-ext[docker]`	Extensions	Docker-backed code execution sandbox
`autogen-ext[web-surfer]`	Extensions	Headless-browser web-browsing tool
`autogen-ext[file-surfer]`	Extensions	File-system browsing tool
`autogen-studio`	UI	Optional graphical workbench for building/debugging teams

The autogen-ext package is the catch-all for tool and model-client integrations — install with one or more extras corresponding to what you need.

Alternatives

Package	Trade-off
`crewai`	Roles + tasks + crews abstraction; YAML-driven; more opinionated. Pythonic alternative.
`langgraph`	Stateful graph-based agents from the LangChain team. Lower-level than agentchat; finer control.
`llama-index` (agents)	Agent abstractions on top of LlamaIndex's retrieval stack. RAG-first orientation.
`swarm` (OpenAI)	OpenAI's experimental lightweight multi-agent library. Smaller, OpenAI-only.
`openai-agents`	OpenAI's newer Agents SDK. Tight OpenAI integration.
Custom orchestration	A loop + provider SDK is often enough for simple 2-agent setups; frameworks earn their keep at 4+ agents.

Common gotchas

v0.2 vs v0.4 are different products. Code from a 2023 AutoGen tutorial does not run on autogen-agentchat. Check the import line — from autogen import ... is v0.2 (pyautogen); from autogen_agentchat.agents import ... is v0.4.
Multi-agent message passing needs the model_client interface in v0.4. You don't pass llm_config dicts anymore — you instantiate a ChatCompletionClient from autogen-ext and hand it to the agents.
Microsoft research repos vs Microsoft GitHub mainline. Earlier AutoGen experiments lived under microsoft/autogen research branches; the current mainline is the v0.4 rewrite. Tutorials and blog posts often reference an older branch state — pin to the latest stable.
ag2 fork on PyPI. The ag2 package is a community fork. Mostly compatible with Microsoft mainline but diverges over time; don't install both.
Async-first runtime. v0.4 agents run on asyncio. Mixing sync run() and async run_stream() patterns inside a single team setup leads to "coroutine was never awaited" warnings or deadlocks.
Code-execution sandbox is opt-in. autogen-ext[docker] installs the Docker-backed executor, but Docker must actually be running on the host. Without it, code execution falls back to a (less safe) local-process executor — explicitly choose.
Tool registration shape changed. v0.2 tools were registered via decorators on a UserProxyAgent; v0.4 uses Python callables passed into AssistantAgent(tools=[...]). Don't paste v0.2 tool snippets into v0.4 agents.
AutoGen Studio is a separate UI, not a runtime requirement. Skip it for CI/headless deployments.

Performance tuning

Multi-agent throughput is dominated by total LLM-call latency × turn count. The levers are:

Reduce turn count. A well-prompted system message and a clear termination condition cut turns dramatically. The default "max_turns=20" is a ceiling, not a target.
Parallel tool calls. If multiple tool calls in one turn are independent, batch them — model providers increasingly support parallel function calls.
Lighter model for routing. A SelectorGroupChat dispatcher running on a small/fast model adds barely any latency vs a flagship model used everywhere.
Streaming for time-to-first-token. Use run_stream and surface partial messages to users; perceived latency improves even when total time doesn't.
Cold-start vs warm-start. Connection pool warm-up for the model client matters — keep clients alive across requests.
Tool latency. Slow tools (web scraping, database) dominate. Cache results aggressively; ratch result sizes.
Cancellation discipline. run_stream returns an iterator — break out early when an agent emits a clear "done" signal.

Evaluation & observability

Multi-agent runs are nearly impossible to debug without traces. Set up observability before you build the second agent.

Trace every model call + tool call. langsmith and OpenTelemetry both work — pick one and instrument the model client + tool wrappers.
Per-run metrics: total turns, total tokens, total cost, success / failure / abort outcome.
Per-agent metrics: turn count, average tokens per turn, time per turn — identifies the agent eating the budget.
Trajectory replay. Save the full message history for failed runs; replay locally to reproduce failures.
A/B on team composition. Swap agents in / out, run the same task suite, compare outcomes.
Custom evaluators. Did the final answer match a gold output? Did the team terminate on the expected condition? Standard LLM eval frameworks (DeepEval, ragas) work for end-of-run scoring; track per-turn metrics separately.

Real-world recipes

The v0.4 architecture organises around AssistantAgent, Tool, and team types (RoundRobinGroupChat, SelectorGroupChat, Swarm). Recipes below assume the modern stack.

Recipe: single assistant with tools

python

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"sunny, 22C in {city}"

async def main():
    model = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(
        name="weather_bot",
        model_client=model,
        tools=[get_weather],
        system_message="You answer weather questions concisely.",
    )
    await Console(agent.run_stream(task="What's the weather in Tokyo?"))

asyncio.run(main())

Output: the model issues a get_weather call, observes the result, replies with a one-sentence answer.

Recipe: round-robin group chat

python

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient

model = OpenAIChatCompletionClient(model="gpt-4o-mini")

writer = AssistantAgent("writer", model_client=model,
    system_message="You draft short marketing copy.")
critic = AssistantAgent("critic", model_client=model,
    system_message="You critique copy. When satisfied, say 'APPROVE'.")

team = RoundRobinGroupChat([writer, critic],
                           termination_condition=TextMentionTermination("APPROVE"),
                           max_turns=8)

Output: writer drafts, critic critiques, writer revises — terminates when critic says "APPROVE" or hits the max turn count.

Recipe: selector group chat with custom routing

python

from autogen_agentchat.teams import SelectorGroupChat

team = SelectorGroupChat(
    [coder, reviewer, tester],
    model_client=model,
    selector_prompt="Select the next agent based on the message. Return one of {participants}.",
    allow_repeated_speaker=False,
    max_turns=12,
)

Output: a model-driven dispatcher decides which agent speaks next based on the conversation state.

Recipe: handing off in a Swarm

Swarm (also from autogen-agentchat.teams) implements OpenAI-style agent hand-off — one agent transfers control to another based on the task. Useful when responsibility is hierarchical rather than collaborative.

python

from autogen_agentchat.teams import Swarm
from autogen_agentchat.conditions import HandoffTermination, MaxMessageTermination

triage = AssistantAgent(
    "triage", model_client=model,
    system_message="Classify the user request and hand off to billing or technical.",
    handoffs=["billing", "technical"],
)
billing   = AssistantAgent("billing",   model_client=model, system_message="Handle billing.")
technical = AssistantAgent("technical", model_client=model, system_message="Handle technical issues.")

team = Swarm(
    [triage, billing, technical],
    termination_condition=HandoffTermination("user") | MaxMessageTermination(10),
)

Output: triage transfers to the appropriate specialist; the swarm terminates when an agent hands back to "user" or hits the message ceiling.

Recipe: custom termination on token budget

python

from autogen_agentchat.conditions import TerminationCondition
from autogen_agentchat.messages import AgentEvent

class BudgetTermination(TerminationCondition):
    def __init__(self, max_tokens: int):
        self.budget = max_tokens
        self.spent = 0
    @property
    def terminated(self) -> bool:
        return self.spent >= self.budget
    async def __call__(self, messages):
        for m in messages:
            if hasattr(m, "models_usage") and m.models_usage:
                self.spent += m.models_usage.completion_tokens or 0
        return self.terminated

team = RoundRobinGroupChat([writer, critic], termination_condition=BudgetTermination(5000))

Output: team self-terminates after consuming 5000 completion tokens, even mid-conversation.

Recipe: code-execution sandbox

python

from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor
from autogen_agentchat.agents import CodeExecutorAgent

async with DockerCommandLineCodeExecutor(work_dir="./sandbox") as executor:
    coder = AssistantAgent("coder", model_client=model,
        system_message="You write Python. Wrap code in ```python blocks.")
    runner = CodeExecutorAgent("runner", code_executor=executor)
    team = RoundRobinGroupChat([coder, runner], max_turns=6)
    await Console(team.run_stream(task="Compute the 30th Fibonacci number."))

Output: coder proposes code, runner executes it in a Docker container, the result feeds back into the conversation.

Recipe: multi-provider agent

python

from autogen_ext.models.anthropic import AnthropicChatCompletionClient

claude = AnthropicChatCompletionClient(model="claude-sonnet-4-6")
gpt    = OpenAIChatCompletionClient(model="gpt-4o-mini")

planner   = AssistantAgent("planner", model_client=claude,
                            system_message="You plan multi-step tasks.")
executor  = AssistantAgent("executor", model_client=gpt,
                            system_message="You execute concrete steps.")
team = RoundRobinGroupChat([planner, executor], max_turns=10)

Output: Claude plans, GPT executes — each agent backed by a different provider.

Cost & rate-limit management

Multi-agent systems multiply costs by the number of turns × agents per turn. Treat the cost ceiling as an explicit design parameter, not an afterthought.

max_turns is mandatory. Without it, agents can ping-pong forever. The default termination conditions exist to bound runtime.
Cheap model for routing, expensive model for reasoning. A SelectorGroupChat dispatcher can run on a small model; the worker agents can use a flagship. Saves significant cost on dispatch.
Custom TerminationCondition. Build conditions on token spend, latency, or message content — short-circuit when the conversation stalls.
Tool-call discipline. Each tool round-trip is a model call. Combine related tools where possible; eg, return a struct, not a single value, per call.
Rate-limit by provider. Multiple agents sharing one provider can trip per-minute quotas. Spread across providers or use a LiteLLM proxy with team quotas.
Streaming run_stream does not reduce cost — but lets you cancel early if the conversation goes off-track.
Observability for spend. Trace every agent call via langsmith or OpenTelemetry; aggregate spend by team / agent.

Version migration guide

pyautogen (v0.2) → autogen-agentchat (v0.4+) is a clean break — different package, different API. The framework was redesigned around async message passing.

Aspect	v0.2 (`pyautogen`)	v0.4+ (`autogen-agentchat` family)
Import	`from autogen import AssistantAgent, UserProxyAgent`	`from autogen_agentchat.agents import AssistantAgent`
Sync vs async	Mostly sync (`agent.initiate_chat(...)`)	Async-first (`await team.run_stream(...)`)
LLM config	`llm_config={"model": "gpt-4", ...}` dict	`model_client=OpenAIChatCompletionClient(...)` from `autogen-ext`
Tools	`@function` decorator on the proxy	Python callables passed to `AssistantAgent(tools=[...])`
Group chat	`GroupChat` + `GroupChatManager`	`RoundRobinGroupChat`, `SelectorGroupChat`, `Swarm`
Code execution	Local subprocess by default	Explicit `DockerCommandLineCodeExecutor` or `LocalCommandLineCodeExecutor`
Termination	Implicit via `max_turns` only	Composable `TerminationCondition` objects
Streaming	Limited	First-class `run_stream` async iterator

Migration discipline:

The two libraries cannot share state — they're separate packages.
Translate llm_config dicts to ChatCompletionClient instances from autogen-ext.
Re-shape tools from @function decorators on the user proxy to plain Python callables passed to assistants.
Adopt async — asyncio.run(main()) at the top level becomes the norm.
ag2 is a community fork of mainline; check which one your tutorial references before pasting.
Hedge: specific symbol moves and signature changes within v0.4.x are best confirmed against the project's microsoft/autogen release notes — the redesign era saw multiple minor adjustments.

Troubleshooting common errors

ImportError: cannot import name 'AssistantAgent' from 'autogen' — that's v0.2 syntax against a v0.4 install (or vice-versa). Check which autogen-* packages are installed.
RuntimeError: This event loop is already running — calling async APIs from a sync context (or in Jupyter without nest_asyncio). Use await properly or asyncio.run.
coroutine was never awaited — calling team.run(...) without await. v0.4 APIs are async-first.
Code execution returns blank. Docker isn't running, or the executor's work_dir is unwritable. Check Docker; pick a writable path.
PermissionDenied: 403 from OpenAIChatCompletionClient — API key not set or scoped wrong; check OPENAI_API_KEY env.
Infinite loop. No termination condition was supplied. Add TextMentionTermination("DONE") or MaxMessageTermination(15).
Tool not being called. Tool signature must be typed (annotations on every parameter); without types, AutoGen can't generate the schema.
ag2 vs autogen confusion. They're forks — don't mix imports.

Production deployment

AutoGen multi-agent systems are usually deployed as backend workers rather than as user-facing services — the conversation pattern is too long-lived for typical HTTP request lifecycles.

Worker / queue pattern. Receive a task on a queue (SQS, Pub/Sub, Celery), run the team to completion, post the result. Don't run agents inside a synchronous HTTP handler.
Async runtime. autogen-core is async-first; the worker should use asyncio directly, not threads.
Persistent state. Agent state is in-memory by default. For long-running conversations, snapshot to durable storage between turns.
Sandbox code execution. Always use DockerCommandLineCodeExecutor in production; the local-process executor is unsafe with untrusted inputs.
Concurrency. One team per worker process. Spinning up additional teams within one process is possible but couples their failure modes.
Observability. Trace every agent + tool call. The fanout is high — without traces, debugging a failed team run is nearly impossible.
Cost ceilings. Set per-task budgets and abort runs that exceed them.

Security considerations

Code execution is the headline risk. Any CodeExecutorAgent can run arbitrary Python. Always sandbox via Docker; never expose to untrusted users.
Tool registration is eval() for the model. Tools that touch databases, file systems, or external services need allowlists and audit logging.
Prompt injection through messages. A malicious tool result, document, or user message can re-direct subsequent agents. Filter inputs and pin system prompts.
Secrets in messages. Agent conversations get logged everywhere. Never put API keys in system messages or user inputs.
Cross-agent secret leakage. If one agent has access to a secret (DB credential), avoid surfacing it in messages other agents see.
Model-client key handling. OpenAIChatCompletionClient(api_key=...) accepts keys directly — prefer env-based configuration so keys don't end up in tracebacks.
Container escapes. Docker code execution mitigates but does not eliminate escape risk — use rootless Docker or gVisor for hardened environments.
Replay risk. Saved conversation logs contain prompts, completions, and tool args — treat as sensitive.

Multi-provider patterns

The v0.4 design puts model selection behind the ChatCompletionClient interface — any agent can use any provider.

autogen-ext[openai] / [anthropic] / [azure] / [google] ship matching client classes. Same interface; swap construction.
LiteLLM proxy in front of an OpenAIChatCompletionClient pointing at the proxy's base URL gives you any provider behind one client class. Useful for centralised cost/quota control.
Per-agent provider choice. Different agents can use different providers — cheap dispatcher, expensive worker — without architectural changes.
Failover. AutoGen does not ship retry-across-providers natively; wrap the model client in a custom adapter that catches errors and falls back.
Token budgeting across providers. Different providers report tokens differently; aggregate via your observability layer.

Ecosystem integrations

Layer	Integrations
Model clients	`autogen-ext[openai]`, `[anthropic]`, `[azure]`, `[google]` cover the major hosted providers.
Tools	`autogen-ext[web-surfer]`, `[file-surfer]`, third-party tool repos. Tools are plain Python callables — anything you can wrap in a function works.
Code execution	`DockerCommandLineCodeExecutor`, `LocalCommandLineCodeExecutor`, `JupyterCodeExecutor` (via `autogen-ext`).
UI / Studio	`autogen-studio` — graphical workbench for building and debugging teams.
Observability	`langsmith` and OpenTelemetry instrumentation; OpenInference Agent spans.
Frameworks alongside	Often combined with LangGraph (for sub-agent flows), `instructor` (for structured outputs), `crewai` for higher-level role abstractions in adjacent services.
Memory	No built-in memory framework — couple with `mem0` or a custom vector store for long-term recall.

When NOT to use this

Single-agent linear flows. A single LangGraph node or a direct provider SDK call is simpler.
You want role-playing first. crewai's Agent/Task/Crew abstraction is more opinionated and faster to bootstrap for that style.
Sync code. AutoGen v0.4 is async-first. If your environment can't run an event loop, the friction isn't worth it.
Deterministic pipelines. If the workflow doesn't need dynamic agent dispatch, LangGraph (explicit graph nodes) gives you tighter control.
You need v0.2 stability. v0.4 is the active surface; v0.2 is maintenance-only. If existing v0.2 code works, don't migrate just to migrate.

autogen-agentchat

What it is

Install

Versioning & Python support

Package metadata

Optional dependencies & extras

Alternatives

Common gotchas

Performance tuning

Evaluation & observability

Real-world recipes

Recipe: single assistant with tools

Recipe: round-robin group chat

Recipe: selector group chat with custom routing

Recipe: handing off in a Swarm

Recipe: custom termination on token budget

Recipe: code-execution sandbox

Recipe: multi-provider agent

Cost & rate-limit management

Version migration guide

Troubleshooting common errors

Production deployment

Security considerations

Multi-provider patterns

Ecosystem integrations

When NOT to use this

See also