cheat sheet
autogen-agentchat
Package-level reference for the autogen-agentchat / autogen-core / autogen-ext family on PyPI plus the legacy pyautogen — install, rename history, versioning, and alternatives.
autogen-agentchat
What it is
autogen-agentchat is the high-level multi-agent SDK from the AutoGen v0.4 redesign — Microsoft's framework for building systems where multiple AI agents converse to complete tasks. In v0.4, AutoGen was split into a layered family of packages: a low-level message-passing runtime, a high-level agent/team SDK, and an extensions package for model-client and tool integrations.
A note on naming: the v0.2 monolith was published as pyautogen on PyPI. The v0.4 rewrite uses autogen-agentchat, autogen-core, and autogen-ext — the bare autogen name on PyPI has a complicated history and is not the canonical v0.4 install. Use the split packages.
Install
pip install autogen-agentchat autogen-ext[openai]
Output: the standard v0.4 stack with OpenAI model client
pip install autogen-agentchat autogen-ext[anthropic]
Output: v0.4 with Anthropic Claude support
pip install autogen-agentchat autogen-ext[azure]
Output: v0.4 with Azure OpenAI
uv add autogen-agentchat "autogen-ext[openai]"
Output: dependencies resolved + added to pyproject.toml
pip install pyautogen # ← legacy v0.2 only; do not mix with v0.4
Output: installs the older monolith — different API entirely
Versioning & Python support
- v0.2 (
pyautogen) — original AutoGen, single package,0.2.xline. Still receives bug fixes but feature work has moved on. Tutorials from 2023–early-2024 target this API. - v0.4+ (
autogen-agentchatfamily) — current redesigned SDK on the0.4.x/0.5.xline (as of late 2025). Different abstractions:AssistantAgent,UserProxyAgent,RoundRobinGroupChat,Swarm, the asyncTeamWorkbenchruntime. - The two APIs are NOT compatible — code written for
pyautogendoes not run onautogen-agentchat. The framework redesign was a clean break. - Python
3.10+for v0.4 (the runtime relies on modern asyncio features). v0.2 supported3.8+. - Pre-1.0 on both lines — pin tightly.
- Microsoft Research's experimental forks live under
microsoft/autogenon GitHub but a related fork lives atag2ai/ag2— released asag2on PyPI. They originated from the same codebase and have diverged. If a tutorial referencesag2, it's the community fork, not the Microsoft mainline.
Package metadata
- Maintainer: Microsoft (the
microsoft/autogenrepo) - Project home: github.com/microsoft/autogen
- Docs: microsoft.github.io/autogen
- PyPI: pypi.org/project/autogen-agentchat
- License: MIT (Apache-2.0 / CC-BY-4.0 in places — check each subpackage)
- Governance: Microsoft Research + open contribution; the
ag2fork is community-led - First released: v0.2 in 2023; v0.4 family in late 2024
- Downloads: millions per month across the family
Optional dependencies & extras
The v0.4 family is layered. Pick packages from the appropriate layer:
| Package | Layer | Purpose |
|---|---|---|
autogen-core | Foundation | Async message-passing runtime, agent base classes. Used by everything else. |
autogen-agentchat | High-level | The "AutoGen API" most users want — assistants, group chats, teams. |
autogen-ext[openai] | Extensions | OpenAI / Azure OpenAI ChatCompletionClient |
autogen-ext[anthropic] | Extensions | Anthropic Claude model client |
autogen-ext[azure] | Extensions | Azure-specific clients |
autogen-ext[docker] | Extensions | Docker-backed code execution sandbox |
autogen-ext[web-surfer] | Extensions | Headless-browser web-browsing tool |
autogen-ext[file-surfer] | Extensions | File-system browsing tool |
autogen-studio | UI | Optional graphical workbench for building/debugging teams |
The autogen-ext package is the catch-all for tool and model-client integrations — install with one or more extras corresponding to what you need.
Alternatives
| Package | Trade-off |
|---|---|
crewai | Roles + tasks + crews abstraction; YAML-driven; more opinionated. Pythonic alternative. |
langgraph | Stateful graph-based agents from the LangChain team. Lower-level than agentchat; finer control. |
llama-index (agents) | Agent abstractions on top of LlamaIndex's retrieval stack. RAG-first orientation. |
swarm (OpenAI) | OpenAI's experimental lightweight multi-agent library. Smaller, OpenAI-only. |
openai-agents | OpenAI's newer Agents SDK. Tight OpenAI integration. |
| Custom orchestration | A loop + provider SDK is often enough for simple 2-agent setups; frameworks earn their keep at 4+ agents. |
Common gotchas
- v0.2 vs v0.4 are different products. Code from a 2023 AutoGen tutorial does not run on
autogen-agentchat. Check the import line —from autogen import ...is v0.2 (pyautogen);from autogen_agentchat.agents import ...is v0.4. - Multi-agent message passing needs the
model_clientinterface in v0.4. You don't passllm_configdicts anymore — you instantiate aChatCompletionClientfromautogen-extand hand it to the agents. - Microsoft research repos vs Microsoft GitHub mainline. Earlier AutoGen experiments lived under
microsoft/autogenresearch branches; the current mainline is the v0.4 rewrite. Tutorials and blog posts often reference an older branch state — pin to the latest stable. ag2fork on PyPI. Theag2package is a community fork. Mostly compatible with Microsoft mainline but diverges over time; don't install both.- Async-first runtime. v0.4 agents run on
asyncio. Mixing syncrun()and asyncrun_stream()patterns inside a single team setup leads to "coroutine was never awaited" warnings or deadlocks. - Code-execution sandbox is opt-in.
autogen-ext[docker]installs the Docker-backed executor, but Docker must actually be running on the host. Without it, code execution falls back to a (less safe) local-process executor — explicitly choose. - Tool registration shape changed. v0.2 tools were registered via decorators on a
UserProxyAgent; v0.4 uses Python callables passed intoAssistantAgent(tools=[...]). Don't paste v0.2 tool snippets into v0.4 agents. - AutoGen Studio is a separate UI, not a runtime requirement. Skip it for CI/headless deployments.
Performance tuning
Multi-agent throughput is dominated by total LLM-call latency × turn count. The levers are:
- Reduce turn count. A well-prompted system message and a clear termination condition cut turns dramatically. The default "max_turns=20" is a ceiling, not a target.
- Parallel tool calls. If multiple tool calls in one turn are independent, batch them — model providers increasingly support parallel function calls.
- Lighter model for routing. A
SelectorGroupChatdispatcher running on a small/fast model adds barely any latency vs a flagship model used everywhere. - Streaming for time-to-first-token. Use
run_streamand surface partial messages to users; perceived latency improves even when total time doesn't. - Cold-start vs warm-start. Connection pool warm-up for the model client matters — keep clients alive across requests.
- Tool latency. Slow tools (web scraping, database) dominate. Cache results aggressively; ratch result sizes.
- Cancellation discipline.
run_streamreturns an iterator — break out early when an agent emits a clear "done" signal.
Evaluation & observability
Multi-agent runs are nearly impossible to debug without traces. Set up observability before you build the second agent.
- Trace every model call + tool call.
langsmithand OpenTelemetry both work — pick one and instrument the model client + tool wrappers. - Per-run metrics: total turns, total tokens, total cost, success / failure / abort outcome.
- Per-agent metrics: turn count, average tokens per turn, time per turn — identifies the agent eating the budget.
- Trajectory replay. Save the full message history for failed runs; replay locally to reproduce failures.
- A/B on team composition. Swap agents in / out, run the same task suite, compare outcomes.
- Custom evaluators. Did the final answer match a gold output? Did the team terminate on the expected condition? Standard LLM eval frameworks (DeepEval, ragas) work for end-of-run scoring; track per-turn metrics separately.
Real-world recipes
The v0.4 architecture organises around AssistantAgent, Tool, and team types (RoundRobinGroupChat, SelectorGroupChat, Swarm). Recipes below assume the modern stack.
Recipe: single assistant with tools
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
async def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"sunny, 22C in {city}"
async def main():
model = OpenAIChatCompletionClient(model="gpt-4o-mini")
agent = AssistantAgent(
name="weather_bot",
model_client=model,
tools=[get_weather],
system_message="You answer weather questions concisely.",
)
await Console(agent.run_stream(task="What's the weather in Tokyo?"))
asyncio.run(main())
Output: the model issues a get_weather call, observes the result, replies with a one-sentence answer.
Recipe: round-robin group chat
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
model = OpenAIChatCompletionClient(model="gpt-4o-mini")
writer = AssistantAgent("writer", model_client=model,
system_message="You draft short marketing copy.")
critic = AssistantAgent("critic", model_client=model,
system_message="You critique copy. When satisfied, say 'APPROVE'.")
team = RoundRobinGroupChat([writer, critic],
termination_condition=TextMentionTermination("APPROVE"),
max_turns=8)
Output: writer drafts, critic critiques, writer revises — terminates when critic says "APPROVE" or hits the max turn count.
Recipe: selector group chat with custom routing
from autogen_agentchat.teams import SelectorGroupChat
team = SelectorGroupChat(
[coder, reviewer, tester],
model_client=model,
selector_prompt="Select the next agent based on the message. Return one of {participants}.",
allow_repeated_speaker=False,
max_turns=12,
)
Output: a model-driven dispatcher decides which agent speaks next based on the conversation state.
Recipe: handing off in a Swarm
Swarm (also from autogen-agentchat.teams) implements OpenAI-style agent hand-off — one agent transfers control to another based on the task. Useful when responsibility is hierarchical rather than collaborative.
from autogen_agentchat.teams import Swarm
from autogen_agentchat.conditions import HandoffTermination, MaxMessageTermination
triage = AssistantAgent(
"triage", model_client=model,
system_message="Classify the user request and hand off to billing or technical.",
handoffs=["billing", "technical"],
)
billing = AssistantAgent("billing", model_client=model, system_message="Handle billing.")
technical = AssistantAgent("technical", model_client=model, system_message="Handle technical issues.")
team = Swarm(
[triage, billing, technical],
termination_condition=HandoffTermination("user") | MaxMessageTermination(10),
)
Output: triage transfers to the appropriate specialist; the swarm terminates when an agent hands back to "user" or hits the message ceiling.
Recipe: custom termination on token budget
from autogen_agentchat.conditions import TerminationCondition
from autogen_agentchat.messages import AgentEvent
class BudgetTermination(TerminationCondition):
def __init__(self, max_tokens: int):
self.budget = max_tokens
self.spent = 0
@property
def terminated(self) -> bool:
return self.spent >= self.budget
async def __call__(self, messages):
for m in messages:
if hasattr(m, "models_usage") and m.models_usage:
self.spent += m.models_usage.completion_tokens or 0
return self.terminated
team = RoundRobinGroupChat([writer, critic], termination_condition=BudgetTermination(5000))
Output: team self-terminates after consuming 5000 completion tokens, even mid-conversation.
Recipe: code-execution sandbox
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor
from autogen_agentchat.agents import CodeExecutorAgent
async with DockerCommandLineCodeExecutor(work_dir="./sandbox") as executor:
coder = AssistantAgent("coder", model_client=model,
system_message="You write Python. Wrap code in ```python blocks.")
runner = CodeExecutorAgent("runner", code_executor=executor)
team = RoundRobinGroupChat([coder, runner], max_turns=6)
await Console(team.run_stream(task="Compute the 30th Fibonacci number."))
Output: coder proposes code, runner executes it in a Docker container, the result feeds back into the conversation.
Recipe: multi-provider agent
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
claude = AnthropicChatCompletionClient(model="claude-sonnet-4-6")
gpt = OpenAIChatCompletionClient(model="gpt-4o-mini")
planner = AssistantAgent("planner", model_client=claude,
system_message="You plan multi-step tasks.")
executor = AssistantAgent("executor", model_client=gpt,
system_message="You execute concrete steps.")
team = RoundRobinGroupChat([planner, executor], max_turns=10)
Output: Claude plans, GPT executes — each agent backed by a different provider.
Cost & rate-limit management
Multi-agent systems multiply costs by the number of turns × agents per turn. Treat the cost ceiling as an explicit design parameter, not an afterthought.
max_turnsis mandatory. Without it, agents can ping-pong forever. The default termination conditions exist to bound runtime.- Cheap model for routing, expensive model for reasoning. A
SelectorGroupChatdispatcher can run on a small model; the worker agents can use a flagship. Saves significant cost on dispatch. - Custom
TerminationCondition. Build conditions on token spend, latency, or message content — short-circuit when the conversation stalls. - Tool-call discipline. Each tool round-trip is a model call. Combine related tools where possible; eg, return a struct, not a single value, per call.
- Rate-limit by provider. Multiple agents sharing one provider can trip per-minute quotas. Spread across providers or use a LiteLLM proxy with team quotas.
- Streaming
run_streamdoes not reduce cost — but lets you cancel early if the conversation goes off-track. - Observability for spend. Trace every agent call via
langsmithor OpenTelemetry; aggregate spend by team / agent.
Version migration guide
pyautogen (v0.2) → autogen-agentchat (v0.4+) is a clean break — different package, different API. The framework was redesigned around async message passing.
| Aspect | v0.2 (pyautogen) | v0.4+ (autogen-agentchat family) |
|---|---|---|
| Import | from autogen import AssistantAgent, UserProxyAgent | from autogen_agentchat.agents import AssistantAgent |
| Sync vs async | Mostly sync (agent.initiate_chat(...)) | Async-first (await team.run_stream(...)) |
| LLM config | llm_config={"model": "gpt-4", ...} dict | model_client=OpenAIChatCompletionClient(...) from autogen-ext |
| Tools | @function decorator on the proxy | Python callables passed to AssistantAgent(tools=[...]) |
| Group chat | GroupChat + GroupChatManager | RoundRobinGroupChat, SelectorGroupChat, Swarm |
| Code execution | Local subprocess by default | Explicit DockerCommandLineCodeExecutor or LocalCommandLineCodeExecutor |
| Termination | Implicit via max_turns only | Composable TerminationCondition objects |
| Streaming | Limited | First-class run_stream async iterator |
Migration discipline:
- The two libraries cannot share state — they're separate packages.
- Translate
llm_configdicts toChatCompletionClientinstances fromautogen-ext. - Re-shape tools from
@functiondecorators on the user proxy to plain Python callables passed to assistants. - Adopt async —
asyncio.run(main())at the top level becomes the norm. ag2is a community fork of mainline; check which one your tutorial references before pasting.- Hedge: specific symbol moves and signature changes within v0.4.x are best confirmed against the project's
microsoft/autogenrelease notes — the redesign era saw multiple minor adjustments.
Troubleshooting common errors
ImportError: cannot import name 'AssistantAgent' from 'autogen'— that's v0.2 syntax against a v0.4 install (or vice-versa). Check whichautogen-*packages are installed.RuntimeError: This event loop is already running— calling async APIs from a sync context (or in Jupyter withoutnest_asyncio). Useawaitproperly orasyncio.run.coroutine was never awaited— callingteam.run(...)withoutawait. v0.4 APIs are async-first.- Code execution returns blank. Docker isn't running, or the executor's
work_diris unwritable. Check Docker; pick a writable path. PermissionDenied: 403fromOpenAIChatCompletionClient— API key not set or scoped wrong; checkOPENAI_API_KEYenv.- Infinite loop. No termination condition was supplied. Add
TextMentionTermination("DONE")orMaxMessageTermination(15). - Tool not being called. Tool signature must be typed (annotations on every parameter); without types, AutoGen can't generate the schema.
ag2vsautogenconfusion. They're forks — don't mix imports.
Production deployment
AutoGen multi-agent systems are usually deployed as backend workers rather than as user-facing services — the conversation pattern is too long-lived for typical HTTP request lifecycles.
- Worker / queue pattern. Receive a task on a queue (SQS, Pub/Sub, Celery), run the team to completion, post the result. Don't run agents inside a synchronous HTTP handler.
- Async runtime.
autogen-coreis async-first; the worker should useasynciodirectly, not threads. - Persistent state. Agent state is in-memory by default. For long-running conversations, snapshot to durable storage between turns.
- Sandbox code execution. Always use
DockerCommandLineCodeExecutorin production; the local-process executor is unsafe with untrusted inputs. - Concurrency. One team per worker process. Spinning up additional teams within one process is possible but couples their failure modes.
- Observability. Trace every agent + tool call. The fanout is high — without traces, debugging a failed team run is nearly impossible.
- Cost ceilings. Set per-task budgets and abort runs that exceed them.
Security considerations
- Code execution is the headline risk. Any
CodeExecutorAgentcan run arbitrary Python. Always sandbox via Docker; never expose to untrusted users. - Tool registration is
eval()for the model. Tools that touch databases, file systems, or external services need allowlists and audit logging. - Prompt injection through messages. A malicious tool result, document, or user message can re-direct subsequent agents. Filter inputs and pin system prompts.
- Secrets in messages. Agent conversations get logged everywhere. Never put API keys in system messages or user inputs.
- Cross-agent secret leakage. If one agent has access to a secret (DB credential), avoid surfacing it in messages other agents see.
- Model-client key handling.
OpenAIChatCompletionClient(api_key=...)accepts keys directly — prefer env-based configuration so keys don't end up in tracebacks. - Container escapes. Docker code execution mitigates but does not eliminate escape risk — use rootless Docker or gVisor for hardened environments.
- Replay risk. Saved conversation logs contain prompts, completions, and tool args — treat as sensitive.
Multi-provider patterns
The v0.4 design puts model selection behind the ChatCompletionClient interface — any agent can use any provider.
autogen-ext[openai]/[anthropic]/[azure]/[google]ship matching client classes. Same interface; swap construction.- LiteLLM proxy in front of an
OpenAIChatCompletionClientpointing at the proxy's base URL gives you any provider behind one client class. Useful for centralised cost/quota control. - Per-agent provider choice. Different agents can use different providers — cheap dispatcher, expensive worker — without architectural changes.
- Failover. AutoGen does not ship retry-across-providers natively; wrap the model client in a custom adapter that catches errors and falls back.
- Token budgeting across providers. Different providers report tokens differently; aggregate via your observability layer.
Ecosystem integrations
| Layer | Integrations |
|---|---|
| Model clients | autogen-ext[openai], [anthropic], [azure], [google] cover the major hosted providers. |
| Tools | autogen-ext[web-surfer], [file-surfer], third-party tool repos. Tools are plain Python callables — anything you can wrap in a function works. |
| Code execution | DockerCommandLineCodeExecutor, LocalCommandLineCodeExecutor, JupyterCodeExecutor (via autogen-ext). |
| UI / Studio | autogen-studio — graphical workbench for building and debugging teams. |
| Observability | langsmith and OpenTelemetry instrumentation; OpenInference Agent spans. |
| Frameworks alongside | Often combined with LangGraph (for sub-agent flows), instructor (for structured outputs), crewai for higher-level role abstractions in adjacent services. |
| Memory | No built-in memory framework — couple with mem0 or a custom vector store for long-term recall. |
When NOT to use this
- Single-agent linear flows. A single LangGraph node or a direct provider SDK call is simpler.
- You want role-playing first.
crewai's Agent/Task/Crew abstraction is more opinionated and faster to bootstrap for that style. - Sync code. AutoGen v0.4 is async-first. If your environment can't run an event loop, the friction isn't worth it.
- Deterministic pipelines. If the workflow doesn't need dynamic agent dispatch, LangGraph (explicit graph nodes) gives you tighter control.
- You need v0.2 stability. v0.4 is the active surface; v0.2 is maintenance-only. If existing v0.2 code works, don't migrate just to migrate.
See also
- AI: AutoGen — agents, group chats, code execution
- Packages: pip-crewai — agent-framework alternative
- Concept: agents — agent loop fundamentals
- Concept: api — model client + tool registration patterns