cheat sheet

autogen-agentchat

Package-level reference for the autogen-agentchat / autogen-core / autogen-ext family on PyPI plus the legacy pyautogen — install, rename history, versioning, and alternatives.

autogen-agentchat

What it is

autogen-agentchat is the high-level multi-agent SDK from the AutoGen v0.4 redesign — Microsoft's framework for building systems where multiple AI agents converse to complete tasks. In v0.4, AutoGen was split into a layered family of packages: a low-level message-passing runtime, a high-level agent/team SDK, and an extensions package for model-client and tool integrations.

A note on naming: the v0.2 monolith was published as pyautogen on PyPI. The v0.4 rewrite uses autogen-agentchat, autogen-core, and autogen-ext — the bare autogen name on PyPI has a complicated history and is not the canonical v0.4 install. Use the split packages.

Install

bash
pip install autogen-agentchat autogen-ext[openai]

Output: the standard v0.4 stack with OpenAI model client

bash
pip install autogen-agentchat autogen-ext[anthropic]

Output: v0.4 with Anthropic Claude support

bash
pip install autogen-agentchat autogen-ext[azure]

Output: v0.4 with Azure OpenAI

bash
uv add autogen-agentchat "autogen-ext[openai]"

Output: dependencies resolved + added to pyproject.toml

bash
pip install pyautogen     # ← legacy v0.2 only; do not mix with v0.4

Output: installs the older monolith — different API entirely

Versioning & Python support

  • v0.2 (pyautogen) — original AutoGen, single package, 0.2.x line. Still receives bug fixes but feature work has moved on. Tutorials from 2023–early-2024 target this API.
  • v0.4+ (autogen-agentchat family) — current redesigned SDK on the 0.4.x / 0.5.x line (as of late 2025). Different abstractions: AssistantAgent, UserProxyAgent, RoundRobinGroupChat, Swarm, the async TeamWorkbench runtime.
  • The two APIs are NOT compatible — code written for pyautogen does not run on autogen-agentchat. The framework redesign was a clean break.
  • Python 3.10+ for v0.4 (the runtime relies on modern asyncio features). v0.2 supported 3.8+.
  • Pre-1.0 on both lines — pin tightly.
  • Microsoft Research's experimental forks live under microsoft/autogen on GitHub but a related fork lives at ag2ai/ag2 — released as ag2 on PyPI. They originated from the same codebase and have diverged. If a tutorial references ag2, it's the community fork, not the Microsoft mainline.

Package metadata

  • Maintainer: Microsoft (the microsoft/autogen repo)
  • Project home: github.com/microsoft/autogen
  • Docs: microsoft.github.io/autogen
  • PyPI: pypi.org/project/autogen-agentchat
  • License: MIT (Apache-2.0 / CC-BY-4.0 in places — check each subpackage)
  • Governance: Microsoft Research + open contribution; the ag2 fork is community-led
  • First released: v0.2 in 2023; v0.4 family in late 2024
  • Downloads: millions per month across the family

Optional dependencies & extras

The v0.4 family is layered. Pick packages from the appropriate layer:

PackageLayerPurpose
autogen-coreFoundationAsync message-passing runtime, agent base classes. Used by everything else.
autogen-agentchatHigh-levelThe "AutoGen API" most users want — assistants, group chats, teams.
autogen-ext[openai]ExtensionsOpenAI / Azure OpenAI ChatCompletionClient
autogen-ext[anthropic]ExtensionsAnthropic Claude model client
autogen-ext[azure]ExtensionsAzure-specific clients
autogen-ext[docker]ExtensionsDocker-backed code execution sandbox
autogen-ext[web-surfer]ExtensionsHeadless-browser web-browsing tool
autogen-ext[file-surfer]ExtensionsFile-system browsing tool
autogen-studioUIOptional graphical workbench for building/debugging teams

The autogen-ext package is the catch-all for tool and model-client integrations — install with one or more extras corresponding to what you need.

Alternatives

PackageTrade-off
crewaiRoles + tasks + crews abstraction; YAML-driven; more opinionated. Pythonic alternative.
langgraphStateful graph-based agents from the LangChain team. Lower-level than agentchat; finer control.
llama-index (agents)Agent abstractions on top of LlamaIndex's retrieval stack. RAG-first orientation.
swarm (OpenAI)OpenAI's experimental lightweight multi-agent library. Smaller, OpenAI-only.
openai-agentsOpenAI's newer Agents SDK. Tight OpenAI integration.
Custom orchestrationA loop + provider SDK is often enough for simple 2-agent setups; frameworks earn their keep at 4+ agents.

Common gotchas

  1. v0.2 vs v0.4 are different products. Code from a 2023 AutoGen tutorial does not run on autogen-agentchat. Check the import line — from autogen import ... is v0.2 (pyautogen); from autogen_agentchat.agents import ... is v0.4.
  2. Multi-agent message passing needs the model_client interface in v0.4. You don't pass llm_config dicts anymore — you instantiate a ChatCompletionClient from autogen-ext and hand it to the agents.
  3. Microsoft research repos vs Microsoft GitHub mainline. Earlier AutoGen experiments lived under microsoft/autogen research branches; the current mainline is the v0.4 rewrite. Tutorials and blog posts often reference an older branch state — pin to the latest stable.
  4. ag2 fork on PyPI. The ag2 package is a community fork. Mostly compatible with Microsoft mainline but diverges over time; don't install both.
  5. Async-first runtime. v0.4 agents run on asyncio. Mixing sync run() and async run_stream() patterns inside a single team setup leads to "coroutine was never awaited" warnings or deadlocks.
  6. Code-execution sandbox is opt-in. autogen-ext[docker] installs the Docker-backed executor, but Docker must actually be running on the host. Without it, code execution falls back to a (less safe) local-process executor — explicitly choose.
  7. Tool registration shape changed. v0.2 tools were registered via decorators on a UserProxyAgent; v0.4 uses Python callables passed into AssistantAgent(tools=[...]). Don't paste v0.2 tool snippets into v0.4 agents.
  8. AutoGen Studio is a separate UI, not a runtime requirement. Skip it for CI/headless deployments.

Performance tuning

Multi-agent throughput is dominated by total LLM-call latency × turn count. The levers are:

  • Reduce turn count. A well-prompted system message and a clear termination condition cut turns dramatically. The default "max_turns=20" is a ceiling, not a target.
  • Parallel tool calls. If multiple tool calls in one turn are independent, batch them — model providers increasingly support parallel function calls.
  • Lighter model for routing. A SelectorGroupChat dispatcher running on a small/fast model adds barely any latency vs a flagship model used everywhere.
  • Streaming for time-to-first-token. Use run_stream and surface partial messages to users; perceived latency improves even when total time doesn't.
  • Cold-start vs warm-start. Connection pool warm-up for the model client matters — keep clients alive across requests.
  • Tool latency. Slow tools (web scraping, database) dominate. Cache results aggressively; ratch result sizes.
  • Cancellation discipline. run_stream returns an iterator — break out early when an agent emits a clear "done" signal.

Evaluation & observability

Multi-agent runs are nearly impossible to debug without traces. Set up observability before you build the second agent.

  • Trace every model call + tool call. langsmith and OpenTelemetry both work — pick one and instrument the model client + tool wrappers.
  • Per-run metrics: total turns, total tokens, total cost, success / failure / abort outcome.
  • Per-agent metrics: turn count, average tokens per turn, time per turn — identifies the agent eating the budget.
  • Trajectory replay. Save the full message history for failed runs; replay locally to reproduce failures.
  • A/B on team composition. Swap agents in / out, run the same task suite, compare outcomes.
  • Custom evaluators. Did the final answer match a gold output? Did the team terminate on the expected condition? Standard LLM eval frameworks (DeepEval, ragas) work for end-of-run scoring; track per-turn metrics separately.

Real-world recipes

The v0.4 architecture organises around AssistantAgent, Tool, and team types (RoundRobinGroupChat, SelectorGroupChat, Swarm). Recipes below assume the modern stack.

Recipe: single assistant with tools

python
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"sunny, 22C in {city}"

async def main():
    model = OpenAIChatCompletionClient(model="gpt-4o-mini")
    agent = AssistantAgent(
        name="weather_bot",
        model_client=model,
        tools=[get_weather],
        system_message="You answer weather questions concisely.",
    )
    await Console(agent.run_stream(task="What's the weather in Tokyo?"))

asyncio.run(main())

Output: the model issues a get_weather call, observes the result, replies with a one-sentence answer.

Recipe: round-robin group chat

python
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient

model = OpenAIChatCompletionClient(model="gpt-4o-mini")

writer = AssistantAgent("writer", model_client=model,
    system_message="You draft short marketing copy.")
critic = AssistantAgent("critic", model_client=model,
    system_message="You critique copy. When satisfied, say 'APPROVE'.")

team = RoundRobinGroupChat([writer, critic],
                           termination_condition=TextMentionTermination("APPROVE"),
                           max_turns=8)

Output: writer drafts, critic critiques, writer revises — terminates when critic says "APPROVE" or hits the max turn count.

Recipe: selector group chat with custom routing

python
from autogen_agentchat.teams import SelectorGroupChat

team = SelectorGroupChat(
    [coder, reviewer, tester],
    model_client=model,
    selector_prompt="Select the next agent based on the message. Return one of {participants}.",
    allow_repeated_speaker=False,
    max_turns=12,
)

Output: a model-driven dispatcher decides which agent speaks next based on the conversation state.

Recipe: handing off in a Swarm

Swarm (also from autogen-agentchat.teams) implements OpenAI-style agent hand-off — one agent transfers control to another based on the task. Useful when responsibility is hierarchical rather than collaborative.

python
from autogen_agentchat.teams import Swarm
from autogen_agentchat.conditions import HandoffTermination, MaxMessageTermination

triage = AssistantAgent(
    "triage", model_client=model,
    system_message="Classify the user request and hand off to billing or technical.",
    handoffs=["billing", "technical"],
)
billing   = AssistantAgent("billing",   model_client=model, system_message="Handle billing.")
technical = AssistantAgent("technical", model_client=model, system_message="Handle technical issues.")

team = Swarm(
    [triage, billing, technical],
    termination_condition=HandoffTermination("user") | MaxMessageTermination(10),
)

Output: triage transfers to the appropriate specialist; the swarm terminates when an agent hands back to "user" or hits the message ceiling.

Recipe: custom termination on token budget

python
from autogen_agentchat.conditions import TerminationCondition
from autogen_agentchat.messages import AgentEvent

class BudgetTermination(TerminationCondition):
    def __init__(self, max_tokens: int):
        self.budget = max_tokens
        self.spent = 0
    @property
    def terminated(self) -> bool:
        return self.spent >= self.budget
    async def __call__(self, messages):
        for m in messages:
            if hasattr(m, "models_usage") and m.models_usage:
                self.spent += m.models_usage.completion_tokens or 0
        return self.terminated

team = RoundRobinGroupChat([writer, critic], termination_condition=BudgetTermination(5000))

Output: team self-terminates after consuming 5000 completion tokens, even mid-conversation.

Recipe: code-execution sandbox

python
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor
from autogen_agentchat.agents import CodeExecutorAgent

async with DockerCommandLineCodeExecutor(work_dir="./sandbox") as executor:
    coder = AssistantAgent("coder", model_client=model,
        system_message="You write Python. Wrap code in ```python blocks.")
    runner = CodeExecutorAgent("runner", code_executor=executor)
    team = RoundRobinGroupChat([coder, runner], max_turns=6)
    await Console(team.run_stream(task="Compute the 30th Fibonacci number."))

Output: coder proposes code, runner executes it in a Docker container, the result feeds back into the conversation.

Recipe: multi-provider agent

python
from autogen_ext.models.anthropic import AnthropicChatCompletionClient

claude = AnthropicChatCompletionClient(model="claude-sonnet-4-6")
gpt    = OpenAIChatCompletionClient(model="gpt-4o-mini")

planner   = AssistantAgent("planner", model_client=claude,
                            system_message="You plan multi-step tasks.")
executor  = AssistantAgent("executor", model_client=gpt,
                            system_message="You execute concrete steps.")
team = RoundRobinGroupChat([planner, executor], max_turns=10)

Output: Claude plans, GPT executes — each agent backed by a different provider.

Cost & rate-limit management

Multi-agent systems multiply costs by the number of turns × agents per turn. Treat the cost ceiling as an explicit design parameter, not an afterthought.

  • max_turns is mandatory. Without it, agents can ping-pong forever. The default termination conditions exist to bound runtime.
  • Cheap model for routing, expensive model for reasoning. A SelectorGroupChat dispatcher can run on a small model; the worker agents can use a flagship. Saves significant cost on dispatch.
  • Custom TerminationCondition. Build conditions on token spend, latency, or message content — short-circuit when the conversation stalls.
  • Tool-call discipline. Each tool round-trip is a model call. Combine related tools where possible; eg, return a struct, not a single value, per call.
  • Rate-limit by provider. Multiple agents sharing one provider can trip per-minute quotas. Spread across providers or use a LiteLLM proxy with team quotas.
  • Streaming run_stream does not reduce cost — but lets you cancel early if the conversation goes off-track.
  • Observability for spend. Trace every agent call via langsmith or OpenTelemetry; aggregate spend by team / agent.

Version migration guide

pyautogen (v0.2) → autogen-agentchat (v0.4+) is a clean break — different package, different API. The framework was redesigned around async message passing.

Aspectv0.2 (pyautogen)v0.4+ (autogen-agentchat family)
Importfrom autogen import AssistantAgent, UserProxyAgentfrom autogen_agentchat.agents import AssistantAgent
Sync vs asyncMostly sync (agent.initiate_chat(...))Async-first (await team.run_stream(...))
LLM configllm_config={"model": "gpt-4", ...} dictmodel_client=OpenAIChatCompletionClient(...) from autogen-ext
Tools@function decorator on the proxyPython callables passed to AssistantAgent(tools=[...])
Group chatGroupChat + GroupChatManagerRoundRobinGroupChat, SelectorGroupChat, Swarm
Code executionLocal subprocess by defaultExplicit DockerCommandLineCodeExecutor or LocalCommandLineCodeExecutor
TerminationImplicit via max_turns onlyComposable TerminationCondition objects
StreamingLimitedFirst-class run_stream async iterator

Migration discipline:

  1. The two libraries cannot share state — they're separate packages.
  2. Translate llm_config dicts to ChatCompletionClient instances from autogen-ext.
  3. Re-shape tools from @function decorators on the user proxy to plain Python callables passed to assistants.
  4. Adopt async — asyncio.run(main()) at the top level becomes the norm.
  5. ag2 is a community fork of mainline; check which one your tutorial references before pasting.
  6. Hedge: specific symbol moves and signature changes within v0.4.x are best confirmed against the project's microsoft/autogen release notes — the redesign era saw multiple minor adjustments.

Troubleshooting common errors

  • ImportError: cannot import name 'AssistantAgent' from 'autogen' — that's v0.2 syntax against a v0.4 install (or vice-versa). Check which autogen-* packages are installed.
  • RuntimeError: This event loop is already running — calling async APIs from a sync context (or in Jupyter without nest_asyncio). Use await properly or asyncio.run.
  • coroutine was never awaited — calling team.run(...) without await. v0.4 APIs are async-first.
  • Code execution returns blank. Docker isn't running, or the executor's work_dir is unwritable. Check Docker; pick a writable path.
  • PermissionDenied: 403 from OpenAIChatCompletionClient — API key not set or scoped wrong; check OPENAI_API_KEY env.
  • Infinite loop. No termination condition was supplied. Add TextMentionTermination("DONE") or MaxMessageTermination(15).
  • Tool not being called. Tool signature must be typed (annotations on every parameter); without types, AutoGen can't generate the schema.
  • ag2 vs autogen confusion. They're forks — don't mix imports.

Production deployment

AutoGen multi-agent systems are usually deployed as backend workers rather than as user-facing services — the conversation pattern is too long-lived for typical HTTP request lifecycles.

  • Worker / queue pattern. Receive a task on a queue (SQS, Pub/Sub, Celery), run the team to completion, post the result. Don't run agents inside a synchronous HTTP handler.
  • Async runtime. autogen-core is async-first; the worker should use asyncio directly, not threads.
  • Persistent state. Agent state is in-memory by default. For long-running conversations, snapshot to durable storage between turns.
  • Sandbox code execution. Always use DockerCommandLineCodeExecutor in production; the local-process executor is unsafe with untrusted inputs.
  • Concurrency. One team per worker process. Spinning up additional teams within one process is possible but couples their failure modes.
  • Observability. Trace every agent + tool call. The fanout is high — without traces, debugging a failed team run is nearly impossible.
  • Cost ceilings. Set per-task budgets and abort runs that exceed them.

Security considerations

  • Code execution is the headline risk. Any CodeExecutorAgent can run arbitrary Python. Always sandbox via Docker; never expose to untrusted users.
  • Tool registration is eval() for the model. Tools that touch databases, file systems, or external services need allowlists and audit logging.
  • Prompt injection through messages. A malicious tool result, document, or user message can re-direct subsequent agents. Filter inputs and pin system prompts.
  • Secrets in messages. Agent conversations get logged everywhere. Never put API keys in system messages or user inputs.
  • Cross-agent secret leakage. If one agent has access to a secret (DB credential), avoid surfacing it in messages other agents see.
  • Model-client key handling. OpenAIChatCompletionClient(api_key=...) accepts keys directly — prefer env-based configuration so keys don't end up in tracebacks.
  • Container escapes. Docker code execution mitigates but does not eliminate escape risk — use rootless Docker or gVisor for hardened environments.
  • Replay risk. Saved conversation logs contain prompts, completions, and tool args — treat as sensitive.

Multi-provider patterns

The v0.4 design puts model selection behind the ChatCompletionClient interface — any agent can use any provider.

  • autogen-ext[openai] / [anthropic] / [azure] / [google] ship matching client classes. Same interface; swap construction.
  • LiteLLM proxy in front of an OpenAIChatCompletionClient pointing at the proxy's base URL gives you any provider behind one client class. Useful for centralised cost/quota control.
  • Per-agent provider choice. Different agents can use different providers — cheap dispatcher, expensive worker — without architectural changes.
  • Failover. AutoGen does not ship retry-across-providers natively; wrap the model client in a custom adapter that catches errors and falls back.
  • Token budgeting across providers. Different providers report tokens differently; aggregate via your observability layer.

Ecosystem integrations

LayerIntegrations
Model clientsautogen-ext[openai], [anthropic], [azure], [google] cover the major hosted providers.
Toolsautogen-ext[web-surfer], [file-surfer], third-party tool repos. Tools are plain Python callables — anything you can wrap in a function works.
Code executionDockerCommandLineCodeExecutor, LocalCommandLineCodeExecutor, JupyterCodeExecutor (via autogen-ext).
UI / Studioautogen-studio — graphical workbench for building and debugging teams.
Observabilitylangsmith and OpenTelemetry instrumentation; OpenInference Agent spans.
Frameworks alongsideOften combined with LangGraph (for sub-agent flows), instructor (for structured outputs), crewai for higher-level role abstractions in adjacent services.
MemoryNo built-in memory framework — couple with mem0 or a custom vector store for long-term recall.

When NOT to use this

  • Single-agent linear flows. A single LangGraph node or a direct provider SDK call is simpler.
  • You want role-playing first. crewai's Agent/Task/Crew abstraction is more opinionated and faster to bootstrap for that style.
  • Sync code. AutoGen v0.4 is async-first. If your environment can't run an event loop, the friction isn't worth it.
  • Deterministic pipelines. If the workflow doesn't need dynamic agent dispatch, LangGraph (explicit graph nodes) gives you tighter control.
  • You need v0.2 stability. v0.4 is the active surface; v0.2 is maintenance-only. If existing v0.2 code works, don't migrate just to migrate.

See also