cheat sheet

crewai

Package-level reference for the crewai library on PyPI plus the crewai-tools companion — install, versioning, and multi-agent alternatives.

updated 05-31-2026

crewai

What it is

crewai is a Python framework for orchestrating role-playing LLM agents that collaborate on tasks. Its core abstractions are Agent (a role with a goal, backstory, and toolset), Task (a unit of work assigned to an agent), and Crew (a team of agents executing tasks in sequence or hierarchy). The framework also exposes a Flow API for stateful branching workflows.

CrewAI's appeal is the deliberately opinionated, YAML-friendly mental model — define agents and tasks in configuration files, then assemble a crew with a few lines of Python. It is the most-cited "easy on-ramp" multi-agent framework, with a heavy community presence in 2024–2025.

Install

bash

pip install crewai

Output: installs the core framework

bash

pip install "crewai[tools]"

Output: installs crewai-tools companion — file readers, web scrapers, search tools, etc.

bash

uv add crewai crewai-tools

Output: dependencies resolved + added to pyproject.toml

bash

poetry add crewai

Output: updated lockfile + virtualenv install

Versioning & Python support

Current line is 0.x (as of late 2025), with frequent minor releases. Pre-1.0 — minor bumps occasionally rename agent / task / process options. Pin tight in production.
Python 3.10+ on current releases.
crewai and crewai-tools are released independently and must move together — a fresh crewai paired with an old crewai-tools is the #1 source of ImportError on first run.
Built on top of langchain / litellm under the hood for LLM-provider abstraction — pulls in a substantial dep tree.
The framework has moved from LangChain-coupled internals toward litellm-based provider abstraction; check release notes when upgrading across minor versions.

Package metadata

Maintainer: CrewAI Inc. (commercial sponsor) + open-source community
Project home: github.com/crewAIInc/crewAI
Tools repo: github.com/crewAIInc/crewAI-tools
Docs: docs.crewai.com
PyPI: pypi.org/project/crewai
License: MIT
Governance: commercial-backed open source; hosted "CrewAI Enterprise" runs the same framework as the free package
First released: late 2023
Downloads: millions per month

Optional dependencies & extras

Extra	Purpose
`crewai[tools]`	Pulls in the `crewai-tools` companion package
`crewai[embeddings]`	Embedding-provider deps for crew memory
`crewai[agentops]`	Built-in AgentOps observability hookup

The companion crewai-tools package is what most projects actually want — it carries dozens of pre-built tools: FileReadTool, DirectoryReadTool, SerperDevTool (web search), WebsiteSearchTool, PDFSearchTool, CodeInterpreterTool, etc.

Heavy transitive deps include:

langchain-core, langchain-openai, langchain-community (for some tools)
litellm — multi-provider LLM client
chromadb — default memory backend
pydantic
instructor — structured output

Alternatives

Package	Trade-off
`autogen-agentchat`	Microsoft's multi-agent framework. Lower-level message passing; less opinionated.
`langgraph`	Stateful graph-based agents from LangChain. Finer control over state; more verbose.
`llama-index` (agents)	RAG-first agent stack; tightly integrated with indexes.
`swarm` / `openai-agents`	OpenAI's lightweight + newer Agents SDK. Less opinionated; OpenAI-tied.
`dspy`	Optimises prompts and pipelines automatically. Different paradigm — programmatic, not role-playing.
Hand-rolled loop	A `while`-loop + provider SDK is often enough for 1–2 agent flows.

Common gotchas

crewai and crewai-tools version drift. Upgrade them together: pip install -U crewai crewai-tools. A stale crewai-tools against a current crewai is the most common first-run import error.
Agent / Task / Crew abstraction takes practice. Confusing what belongs on the Agent (persistent role/backstory) vs the Task (one-shot description + expected output) is the most common modeling mistake. Tasks live for a single execution; agents persist across a crew run.
YAML config vs Python config. CrewAI supports defining agents and tasks in YAML (agents.yaml + tasks.yaml) loaded via @CrewBase decorators, OR fully in Python. Mixing both in the same project is allowed but quickly becomes confusing — pick one.
Process modes change behaviour drastically. Process.sequential runs tasks in order; Process.hierarchical introduces a manager agent that delegates. Hierarchical needs an explicit manager_llm config or it silently falls back to a default model.
Memory backend defaults to local ChromaDB. A .chroma directory appears in the working directory on first run. For ephemeral environments (Docker, CI), configure memory=False or a different backend.
Tool dependency surface is wide. Each tool in crewai-tools can pull in further deps (Playwright, Selenium, embedchain, PyPDF, etc.). Don't pip install crewai-tools[all] casually — install per-tool extras instead.
Litellm middleman. Provider-specific failures (rate-limits, malformed function-calls) surface as litellm.exceptions.* rather than the underlying SDK's exceptions. Catch Exception broadly or import from litellm.exceptions.
Async support is partial. Most of the framework is sync; mixing with FastAPI/asyncio requires asyncio.to_thread() or running the crew in a separate worker.

Real-world recipes

CrewAI's value shines in the medium-complexity zone — 3-7 agent crews with clearly delineated roles. Below are the canonical shapes.

Recipe: sequential research → write → review

python

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool

search = SerperDevTool()

researcher = Agent(
    role="Senior Research Analyst",
    goal="Surface accurate, recent facts about a topic.",
    backstory="A meticulous analyst who values primary sources.",
    tools=[search],
    verbose=True,
)
writer = Agent(
    role="Technical Writer",
    goal="Turn research notes into a clear briefing.",
    backstory="A writer who explains technical topics without jargon.",
    verbose=True,
)
reviewer = Agent(
    role="Editorial Reviewer",
    goal="Catch factual errors and unclear passages.",
    backstory="An editor with an eye for sloppy claims.",
    verbose=True,
)

t1 = Task(description="Research recent advances in vector databases.",
          expected_output="A bulleted summary with citations.",
          agent=researcher)
t2 = Task(description="Write a 400-word briefing using the research notes.",
          expected_output="Polished prose, ready to publish.",
          agent=writer, context=[t1])
t3 = Task(description="Review the briefing for errors and clarity issues.",
          expected_output="Approved briefing or a list of fixes.",
          agent=reviewer, context=[t2])

crew = Crew(agents=[researcher, writer, reviewer], tasks=[t1, t2, t3], process=Process.sequential)
print(crew.kickoff())

Output: task chain produces a reviewed briefing with citations.

Recipe: hierarchical crew with manager LLM

python

from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI

manager_llm = ChatOpenAI(model="gpt-4o-mini")

crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[Task(description="Produce a publication-ready briefing about pgvector.",
                expected_output="Reviewed, polished briefing.")],
    process=Process.hierarchical,
    manager_llm=manager_llm,
)
print(crew.kickoff())

Output: the manager LLM delegates sub-tasks; the explicit task list is replaced by a top-level goal.

Recipe: YAML-driven crew (`@CrewBase`)

python

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task

@CrewBase
class ResearchCrew:
    agents_config = "config/agents.yaml"
    tasks_config  = "config/tasks.yaml"

    @agent
    def researcher(self) -> Agent:
        return Agent(config=self.agents_config["researcher"])

    @agent
    def writer(self) -> Agent:
        return Agent(config=self.agents_config["writer"])

    @task
    def research_task(self) -> Task:
        return Task(config=self.tasks_config["research"], agent=self.researcher())

    @crew
    def crew(self) -> Crew:
        return Crew(agents=self.agents, tasks=self.tasks, process=Process.sequential)

Output: declarative config separated from Python wiring — useful for non-developers tweaking prompts.

Recipe: custom tool integration

python

from crewai.tools import BaseTool
from pydantic import BaseModel, Field

class SQLInput(BaseModel):
    query: str = Field(description="A SELECT query")

class SQLTool(BaseTool):
    name: str = "SQL Query"
    description: str = "Run a SELECT against the analytics DB."
    args_schema: type = SQLInput
    def _run(self, query: str) -> str:
        # connect, run, return text
        return run_select(query)

analyst = Agent(role="Data Analyst", goal="Answer questions with SQL.", tools=[SQLTool()])

Output: typed tool with Pydantic args; agent now has SQL capability.

Recipe: flow-based stateful workflow

python

from crewai.flow.flow import Flow, listen, start

class ReleaseFlow(Flow):
    @start()
    def collect(self):
        return {"commits": fetch_commits()}

    @listen(collect)
    def categorise(self, ctx):
        return crew_categorise.kickoff(inputs=ctx)

    @listen(categorise)
    def write(self, categories):
        return crew_write.kickoff(inputs={"categories": categories})

ReleaseFlow().kickoff()

Output: branching, stateful workflow combining multiple crews with explicit data flow.

Cost & rate-limit management

Like AutoGen, crews are cost-multipliers — every task is at least one LLM call, often several, and tools add their own.

Set max_iter on every agent. Each agent's per-task iteration ceiling. Without it, an agent can loop calling tools until the context window fills.
Smaller model for tool-heavy agents. A simple "fetch and summarise" role rarely needs a flagship.
Process.sequential is cheaper than hierarchical. Hierarchical adds a manager LLM and extra coordination calls.
Disable memory unless you need it. Crew(memory=False) skips embedding/storage costs.
Cache embeddings. The default ChromaDB memory backend embeds every observation; reuse embeddings across runs by pointing at a persistent path.
Cap context length. Long backstories and verbose tools inflate every call. Trim ruthlessly.
LiteLLM-side cost tracking. Since CrewAI runs LLM calls through LiteLLM, point LiteLLM at a proxy that logs costs by team/project.
Streaming saves perceived latency. With step_callback, surface progress as agents work.

Version migration guide

CrewAI is pre-1.0 with frequent minor releases. The API has stabilised around Agent / Task / Crew / Process / Flow, but renames and option changes happen.

Roughly	What tends to change
Early `0.x`	LangChain-coupled internals; `llm=ChatOpenAI(...)` was the way to configure models.
Mid `0.x`	Migration toward `litellm` for provider abstraction. `llm="gpt-4o-mini"` (string) became the canonical form.
Recent `0.x`	`Flow` API added for stateful workflows. YAML-driven `@CrewBase` decorators became the recommended pattern for larger projects.

Migration discipline:

Upgrade crewai and crewai-tools together. Version skew between the two is the most common breakage source.
Replace LangChain LLM instantiation with strings. Agent(..., llm="gpt-4o-mini") is the modern form; Agent(..., llm=ChatOpenAI(...)) works but couples to LangChain.
Re-check tool import paths. crewai-tools reorganises namespaces between minors; pin a known-good version.
memory=True schema may shift. If you persisted memory across versions, rebuild it after a major bump.
Hedge: exact symbol moves and signature tweaks across 0.x releases are best confirmed against the project's CHANGELOG — pre-1.0 churn is the norm.

Troubleshooting common errors

ImportError on first run after upgrade. Stale crewai-tools. pip install -U crewai crewai-tools.
Process.hierarchical silently uses a default model. No manager_llm was supplied. Pass one explicitly.
Agent loops calling the same tool. No max_iter on the agent, or the tool result is too vague. Set max_iter=10 and tighten tool descriptions.
KeyError: 'memory' — Crew configuration mismatch with the installed version. Pin both crewai and crewai-tools to known-good versions.
.chroma directory appearing. Default ChromaDB memory backend creates one in cwd. Set memory=False or point to a configured path.
litellm.exceptions.RateLimitError — provider throttled. Add tenacity retry or use LiteLLM's built-in retry policy.
Tool not called. Tool descriptions are critical — agents call tools whose descriptions semantically match. Sharpen the wording.
asyncio event loop conflicts. CrewAI is sync; wrap in asyncio.to_thread(crew.kickoff) from async handlers.

Performance tuning

CrewAI runs are sequential by default — performance comes from doing fewer LLM calls, not faster ones.

Crisp task descriptions. Vague tasks cause iterative tool calls. Sharp expected_output shapes cut iterations.
max_iter bounds. Always set explicitly per agent. The default ceiling is generous; bring it down for cost-sensitive workloads.
Trim backstories. Backstories are prepended to every call; long backstories inflate every prompt.
Process.sequential over Process.hierarchical unless you genuinely need delegation. Hierarchical adds a manager LLM that costs both money and time.
Disable memory unless used. ChromaDB embedding on every observation is non-trivial latency.
Cache memory backend. Pre-warm embeddings; use persistent ChromaDB storage so re-runs hit cache.
Stream progress via step_callback for user-facing progress indicators — perceived latency improves even when wall-clock doesn't.
Parallel tasks via Flow. Crews are sequential; if you have independent crews, run them concurrently via Flow branches.

Production deployment

Crews are best deployed as background workers, not synchronous request handlers — runs can take minutes.

Queue + worker pattern. Receive task on a queue, run the crew, persist the result. Don't tie up HTTP threads.
Stateless workers. Pass all context as task inputs; avoid in-process state across runs.
Cap run time. Use max_iter per agent + a wall-clock timeout via signal or a supervisor — runaway crews can spend hundreds of dollars before you notice.
Memory persistence. If using memory, point the storage backend at durable disk (S3-backed FUSE, persistent volume) rather than container ephemeral disk.
Logging. Set verbose=True during development; switch to structured logging via a step_callback in production.
Container shape. crewai pulls heavy transitive deps (LangChain, ChromaDB, instructor) — expect a ~500 MB Python image with all extras.
Healthcheck. A minimal crew with a no-op task validates that imports work and provider keys are valid.

Security considerations

Tool calls are eval() for the model. Treat each tool in crewai-tools like a privileged operation. Sandbox code-interpreter tools (Docker) and allowlist file-system tools.
Prompt injection through tools. Web-scraping tools return attacker-controlled content. Sanitise or wrap in a system prompt that says tool output is data, not instructions.
Memory persistence is data persistence. If memory backs onto ChromaDB, treat the .chroma directory like the database it is — encrypt at rest, control access.
LiteLLM middleman. Errors surface as litellm.exceptions.* rather than provider-native types — catch broadly and log raw responses if debugging.
Secrets in backstories. Agent backstories are part of every prompt; never embed credentials.
Crew output validation. Models can hallucinate plausible but wrong outputs; for high-stakes use cases, route final output through a deterministic validator.
Multi-tenant isolation. Each tenant should run in its own process with scoped credentials; sharing a Python interpreter across tenants risks accidental cross-leakage.

Multi-provider patterns

CrewAI uses LiteLLM under the hood, so multi-provider is built-in.

Per-agent provider. Agent(..., llm="claude-sonnet-4-6") for one agent, llm="gpt-4o-mini" for another. LiteLLM resolves to the right provider.
Custom LiteLLM config. Set LITELLM_* env vars or use a config file to specify model aliases, fallback chains, and rate limits.
LiteLLM proxy. Point CrewAI at a self-hosted LiteLLM proxy for centralised quota and logging.
Tokenizer parity. LiteLLM handles tokenizer differences — use its litellm.token_counter(...) for cross-provider budgeting.

Evaluation & observability

CrewAI runs are sequential, multi-step, and easy to debug only with traces.

langsmith via LangChain callbacks. Every model call gets traced; the trace tree reflects agent → task → tool nesting.
agentops — CrewAI ships with built-in AgentOps observability via crewai[agentops]. Enable for trace UI focused on multi-agent workflows.
Per-task metrics. Track tokens, latency, and cost per task; identify the bottleneck task.
End-to-end metrics. Track success/failure outcomes for entire crew runs against a gold dataset.
Trajectory replay. Save the full crew.usage_metrics and message history for failed runs.

Ecosystem integrations

Layer	Integrations
Tools	`crewai-tools` ships dozens — search (Serper, Brave), web (Playwright, requests), files (PDF, CSV), code (Python REPL via interpreter tools), and DB connectors.
Providers	Anything `litellm` supports — OpenAI, Anthropic, Gemini, Mistral, Cohere, Bedrock, Ollama, vLLM.
Memory	ChromaDB (default), Pinecone, Weaviate, Qdrant via crewai-tools / direct integration.
Observability	`agentops`, `langsmith`, custom callbacks via `step_callback`.
Structured output	`instructor`-based output parsing on Task definitions (`output_pydantic`).
Hosted	"CrewAI Enterprise" runs the same framework managed; the open package is identical.
CLI	`crewai create`, `crewai run`, `crewai test` for project scaffolding.

When NOT to use this

Single-agent tasks. A direct provider SDK + a loop is simpler.
Deterministic, non-LLM workflows. Airflow, Prefect, or Dagster are better fits — CrewAI is overkill for cron-like pipelines.
You need low-level message-passing control. AutoGen autogen-core and LangGraph give more control over per-message routing.
High-frequency, latency-sensitive interactions. Crew runs are slow (minutes); for sub-second user-facing flows, this is the wrong tool.
You want optimised prompts. DSPy treats prompts as optimisable; CrewAI's prompts are templates baked into agent definitions.
Programmatic orchestration with hard guarantees. Reach for explicit graph frameworks (LangGraph) where state transitions are enforced rather than emergent.

crewai

What it is

Install

Versioning & Python support

Package metadata

Optional dependencies & extras

Alternatives

Common gotchas

Real-world recipes

Recipe: sequential research → write → review

Recipe: hierarchical crew with manager LLM

Recipe: YAML-driven crew (@CrewBase)

Recipe: custom tool integration

Recipe: flow-based stateful workflow

Cost & rate-limit management

Version migration guide

Troubleshooting common errors

Performance tuning

Production deployment

Security considerations

Multi-provider patterns

Evaluation & observability

Ecosystem integrations

When NOT to use this

See also

Recipe: YAML-driven crew (`@CrewBase`)