cheat sheet
crewai
Package-level reference for the crewai library on PyPI plus the crewai-tools companion — install, versioning, and multi-agent alternatives.
crewai
What it is
crewai is a Python framework for orchestrating role-playing LLM agents that collaborate on tasks. Its core abstractions are Agent (a role with a goal, backstory, and toolset), Task (a unit of work assigned to an agent), and Crew (a team of agents executing tasks in sequence or hierarchy). The framework also exposes a Flow API for stateful branching workflows.
CrewAI's appeal is the deliberately opinionated, YAML-friendly mental model — define agents and tasks in configuration files, then assemble a crew with a few lines of Python. It is the most-cited "easy on-ramp" multi-agent framework, with a heavy community presence in 2024–2025.
Install
pip install crewai
Output: installs the core framework
pip install "crewai[tools]"
Output: installs crewai-tools companion — file readers, web scrapers, search tools, etc.
uv add crewai crewai-tools
Output: dependencies resolved + added to pyproject.toml
poetry add crewai
Output: updated lockfile + virtualenv install
Versioning & Python support
- Current line is
0.x(as of late 2025), with frequent minor releases. Pre-1.0 — minor bumps occasionally rename agent / task / process options. Pin tight in production. - Python
3.10+on current releases. crewaiandcrewai-toolsare released independently and must move together — a freshcrewaipaired with an oldcrewai-toolsis the #1 source ofImportErroron first run.- Built on top of
langchain/litellmunder the hood for LLM-provider abstraction — pulls in a substantial dep tree. - The framework has moved from LangChain-coupled internals toward
litellm-based provider abstraction; check release notes when upgrading across minor versions.
Package metadata
- Maintainer: CrewAI Inc. (commercial sponsor) + open-source community
- Project home: github.com/crewAIInc/crewAI
- Tools repo: github.com/crewAIInc/crewAI-tools
- Docs: docs.crewai.com
- PyPI: pypi.org/project/crewai
- License: MIT
- Governance: commercial-backed open source; hosted "CrewAI Enterprise" runs the same framework as the free package
- First released: late 2023
- Downloads: millions per month
Optional dependencies & extras
| Extra | Purpose |
|---|---|
crewai[tools] | Pulls in the crewai-tools companion package |
crewai[embeddings] | Embedding-provider deps for crew memory |
crewai[agentops] | Built-in AgentOps observability hookup |
The companion crewai-tools package is what most projects actually want — it carries dozens of pre-built tools: FileReadTool, DirectoryReadTool, SerperDevTool (web search), WebsiteSearchTool, PDFSearchTool, CodeInterpreterTool, etc.
Heavy transitive deps include:
langchain-core,langchain-openai,langchain-community(for some tools)litellm— multi-provider LLM clientchromadb— default memory backendpydanticinstructor— structured output
Alternatives
| Package | Trade-off |
|---|---|
autogen-agentchat | Microsoft's multi-agent framework. Lower-level message passing; less opinionated. |
langgraph | Stateful graph-based agents from LangChain. Finer control over state; more verbose. |
llama-index (agents) | RAG-first agent stack; tightly integrated with indexes. |
swarm / openai-agents | OpenAI's lightweight + newer Agents SDK. Less opinionated; OpenAI-tied. |
dspy | Optimises prompts and pipelines automatically. Different paradigm — programmatic, not role-playing. |
| Hand-rolled loop | A while-loop + provider SDK is often enough for 1–2 agent flows. |
Common gotchas
crewaiandcrewai-toolsversion drift. Upgrade them together:pip install -U crewai crewai-tools. A stalecrewai-toolsagainst a currentcrewaiis the most common first-run import error.- Agent / Task / Crew abstraction takes practice. Confusing what belongs on the Agent (persistent role/backstory) vs the Task (one-shot description + expected output) is the most common modeling mistake. Tasks live for a single execution; agents persist across a crew run.
- YAML config vs Python config. CrewAI supports defining agents and tasks in YAML (
agents.yaml+tasks.yaml) loaded via@CrewBasedecorators, OR fully in Python. Mixing both in the same project is allowed but quickly becomes confusing — pick one. - Process modes change behaviour drastically.
Process.sequentialruns tasks in order;Process.hierarchicalintroduces a manager agent that delegates. Hierarchical needs an explicitmanager_llmconfig or it silently falls back to a default model. - Memory backend defaults to local ChromaDB. A
.chromadirectory appears in the working directory on first run. For ephemeral environments (Docker, CI), configurememory=Falseor a different backend. - Tool dependency surface is wide. Each tool in
crewai-toolscan pull in further deps (Playwright, Selenium, embedchain, PyPDF, etc.). Don'tpip install crewai-tools[all]casually — install per-tool extras instead. - Litellm middleman. Provider-specific failures (rate-limits, malformed function-calls) surface as
litellm.exceptions.*rather than the underlying SDK's exceptions. CatchExceptionbroadly or import fromlitellm.exceptions. - Async support is partial. Most of the framework is sync; mixing with FastAPI/asyncio requires
asyncio.to_thread()or running the crew in a separate worker.
Real-world recipes
CrewAI's value shines in the medium-complexity zone — 3-7 agent crews with clearly delineated roles. Below are the canonical shapes.
Recipe: sequential research → write → review
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool
search = SerperDevTool()
researcher = Agent(
role="Senior Research Analyst",
goal="Surface accurate, recent facts about a topic.",
backstory="A meticulous analyst who values primary sources.",
tools=[search],
verbose=True,
)
writer = Agent(
role="Technical Writer",
goal="Turn research notes into a clear briefing.",
backstory="A writer who explains technical topics without jargon.",
verbose=True,
)
reviewer = Agent(
role="Editorial Reviewer",
goal="Catch factual errors and unclear passages.",
backstory="An editor with an eye for sloppy claims.",
verbose=True,
)
t1 = Task(description="Research recent advances in vector databases.",
expected_output="A bulleted summary with citations.",
agent=researcher)
t2 = Task(description="Write a 400-word briefing using the research notes.",
expected_output="Polished prose, ready to publish.",
agent=writer, context=[t1])
t3 = Task(description="Review the briefing for errors and clarity issues.",
expected_output="Approved briefing or a list of fixes.",
agent=reviewer, context=[t2])
crew = Crew(agents=[researcher, writer, reviewer], tasks=[t1, t2, t3], process=Process.sequential)
print(crew.kickoff())
Output: task chain produces a reviewed briefing with citations.
Recipe: hierarchical crew with manager LLM
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
manager_llm = ChatOpenAI(model="gpt-4o-mini")
crew = Crew(
agents=[researcher, writer, reviewer],
tasks=[Task(description="Produce a publication-ready briefing about pgvector.",
expected_output="Reviewed, polished briefing.")],
process=Process.hierarchical,
manager_llm=manager_llm,
)
print(crew.kickoff())
Output: the manager LLM delegates sub-tasks; the explicit task list is replaced by a top-level goal.
Recipe: YAML-driven crew (@CrewBase)
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
@CrewBase
class ResearchCrew:
agents_config = "config/agents.yaml"
tasks_config = "config/tasks.yaml"
@agent
def researcher(self) -> Agent:
return Agent(config=self.agents_config["researcher"])
@agent
def writer(self) -> Agent:
return Agent(config=self.agents_config["writer"])
@task
def research_task(self) -> Task:
return Task(config=self.tasks_config["research"], agent=self.researcher())
@crew
def crew(self) -> Crew:
return Crew(agents=self.agents, tasks=self.tasks, process=Process.sequential)
Output: declarative config separated from Python wiring — useful for non-developers tweaking prompts.
Recipe: custom tool integration
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
class SQLInput(BaseModel):
query: str = Field(description="A SELECT query")
class SQLTool(BaseTool):
name: str = "SQL Query"
description: str = "Run a SELECT against the analytics DB."
args_schema: type = SQLInput
def _run(self, query: str) -> str:
# connect, run, return text
return run_select(query)
analyst = Agent(role="Data Analyst", goal="Answer questions with SQL.", tools=[SQLTool()])
Output: typed tool with Pydantic args; agent now has SQL capability.
Recipe: flow-based stateful workflow
from crewai.flow.flow import Flow, listen, start
class ReleaseFlow(Flow):
@start()
def collect(self):
return {"commits": fetch_commits()}
@listen(collect)
def categorise(self, ctx):
return crew_categorise.kickoff(inputs=ctx)
@listen(categorise)
def write(self, categories):
return crew_write.kickoff(inputs={"categories": categories})
ReleaseFlow().kickoff()
Output: branching, stateful workflow combining multiple crews with explicit data flow.
Cost & rate-limit management
Like AutoGen, crews are cost-multipliers — every task is at least one LLM call, often several, and tools add their own.
- Set
max_iteron every agent. Each agent's per-task iteration ceiling. Without it, an agent can loop calling tools until the context window fills. - Smaller model for tool-heavy agents. A simple "fetch and summarise" role rarely needs a flagship.
Process.sequentialis cheaper thanhierarchical. Hierarchical adds a manager LLM and extra coordination calls.- Disable memory unless you need it.
Crew(memory=False)skips embedding/storage costs. - Cache embeddings. The default ChromaDB memory backend embeds every observation; reuse embeddings across runs by pointing at a persistent path.
- Cap context length. Long backstories and verbose tools inflate every call. Trim ruthlessly.
- LiteLLM-side cost tracking. Since CrewAI runs LLM calls through LiteLLM, point LiteLLM at a proxy that logs costs by team/project.
- Streaming saves perceived latency. With
step_callback, surface progress as agents work.
Version migration guide
CrewAI is pre-1.0 with frequent minor releases. The API has stabilised around Agent / Task / Crew / Process / Flow, but renames and option changes happen.
| Roughly | What tends to change |
|---|---|
Early 0.x | LangChain-coupled internals; llm=ChatOpenAI(...) was the way to configure models. |
Mid 0.x | Migration toward litellm for provider abstraction. llm="gpt-4o-mini" (string) became the canonical form. |
Recent 0.x | Flow API added for stateful workflows. YAML-driven @CrewBase decorators became the recommended pattern for larger projects. |
Migration discipline:
- Upgrade
crewaiandcrewai-toolstogether. Version skew between the two is the most common breakage source. - Replace LangChain LLM instantiation with strings.
Agent(..., llm="gpt-4o-mini")is the modern form;Agent(..., llm=ChatOpenAI(...))works but couples to LangChain. - Re-check tool import paths.
crewai-toolsreorganises namespaces between minors; pin a known-good version. memory=Trueschema may shift. If you persisted memory across versions, rebuild it after a major bump.- Hedge: exact symbol moves and signature tweaks across
0.xreleases are best confirmed against the project's CHANGELOG — pre-1.0 churn is the norm.
Troubleshooting common errors
ImportErroron first run after upgrade. Stalecrewai-tools.pip install -U crewai crewai-tools.Process.hierarchicalsilently uses a default model. Nomanager_llmwas supplied. Pass one explicitly.- Agent loops calling the same tool. No
max_iteron the agent, or the tool result is too vague. Setmax_iter=10and tighten tool descriptions. KeyError: 'memory'—Crewconfiguration mismatch with the installed version. Pin bothcrewaiandcrewai-toolsto known-good versions..chromadirectory appearing. Default ChromaDB memory backend creates one in cwd. Setmemory=Falseor point to a configured path.litellm.exceptions.RateLimitError— provider throttled. Addtenacityretry or use LiteLLM's built-in retry policy.- Tool not called. Tool descriptions are critical — agents call tools whose descriptions semantically match. Sharpen the wording.
asyncioevent loop conflicts. CrewAI is sync; wrap inasyncio.to_thread(crew.kickoff)from async handlers.
Performance tuning
CrewAI runs are sequential by default — performance comes from doing fewer LLM calls, not faster ones.
- Crisp task descriptions. Vague tasks cause iterative tool calls. Sharp
expected_outputshapes cut iterations. max_iterbounds. Always set explicitly per agent. The default ceiling is generous; bring it down for cost-sensitive workloads.- Trim backstories. Backstories are prepended to every call; long backstories inflate every prompt.
Process.sequentialoverProcess.hierarchicalunless you genuinely need delegation. Hierarchical adds a manager LLM that costs both money and time.- Disable memory unless used. ChromaDB embedding on every observation is non-trivial latency.
- Cache memory backend. Pre-warm embeddings; use persistent ChromaDB storage so re-runs hit cache.
- Stream progress via
step_callbackfor user-facing progress indicators — perceived latency improves even when wall-clock doesn't. - Parallel tasks via Flow. Crews are sequential; if you have independent crews, run them concurrently via
Flowbranches.
Production deployment
Crews are best deployed as background workers, not synchronous request handlers — runs can take minutes.
- Queue + worker pattern. Receive task on a queue, run the crew, persist the result. Don't tie up HTTP threads.
- Stateless workers. Pass all context as task inputs; avoid in-process state across runs.
- Cap run time. Use
max_iterper agent + a wall-clock timeout viasignalor a supervisor — runaway crews can spend hundreds of dollars before you notice. - Memory persistence. If using memory, point the storage backend at durable disk (S3-backed FUSE, persistent volume) rather than container ephemeral disk.
- Logging. Set
verbose=Trueduring development; switch to structured logging via astep_callbackin production. - Container shape.
crewaipulls heavy transitive deps (LangChain, ChromaDB, instructor) — expect a ~500 MB Python image with all extras. - Healthcheck. A minimal crew with a no-op task validates that imports work and provider keys are valid.
Security considerations
- Tool calls are
eval()for the model. Treat each tool increwai-toolslike a privileged operation. Sandbox code-interpreter tools (Docker) and allowlist file-system tools. - Prompt injection through tools. Web-scraping tools return attacker-controlled content. Sanitise or wrap in a system prompt that says tool output is data, not instructions.
- Memory persistence is data persistence. If memory backs onto ChromaDB, treat the
.chromadirectory like the database it is — encrypt at rest, control access. - LiteLLM middleman. Errors surface as
litellm.exceptions.*rather than provider-native types — catch broadly and log raw responses if debugging. - Secrets in backstories. Agent backstories are part of every prompt; never embed credentials.
- Crew output validation. Models can hallucinate plausible but wrong outputs; for high-stakes use cases, route final output through a deterministic validator.
- Multi-tenant isolation. Each tenant should run in its own process with scoped credentials; sharing a Python interpreter across tenants risks accidental cross-leakage.
Multi-provider patterns
CrewAI uses LiteLLM under the hood, so multi-provider is built-in.
- Per-agent provider.
Agent(..., llm="claude-sonnet-4-6")for one agent,llm="gpt-4o-mini"for another. LiteLLM resolves to the right provider. - Custom LiteLLM config. Set
LITELLM_*env vars or use a config file to specify model aliases, fallback chains, and rate limits. - LiteLLM proxy. Point CrewAI at a self-hosted LiteLLM proxy for centralised quota and logging.
- Tokenizer parity. LiteLLM handles tokenizer differences — use its
litellm.token_counter(...)for cross-provider budgeting.
Evaluation & observability
CrewAI runs are sequential, multi-step, and easy to debug only with traces.
langsmithvia LangChain callbacks. Every model call gets traced; the trace tree reflects agent → task → tool nesting.agentops— CrewAI ships with built-in AgentOps observability viacrewai[agentops]. Enable for trace UI focused on multi-agent workflows.- Per-task metrics. Track tokens, latency, and cost per task; identify the bottleneck task.
- End-to-end metrics. Track success/failure outcomes for entire crew runs against a gold dataset.
- Trajectory replay. Save the full
crew.usage_metricsand message history for failed runs.
Ecosystem integrations
| Layer | Integrations |
|---|---|
| Tools | crewai-tools ships dozens — search (Serper, Brave), web (Playwright, requests), files (PDF, CSV), code (Python REPL via interpreter tools), and DB connectors. |
| Providers | Anything litellm supports — OpenAI, Anthropic, Gemini, Mistral, Cohere, Bedrock, Ollama, vLLM. |
| Memory | ChromaDB (default), Pinecone, Weaviate, Qdrant via crewai-tools / direct integration. |
| Observability | agentops, langsmith, custom callbacks via step_callback. |
| Structured output | instructor-based output parsing on Task definitions (output_pydantic). |
| Hosted | "CrewAI Enterprise" runs the same framework managed; the open package is identical. |
| CLI | crewai create, crewai run, crewai test for project scaffolding. |
When NOT to use this
- Single-agent tasks. A direct provider SDK + a loop is simpler.
- Deterministic, non-LLM workflows. Airflow, Prefect, or Dagster are better fits — CrewAI is overkill for cron-like pipelines.
- You need low-level message-passing control. AutoGen
autogen-coreand LangGraph give more control over per-message routing. - High-frequency, latency-sensitive interactions. Crew runs are slow (minutes); for sub-second user-facing flows, this is the wrong tool.
- You want optimised prompts. DSPy treats prompts as optimisable; CrewAI's prompts are templates baked into agent definitions.
- Programmatic orchestration with hard guarantees. Reach for explicit graph frameworks (LangGraph) where state transitions are enforced rather than emergent.
See also
- AI: CrewAI — agents, tasks, crews, flows
- Packages: pip-autogen — alternative multi-agent framework
- Concept: agents — agent loop fundamentals
- Concept: api — agent + tool registration patterns