cheat sheet

crewai

Package-level reference for the crewai library on PyPI plus the crewai-tools companion — install, versioning, and multi-agent alternatives.

crewai

What it is

crewai is a Python framework for orchestrating role-playing LLM agents that collaborate on tasks. Its core abstractions are Agent (a role with a goal, backstory, and toolset), Task (a unit of work assigned to an agent), and Crew (a team of agents executing tasks in sequence or hierarchy). The framework also exposes a Flow API for stateful branching workflows.

CrewAI's appeal is the deliberately opinionated, YAML-friendly mental model — define agents and tasks in configuration files, then assemble a crew with a few lines of Python. It is the most-cited "easy on-ramp" multi-agent framework, with a heavy community presence in 2024–2025.

Install

bash
pip install crewai

Output: installs the core framework

bash
pip install "crewai[tools]"

Output: installs crewai-tools companion — file readers, web scrapers, search tools, etc.

bash
uv add crewai crewai-tools

Output: dependencies resolved + added to pyproject.toml

bash
poetry add crewai

Output: updated lockfile + virtualenv install

Versioning & Python support

  • Current line is 0.x (as of late 2025), with frequent minor releases. Pre-1.0 — minor bumps occasionally rename agent / task / process options. Pin tight in production.
  • Python 3.10+ on current releases.
  • crewai and crewai-tools are released independently and must move together — a fresh crewai paired with an old crewai-tools is the #1 source of ImportError on first run.
  • Built on top of langchain / litellm under the hood for LLM-provider abstraction — pulls in a substantial dep tree.
  • The framework has moved from LangChain-coupled internals toward litellm-based provider abstraction; check release notes when upgrading across minor versions.

Package metadata

  • Maintainer: CrewAI Inc. (commercial sponsor) + open-source community
  • Project home: github.com/crewAIInc/crewAI
  • Tools repo: github.com/crewAIInc/crewAI-tools
  • Docs: docs.crewai.com
  • PyPI: pypi.org/project/crewai
  • License: MIT
  • Governance: commercial-backed open source; hosted "CrewAI Enterprise" runs the same framework as the free package
  • First released: late 2023
  • Downloads: millions per month

Optional dependencies & extras

ExtraPurpose
crewai[tools]Pulls in the crewai-tools companion package
crewai[embeddings]Embedding-provider deps for crew memory
crewai[agentops]Built-in AgentOps observability hookup

The companion crewai-tools package is what most projects actually want — it carries dozens of pre-built tools: FileReadTool, DirectoryReadTool, SerperDevTool (web search), WebsiteSearchTool, PDFSearchTool, CodeInterpreterTool, etc.

Heavy transitive deps include:

  • langchain-core, langchain-openai, langchain-community (for some tools)
  • litellm — multi-provider LLM client
  • chromadb — default memory backend
  • pydantic
  • instructor — structured output

Alternatives

PackageTrade-off
autogen-agentchatMicrosoft's multi-agent framework. Lower-level message passing; less opinionated.
langgraphStateful graph-based agents from LangChain. Finer control over state; more verbose.
llama-index (agents)RAG-first agent stack; tightly integrated with indexes.
swarm / openai-agentsOpenAI's lightweight + newer Agents SDK. Less opinionated; OpenAI-tied.
dspyOptimises prompts and pipelines automatically. Different paradigm — programmatic, not role-playing.
Hand-rolled loopA while-loop + provider SDK is often enough for 1–2 agent flows.

Common gotchas

  1. crewai and crewai-tools version drift. Upgrade them together: pip install -U crewai crewai-tools. A stale crewai-tools against a current crewai is the most common first-run import error.
  2. Agent / Task / Crew abstraction takes practice. Confusing what belongs on the Agent (persistent role/backstory) vs the Task (one-shot description + expected output) is the most common modeling mistake. Tasks live for a single execution; agents persist across a crew run.
  3. YAML config vs Python config. CrewAI supports defining agents and tasks in YAML (agents.yaml + tasks.yaml) loaded via @CrewBase decorators, OR fully in Python. Mixing both in the same project is allowed but quickly becomes confusing — pick one.
  4. Process modes change behaviour drastically. Process.sequential runs tasks in order; Process.hierarchical introduces a manager agent that delegates. Hierarchical needs an explicit manager_llm config or it silently falls back to a default model.
  5. Memory backend defaults to local ChromaDB. A .chroma directory appears in the working directory on first run. For ephemeral environments (Docker, CI), configure memory=False or a different backend.
  6. Tool dependency surface is wide. Each tool in crewai-tools can pull in further deps (Playwright, Selenium, embedchain, PyPDF, etc.). Don't pip install crewai-tools[all] casually — install per-tool extras instead.
  7. Litellm middleman. Provider-specific failures (rate-limits, malformed function-calls) surface as litellm.exceptions.* rather than the underlying SDK's exceptions. Catch Exception broadly or import from litellm.exceptions.
  8. Async support is partial. Most of the framework is sync; mixing with FastAPI/asyncio requires asyncio.to_thread() or running the crew in a separate worker.

Real-world recipes

CrewAI's value shines in the medium-complexity zone — 3-7 agent crews with clearly delineated roles. Below are the canonical shapes.

Recipe: sequential research → write → review

python
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool

search = SerperDevTool()

researcher = Agent(
    role="Senior Research Analyst",
    goal="Surface accurate, recent facts about a topic.",
    backstory="A meticulous analyst who values primary sources.",
    tools=[search],
    verbose=True,
)
writer = Agent(
    role="Technical Writer",
    goal="Turn research notes into a clear briefing.",
    backstory="A writer who explains technical topics without jargon.",
    verbose=True,
)
reviewer = Agent(
    role="Editorial Reviewer",
    goal="Catch factual errors and unclear passages.",
    backstory="An editor with an eye for sloppy claims.",
    verbose=True,
)

t1 = Task(description="Research recent advances in vector databases.",
          expected_output="A bulleted summary with citations.",
          agent=researcher)
t2 = Task(description="Write a 400-word briefing using the research notes.",
          expected_output="Polished prose, ready to publish.",
          agent=writer, context=[t1])
t3 = Task(description="Review the briefing for errors and clarity issues.",
          expected_output="Approved briefing or a list of fixes.",
          agent=reviewer, context=[t2])

crew = Crew(agents=[researcher, writer, reviewer], tasks=[t1, t2, t3], process=Process.sequential)
print(crew.kickoff())

Output: task chain produces a reviewed briefing with citations.

Recipe: hierarchical crew with manager LLM

python
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI

manager_llm = ChatOpenAI(model="gpt-4o-mini")

crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[Task(description="Produce a publication-ready briefing about pgvector.",
                expected_output="Reviewed, polished briefing.")],
    process=Process.hierarchical,
    manager_llm=manager_llm,
)
print(crew.kickoff())

Output: the manager LLM delegates sub-tasks; the explicit task list is replaced by a top-level goal.

Recipe: YAML-driven crew (@CrewBase)

python
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task

@CrewBase
class ResearchCrew:
    agents_config = "config/agents.yaml"
    tasks_config  = "config/tasks.yaml"

    @agent
    def researcher(self) -> Agent:
        return Agent(config=self.agents_config["researcher"])

    @agent
    def writer(self) -> Agent:
        return Agent(config=self.agents_config["writer"])

    @task
    def research_task(self) -> Task:
        return Task(config=self.tasks_config["research"], agent=self.researcher())

    @crew
    def crew(self) -> Crew:
        return Crew(agents=self.agents, tasks=self.tasks, process=Process.sequential)

Output: declarative config separated from Python wiring — useful for non-developers tweaking prompts.

Recipe: custom tool integration

python
from crewai.tools import BaseTool
from pydantic import BaseModel, Field

class SQLInput(BaseModel):
    query: str = Field(description="A SELECT query")

class SQLTool(BaseTool):
    name: str = "SQL Query"
    description: str = "Run a SELECT against the analytics DB."
    args_schema: type = SQLInput
    def _run(self, query: str) -> str:
        # connect, run, return text
        return run_select(query)

analyst = Agent(role="Data Analyst", goal="Answer questions with SQL.", tools=[SQLTool()])

Output: typed tool with Pydantic args; agent now has SQL capability.

Recipe: flow-based stateful workflow

python
from crewai.flow.flow import Flow, listen, start

class ReleaseFlow(Flow):
    @start()
    def collect(self):
        return {"commits": fetch_commits()}

    @listen(collect)
    def categorise(self, ctx):
        return crew_categorise.kickoff(inputs=ctx)

    @listen(categorise)
    def write(self, categories):
        return crew_write.kickoff(inputs={"categories": categories})

ReleaseFlow().kickoff()

Output: branching, stateful workflow combining multiple crews with explicit data flow.

Cost & rate-limit management

Like AutoGen, crews are cost-multipliers — every task is at least one LLM call, often several, and tools add their own.

  • Set max_iter on every agent. Each agent's per-task iteration ceiling. Without it, an agent can loop calling tools until the context window fills.
  • Smaller model for tool-heavy agents. A simple "fetch and summarise" role rarely needs a flagship.
  • Process.sequential is cheaper than hierarchical. Hierarchical adds a manager LLM and extra coordination calls.
  • Disable memory unless you need it. Crew(memory=False) skips embedding/storage costs.
  • Cache embeddings. The default ChromaDB memory backend embeds every observation; reuse embeddings across runs by pointing at a persistent path.
  • Cap context length. Long backstories and verbose tools inflate every call. Trim ruthlessly.
  • LiteLLM-side cost tracking. Since CrewAI runs LLM calls through LiteLLM, point LiteLLM at a proxy that logs costs by team/project.
  • Streaming saves perceived latency. With step_callback, surface progress as agents work.

Version migration guide

CrewAI is pre-1.0 with frequent minor releases. The API has stabilised around Agent / Task / Crew / Process / Flow, but renames and option changes happen.

RoughlyWhat tends to change
Early 0.xLangChain-coupled internals; llm=ChatOpenAI(...) was the way to configure models.
Mid 0.xMigration toward litellm for provider abstraction. llm="gpt-4o-mini" (string) became the canonical form.
Recent 0.xFlow API added for stateful workflows. YAML-driven @CrewBase decorators became the recommended pattern for larger projects.

Migration discipline:

  1. Upgrade crewai and crewai-tools together. Version skew between the two is the most common breakage source.
  2. Replace LangChain LLM instantiation with strings. Agent(..., llm="gpt-4o-mini") is the modern form; Agent(..., llm=ChatOpenAI(...)) works but couples to LangChain.
  3. Re-check tool import paths. crewai-tools reorganises namespaces between minors; pin a known-good version.
  4. memory=True schema may shift. If you persisted memory across versions, rebuild it after a major bump.
  5. Hedge: exact symbol moves and signature tweaks across 0.x releases are best confirmed against the project's CHANGELOG — pre-1.0 churn is the norm.

Troubleshooting common errors

  • ImportError on first run after upgrade. Stale crewai-tools. pip install -U crewai crewai-tools.
  • Process.hierarchical silently uses a default model. No manager_llm was supplied. Pass one explicitly.
  • Agent loops calling the same tool. No max_iter on the agent, or the tool result is too vague. Set max_iter=10 and tighten tool descriptions.
  • KeyError: 'memory'Crew configuration mismatch with the installed version. Pin both crewai and crewai-tools to known-good versions.
  • .chroma directory appearing. Default ChromaDB memory backend creates one in cwd. Set memory=False or point to a configured path.
  • litellm.exceptions.RateLimitError — provider throttled. Add tenacity retry or use LiteLLM's built-in retry policy.
  • Tool not called. Tool descriptions are critical — agents call tools whose descriptions semantically match. Sharpen the wording.
  • asyncio event loop conflicts. CrewAI is sync; wrap in asyncio.to_thread(crew.kickoff) from async handlers.

Performance tuning

CrewAI runs are sequential by default — performance comes from doing fewer LLM calls, not faster ones.

  • Crisp task descriptions. Vague tasks cause iterative tool calls. Sharp expected_output shapes cut iterations.
  • max_iter bounds. Always set explicitly per agent. The default ceiling is generous; bring it down for cost-sensitive workloads.
  • Trim backstories. Backstories are prepended to every call; long backstories inflate every prompt.
  • Process.sequential over Process.hierarchical unless you genuinely need delegation. Hierarchical adds a manager LLM that costs both money and time.
  • Disable memory unless used. ChromaDB embedding on every observation is non-trivial latency.
  • Cache memory backend. Pre-warm embeddings; use persistent ChromaDB storage so re-runs hit cache.
  • Stream progress via step_callback for user-facing progress indicators — perceived latency improves even when wall-clock doesn't.
  • Parallel tasks via Flow. Crews are sequential; if you have independent crews, run them concurrently via Flow branches.

Production deployment

Crews are best deployed as background workers, not synchronous request handlers — runs can take minutes.

  • Queue + worker pattern. Receive task on a queue, run the crew, persist the result. Don't tie up HTTP threads.
  • Stateless workers. Pass all context as task inputs; avoid in-process state across runs.
  • Cap run time. Use max_iter per agent + a wall-clock timeout via signal or a supervisor — runaway crews can spend hundreds of dollars before you notice.
  • Memory persistence. If using memory, point the storage backend at durable disk (S3-backed FUSE, persistent volume) rather than container ephemeral disk.
  • Logging. Set verbose=True during development; switch to structured logging via a step_callback in production.
  • Container shape. crewai pulls heavy transitive deps (LangChain, ChromaDB, instructor) — expect a ~500 MB Python image with all extras.
  • Healthcheck. A minimal crew with a no-op task validates that imports work and provider keys are valid.

Security considerations

  • Tool calls are eval() for the model. Treat each tool in crewai-tools like a privileged operation. Sandbox code-interpreter tools (Docker) and allowlist file-system tools.
  • Prompt injection through tools. Web-scraping tools return attacker-controlled content. Sanitise or wrap in a system prompt that says tool output is data, not instructions.
  • Memory persistence is data persistence. If memory backs onto ChromaDB, treat the .chroma directory like the database it is — encrypt at rest, control access.
  • LiteLLM middleman. Errors surface as litellm.exceptions.* rather than provider-native types — catch broadly and log raw responses if debugging.
  • Secrets in backstories. Agent backstories are part of every prompt; never embed credentials.
  • Crew output validation. Models can hallucinate plausible but wrong outputs; for high-stakes use cases, route final output through a deterministic validator.
  • Multi-tenant isolation. Each tenant should run in its own process with scoped credentials; sharing a Python interpreter across tenants risks accidental cross-leakage.

Multi-provider patterns

CrewAI uses LiteLLM under the hood, so multi-provider is built-in.

  • Per-agent provider. Agent(..., llm="claude-sonnet-4-6") for one agent, llm="gpt-4o-mini" for another. LiteLLM resolves to the right provider.
  • Custom LiteLLM config. Set LITELLM_* env vars or use a config file to specify model aliases, fallback chains, and rate limits.
  • LiteLLM proxy. Point CrewAI at a self-hosted LiteLLM proxy for centralised quota and logging.
  • Tokenizer parity. LiteLLM handles tokenizer differences — use its litellm.token_counter(...) for cross-provider budgeting.

Evaluation & observability

CrewAI runs are sequential, multi-step, and easy to debug only with traces.

  • langsmith via LangChain callbacks. Every model call gets traced; the trace tree reflects agent → task → tool nesting.
  • agentops — CrewAI ships with built-in AgentOps observability via crewai[agentops]. Enable for trace UI focused on multi-agent workflows.
  • Per-task metrics. Track tokens, latency, and cost per task; identify the bottleneck task.
  • End-to-end metrics. Track success/failure outcomes for entire crew runs against a gold dataset.
  • Trajectory replay. Save the full crew.usage_metrics and message history for failed runs.

Ecosystem integrations

LayerIntegrations
Toolscrewai-tools ships dozens — search (Serper, Brave), web (Playwright, requests), files (PDF, CSV), code (Python REPL via interpreter tools), and DB connectors.
ProvidersAnything litellm supports — OpenAI, Anthropic, Gemini, Mistral, Cohere, Bedrock, Ollama, vLLM.
MemoryChromaDB (default), Pinecone, Weaviate, Qdrant via crewai-tools / direct integration.
Observabilityagentops, langsmith, custom callbacks via step_callback.
Structured outputinstructor-based output parsing on Task definitions (output_pydantic).
Hosted"CrewAI Enterprise" runs the same framework managed; the open package is identical.
CLIcrewai create, crewai run, crewai test for project scaffolding.

When NOT to use this

  • Single-agent tasks. A direct provider SDK + a loop is simpler.
  • Deterministic, non-LLM workflows. Airflow, Prefect, or Dagster are better fits — CrewAI is overkill for cron-like pipelines.
  • You need low-level message-passing control. AutoGen autogen-core and LangGraph give more control over per-message routing.
  • High-frequency, latency-sensitive interactions. Crew runs are slow (minutes); for sub-second user-facing flows, this is the wrong tool.
  • You want optimised prompts. DSPy treats prompts as optimisable; CrewAI's prompts are templates baked into agent definitions.
  • Programmatic orchestration with hard guarantees. Reach for explicit graph frameworks (LangGraph) where state transitions are enforced rather than emergent.

See also