cheat sheet

pycparser

Package-level reference for pycparser on PyPI — install, AST walking, fake stdlib headers, and use as a cffi dependency.

#pip#package#binding#cupdated 05-31-2026

pycparser

What it is

pycparser is a complete C99 parser written entirely in Python by Eli Bendersky. It produces a fully traversable AST from preprocessed C source, with no external dependencies (no libclang, no compiler invocation). It is the parsing engine inside cffi — every ffi.cdef(...) call ends up in pycparser — and it is occasionally used directly for static analysis, header munging, or code generation against C interfaces.

Reach for pycparser when you need to: read a C header into a structured form for code generation, transform a small C source, or build tooling that emits language bindings without depending on a Clang install. For full-fidelity parsing of modern C (C11/C17/C23 features, GCC extensions), libclang (clang.cindex) is the better choice.

Install

bash
pip install pycparser

Output: (none — exits 0 on success)

bash
uv add pycparser

Output: resolved + added to pyproject.toml

bash
poetry add pycparser

Output: updated lockfile + virtualenv install

Pure-Python wheel — works on every Python platform without a build step. You usually do NOT install this directly; it arrives as a transitive dependency of cffi.

Versioning & Python support

  • Current line is the 2.21.x / 2.22.x series in 2025-26.
  • Supports Python 3.8+ on recent releases.
  • The grammar (PLY-generated) is stable; releases mostly add bug fixes and edge-case fidelity. Expect a slow release cadence — months to a year between bumps.
  • Stable API; pinning to a minor (e.g. pycparser>=2.21,<2.23) is plenty.

Package metadata

  • Maintainer: Eli Bendersky
  • Project home: github.com/eliben/pycparser
  • Docs: README + examples/ in the repo
  • PyPI: pypi.org/project/pycparser
  • License: BSD-3-Clause
  • Governance: single-maintainer; long-term stable
  • First released: 2010
  • Downloads: hundreds of millions per month (transitive of cffi)

Optional dependencies & extras

pycparser has no PyPI extras and no runtime dependencies. The historical dep on ply was vendored in; the package is fully self-contained.

For preprocessing real-world headers, you typically need an external preprocessor — either install cpp (from GCC) or run clang -E -P yourself. pycparser ships a directory of "fake" stdlib headers (utils/fake_libc_include/) that contain empty stubs for <stdio.h>, <stdlib.h>, etc., letting you parse code that includes the C standard library without needing real system headers.

Alternatives

PackageTrade-off
clang.cindex (libclang Python bindings)Full C/C++ support, preprocessor included; needs libclang shared lib installed.
tree-sitter-c (tree-sitter Python bindings)Fast incremental parser; less semantic info than libclang.
lark + a custom C grammarBuild-your-own; only justified for teaching or research.
Direct regex on headersFast but fragile; only OK for trivial bindings.
swigGenerates bindings directly; less flexible than parsing yourself.

Common gotchas

  1. pycparser does NOT preprocess. #include, #define, #ifdef must be expanded before parsing. Run cpp -E (or clang -E) and feed the output to parse_file.
  2. It parses C99 only. GCC extensions (__attribute__((...)), __asm__, statement expressions, nested functions) and most C11+ features fail. Strip them with sed or skip the bindings.
  3. Anonymous structs/unions are partially supported. Nested anonymous structs work; field flattening behavior matches the C99 spec, not GCC's extension.
  4. typedef ordering matters. Forward typedefs must appear before use. The parser is single-pass.
  5. The PLY-generated tables (lextab.py, yacctab.py) regenerate on first import if missing. Pre-generate them in production builds to avoid first-import latency.
  6. Error messages are PLY-level, not Clang-level — they say "syntax error at line N" with limited context. Diff your preprocessed input against a known-good version to localize.
  7. generate_ast() round-trips, but loses comments and whitespace. It's a source-to-source generator only for syntactic round-trips.
  8. No semantic analysis. pycparser doesn't know that int x = "string" is a type error; it parses and you walk the AST yourself.

Real-world recipes

The recipes below cover parsing a header, walking the AST, regenerating source, faking the stdlib, and using it as a cffi helper — the patterns that come up when wrapping a C library.

Recipe 1 — Parse a C header to AST.

python
from pycparser import parse_file

ast = parse_file("api.h", use_cpp=True, cpp_args=["-E", "-Iinclude"])
ast.show()  # pretty-prints the tree

Output: an ast.FileAST whose ext list contains every top-level declaration. show() writes the structure to stdout for human inspection.

Recipe 2 — Walk the AST with NodeVisitor to enumerate function decls.

python
from pycparser import c_ast, parse_file

class FuncDeclVisitor(c_ast.NodeVisitor):
    def visit_FuncDecl(self, node):
        name = node.type.declname
        rtype = " ".join(node.type.type.names) if hasattr(node.type, "type") else "?"
        print(f"{rtype} {name}(...)")

ast = parse_file("api.h", use_cpp=True)
FuncDeclVisitor().visit(ast)

Output: one line per function declaration — useful for generating wrappers.

Recipe 3 — Round-trip AST back to source.

python
from pycparser import parse_file
from pycparser.c_generator import CGenerator

ast = parse_file("input.c", use_cpp=True)
gen = CGenerator()
print(gen.visit(ast))

Output: reformatted C source — equivalent semantics, normalized whitespace. Useful for code-gen tools that emit modified headers.

Recipe 4 — Use the bundled fake stdlib so headers including <stdio.h> parse.

python
import pycparser, os

fake_libc = os.path.join(os.path.dirname(pycparser.__file__), "..", "utils", "fake_libc_include")
ast = pycparser.parse_file(
    "api.h",
    use_cpp=True,
    cpp_args=["-E", f"-I{fake_libc}"],
)

Output: AST built from api.h with the real <stdio.h> etc. replaced by empty stubs — no need for system headers.

Recipe 5 — Generate cffi.cdef strings from a header.

python
from pycparser import parse_file
from pycparser.c_generator import CGenerator

ast = parse_file("api.h", use_cpp=True, cpp_args=["-E"])
gen = CGenerator()

cdef_lines = []
for ext in ast.ext:
    if isinstance(ext, (c_ast.Decl, c_ast.Typedef)):
        cdef_lines.append(gen.visit(ext) + ";")
print("\n".join(cdef_lines))

Output: body suitable to pass to cffi.FFI().cdef(...). This is essentially what cffi does internally when you feed it a header.

Production deployment notes

  • Regenerate PLY tables at build time. Ship lextab.py and yacctab.py precompiled so the first import in a deployed container doesn't trigger a regenerate. Common pattern: import pycparser in your wheel-build step.
  • Don't run preprocessing in production paths. Parse once during build, ship the resulting AST (pickle or codegen output), and load that at runtime.
  • Pin to a known minor. The grammar is stable but minor bug fixes occasionally change which corner-case constructs parse.
  • Vendor fake_libc_include into your repo if you depend on it — it's part of the pycparser sdist but not always present after wheel-only installs.

Performance tuning

  • Parsing is slow. Tens of MB/s of C source on a modern CPU; not designed for hot paths.
  • Cache parsed ASTs. Use pickle if you parse the same header repeatedly; the AST is a simple object graph.
  • Compose narrowly. Don't parse_file an entire SDK; parse just the headers you wrap.
  • Use parse_string for in-memory snippets to avoid the file-IO cost.
  • Precompile PLY tables. First parse is much slower than subsequent ones — warm the parser at build/CI time.

Version migration guide

  • 2.18 → 2.19 — Python 2 support dropped.
  • 2.20 → 2.21 — minor grammar fixes for _Alignas, _Generic (C11 keywords parsed but not fully analyzed).
  • 2.21 → 2.22 — packaging modernization (PEP 517 build); no API change.
  • No removals expected in foreseeable releases.
python
# Old (pre-2.20): manual lexer reset between parses
parser = CParser()
parser.parse(src1)
parser.parse(src2)  # could leak state

# Current: construct a fresh parser per parse for safety
from pycparser.c_parser import CParser
CParser().parse(src1)
CParser().parse(src2)

Output: clean state per parse; matters when parsing files that redefine typedef names.

Security considerations

  • C source can be arbitrarily large. Don't parse_file a user-uploaded header without size limits — pycparser's memory use is roughly proportional to AST node count.
  • The preprocessor is your sandbox boundary. cpp_args=["-Iuser-controlled-dir"] lets a caller pull headers from anywhere. Validate include paths.
  • No code execution. pycparser only parses; it does NOT evaluate #define macros that look executable.
  • No CVE history of note. As a pure-Python parser with no I/O of its own, attack surface is minimal — but the preprocessor you pair it with (cpp/clang) is its own concern.

Testing & CI integration

  • Snapshot the AST output (ast.show() to a string) for representative headers; diff in CI to catch unintended grammar changes.
  • Run pycparser tests against real headers from libraries you ship bindings for; this catches preprocessor surprises early.
  • Use pytest.mark.parametrize over a set of known-good and known-failing snippets.
python
import pycparser, io, pytest

def test_parse_basic_decl():
    src = "int x; int add(int a, int b);"
    ast = pycparser.CParser().parse(src)
    assert len(ast.ext) == 2

def test_unknown_attr_rejected():
    src = "int __attribute__((unused)) x;"  # GCC extension
    with pytest.raises(pycparser.plyparser.ParseError):
        pycparser.CParser().parse(src)

Output: both tests pass; documents the C99-strict behavior.

Ecosystem integrations

  • cffi — pycparser is the C parser inside cdef().
  • pyclibrary — older library binding system that uses pycparser.
  • gccxml / castxml — alternative XML-based dumpers if you need full C++.
  • autopxd2 — generates Cython .pxd files from C headers using pycparser.
  • Custom code generators — anyone writing a "headers → bindings" tool tends to reach for pycparser unless they need C++ or libclang's semantic analysis.

Compatibility matrix

Pythonpycparser lineNotes
3.72.20 and earlierDropped.
3.82.21+Current floor.
3.92.21+Supported.
3.102.21+Supported.
3.112.21+Supported.
3.122.21+Supported.
3.132.22+Supported.

Troubleshooting common errors

Error / SymptomLikely causeFix
ParseError: ... before: '__attribute__'GCC extension in headerStrip with sed -e 's/__attribute__((.*))//' before parsing, or use libclang.
Could not find cppNo system preprocessorInstall GCC or use cpp_path="clang", cpp_args=["-E"].
fatal error: stdio.h: No such file during preprocessReal stdlib headers missingUse the bundled fake_libc_include/.
Slow first importPLY table regenerationPrecompile tables at build time.
ParseError on a typedef'd name later used as a typeSingle-pass parsing limitationEnsure typedefs precede uses.
Crash with very deep nestingPython recursion limitsys.setrecursionlimit(10000).
Empty AST returnedparse_file got an empty preprocessed stringCheck cpp_args; missing include path drops everything.

When NOT to use this

  • You need C++ parsing. Use libclang.
  • You need full C11/C17/C23 fidelity_Generic arms with complex deduction, atomics with full type info — use libclang.
  • You don't actually need an AST. For "extract function signatures from a header", regex + sanity checks may be enough.
  • You need preprocessor-aware analysis (e.g., reading #define constants symbolically). pycparser only sees post-preprocessor source.
  • Performance-critical hot path. Parsing big SDK headers takes seconds; cache the result.

Worked example: extract function signatures into JSON

A common direct use case — read a header, emit a JSON catalogue of function signatures for downstream tooling (binding generators, doc generators, lint rules).

Step 1 — minimal NodeVisitor that collects FuncDecls.

python
import json
from pycparser import c_ast, parse_file
from pycparser.c_generator import CGenerator

class FuncCatalog(c_ast.NodeVisitor):
    def __init__(self):
        self.entries = []
        self.gen = CGenerator()

    def visit_FuncDecl(self, node):
        name = node.type.declname if hasattr(node.type, "declname") else None
        if not name:
            return
        return_type = self.gen.visit(node.type.type)
        params = []
        if node.args:
            for p in node.args.params:
                if isinstance(p, c_ast.Typename):
                    params.append({"type": self.gen.visit(p), "name": None})
                else:
                    params.append({"type": self.gen.visit(p.type), "name": p.name})
        self.entries.append({"name": name, "return": return_type, "params": params})

Output: a list of dicts describing each function — copy-paste into a doc generator or binding scaffold.

Step 2 — run the visitor against a preprocessed header.

python
ast = parse_file(
    "api.h", use_cpp=True,
    cpp_args=["-E", "-Iinclude", "-Ifake_libc"]
)
cat = FuncCatalog()
cat.visit(ast)
print(json.dumps(cat.entries, indent=2)[:500])

Output: JSON like [{"name": "do_thing", "return": "int", "params": [{"type": "const char *", "name": "input"}]}, ...].

Step 3 — feed the JSON to a binding generator.

python
TEMPLATE = '''def {name}({args}) -> {ret}:
    return lib.{name}({call})
'''
for f in cat.entries:
    args = ", ".join(p["name"] or f"a{i}" for i, p in enumerate(f["params"]))
    call = args
    print(TEMPLATE.format(name=f["name"], args=args, ret=f["return"], call=call))

Output: Python wrapper stubs you can then refine. Real production codegen would map C types to Python types (char *bytes, int * → ctypes pointer), but this skeleton is the starting line.

Step 4 — verify on a known header.

python
# Run against /usr/include/zlib.h (after preprocess) and check zlibVersion appears.
assert any(e["name"] == "zlibVersion" for e in cat.entries)

Output: sanity check — a known function survives the parse + visit pipeline.

FAQ

Q: How do I parse a string instead of a file? A: from pycparser import CParser; CParser().parse(src_string) — bypasses the preprocessor. You must hand-strip #include and #define first.

Q: What's the difference between c_ast.Typename and c_ast.Decl? A: Typename represents a type without a name (e.g. const char * in a parameter where the name was omitted). Decl carries both a type and a name. Visit both in code that walks function parameters.

Q: How do I handle __attribute__((...)) from GCC? A: Strip it before parsing. Common one-liner:

css
sed -E 's/__attribute__\(\(.*\)\)//g' input.h > stripped.h

Or use pcpp (a pure-Python preprocessor) which knows to ignore these.

Q: Can pycparser parse C++? A: No. C++ has features (templates, namespaces, classes, references) far outside C99. Use libclang for C++.

Q: Why does my parse fail on a header that compiles fine? A: Almost always one of: (1) GCC extension you didn't strip, (2) macro that didn't expand because preprocessor wasn't run, (3) typedef-before-use rule violated by an #include chain. Run cpp -E manually and inspect the output.

Q: How big a header can pycparser handle? A: Tens of thousands of lines is fine; memory and time scale roughly linearly. Beyond that, parse in segments or switch to libclang.

AST node types you'll actually use

A small but high-value subset of c_ast nodes covers most real-world traversal needs. Cheat-sheet:

  • c_ast.FuncDecl — function declaration; node.type.declname gives the name, node.args the parameter list.
  • c_ast.FuncDef — function definition (header + body); contains a decl (the FuncDecl) and a body (compound statement).
  • c_ast.Decl — generic declaration; carries name, type, init, quals (qualifiers like const).
  • c_ast.Typedef — type aliases.
  • c_ast.Struct / c_ast.Union / c_ast.Enum — composite types. decls/values lists their members.
  • c_ast.PtrDecl — pointer wrapper; chain through node.type to reach the pointed-at type.
  • c_ast.ArrayDecl — array; node.dim is the size expression.
  • c_ast.IdentifierType — primitive types; node.names is a list like ["unsigned", "int"].
  • c_ast.TypeDecl — typed name wrapper around primitives/structs.

The pattern for traversal: subclass c_ast.NodeVisitor, define visit_<Class> methods for each type you care about, and call self.generic_visit(node) to continue walking children. Use pycparser.c_generator.CGenerator().visit(node) to turn any subtree back into source — useful for emitting wrappers from the AST.

python
# Quick sanity check: dump every Decl's type as C source.
from pycparser.c_generator import CGenerator
gen = CGenerator()
for decl in ast.ext:
    if hasattr(decl, "type"):
        print(decl.name, "→", gen.visit(decl.type))

Output: globalCount → int, compute → int (*)(const char *), etc. — fast way to introspect what a header declares.

Real-world example: parsing a Linux kernel uAPI header

The kernel's uapi headers are an interesting stress test — large, depend on subtle preprocessor expansion, and use GCC-isms. The recipe that usually works:

bash
# 1. Strip GCC extensions
sed -E 's/__attribute__\s*\(\(.*\)\)//g; s/__extension__//g' \
    /usr/include/linux/limits.h > /tmp/limits.h

# 2. Preprocess with the fake stdlib
cpp -nostdinc -E -I/tmp -I"$(python -c 'import pycparser, os; print(os.path.join(os.path.dirname(pycparser.__file__), "..", "utils", "fake_libc_include"))')" \
    /tmp/limits.h > /tmp/limits.i

# 3. Parse the result
python -c 'import pycparser; pycparser.CParser().parse(open("/tmp/limits.i").read()); print("ok")'

Output: ok — and a usable AST. The general lesson: pycparser is a pure parser; the prep work to feed it valid C99 is where 90 % of the engineering happens. For kernel headers in particular, prefer libclang when you can.

See also