cheat sheet
pycparser
Package-level reference for pycparser on PyPI — install, AST walking, fake stdlib headers, and use as a cffi dependency.
pycparser
What it is
pycparser is a complete C99 parser written entirely in Python by Eli Bendersky. It produces a fully traversable AST from preprocessed C source, with no external dependencies (no libclang, no compiler invocation). It is the parsing engine inside cffi — every ffi.cdef(...) call ends up in pycparser — and it is occasionally used directly for static analysis, header munging, or code generation against C interfaces.
Reach for pycparser when you need to: read a C header into a structured form for code generation, transform a small C source, or build tooling that emits language bindings without depending on a Clang install. For full-fidelity parsing of modern C (C11/C17/C23 features, GCC extensions), libclang (clang.cindex) is the better choice.
Install
pip install pycparser
Output: (none — exits 0 on success)
uv add pycparser
Output: resolved + added to pyproject.toml
poetry add pycparser
Output: updated lockfile + virtualenv install
Pure-Python wheel — works on every Python platform without a build step. You usually do NOT install this directly; it arrives as a transitive dependency of cffi.
Versioning & Python support
- Current line is the
2.21.x/2.22.xseries in 2025-26. - Supports Python 3.8+ on recent releases.
- The grammar (PLY-generated) is stable; releases mostly add bug fixes and edge-case fidelity. Expect a slow release cadence — months to a year between bumps.
- Stable API; pinning to a minor (e.g.
pycparser>=2.21,<2.23) is plenty.
Package metadata
- Maintainer: Eli Bendersky
- Project home: github.com/eliben/pycparser
- Docs: README +
examples/in the repo - PyPI: pypi.org/project/pycparser
- License: BSD-3-Clause
- Governance: single-maintainer; long-term stable
- First released: 2010
- Downloads: hundreds of millions per month (transitive of
cffi)
Optional dependencies & extras
pycparser has no PyPI extras and no runtime dependencies. The historical dep on ply was vendored in; the package is fully self-contained.
For preprocessing real-world headers, you typically need an external preprocessor — either install cpp (from GCC) or run clang -E -P yourself. pycparser ships a directory of "fake" stdlib headers (utils/fake_libc_include/) that contain empty stubs for <stdio.h>, <stdlib.h>, etc., letting you parse code that includes the C standard library without needing real system headers.
Alternatives
| Package | Trade-off |
|---|---|
clang.cindex (libclang Python bindings) | Full C/C++ support, preprocessor included; needs libclang shared lib installed. |
tree-sitter-c (tree-sitter Python bindings) | Fast incremental parser; less semantic info than libclang. |
lark + a custom C grammar | Build-your-own; only justified for teaching or research. |
| Direct regex on headers | Fast but fragile; only OK for trivial bindings. |
swig | Generates bindings directly; less flexible than parsing yourself. |
Common gotchas
pycparserdoes NOT preprocess.#include,#define,#ifdefmust be expanded before parsing. Runcpp -E(orclang -E) and feed the output toparse_file.- It parses C99 only. GCC extensions (
__attribute__((...)),__asm__, statement expressions, nested functions) and most C11+ features fail. Strip them with sed or skip the bindings. - Anonymous structs/unions are partially supported. Nested anonymous structs work; field flattening behavior matches the C99 spec, not GCC's extension.
typedefordering matters. Forward typedefs must appear before use. The parser is single-pass.- The PLY-generated tables (
lextab.py,yacctab.py) regenerate on first import if missing. Pre-generate them in production builds to avoid first-import latency. - Error messages are PLY-level, not Clang-level — they say "syntax error at line N" with limited context. Diff your preprocessed input against a known-good version to localize.
generate_ast()round-trips, but loses comments and whitespace. It's a source-to-source generator only for syntactic round-trips.- No semantic analysis.
pycparserdoesn't know thatint x = "string"is a type error; it parses and you walk the AST yourself.
Real-world recipes
The recipes below cover parsing a header, walking the AST, regenerating source, faking the stdlib, and using it as a cffi helper — the patterns that come up when wrapping a C library.
Recipe 1 — Parse a C header to AST.
from pycparser import parse_file
ast = parse_file("api.h", use_cpp=True, cpp_args=["-E", "-Iinclude"])
ast.show() # pretty-prints the tree
Output: an ast.FileAST whose ext list contains every top-level declaration. show() writes the structure to stdout for human inspection.
Recipe 2 — Walk the AST with NodeVisitor to enumerate function decls.
from pycparser import c_ast, parse_file
class FuncDeclVisitor(c_ast.NodeVisitor):
def visit_FuncDecl(self, node):
name = node.type.declname
rtype = " ".join(node.type.type.names) if hasattr(node.type, "type") else "?"
print(f"{rtype} {name}(...)")
ast = parse_file("api.h", use_cpp=True)
FuncDeclVisitor().visit(ast)
Output: one line per function declaration — useful for generating wrappers.
Recipe 3 — Round-trip AST back to source.
from pycparser import parse_file
from pycparser.c_generator import CGenerator
ast = parse_file("input.c", use_cpp=True)
gen = CGenerator()
print(gen.visit(ast))
Output: reformatted C source — equivalent semantics, normalized whitespace. Useful for code-gen tools that emit modified headers.
Recipe 4 — Use the bundled fake stdlib so headers including <stdio.h> parse.
import pycparser, os
fake_libc = os.path.join(os.path.dirname(pycparser.__file__), "..", "utils", "fake_libc_include")
ast = pycparser.parse_file(
"api.h",
use_cpp=True,
cpp_args=["-E", f"-I{fake_libc}"],
)
Output: AST built from api.h with the real <stdio.h> etc. replaced by empty stubs — no need for system headers.
Recipe 5 — Generate cffi.cdef strings from a header.
from pycparser import parse_file
from pycparser.c_generator import CGenerator
ast = parse_file("api.h", use_cpp=True, cpp_args=["-E"])
gen = CGenerator()
cdef_lines = []
for ext in ast.ext:
if isinstance(ext, (c_ast.Decl, c_ast.Typedef)):
cdef_lines.append(gen.visit(ext) + ";")
print("\n".join(cdef_lines))
Output: body suitable to pass to cffi.FFI().cdef(...). This is essentially what cffi does internally when you feed it a header.
Production deployment notes
- Regenerate PLY tables at build time. Ship
lextab.pyandyacctab.pyprecompiled so the first import in a deployed container doesn't trigger a regenerate. Common pattern: importpycparserin your wheel-build step. - Don't run preprocessing in production paths. Parse once during build, ship the resulting AST (pickle or codegen output), and load that at runtime.
- Pin to a known minor. The grammar is stable but minor bug fixes occasionally change which corner-case constructs parse.
- Vendor
fake_libc_includeinto your repo if you depend on it — it's part of the pycparser sdist but not always present after wheel-only installs.
Performance tuning
- Parsing is slow. Tens of MB/s of C source on a modern CPU; not designed for hot paths.
- Cache parsed ASTs. Use
pickleif you parse the same header repeatedly; the AST is a simple object graph. - Compose narrowly. Don't
parse_filean entire SDK; parse just the headers you wrap. - Use
parse_stringfor in-memory snippets to avoid the file-IO cost. - Precompile PLY tables. First parse is much slower than subsequent ones — warm the parser at build/CI time.
Version migration guide
2.18 → 2.19— Python 2 support dropped.2.20 → 2.21— minor grammar fixes for_Alignas,_Generic(C11 keywords parsed but not fully analyzed).2.21 → 2.22— packaging modernization (PEP 517 build); no API change.- No removals expected in foreseeable releases.
# Old (pre-2.20): manual lexer reset between parses
parser = CParser()
parser.parse(src1)
parser.parse(src2) # could leak state
# Current: construct a fresh parser per parse for safety
from pycparser.c_parser import CParser
CParser().parse(src1)
CParser().parse(src2)
Output: clean state per parse; matters when parsing files that redefine typedef names.
Security considerations
- C source can be arbitrarily large. Don't
parse_filea user-uploaded header without size limits — pycparser's memory use is roughly proportional to AST node count. - The preprocessor is your sandbox boundary.
cpp_args=["-Iuser-controlled-dir"]lets a caller pull headers from anywhere. Validate include paths. - No code execution. pycparser only parses; it does NOT evaluate
#definemacros that look executable. - No CVE history of note. As a pure-Python parser with no I/O of its own, attack surface is minimal — but the preprocessor you pair it with (cpp/clang) is its own concern.
Testing & CI integration
- Snapshot the AST output (
ast.show()to a string) for representative headers; diff in CI to catch unintended grammar changes. - Run pycparser tests against real headers from libraries you ship bindings for; this catches preprocessor surprises early.
- Use
pytest.mark.parametrizeover a set of known-good and known-failing snippets.
import pycparser, io, pytest
def test_parse_basic_decl():
src = "int x; int add(int a, int b);"
ast = pycparser.CParser().parse(src)
assert len(ast.ext) == 2
def test_unknown_attr_rejected():
src = "int __attribute__((unused)) x;" # GCC extension
with pytest.raises(pycparser.plyparser.ParseError):
pycparser.CParser().parse(src)
Output: both tests pass; documents the C99-strict behavior.
Ecosystem integrations
cffi— pycparser is the C parser insidecdef().pyclibrary— older library binding system that uses pycparser.gccxml/castxml— alternative XML-based dumpers if you need full C++.autopxd2— generates Cython.pxdfiles from C headers using pycparser.- Custom code generators — anyone writing a "headers → bindings" tool tends to reach for pycparser unless they need C++ or libclang's semantic analysis.
Compatibility matrix
| Python | pycparser line | Notes |
|---|---|---|
| 3.7 | 2.20 and earlier | Dropped. |
| 3.8 | 2.21+ | Current floor. |
| 3.9 | 2.21+ | Supported. |
| 3.10 | 2.21+ | Supported. |
| 3.11 | 2.21+ | Supported. |
| 3.12 | 2.21+ | Supported. |
| 3.13 | 2.22+ | Supported. |
Troubleshooting common errors
| Error / Symptom | Likely cause | Fix |
|---|---|---|
ParseError: ... before: '__attribute__' | GCC extension in header | Strip with sed -e 's/__attribute__((.*))//' before parsing, or use libclang. |
Could not find cpp | No system preprocessor | Install GCC or use cpp_path="clang", cpp_args=["-E"]. |
fatal error: stdio.h: No such file during preprocess | Real stdlib headers missing | Use the bundled fake_libc_include/. |
| Slow first import | PLY table regeneration | Precompile tables at build time. |
ParseError on a typedef'd name later used as a type | Single-pass parsing limitation | Ensure typedefs precede uses. |
| Crash with very deep nesting | Python recursion limit | sys.setrecursionlimit(10000). |
| Empty AST returned | parse_file got an empty preprocessed string | Check cpp_args; missing include path drops everything. |
When NOT to use this
- You need C++ parsing. Use libclang.
- You need full C11/C17/C23 fidelity —
_Genericarms with complex deduction, atomics with full type info — use libclang. - You don't actually need an AST. For "extract function signatures from a header", regex + sanity checks may be enough.
- You need preprocessor-aware analysis (e.g., reading
#defineconstants symbolically). pycparser only sees post-preprocessor source. - Performance-critical hot path. Parsing big SDK headers takes seconds; cache the result.
Worked example: extract function signatures into JSON
A common direct use case — read a header, emit a JSON catalogue of function signatures for downstream tooling (binding generators, doc generators, lint rules).
Step 1 — minimal NodeVisitor that collects FuncDecls.
import json
from pycparser import c_ast, parse_file
from pycparser.c_generator import CGenerator
class FuncCatalog(c_ast.NodeVisitor):
def __init__(self):
self.entries = []
self.gen = CGenerator()
def visit_FuncDecl(self, node):
name = node.type.declname if hasattr(node.type, "declname") else None
if not name:
return
return_type = self.gen.visit(node.type.type)
params = []
if node.args:
for p in node.args.params:
if isinstance(p, c_ast.Typename):
params.append({"type": self.gen.visit(p), "name": None})
else:
params.append({"type": self.gen.visit(p.type), "name": p.name})
self.entries.append({"name": name, "return": return_type, "params": params})
Output: a list of dicts describing each function — copy-paste into a doc generator or binding scaffold.
Step 2 — run the visitor against a preprocessed header.
ast = parse_file(
"api.h", use_cpp=True,
cpp_args=["-E", "-Iinclude", "-Ifake_libc"]
)
cat = FuncCatalog()
cat.visit(ast)
print(json.dumps(cat.entries, indent=2)[:500])
Output: JSON like [{"name": "do_thing", "return": "int", "params": [{"type": "const char *", "name": "input"}]}, ...].
Step 3 — feed the JSON to a binding generator.
TEMPLATE = '''def {name}({args}) -> {ret}:
return lib.{name}({call})
'''
for f in cat.entries:
args = ", ".join(p["name"] or f"a{i}" for i, p in enumerate(f["params"]))
call = args
print(TEMPLATE.format(name=f["name"], args=args, ret=f["return"], call=call))
Output: Python wrapper stubs you can then refine. Real production codegen would map C types to Python types (char * → bytes, int * → ctypes pointer), but this skeleton is the starting line.
Step 4 — verify on a known header.
# Run against /usr/include/zlib.h (after preprocess) and check zlibVersion appears.
assert any(e["name"] == "zlibVersion" for e in cat.entries)
Output: sanity check — a known function survives the parse + visit pipeline.
FAQ
Q: How do I parse a string instead of a file?
A: from pycparser import CParser; CParser().parse(src_string) — bypasses the preprocessor. You must hand-strip #include and #define first.
Q: What's the difference between c_ast.Typename and c_ast.Decl?
A: Typename represents a type without a name (e.g. const char * in a parameter where the name was omitted). Decl carries both a type and a name. Visit both in code that walks function parameters.
Q: How do I handle __attribute__((...)) from GCC?
A: Strip it before parsing. Common one-liner:
sed -E 's/__attribute__\(\(.*\)\)//g' input.h > stripped.h
Or use pcpp (a pure-Python preprocessor) which knows to ignore these.
Q: Can pycparser parse C++?
A: No. C++ has features (templates, namespaces, classes, references) far outside C99. Use libclang for C++.
Q: Why does my parse fail on a header that compiles fine?
A: Almost always one of: (1) GCC extension you didn't strip, (2) macro that didn't expand because preprocessor wasn't run, (3) typedef-before-use rule violated by an #include chain. Run cpp -E manually and inspect the output.
Q: How big a header can pycparser handle? A: Tens of thousands of lines is fine; memory and time scale roughly linearly. Beyond that, parse in segments or switch to libclang.
AST node types you'll actually use
A small but high-value subset of c_ast nodes covers most real-world traversal needs. Cheat-sheet:
c_ast.FuncDecl— function declaration;node.type.declnamegives the name,node.argsthe parameter list.c_ast.FuncDef— function definition (header + body); contains adecl(theFuncDecl) and abody(compound statement).c_ast.Decl— generic declaration; carriesname,type,init,quals(qualifiers likeconst).c_ast.Typedef— type aliases.c_ast.Struct/c_ast.Union/c_ast.Enum— composite types.decls/valueslists their members.c_ast.PtrDecl— pointer wrapper; chain throughnode.typeto reach the pointed-at type.c_ast.ArrayDecl— array;node.dimis the size expression.c_ast.IdentifierType— primitive types;node.namesis a list like["unsigned", "int"].c_ast.TypeDecl— typed name wrapper around primitives/structs.
The pattern for traversal: subclass c_ast.NodeVisitor, define visit_<Class> methods for each type you care about, and call self.generic_visit(node) to continue walking children. Use pycparser.c_generator.CGenerator().visit(node) to turn any subtree back into source — useful for emitting wrappers from the AST.
# Quick sanity check: dump every Decl's type as C source.
from pycparser.c_generator import CGenerator
gen = CGenerator()
for decl in ast.ext:
if hasattr(decl, "type"):
print(decl.name, "→", gen.visit(decl.type))
Output: globalCount → int, compute → int (*)(const char *), etc. — fast way to introspect what a header declares.
Real-world example: parsing a Linux kernel uAPI header
The kernel's uapi headers are an interesting stress test — large, depend on subtle preprocessor expansion, and use GCC-isms. The recipe that usually works:
# 1. Strip GCC extensions
sed -E 's/__attribute__\s*\(\(.*\)\)//g; s/__extension__//g' \
/usr/include/linux/limits.h > /tmp/limits.h
# 2. Preprocess with the fake stdlib
cpp -nostdinc -E -I/tmp -I"$(python -c 'import pycparser, os; print(os.path.join(os.path.dirname(pycparser.__file__), "..", "utils", "fake_libc_include"))')" \
/tmp/limits.h > /tmp/limits.i
# 3. Parse the result
python -c 'import pycparser; pycparser.CParser().parse(open("/tmp/limits.i").read()); print("ok")'
Output: ok — and a usable AST. The general lesson: pycparser is a pure parser; the prep work to feed it valid C99 is where 90 % of the engineering happens. For kernel headers in particular, prefer libclang when you can.
See also
- Concept: API — generating Python APIs from C headers