cheat sheet
PyYAML
Package-level reference for PyYAML on PyPI — safe_load vs load, dump, custom tags, install, alternatives like ruamel.yaml.
PyYAML
What it is
PyYAML is the long-standing YAML 1.1 parser and emitter for Python. It powers the YAML side of every major Python framework: kubectl-style CLI tools, Ansible playbooks, GitHub Actions workflows, Docker Compose files, Hugo / Sphinx / mkdocs configs, and countless internal configuration schemas. The library compiles against libyaml (a C extension) when available for ~3-5× speedup; falls back to a pure-Python parser otherwise.
Reach for PyYAML when you need to read or write YAML configuration files. For round-tripping with preserved comments and ordering, look at ruamel.yaml. For new configuration designs, consider whether TOML or JSON would be a better fit (PyYAML has historical CVEs around yaml.load — see Security).
Note: the PyPI name is PyYAML (case-insensitive: pip install pyyaml works); the import name is yaml.
Install
pip install pyyaml
Output: (none — exits 0 on success; prefers a wheel with bundled libyaml)
uv add pyyaml
Output: dependency resolved + added to pyproject.toml
poetry add pyyaml
Output: updated lockfile + virtualenv install
pip install --no-binary=:all: pyyaml
Output: forces a source build — links against the system libyaml. Useful when the wheel was built without C extension on your platform.
Versioning & Python support
- Current line is the
6.xseries. Major releases are infrequent (5.x→6.0in 2021). - Supports Python 3.6+ on the
6.xline; older Pythons can pin5.4.x. - Releases are slow — security fixes can take months. The
5.1FullLoaderintroduction (CVE-2017-18342 family) was a long-running project; YAML 1.2 support remains incomplete. - The library implements YAML 1.1, not YAML 1.2. Differences:
yes/no/on/offparse as booleans (1.1) but as strings (1.2). For 1.2, considerruamel.yaml.
Package metadata
- Maintainer: Ingy döt Net, original author (Kirill Simonov); current PyYAML org
- Project home: github.com/yaml/pyyaml
- Docs: pyyaml.org
- PyPI: pypi.org/project/PyYAML
- License: MIT
- Governance: community-maintained; YAML core team alignment
- First released: 2006
- Downloads: consistently in PyPI top 10 (transitive via everything that touches YAML)
Optional dependencies & extras
libyaml— optional C library for the fast parser/emitter. Bundled in the prebuilt wheels for Linux / macOS / Windows. If absent, PyYAML falls back to a pure-Python implementation (significantly slower).- No PyPI extras.
Alternatives
| Package | Trade-off |
|---|---|
ruamel.yaml | YAML 1.2-compliant; preserves comments and ordering on round-trip. Slower, heavier, but the right answer for editable config. |
strictyaml | Subset of YAML — no flow style, no implicit typing. Use for security-sensitive config where YAML's footguns matter. |
yaml-rs (via python-rapidyaml) | Rust-backed YAML 1.2 parser. Fast; less mature ecosystem. |
tomllib / tomli-w (stdlib for read) | If you can choose the format, TOML is safer and simpler. Migrating? Consider it. |
| JSON | If your config doesn't need comments, JSON is the boring-good choice. |
Common gotchas
yaml.load(data)is unsafe. Never use it on untrusted input — it can instantiate arbitrary Python classes. Always useyaml.safe_load()(or passLoader=yaml.SafeLoader). PyYAML 5.1+ warns whenLoader=is omitted to encourage explicitness.yes/no/on/offparse as booleans. YAML 1.1 quirk —enabled: noisFalse, not the string"no". Quote your strings.- Octal literals.
version: 010is the integer 8 (YAML 1.1 octal). Quote literals. - Numbers vs strings. Norway problem: country code
NOparses asFalse. Country codeNO_parses as"NO_". Always quote. - Order is NOT preserved on round-trip with
dump. Usesort_keys=Falseto keep insertion order; comments and blank lines are still lost. !!python/objecttags in YAML payload are the security vulnerability.safe_loadrejects them;load(Loader=FullLoader)rejects them in 5.1+;load(Loader=UnsafeLoader)accepts them — avoid.- Multi-doc YAML. Files separated by
---needyaml.safe_load_all()(iterator), notsafe_load(). - Indentation is significant. Two-space indentation is the convention; tabs are forbidden.
Real-world recipes
The five recipes cover the daily-use surface: safe loading, dumping with shape control, multi-doc, the round-trip caveat, and custom tags.
Recipe 1 — Safe load and basic inspection.
import yaml
data = """
name: alice-dev
roles:
- admin
- editor
limits:
cpu: 2
memory: 4Gi
"""
obj = yaml.safe_load(data)
print(obj["name"], obj["roles"], obj["limits"])
Output:
alice-dev ['admin', 'editor'] {'cpu': 2, 'memory': '4Gi'}
safe_load returns plain Python types: dict, list, str, int, bool, None. Never use load for untrusted input.
Recipe 2 — Dump with custom indent and key order.
import yaml
data = {
"name": "alice-dev",
"roles": ["admin", "editor"],
"limits": {"cpu": 2, "memory": "4Gi"},
}
out = yaml.safe_dump(
data,
sort_keys=False, # preserve insertion order
indent=2,
default_flow_style=False,
allow_unicode=True,
)
print(out)
Output:
name: alice-dev
roles:
- admin
- editor
limits:
cpu: 2
memory: 4Gi
sort_keys=False is the most-requested knob — defaults to True, which surprises everyone.
Recipe 3 — Round-trip preserves data but NOT comments.
import yaml
original = """\
# Top-level config
name: alice-dev # primary user
roles:
- admin
"""
obj = yaml.safe_load(original)
roundtripped = yaml.safe_dump(obj, sort_keys=False)
print(roundtripped)
Output:
name: alice-dev
roles:
- admin
Note: comments were dropped. PyYAML does NOT round-trip comments. If you need to preserve them, use ruamel.yaml instead:
# pip install ruamel.yaml
from ruamel.yaml import YAML
import io
yml = YAML(typ="rt") # round-trip mode
yml.preserve_quotes = True
doc = yml.load(original)
out = io.StringIO()
yml.dump(doc, out)
print(out.getvalue())
Output: comments and ordering are preserved. ruamel.yaml is the right tool for editable configuration files.
Recipe 4 — Multi-document YAML (Kubernetes manifests).
import yaml
stream = """
apiVersion: v1
kind: ConfigMap
metadata:
name: cm-1
---
apiVersion: v1
kind: Secret
metadata:
name: s-1
"""
docs = list(yaml.safe_load_all(stream))
for d in docs:
print(d["kind"], d["metadata"]["name"])
Output:
ConfigMap cm-1
Secret s-1
safe_load_all returns an iterator — wrap in list() for multi-pass.
Recipe 5 — Config loader pattern with environment overrides.
import os, yaml
from pathlib import Path
def load_config(path: str) -> dict:
with open(path) as f:
cfg = yaml.safe_load(f) or {}
# Allow env vars to override leaf string values via dotted path
for k, v in os.environ.items():
if k.startswith("APP__"):
keys = k.removeprefix("APP__").lower().split("__")
d = cfg
for kk in keys[:-1]:
d = d.setdefault(kk, {})
d[keys[-1]] = v
return cfg
# /tmp/cfg.yaml: {"db": {"host": "localhost"}}
Path("/tmp/cfg.yaml").write_text("db:\n host: localhost\n port: 5432\n")
os.environ["APP__DB__HOST"] = "myhost"
print(load_config("/tmp/cfg.yaml"))
Output:
{'db': {'host': 'myhost', 'port': 5432}}
The APP__SECTION__KEY override pattern is a common 12-factor-app idiom.
Recipe 6 — Custom tag for typed loading.
import yaml
from datetime import datetime
class IsoDate(yaml.YAMLObject):
yaml_tag = "!iso_date"
yaml_loader = yaml.SafeLoader
def __init__(self, value: str):
self.value = datetime.fromisoformat(value)
# Register the constructor explicitly
def construct_iso(loader, node):
return datetime.fromisoformat(loader.construct_scalar(node))
yaml.SafeLoader.add_constructor("!iso_date", construct_iso)
print(yaml.safe_load("when: !iso_date 2026-05-31T14:32:00"))
Output:
{'when': datetime.datetime(2026, 5, 31, 14, 32)}
Custom tags let you keep types in the YAML; register against SafeLoader to keep the safety guarantees.
Performance tuning
-
Install the C-backed wheel. Confirm with:
import yaml print(yaml.__with_libyaml__) # True if libyaml extension is loadedOutput:
Trueon prebuilt wheels (Linux, macOS, Windows).Falsemeans you fell back to pure-Python — ~3-5× slower. -
Use
CSafeLoader/CSafeDumperdirectly when you know the C extension is available:from yaml import safe_load from yaml import CSafeLoader as Loader obj = safe_load(open("config.yaml"), Loader=Loader)Slight speedup on hot paths by skipping the Python-vs-C dispatch.
-
Cache parsed config. YAML parsing is moderately expensive (~10× JSON). For long-running services, parse once, hand around the dict.
-
Avoid huge anchors / aliases. Pathological YAML (e.g. "billion laughs" structures) can blow memory; cap input size for untrusted sources.
Version migration guide
5.0 → 5.1—yaml.loaddeprecation warning added; recommendsyaml.safe_loador explicitLoader=.5.1 → 5.4—FullLoadertightened;UnsafeLoaderkept for legacy.5.4 → 6.0— minimum Python 3.6; bundledlibyamlbuild cleanups; someLoader=argument names normalized.6.0 → 6.0.x— security fixes; no API breaks.
# Pre-5.1 (insecure default)
obj = yaml.load(open("config.yaml")) # warns + DANGEROUS for untrusted input
# 5.1+ (safe)
obj = yaml.safe_load(open("config.yaml")) # always prefer this
Output: safe-load behavior is now the documented default.
Security considerations
This is the single most important section of the article. PyYAML's history is a series of CVEs around the default load() behavior.
- NEVER use
yaml.load(untrusted_input). It can instantiate arbitrary Python objects via!!python/objecttags. CVE-2017-18342 family. - Always use
yaml.safe_load(...)for any input that didn't originate from your own trusted source. FullLoaderis still risky. It restricts the executable surface but not enough;safe_loadis the recommended default.UnsafeLoaderis explicit-by-name to discourage accidental use. Only use it for YAML you yourself wrote.- Schema validation. Use
pydantic,jsonschema, orstrictyamlto validate structure AFTERsafe_load. PyYAML alone doesn't validate against a schema. - Resource limits. Pathological YAML (deeply nested aliases) can DoS the parser. Cap file size; consider a sandboxed process for untrusted input.
- Quote ambiguous values.
password: yesbecomes the booleanTrue— quote secrets.
Testing & CI
import yaml, pytest
def test_round_trip():
data = {"name": "alice-dev", "roles": ["admin"]}
s = yaml.safe_dump(data, sort_keys=False)
assert yaml.safe_load(s) == data
def test_norway_problem():
# Country code NO parses as False unless quoted
assert yaml.safe_load("country: NO") == {"country": False}
assert yaml.safe_load('country: "NO"') == {"country": "NO"}
def test_unsafe_rejected():
with pytest.raises(yaml.constructor.ConstructorError):
yaml.safe_load("!!python/object/apply:os.system ['ls']")
Output: all assertions hold; the test makes the YAML 1.1 quirks visible and ensures safe_load actively refuses dangerous constructs.
For CI, also lint that no yaml.load( slipped in:
grep -rEn '\byaml\.load\s*\(' src/ && exit 1 || true
Output: exits 1 if any file uses yaml.load() instead of safe_load().
Ecosystem integrations
ruamel.yaml— round-trip with comment preservation; YAML 1.2.pydantic-settings— load typed config from YAML files.jsonschema— validate YAML-loaded dicts against JSON Schema.Hydra— Facebook's config framework for ML experiments; YAML-backed.Ansible— YAML is the playbook format; uses PyYAML internally.Kubernetes Python client— manifests parse via PyYAML.
Compatibility matrix
| Python | PyYAML | Notes |
|---|---|---|
| 3.5 | 5.4 (frozen) | Final supported version. |
| 3.6 | 6.0+ | Lowest current floor. |
| 3.7 | 6.0+ | Stable. |
| 3.8 | 6.0+ | Stable. |
| 3.9 | 6.0+ | Stable. |
| 3.10 | 6.0+ | Stable. |
| 3.11 | 6.0+ | Stable. |
| 3.12 | 6.0.1+ | Required for the imp/distutils removal. |
| 3.13 | 6.0.2+ | Build fixes for the latest Python. |
Production deployment
- Pin a recent version (
PyYAML>=6.0.2). The library updates infrequently — when an update lands, it's usually security-relevant. - Bake the C-backed wheel into the image — pure-Python fallback is a perf regression. On Alpine, install
yaml-devand rebuild from source. - Centralize YAML loading — wrap
safe_loadin a project helper that adds size limits and schema validation. - Validate schemas at startup so misconfiguration fails fast.
- Don't accept user-uploaded YAML without a sandbox. If you must, cap size, use
safe_load, and validate against a strict schema.
When NOT to use this
- You need to preserve comments on edit. Use
ruamel.yaml. - You need YAML 1.2 compliance. Use
ruamel.yaml(oryaml-rs/python-rapidyamlfor speed). - You're choosing a config format from scratch. Strongly consider TOML (stdlib
tomllib) or JSON. YAML's ambiguities are not worth its conciseness in greenfield code. - You're accepting untrusted YAML from the network. Use
strictyaml(no flow style, no implicit typing) or a sandboxed parser.
Troubleshooting common errors
| Error / Symptom | Likely cause | Fix |
|---|---|---|
yaml.YAMLError: while parsing a block mapping | Indentation mismatch (tabs or mixed) | Convert tabs to two-space indentation. |
country: NO returns False | YAML 1.1 boolean parsing | Quote the value: country: "NO". |
| Comments lost on round-trip | PyYAML doesn't preserve them | Switch to ruamel.yaml. |
ConstructorError: could not determine a constructor for the tag | Custom tag without registered constructor | Register with SafeLoader.add_constructor. |
pip install PyYAML slow / fails | Building from source on a system without libyaml | Install a wheel: pip install --only-binary=:all: PyYAML. |
yaml.__with_libyaml__ is False | Pure-Python fallback installed | Reinstall with wheel; on Alpine, apk add yaml-dev. |
| Numbers parse as strings | Leading zeros, sexagesimal (1:30:00), or quoted | YAML 1.1 quirks — read the spec or use strictyaml. |
See also
- Concept: JSON — related serialization format
- Official PyYAML docs
- YAML 1.2 specification
- ruamel.yaml — round-trip alternative