cheat sheet

PyYAML

Package-level reference for PyYAML on PyPI — safe_load vs load, dump, custom tags, install, alternatives like ruamel.yaml.

PyYAML

What it is

PyYAML is the long-standing YAML 1.1 parser and emitter for Python. It powers the YAML side of every major Python framework: kubectl-style CLI tools, Ansible playbooks, GitHub Actions workflows, Docker Compose files, Hugo / Sphinx / mkdocs configs, and countless internal configuration schemas. The library compiles against libyaml (a C extension) when available for ~3-5× speedup; falls back to a pure-Python parser otherwise.

Reach for PyYAML when you need to read or write YAML configuration files. For round-tripping with preserved comments and ordering, look at ruamel.yaml. For new configuration designs, consider whether TOML or JSON would be a better fit (PyYAML has historical CVEs around yaml.load — see Security).

Note: the PyPI name is PyYAML (case-insensitive: pip install pyyaml works); the import name is yaml.

Install

bash
pip install pyyaml

Output: (none — exits 0 on success; prefers a wheel with bundled libyaml)

bash
uv add pyyaml

Output: dependency resolved + added to pyproject.toml

bash
poetry add pyyaml

Output: updated lockfile + virtualenv install

bash
pip install --no-binary=:all: pyyaml

Output: forces a source build — links against the system libyaml. Useful when the wheel was built without C extension on your platform.

Versioning & Python support

  • Current line is the 6.x series. Major releases are infrequent (5.x6.0 in 2021).
  • Supports Python 3.6+ on the 6.x line; older Pythons can pin 5.4.x.
  • Releases are slow — security fixes can take months. The 5.1 FullLoader introduction (CVE-2017-18342 family) was a long-running project; YAML 1.2 support remains incomplete.
  • The library implements YAML 1.1, not YAML 1.2. Differences: yes/no/on/off parse as booleans (1.1) but as strings (1.2). For 1.2, consider ruamel.yaml.

Package metadata

  • Maintainer: Ingy döt Net, original author (Kirill Simonov); current PyYAML org
  • Project home: github.com/yaml/pyyaml
  • Docs: pyyaml.org
  • PyPI: pypi.org/project/PyYAML
  • License: MIT
  • Governance: community-maintained; YAML core team alignment
  • First released: 2006
  • Downloads: consistently in PyPI top 10 (transitive via everything that touches YAML)

Optional dependencies & extras

  • libyaml — optional C library for the fast parser/emitter. Bundled in the prebuilt wheels for Linux / macOS / Windows. If absent, PyYAML falls back to a pure-Python implementation (significantly slower).
  • No PyPI extras.

Alternatives

PackageTrade-off
ruamel.yamlYAML 1.2-compliant; preserves comments and ordering on round-trip. Slower, heavier, but the right answer for editable config.
strictyamlSubset of YAML — no flow style, no implicit typing. Use for security-sensitive config where YAML's footguns matter.
yaml-rs (via python-rapidyaml)Rust-backed YAML 1.2 parser. Fast; less mature ecosystem.
tomllib / tomli-w (stdlib for read)If you can choose the format, TOML is safer and simpler. Migrating? Consider it.
JSONIf your config doesn't need comments, JSON is the boring-good choice.

Common gotchas

  1. yaml.load(data) is unsafe. Never use it on untrusted input — it can instantiate arbitrary Python classes. Always use yaml.safe_load() (or pass Loader=yaml.SafeLoader). PyYAML 5.1+ warns when Loader= is omitted to encourage explicitness.
  2. yes/no/on/off parse as booleans. YAML 1.1 quirk — enabled: no is False, not the string "no". Quote your strings.
  3. Octal literals. version: 010 is the integer 8 (YAML 1.1 octal). Quote literals.
  4. Numbers vs strings. Norway problem: country code NO parses as False. Country code NO_ parses as "NO_". Always quote.
  5. Order is NOT preserved on round-trip with dump. Use sort_keys=False to keep insertion order; comments and blank lines are still lost.
  6. !!python/object tags in YAML payload are the security vulnerability. safe_load rejects them; load(Loader=FullLoader) rejects them in 5.1+; load(Loader=UnsafeLoader) accepts them — avoid.
  7. Multi-doc YAML. Files separated by --- need yaml.safe_load_all() (iterator), not safe_load().
  8. Indentation is significant. Two-space indentation is the convention; tabs are forbidden.

Real-world recipes

The five recipes cover the daily-use surface: safe loading, dumping with shape control, multi-doc, the round-trip caveat, and custom tags.

Recipe 1 — Safe load and basic inspection.

python
import yaml

data = """
name: alice-dev
roles:
  - admin
  - editor
limits:
  cpu: 2
  memory: 4Gi
"""

obj = yaml.safe_load(data)
print(obj["name"], obj["roles"], obj["limits"])

Output:

css
alice-dev ['admin', 'editor'] {'cpu': 2, 'memory': '4Gi'}

safe_load returns plain Python types: dict, list, str, int, bool, None. Never use load for untrusted input.

Recipe 2 — Dump with custom indent and key order.

python
import yaml

data = {
    "name": "alice-dev",
    "roles": ["admin", "editor"],
    "limits": {"cpu": 2, "memory": "4Gi"},
}

out = yaml.safe_dump(
    data,
    sort_keys=False,        # preserve insertion order
    indent=2,
    default_flow_style=False,
    allow_unicode=True,
)
print(out)

Output:

yaml
name: alice-dev
roles:
- admin
- editor
limits:
  cpu: 2
  memory: 4Gi

sort_keys=False is the most-requested knob — defaults to True, which surprises everyone.

Recipe 3 — Round-trip preserves data but NOT comments.

python
import yaml

original = """\
# Top-level config
name: alice-dev   # primary user
roles:
  - admin
"""

obj = yaml.safe_load(original)
roundtripped = yaml.safe_dump(obj, sort_keys=False)
print(roundtripped)

Output:

makefile
name: alice-dev
roles:
- admin

Note: comments were dropped. PyYAML does NOT round-trip comments. If you need to preserve them, use ruamel.yaml instead:

python
# pip install ruamel.yaml
from ruamel.yaml import YAML
import io

yml = YAML(typ="rt")              # round-trip mode
yml.preserve_quotes = True
doc = yml.load(original)
out = io.StringIO()
yml.dump(doc, out)
print(out.getvalue())

Output: comments and ordering are preserved. ruamel.yaml is the right tool for editable configuration files.

Recipe 4 — Multi-document YAML (Kubernetes manifests).

python
import yaml

stream = """
apiVersion: v1
kind: ConfigMap
metadata:
  name: cm-1
---
apiVersion: v1
kind: Secret
metadata:
  name: s-1
"""

docs = list(yaml.safe_load_all(stream))
for d in docs:
    print(d["kind"], d["metadata"]["name"])

Output:

code
ConfigMap cm-1
Secret s-1

safe_load_all returns an iterator — wrap in list() for multi-pass.

Recipe 5 — Config loader pattern with environment overrides.

python
import os, yaml
from pathlib import Path

def load_config(path: str) -> dict:
    with open(path) as f:
        cfg = yaml.safe_load(f) or {}
    # Allow env vars to override leaf string values via dotted path
    for k, v in os.environ.items():
        if k.startswith("APP__"):
            keys = k.removeprefix("APP__").lower().split("__")
            d = cfg
            for kk in keys[:-1]:
                d = d.setdefault(kk, {})
            d[keys[-1]] = v
    return cfg

# /tmp/cfg.yaml: {"db": {"host": "localhost"}}
Path("/tmp/cfg.yaml").write_text("db:\n  host: localhost\n  port: 5432\n")
os.environ["APP__DB__HOST"] = "myhost"
print(load_config("/tmp/cfg.yaml"))

Output:

arduino
{'db': {'host': 'myhost', 'port': 5432}}

The APP__SECTION__KEY override pattern is a common 12-factor-app idiom.

Recipe 6 — Custom tag for typed loading.

python
import yaml
from datetime import datetime

class IsoDate(yaml.YAMLObject):
    yaml_tag = "!iso_date"
    yaml_loader = yaml.SafeLoader
    def __init__(self, value: str):
        self.value = datetime.fromisoformat(value)

# Register the constructor explicitly
def construct_iso(loader, node):
    return datetime.fromisoformat(loader.construct_scalar(node))

yaml.SafeLoader.add_constructor("!iso_date", construct_iso)

print(yaml.safe_load("when: !iso_date 2026-05-31T14:32:00"))

Output:

css
{'when': datetime.datetime(2026, 5, 31, 14, 32)}

Custom tags let you keep types in the YAML; register against SafeLoader to keep the safety guarantees.

Performance tuning

  • Install the C-backed wheel. Confirm with:

    python
    import yaml
    print(yaml.__with_libyaml__)        # True if libyaml extension is loaded
    

    Output: True on prebuilt wheels (Linux, macOS, Windows). False means you fell back to pure-Python — ~3-5× slower.

  • Use CSafeLoader / CSafeDumper directly when you know the C extension is available:

    python
    from yaml import safe_load
    from yaml import CSafeLoader as Loader
    obj = safe_load(open("config.yaml"), Loader=Loader)
    

    Slight speedup on hot paths by skipping the Python-vs-C dispatch.

  • Cache parsed config. YAML parsing is moderately expensive (~10× JSON). For long-running services, parse once, hand around the dict.

  • Avoid huge anchors / aliases. Pathological YAML (e.g. "billion laughs" structures) can blow memory; cap input size for untrusted sources.

Version migration guide

  • 5.0 → 5.1yaml.load deprecation warning added; recommends yaml.safe_load or explicit Loader=.
  • 5.1 → 5.4FullLoader tightened; UnsafeLoader kept for legacy.
  • 5.4 → 6.0 — minimum Python 3.6; bundled libyaml build cleanups; some Loader= argument names normalized.
  • 6.0 → 6.0.x — security fixes; no API breaks.
python
# Pre-5.1 (insecure default)
obj = yaml.load(open("config.yaml"))           # warns + DANGEROUS for untrusted input

# 5.1+ (safe)
obj = yaml.safe_load(open("config.yaml"))      # always prefer this

Output: safe-load behavior is now the documented default.

Security considerations

This is the single most important section of the article. PyYAML's history is a series of CVEs around the default load() behavior.

  • NEVER use yaml.load(untrusted_input). It can instantiate arbitrary Python objects via !!python/object tags. CVE-2017-18342 family.
  • Always use yaml.safe_load(...) for any input that didn't originate from your own trusted source.
  • FullLoader is still risky. It restricts the executable surface but not enough; safe_load is the recommended default.
  • UnsafeLoader is explicit-by-name to discourage accidental use. Only use it for YAML you yourself wrote.
  • Schema validation. Use pydantic, jsonschema, or strictyaml to validate structure AFTER safe_load. PyYAML alone doesn't validate against a schema.
  • Resource limits. Pathological YAML (deeply nested aliases) can DoS the parser. Cap file size; consider a sandboxed process for untrusted input.
  • Quote ambiguous values. password: yes becomes the boolean True — quote secrets.

Testing & CI

python
import yaml, pytest

def test_round_trip():
    data = {"name": "alice-dev", "roles": ["admin"]}
    s = yaml.safe_dump(data, sort_keys=False)
    assert yaml.safe_load(s) == data

def test_norway_problem():
    # Country code NO parses as False unless quoted
    assert yaml.safe_load("country: NO") == {"country": False}
    assert yaml.safe_load('country: "NO"') == {"country": "NO"}

def test_unsafe_rejected():
    with pytest.raises(yaml.constructor.ConstructorError):
        yaml.safe_load("!!python/object/apply:os.system ['ls']")

Output: all assertions hold; the test makes the YAML 1.1 quirks visible and ensures safe_load actively refuses dangerous constructs.

For CI, also lint that no yaml.load( slipped in:

bash
grep -rEn '\byaml\.load\s*\(' src/ && exit 1 || true

Output: exits 1 if any file uses yaml.load() instead of safe_load().

Ecosystem integrations

  • ruamel.yaml — round-trip with comment preservation; YAML 1.2.
  • pydantic-settings — load typed config from YAML files.
  • jsonschema — validate YAML-loaded dicts against JSON Schema.
  • Hydra — Facebook's config framework for ML experiments; YAML-backed.
  • Ansible — YAML is the playbook format; uses PyYAML internally.
  • Kubernetes Python client — manifests parse via PyYAML.

Compatibility matrix

PythonPyYAMLNotes
3.55.4 (frozen)Final supported version.
3.66.0+Lowest current floor.
3.76.0+Stable.
3.86.0+Stable.
3.96.0+Stable.
3.106.0+Stable.
3.116.0+Stable.
3.126.0.1+Required for the imp/distutils removal.
3.136.0.2+Build fixes for the latest Python.

Production deployment

  • Pin a recent version (PyYAML>=6.0.2). The library updates infrequently — when an update lands, it's usually security-relevant.
  • Bake the C-backed wheel into the image — pure-Python fallback is a perf regression. On Alpine, install yaml-dev and rebuild from source.
  • Centralize YAML loading — wrap safe_load in a project helper that adds size limits and schema validation.
  • Validate schemas at startup so misconfiguration fails fast.
  • Don't accept user-uploaded YAML without a sandbox. If you must, cap size, use safe_load, and validate against a strict schema.

When NOT to use this

  • You need to preserve comments on edit. Use ruamel.yaml.
  • You need YAML 1.2 compliance. Use ruamel.yaml (or yaml-rs / python-rapidyaml for speed).
  • You're choosing a config format from scratch. Strongly consider TOML (stdlib tomllib) or JSON. YAML's ambiguities are not worth its conciseness in greenfield code.
  • You're accepting untrusted YAML from the network. Use strictyaml (no flow style, no implicit typing) or a sandboxed parser.

Troubleshooting common errors

Error / SymptomLikely causeFix
yaml.YAMLError: while parsing a block mappingIndentation mismatch (tabs or mixed)Convert tabs to two-space indentation.
country: NO returns FalseYAML 1.1 boolean parsingQuote the value: country: "NO".
Comments lost on round-tripPyYAML doesn't preserve themSwitch to ruamel.yaml.
ConstructorError: could not determine a constructor for the tagCustom tag without registered constructorRegister with SafeLoader.add_constructor.
pip install PyYAML slow / failsBuilding from source on a system without libyamlInstall a wheel: pip install --only-binary=:all: PyYAML.
yaml.__with_libyaml__ is FalsePure-Python fallback installedReinstall with wheel; on Alpine, apk add yaml-dev.
Numbers parse as stringsLeading zeros, sexagesimal (1:30:00), or quotedYAML 1.1 quirks — read the spec or use strictyaml.

See also

  • Concept: JSON — related serialization format
  • Official PyYAML docs
  • YAML 1.2 specification
  • ruamel.yaml — round-trip alternative