cheat sheet

idna

Package-level reference for idna on PyPI — IDNA2008 vs UTS46, encode/decode, install, integration with requests / urllib3, alternatives.

idna

What it is

idna is a Python implementation of RFC 5891 (IDNA2008) for converting between Unicode domain names like münchen.de and their ASCII-compatible encoding (xn--mnchen-3ya.de). It also implements the Unicode Technical Standard #46 (UTS46) for compatibility with the older IDNA2003 mappings that web browsers historically used. The library is part of the request-validation path in requests, urllib3, httpx, and other HTTP libraries — they call idna.encode() before passing a host to the resolver.

Reach for idna directly when you need to: validate or normalize a user-supplied domain name; convert between Unicode and Punycode forms; check whether a label conforms to IDNA2008 rules; or implement a protocol (SMTP, FTP) that requires IDN-safe hostnames.

Install

bash
pip install idna

Output: (none — exits 0 on success; pure-Python, zero dependencies)

bash
uv add idna

Output: dependency resolved + added to pyproject.toml

bash
poetry add idna

Output: updated lockfile + virtualenv install

There are no optional extras — idna is a single pure-Python package with no install variants.

Versioning & Python support

  • Current line is the 3.x series. Semantic versioning — minor releases are backwards-compatible.
  • The 2.x line is frozen; remaining downstream pins gradually migrate.
  • Supports Python 3.6+ on the 3.x line; 3.5 was dropped at 3.0.
  • The library is small and stable — major releases happen every few years, mostly to update the Unicode tables to match the latest Unicode standard.

Package metadata

  • Maintainer: Kim Davies
  • Project home: github.com/kjd/idna
  • Docs: pypi.org/project/idna
  • License: BSD-3-Clause
  • Governance: single-maintainer; ICANN engagement
  • First released: 2014
  • Downloads: consistently in PyPI top 10 (transitive via requests, urllib3, httpx)
  • Standards followed: RFC 5891 (IDNA2008), RFC 5892 (tables), UTS46

Optional dependencies & extras

  • None. idna has no third-party dependencies.

Alternatives

PackageTrade-off
encodings.idna (stdlib)Implements only the older IDNA2003 RFC. Use only as a last resort — it accepts strings IDNA2008 would reject.
libidn2 (via ctypes / bindings)Reference C implementation. Faster on large batches; native dep.
tldHigher-level — extracts effective TLD ("public suffix"). Different layer; pair with idna.

Common gotchas

  1. idna.encode() returns bytes, not str. Many callers want .decode("ascii") afterward to get a regular string like "xn--mnchen-3ya.de".
  2. IDNA2008 is strict. Labels with mixed scripts (e.g. mixed Latin + Cyrillic) or with characters disallowed by the table will raise IDNAError. The stdlib idna codec is more permissive — sometimes a sign of bugs.
  3. uts46=True enables the browser-style mapping — converts uppercase to lowercase, maps deprecated chars. Use for parsing user input from address bars.
  4. Empty labels (consecutive dots) and labels longer than 63 octets raise IDNAError. ASCII-only valid hosts pass through unchanged.
  5. Trailing dot (example.com.) is preserved but the empty final label is not encoded. idna.encode("example.com.") raises unless you strip the dot first.
  6. alabel() and ulabel() are per-label functions; for full names, use encode() and decode() which split on . for you.
  7. Internationalized TLDs. .рф, .中国, .tokyo all work through idna — there's nothing special to enable.

Real-world recipes

The recipes cover the four operations you'll actually do: encode, decode, validate, and the UTS46-vs-IDNA2008 split.

Recipe 1 — Encode an internationalised domain to Punycode.

python
import idna
ascii_name = idna.encode("münchen.de").decode("ascii")
print(ascii_name)

Output:

css
xn--mnchen-3ya.de

Pass ascii_name to socket.getaddrinfo() or any other ASCII-only API.

Recipe 2 — Decode Punycode back to Unicode.

python
import idna
print(idna.decode("xn--mnchen-3ya.de"))
print(idna.decode("xn--80akhbyknj4f.xn--p1ai"))

Output:

code
münchen.de
испытание.рф

decode() accepts either bytes or str input.

Recipe 3 — Validate an arbitrary domain.

python
import idna

def is_valid_domain(name: str) -> bool:
    try:
        idna.encode(name)
        return True
    except idna.IDNAError:
        return False

print(is_valid_domain("example.com"))         # True
print(is_valid_domain("münchen.de"))          # True
print(is_valid_domain("foo--bar.com"))        # False — IDNA2008 reserves --
print(is_valid_domain("ASCII--mixed.com"))    # False

Output:

graphql
True
True
False
False

idna.IDNAError covers every rejection reason; inspect str(exc) for the specific cause.

Recipe 4 — UTS46 vs IDNA2008 — browser-permissive vs spec-strict.

python
import idna

# Strict IDNA2008 — rejects mixed scripts and deprecated chars
try:
    idna.encode("ExamPle.Com")     # uppercase disallowed
except idna.IDNAError as e:
    print("strict:", e)

# UTS46 — lowercases and maps deprecated chars (browser behavior)
print("uts46:", idna.encode("ExamPle.Com", uts46=True).decode())

# UTS46 transitional — even more permissive (sharp-s, eszett)
print("uts46 transitional:", idna.encode("straße.de", uts46=True, transitional=True).decode())

Output:

vbnet
strict: Codepoint U+0045 not allowed at position 1 in 'ExamPle'
uts46: example.com
uts46 transitional: strasse.de

uts46=True, transitional=False is the modern default. transitional=True maps ßss — historical browser behavior; rarely what you want today.

Recipe 5 — Round-trip with non-ASCII TLD.

python
import idna
original = "тест.испытание"     # Cyrillic example.test
encoded = idna.encode(original).decode("ascii")
decoded = idna.decode(encoded)
print(encoded)
print(decoded)
print(decoded == original)

Output:

sql
xn--e1aybc.xn--80akhbyknj4f
тест.испытание
True

Round-trip is lossless for valid IDNA2008 input.

Performance tuning

  • idna is fast enough. ~10-50 µs per encode() call; not a hot path in any normal HTTP stack.
  • Cache results when batch-processing large domain lists — functools.lru_cache(maxsize=10_000) over idna.encode if you re-encode the same labels repeatedly.
  • The Unicode tables are statically generated at install time — no runtime download.

Version migration guide

  • 2.x → 3.0 — minimum Python 3.5+ (later 3.6+); some helper functions removed in favor of encode/decode.
  • 3.0 → 3.2uts46=True default for new releases of httpx matched here.
  • 3.4 → 3.6 — Unicode tables refreshed to match Unicode 15.x.
  • 3.6 → 3.7 — Unicode 16.x, IDNA2008 errata applied.
python
# Pre-3.x removed helpers
from idna.codec import ulabel, alabel   # removed
# 3.x — use top-level functions
from idna import alabel, ulabel

Output: same semantics; cleaner imports.

Security considerations

  • Homograph attacks are the entire reason IDNA exists. Mixed-script labels (e.g. cyrillic-a masquerading as Latin a) are rejected by IDNA2008 — leaving them in error paths is the safe default.
  • uts46=True, transitional=True maps ßss. This is unsafe for security-sensitive contexts (you can craft visually-similar pairs).
  • Always normalize before comparing. Compare encoded forms (idna.encode(a) == idna.encode(b)), never raw Unicode — é (U+00E9) ≠ e + combining-acute (U+0065 U+0301).
  • Allowlist your TLDs. IDNA validates the format of labels; it doesn't tell you whether .zz is a real TLD. Pair with tld or the IANA TLD list.
  • Email IDN. SMTP and Email IDN (SMTPUTF8) have different rules — for email use idna for the domain part only, after @.

Testing & CI

python
import idna, pytest

@pytest.mark.parametrize("name,expected", [
    ("example.com", "example.com"),
    ("münchen.de", "xn--mnchen-3ya.de"),
    ("испытание.рф", "xn--80akhbyknj4f.xn--p1ai"),
])
def test_encode(name, expected):
    assert idna.encode(name).decode("ascii") == expected

@pytest.mark.parametrize("bad", ["foo--bar.com", "ExamPle.Com", "", ".com"])
def test_rejects(bad):
    with pytest.raises(idna.IDNAError):
        idna.encode(bad)

Output: parametrised test passes for valid names and asserts that invalid forms raise IDNAError.

Ecosystem integrations

  • requestsidna is used to encode the host portion of every URL.
  • urllib3 — same.
  • httpx — same.
  • smtplib / aiosmtplib — used to encode domain parts when supported.
  • cryptography — used for IDN-aware Subject Alternative Name (SAN) checks in X.509 verification.
  • dnspython — used to encode names before DNS queries.

Compatibility matrix

PythonidnaNotes
3.52.x (frozen)Final supported line for 3.5.
3.63.xLowest 3.x floor.
3.73.xStable.
3.83.xStable.
3.93.xStable.
3.103.xStable.
3.113.xStable.
3.123.xStable.
3.133.xWheel available immediately.

idna is pure Python — wheel availability is universal.

Production deployment

  • Pin a minimum version (idna>=3.6) to ensure recent Unicode tables.
  • Validate user-supplied domain input at the boundary with idna.encode(...) inside a try/except idna.IDNAError. Failing fast is preferable to passing invalid hosts down the stack.
  • Log the encoded form in audit logs — Unicode in logs is a footgun.
  • uts46=True for user-facing input (browser-style permissiveness); uts46=False for protocol-internal validation (be strict with peer software).
  • Refresh idna annually to track Unicode standard updates.

When NOT to use this

  • You only handle ASCII hostnames. No real need for idna if your traffic is example.com-style. (You'll still get it transitively.)
  • You need actual DNS resolution. idna is encoding only — use dnspython or socket.getaddrinfo() for resolution.
  • You need the public suffix list. idna doesn't know .co.uk is a registry suffix — use tld or publicsuffix2.
  • You're decoding email local parts. Use email.headerregistry or email-validator; IDN rules don't apply to local parts.

Troubleshooting common errors

Error / SymptomLikely causeFix
IDNAError: Codepoint Uxxxx not allowed at position NDisallowed Unicode (uppercase, mixed scripts)Use uts46=True for browser-permissive input; fail otherwise.
IDNAError: Empty LabelConsecutive dots or leading/trailing dotStrip and validate input.
IDNAError: Label too longA label > 63 octets after encodingShorten the label.
UnicodeDecodeError after idna.encodeTrying to mix bytes/str.decode("ascii") on the result.
requests fails with IDNAErrorBad input URLValidate the URL before passing to requests; consider uts46=True upstream.
Old stdlib encodings.idna accepts what idna rejectsStdlib uses IDNA2003Always prefer the idna library; treat the stdlib codec as legacy.

See also