cheat sheet
idna
Package-level reference for idna on PyPI — IDNA2008 vs UTS46, encode/decode, install, integration with requests / urllib3, alternatives.
idna
What it is
idna is a Python implementation of RFC 5891 (IDNA2008) for converting between Unicode domain names like münchen.de and their ASCII-compatible encoding (xn--mnchen-3ya.de). It also implements the Unicode Technical Standard #46 (UTS46) for compatibility with the older IDNA2003 mappings that web browsers historically used. The library is part of the request-validation path in requests, urllib3, httpx, and other HTTP libraries — they call idna.encode() before passing a host to the resolver.
Reach for idna directly when you need to: validate or normalize a user-supplied domain name; convert between Unicode and Punycode forms; check whether a label conforms to IDNA2008 rules; or implement a protocol (SMTP, FTP) that requires IDN-safe hostnames.
Install
pip install idna
Output: (none — exits 0 on success; pure-Python, zero dependencies)
uv add idna
Output: dependency resolved + added to pyproject.toml
poetry add idna
Output: updated lockfile + virtualenv install
There are no optional extras — idna is a single pure-Python package with no install variants.
Versioning & Python support
- Current line is the
3.xseries. Semantic versioning — minor releases are backwards-compatible. - The
2.xline is frozen; remaining downstream pins gradually migrate. - Supports Python 3.6+ on the
3.xline; 3.5 was dropped at3.0. - The library is small and stable — major releases happen every few years, mostly to update the Unicode tables to match the latest Unicode standard.
Package metadata
- Maintainer: Kim Davies
- Project home: github.com/kjd/idna
- Docs: pypi.org/project/idna
- License: BSD-3-Clause
- Governance: single-maintainer; ICANN engagement
- First released: 2014
- Downloads: consistently in PyPI top 10 (transitive via
requests,urllib3,httpx) - Standards followed: RFC 5891 (IDNA2008), RFC 5892 (tables), UTS46
Optional dependencies & extras
- None.
idnahas no third-party dependencies.
Alternatives
| Package | Trade-off |
|---|---|
encodings.idna (stdlib) | Implements only the older IDNA2003 RFC. Use only as a last resort — it accepts strings IDNA2008 would reject. |
libidn2 (via ctypes / bindings) | Reference C implementation. Faster on large batches; native dep. |
tld | Higher-level — extracts effective TLD ("public suffix"). Different layer; pair with idna. |
Common gotchas
idna.encode()returnsbytes, notstr. Many callers want.decode("ascii")afterward to get a regular string like"xn--mnchen-3ya.de".- IDNA2008 is strict. Labels with mixed scripts (e.g. mixed Latin + Cyrillic) or with characters disallowed by the table will raise
IDNAError. The stdlibidnacodec is more permissive — sometimes a sign of bugs. uts46=Trueenables the browser-style mapping — converts uppercase to lowercase, maps deprecated chars. Use for parsing user input from address bars.- Empty labels (consecutive dots) and labels longer than 63 octets raise
IDNAError. ASCII-only valid hosts pass through unchanged. - Trailing dot (
example.com.) is preserved but the empty final label is not encoded.idna.encode("example.com.")raises unless you strip the dot first. alabel()andulabel()are per-label functions; for full names, useencode()anddecode()which split on.for you.- Internationalized TLDs.
.рф,.中国,.tokyoall work throughidna— there's nothing special to enable.
Real-world recipes
The recipes cover the four operations you'll actually do: encode, decode, validate, and the UTS46-vs-IDNA2008 split.
Recipe 1 — Encode an internationalised domain to Punycode.
import idna
ascii_name = idna.encode("münchen.de").decode("ascii")
print(ascii_name)
Output:
xn--mnchen-3ya.de
Pass ascii_name to socket.getaddrinfo() or any other ASCII-only API.
Recipe 2 — Decode Punycode back to Unicode.
import idna
print(idna.decode("xn--mnchen-3ya.de"))
print(idna.decode("xn--80akhbyknj4f.xn--p1ai"))
Output:
münchen.de
испытание.рф
decode() accepts either bytes or str input.
Recipe 3 — Validate an arbitrary domain.
import idna
def is_valid_domain(name: str) -> bool:
try:
idna.encode(name)
return True
except idna.IDNAError:
return False
print(is_valid_domain("example.com")) # True
print(is_valid_domain("münchen.de")) # True
print(is_valid_domain("foo--bar.com")) # False — IDNA2008 reserves --
print(is_valid_domain("ASCII--mixed.com")) # False
Output:
True
True
False
False
idna.IDNAError covers every rejection reason; inspect str(exc) for the specific cause.
Recipe 4 — UTS46 vs IDNA2008 — browser-permissive vs spec-strict.
import idna
# Strict IDNA2008 — rejects mixed scripts and deprecated chars
try:
idna.encode("ExamPle.Com") # uppercase disallowed
except idna.IDNAError as e:
print("strict:", e)
# UTS46 — lowercases and maps deprecated chars (browser behavior)
print("uts46:", idna.encode("ExamPle.Com", uts46=True).decode())
# UTS46 transitional — even more permissive (sharp-s, eszett)
print("uts46 transitional:", idna.encode("straße.de", uts46=True, transitional=True).decode())
Output:
strict: Codepoint U+0045 not allowed at position 1 in 'ExamPle'
uts46: example.com
uts46 transitional: strasse.de
uts46=True, transitional=False is the modern default. transitional=True maps ß → ss — historical browser behavior; rarely what you want today.
Recipe 5 — Round-trip with non-ASCII TLD.
import idna
original = "тест.испытание" # Cyrillic example.test
encoded = idna.encode(original).decode("ascii")
decoded = idna.decode(encoded)
print(encoded)
print(decoded)
print(decoded == original)
Output:
xn--e1aybc.xn--80akhbyknj4f
тест.испытание
True
Round-trip is lossless for valid IDNA2008 input.
Performance tuning
idnais fast enough. ~10-50 µs perencode()call; not a hot path in any normal HTTP stack.- Cache results when batch-processing large domain lists —
functools.lru_cache(maxsize=10_000)overidna.encodeif you re-encode the same labels repeatedly. - The Unicode tables are statically generated at install time — no runtime download.
Version migration guide
2.x → 3.0— minimum Python 3.5+ (later 3.6+); some helper functions removed in favor ofencode/decode.3.0 → 3.2—uts46=Truedefault for new releases ofhttpxmatched here.3.4 → 3.6— Unicode tables refreshed to match Unicode 15.x.3.6 → 3.7— Unicode 16.x, IDNA2008 errata applied.
# Pre-3.x removed helpers
from idna.codec import ulabel, alabel # removed
# 3.x — use top-level functions
from idna import alabel, ulabel
Output: same semantics; cleaner imports.
Security considerations
- Homograph attacks are the entire reason IDNA exists. Mixed-script labels (e.g.
cyrillic-amasquerading as Latina) are rejected by IDNA2008 — leaving them in error paths is the safe default. uts46=True, transitional=Truemapsß→ss. This is unsafe for security-sensitive contexts (you can craft visually-similar pairs).- Always normalize before comparing. Compare encoded forms (
idna.encode(a) == idna.encode(b)), never raw Unicode —é(U+00E9) ≠e+ combining-acute (U+0065 U+0301). - Allowlist your TLDs. IDNA validates the format of labels; it doesn't tell you whether
.zzis a real TLD. Pair withtldor the IANA TLD list. - Email IDN. SMTP and Email IDN (
SMTPUTF8) have different rules — for email useidnafor the domain part only, after@.
Testing & CI
import idna, pytest
@pytest.mark.parametrize("name,expected", [
("example.com", "example.com"),
("münchen.de", "xn--mnchen-3ya.de"),
("испытание.рф", "xn--80akhbyknj4f.xn--p1ai"),
])
def test_encode(name, expected):
assert idna.encode(name).decode("ascii") == expected
@pytest.mark.parametrize("bad", ["foo--bar.com", "ExamPle.Com", "", ".com"])
def test_rejects(bad):
with pytest.raises(idna.IDNAError):
idna.encode(bad)
Output: parametrised test passes for valid names and asserts that invalid forms raise IDNAError.
Ecosystem integrations
requests—idnais used to encode the host portion of every URL.urllib3— same.httpx— same.smtplib/aiosmtplib— used to encode domain parts when supported.cryptography— used for IDN-aware Subject Alternative Name (SAN) checks in X.509 verification.dnspython— used to encode names before DNS queries.
Compatibility matrix
| Python | idna | Notes |
|---|---|---|
| 3.5 | 2.x (frozen) | Final supported line for 3.5. |
| 3.6 | 3.x | Lowest 3.x floor. |
| 3.7 | 3.x | Stable. |
| 3.8 | 3.x | Stable. |
| 3.9 | 3.x | Stable. |
| 3.10 | 3.x | Stable. |
| 3.11 | 3.x | Stable. |
| 3.12 | 3.x | Stable. |
| 3.13 | 3.x | Wheel available immediately. |
idna is pure Python — wheel availability is universal.
Production deployment
- Pin a minimum version (
idna>=3.6) to ensure recent Unicode tables. - Validate user-supplied domain input at the boundary with
idna.encode(...)inside atry/except idna.IDNAError. Failing fast is preferable to passing invalid hosts down the stack. - Log the encoded form in audit logs — Unicode in logs is a footgun.
uts46=Truefor user-facing input (browser-style permissiveness);uts46=Falsefor protocol-internal validation (be strict with peer software).- Refresh
idnaannually to track Unicode standard updates.
When NOT to use this
- You only handle ASCII hostnames. No real need for
idnaif your traffic isexample.com-style. (You'll still get it transitively.) - You need actual DNS resolution.
idnais encoding only — usednspythonorsocket.getaddrinfo()for resolution. - You need the public suffix list.
idnadoesn't know.co.ukis a registry suffix — usetldorpublicsuffix2. - You're decoding email local parts. Use
email.headerregistryoremail-validator; IDN rules don't apply to local parts.
Troubleshooting common errors
| Error / Symptom | Likely cause | Fix |
|---|---|---|
IDNAError: Codepoint Uxxxx not allowed at position N | Disallowed Unicode (uppercase, mixed scripts) | Use uts46=True for browser-permissive input; fail otherwise. |
IDNAError: Empty Label | Consecutive dots or leading/trailing dot | Strip and validate input. |
IDNAError: Label too long | A label > 63 octets after encoding | Shorten the label. |
UnicodeDecodeError after idna.encode | Trying to mix bytes/str | .decode("ascii") on the result. |
requests fails with IDNAError | Bad input URL | Validate the URL before passing to requests; consider uts46=True upstream. |
Old stdlib encodings.idna accepts what idna rejects | Stdlib uses IDNA2003 | Always prefer the idna library; treat the stdlib codec as legacy. |
See also
- Concept: DNS — domain name fundamentals
- Concept: HTTP — protocol context
- Packages: pip-requests — primary consumer
- Official idna repo