cheat sheet
pathlib
Work with filesystem paths using Python's built-in pathlib module. Covers Path creation, navigation, reading/writing files, glob patterns, and stat.
pathlib — Object-Oriented File Paths
What it is
pathlib is part of the Python standard library (no install needed). It represents filesystem paths as Path objects instead of plain strings, giving you methods for reading, writing, navigating, and querying files — all in a cross-platform way that handles Windows backslashes automatically.
It replaces os.path, os.getcwd(), open() boilerplate, and glob.glob() for most filesystem tasks.
Quick example
from pathlib import Path
p = Path("documents/readme.txt")
print(p.name) # filename with extension
print(p.stem) # filename without extension
print(p.suffix) # extension with dot
print(p.parent) # containing directory
print(p.parts) # tuple of path components
Output:
readme.txt
readme
.txt
documents
('documents', 'readme.txt')
When / why to use it
- Any time you manipulate file paths — pathlib is cleaner and more readable than
os.path.join(). - Reading and writing files without managing file handles.
- Recursive directory searches with
rglob("*.py"). - Cross-platform code:
Pathuses the right separator on every OS.
Prefer
Pathoverstrfor all path values in new code. If a library requires a string, wrap:str(p)or usep.as_posix()for forward-slash strings.
Common pitfalls
/operator builds paths, not divides —Path("/home") / "user" / "file.txt"is path concatenation. If you pass an absolute path as the right operand it replaces everything to the left:Path("/home") / "/etc/passwd"→Path("/etc/passwd").
Path.open()vsopen()— both work, butpath.read_text()/path.write_text()are shorter for simple file reads/writes and handle encoding arguments directly.
Special paths
Path.home() returns the current user's home directory; Path.cwd() returns the process working directory. In scripts, Path(__file__).parent gives the directory of the script itself — useful for building paths relative to the source file rather than wherever the script is invoked from.
from pathlib import Path
print(Path.cwd()) # current working directory
print(Path.home()) # home directory (~)
print(Path("/").root) # "/"
Output:
/home/user/myproject
/home/user
/
Reading and writing
read_text() and write_text() handle the open/read/close cycle in one call and accept an encoding parameter. read_bytes() and write_bytes() do the same for binary data. For appending or more control, fall back to p.open("a") as a context manager.
from pathlib import Path
p = Path("notes.txt")
# Write text (creates or overwrites)
p.write_text("Hello\nWorld\n", encoding="utf-8")
# Read text
content = p.read_text(encoding="utf-8")
print(repr(content))
# Append (open explicitly)
with p.open("a") as f:
f.write("More text\n")
# Binary
p.write_bytes(b"\x89PNG\r\n")
data = p.read_bytes()
print(len(data), "bytes")
Output:
'Hello\nWorld\n'
6 bytes
Richer example — directory operations and glob
from pathlib import Path
base = Path("/tmp/demo_project")
# Create a directory tree
(base / "src").mkdir(parents=True, exist_ok=True)
(base / "tests").mkdir(parents=True, exist_ok=True)
# Write some files
(base / "src" / "app.py").write_text("print('app')\n")
(base / "src" / "utils.py").write_text("# utilities\n")
(base / "tests" / "test_app.py").write_text("# tests\n")
(base / "README.md").write_text("# Demo\n")
# Find all Python files recursively
py_files = sorted(base.rglob("*.py"))
print("Python files:")
for f in py_files:
print(f" {f.relative_to(base)} ({f.stat().st_size} bytes)")
# Find only in src/ (non-recursive)
src_files = sorted((base / "src").glob("*.py"))
print("\nSrc files:")
for f in src_files:
print(f" {f.name}")
Output:
Python files:
src/app.py (13 bytes)
src/utils.py (14 bytes)
tests/test_app.py (9 bytes)
Src files:
app.py
utils.py
Checking existence and type
exists() returns True for any path that exists on the filesystem, including symlinks and directories. Use the more specific is_file() or is_dir() when you need to distinguish between them — both return False if the path doesn't exist at all.
from pathlib import Path
p = Path("/tmp/demo_project/src/app.py")
print(p.exists()) # True if path exists (any type)
print(p.is_file()) # True if regular file
print(p.is_dir()) # True if directory
print(p.is_symlink()) # True if symlink
Output:
True
True
False
False
Quick reference
| Task | Code |
|---|---|
| Build path | Path("dir") / "sub" / "file.txt" |
| Absolute path | p.resolve() |
| Home dir | Path.home() |
| Current dir | Path.cwd() |
| Read text | p.read_text(encoding="utf-8") |
| Write text | p.write_text("content") |
| Read bytes | p.read_bytes() |
| Create dir | p.mkdir(parents=True, exist_ok=True) |
| List dir | list(p.iterdir()) |
| Glob (non-recursive) | list(p.glob("*.py")) |
| Glob (recursive) | list(p.rglob("*.py")) |
| File size | p.stat().st_size |
| Rename / move | p.rename(new_path) |
| Copy (no method — use) | shutil.copy2(src, dst) |
| Delete file | p.unlink() |
| Delete empty dir | p.rmdir() |
| Delete tree | shutil.rmtree(p) |
| Change extension | p.with_suffix(".txt") |
| Change name | p.with_name("other.txt") |
| Change stem | p.with_stem("other") |
| Relative path | p.relative_to(base) |
| Parent dirs | p.parents[0], p.parents[1], … |
Path vs PurePath
PurePath is the string-manipulation base class — it understands path syntax (separators, suffixes, parts) but never touches the filesystem. Path is the concrete subclass that adds I/O methods (read_text, iterdir, exists, stat). Reach for PurePath when you are constructing paths for another system (e.g., building a Linux path on Windows for a remote machine) or when writing pure logic that should not hit the disk.
from pathlib import Path, PurePath, PurePosixPath, PureWindowsPath
# Pure: string-only, no filesystem access
p = PurePosixPath("/srv/data/logs.txt")
print(p.suffix, p.parts)
w = PureWindowsPath(r"C:\Users\Alice\Desktop\notes.txt")
print(w.drive, w.parts)
# Concrete: hits the filesystem
c = Path.home() / ".bashrc"
print(c.exists())
Output:
.txt ('/', 'srv', 'data', 'logs.txt')
C: ('C:\\', 'Users', 'Alice', 'Desktop', 'notes.txt')
True
PurePath("a/b") resolves to either PurePosixPath or PureWindowsPath depending on the host OS; instantiate the specific class to force semantics.
OS-specific subclasses — PosixPath and WindowsPath
On a POSIX system, Path() returns a PosixPath; on Windows it returns a WindowsPath. Both inherit the same API, but each enforces its own separator and forbidden-character rules. You rarely instantiate the subclass directly — use the Path factory.
from pathlib import Path, PosixPath, WindowsPath
import os
p = Path("/tmp/example")
print(type(p).__name__)
# On Windows this would print:
# WindowsPath
# On macOS/Linux:
# PosixPath
# Force POSIX-style serialisation regardless of host
print(p.as_posix())
print(p.as_uri()) # → file:///tmp/example
Output (Linux/macOS):
PosixPath
/tmp/example
file:///tmp/example
When writing to JSON/YAML/TOML for cross-platform tools, store
p.as_posix()rather thanstr(p). The Posix form round-trips on every OS.
The / operator and joinpath
The division operator on Path performs path joining, not arithmetic. It accepts any number of str or Path operands on the right side and is the idiomatic replacement for os.path.join(...). joinpath(*parts) is the method form, useful when you have a list of components.
from pathlib import Path
base = Path.home()
# Operator form (preferred)
log = base / "logs" / "app.log"
print(log)
# Method form — same result
log2 = base.joinpath("logs", "app.log")
print(log2)
# Joining with a list (use *unpacking)
parts = ["src", "lib", "core.py"]
p = base.joinpath(*parts)
print(p)
Output:
/home/alice/logs/app.log
/home/alice/logs/app.log
/home/alice/src/lib/core.py
Absolute paths on the right reset everything.
Path("/home/alice") / "/etc/passwd"returnsPath("/etc/passwd")— anything to the left is discarded. This mirrorsos.path.joinbehaviour but surprises newcomers. Validate untrusted user input withis_absolute()before joining.
Resolution, normalisation, and absolute paths
absolute() prepends the current working directory but does not resolve symlinks or .. segments. resolve(strict=False) does the full thing: it makes the path absolute and resolves symlinks, dots, and .. to a canonical form. Use resolve(strict=True) to raise FileNotFoundError if any path component is missing — a useful sanity check.
from pathlib import Path
p = Path("../docs/./readme.md")
print(p.absolute()) # cwd-prepended; not normalised
print(p.resolve()) # canonical; symlinks resolved
print(p.expanduser()) # ~ → /home/alice
Output:
/home/alice/project/../docs/./readme.md
/home/alice/docs/readme.md
../docs/./readme.md
expanduser() expands a leading ~ or ~user. Combine with resolve() for full canonicalisation: Path("~/notes.md").expanduser().resolve().
Glob and rglob — pattern matching
glob(pattern) matches files in a directory using shell-style wildcards (*, ?, [abc], **); rglob(pattern) is the recursive form, equivalent to glob("**/" + pattern). Both return generators — wrap in list(...) or sorted(...) when you need a concrete sequence.
from pathlib import Path
root = Path("/tmp/demo_project")
# All Python files anywhere under root (recursive)
py = sorted(root.rglob("*.py"))
# Direct children only matching a pattern
direct = sorted(root.glob("src/*.py"))
# Multiple patterns with chain.from_iterable
from itertools import chain
images = sorted(chain.from_iterable(root.rglob(p)
for p in ("*.png", "*.jpg", "*.webp")))
# Use ** explicitly for arbitrary-depth match
deep = sorted(root.glob("**/test_*.py"))
# Glob with character class
test_files = sorted(root.rglob("test_[a-z]*.py"))
Output: (paths printed depend on filesystem contents)
[PosixPath('/tmp/demo_project/src/app.py'), ...]
rglobtraverses hidden directories (starting with.) as well. If you want to skip.gitornode_modules, filter the iterator or useos.walkwith explicit pruning. Pathlib added acase_sensitive=argument in Python 3.12.
For complex filesystem queries (e.g. "Python files under 1 MB modified this week"), chain
rglobwithstat():import time cutoff = time.time() - 7*86400 recent = [p for p in root.rglob("*.py") if p.stat().st_size < 1_000_000 and p.stat().st_mtime > cutoff]
iterdir and walking trees
iterdir() yields the immediate children (files and directories) of a directory; it does not recurse. For full tree walks, use rglob("*") or Path.walk() (Python 3.12+), which mirrors os.walk but yields Path objects.
from pathlib import Path
root = Path("/tmp/demo_project")
# Direct children
for child in sorted(root.iterdir()):
kind = "DIR " if child.is_dir() else "FILE"
print(kind, child.name)
Output:
FILE README.md
DIR src
DIR tests
Path.walk() (Python 3.12+) is the modern recursive walker; on earlier versions, use os.walk(root) and wrap results in Path.
# Python 3.12+
from pathlib import Path
for dirpath, dirnames, filenames in Path("/tmp/demo_project").walk():
# Prune unwanted dirs in-place (just like os.walk)
dirnames[:] = [d for d in dirnames if d not in {".git", "__pycache__"}]
for name in filenames:
print(dirpath / name)
# Pre-3.12 fallback
import os
from pathlib import Path
for dirpath, dirnames, filenames in os.walk("/tmp/demo_project"):
dirnames[:] = [d for d in dirnames if d not in {".git"}]
for name in filenames:
print(Path(dirpath) / name)
Creating directories — mkdir
mkdir() creates a single directory and raises FileExistsError if it exists or FileNotFoundError if a parent is missing. parents=True creates missing intermediates (mkdir -p); exist_ok=True makes the call idempotent. The canonical safe-create idiom is p.mkdir(parents=True, exist_ok=True).
from pathlib import Path
target = Path("/tmp/myapp/logs/2026")
target.mkdir(parents=True, exist_ok=True)
print(target.is_dir())
# Mode argument controls permissions on Unix (subject to umask)
restricted = Path("/tmp/myapp/secrets")
restricted.mkdir(mode=0o700, exist_ok=True)
print(oct(restricted.stat().st_mode & 0o777))
Output:
True
0o700
Renaming, moving, and replacing
rename(target) renames or moves a path; on POSIX it fails silently if target exists on the same filesystem, but cross-device moves raise. replace(target) always overwrites the target, atomically when possible — use it for safe writes (write to temp, then replace). Cross-device moves are not atomic; fall back to shutil.move for that case.
from pathlib import Path
import shutil
src = Path("/tmp/notes.txt")
src.write_text("v1")
# Rename within the same dir
new = src.rename(src.with_name("notes-2026.txt"))
print(new)
# Atomic replace (overwrites)
tmp = Path("/tmp/notes-2026.txt.tmp")
tmp.write_text("v2")
tmp.replace(new)
print(new.read_text())
# Cross-device move (or unknown FS): use shutil
shutil.move(str(new), "/tmp/archive/notes-2026.txt")
Output:
/tmp/notes-2026.txt
v2
The atomic-write pattern: write to
path.with_suffix(path.suffix + ".tmp"), fsync, thenreplace(path). Readers always see either the old or the complete new file — never a partial write.
Stat and metadata
stat() returns an os.stat_result with size, timestamps, mode, and inode. Use it to filter, sort, or audit large directories. lstat() does the same but does not follow symlinks (returns metadata of the symlink itself).
from pathlib import Path
import time
p = Path("/tmp/demo_project/src/app.py")
s = p.stat()
print(f"size: {s.st_size} bytes")
print(f"mode: {oct(s.st_mode & 0o777)}")
print(f"mtime: {time.ctime(s.st_mtime)}")
print(f"is_dir: {p.is_dir()}")
print(f"owner: {p.owner()} group: {p.group()}") # Unix only
Output:
size: 13 bytes
mode: 0o644
mtime: Sun Apr 25 14:30:00 2026
is_dir: False
owner: alice group: alice
Every
p.stat()call is a syscall. When filtering thousands of files, cache the result:for p in root.rglob('*'): info = p.stat(); .... Repeatingp.stat().st_sizeandp.stat().st_mtimedoubles the I/O.
Symbolic links
is_symlink() checks symlink status without following; resolve() follows the symlink to its real target; readlink() returns the immediate target (a single hop, not transitively resolved). Create symlinks with symlink_to() and hardlinks with hardlink_to() (Python 3.10+).
from pathlib import Path
target = Path("/tmp/real_file.txt")
target.write_text("data")
link = Path("/tmp/alias.txt")
link.unlink(missing_ok=True)
link.symlink_to(target)
print(link.is_symlink())
print(link.readlink()) # one hop
print(link.resolve()) # full canonical target
print(link.read_text()) # symlink follows transparently
Output:
True
/tmp/real_file.txt
/tmp/real_file.txt
data
Deletion — unlink, rmdir, shutil.rmtree
unlink() removes a single file or symlink; rmdir() removes an empty directory; shutil.rmtree(path) recursively removes a whole tree. None of these are reversible — there is no recycle bin. Use missing_ok=True (Python 3.8+) to suppress the FileNotFoundError for idempotent cleanup.
from pathlib import Path
import shutil
# Single file (idempotent)
Path("/tmp/old.log").unlink(missing_ok=True)
# Empty directory
Path("/tmp/emptydir").rmdir()
# Recursive
shutil.rmtree("/tmp/demo_project", ignore_errors=True)
shutil.rmtreeis permanent and recursive. Passignore_errors=Trueonly when you've already verified the path. A bug in path construction can wipe production data — always log the path first and use a confirmation in interactive scripts (click.confirmif you're using click).
Integration with os and shutil
Pathlib covers the high-frequency 95% of filesystem work. For the rest, os and shutil are still the right answer — every Path accepts a string via str(p) and works with os.path.* for legacy APIs that expect strings.
| Task | Module | Call |
|---|---|---|
| Copy file (with metadata) | shutil | shutil.copy2(src, dst) |
| Copy file tree | shutil | shutil.copytree(src, dst) |
| Move (cross-device-safe) | shutil | shutil.move(src, dst) |
| Recursive delete | shutil | shutil.rmtree(path) |
| Disk usage | shutil | shutil.disk_usage(path) |
| Free disk for tree | shutil | shutil.disk_usage(path).free |
| Temp directory | tempfile | tempfile.TemporaryDirectory() |
| Temp file | tempfile | tempfile.NamedTemporaryFile(delete=False) |
| Get/change cwd | os | os.getcwd() / os.chdir(path) |
| File mode | os | os.chmod(path, 0o644) |
| Path expansion | os.path | os.path.expandvars("$HOME/...") |
| Walk a tree | os | os.walk(path) (or Path.walk() on 3.12+) |
import shutil
from pathlib import Path
src = Path("photo.jpg")
dst = Path("/tmp/backup") / src.name
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(src, dst)
# Walk for tree size
size = sum(f.stat().st_size for f in dst.parent.rglob("*") if f.is_file())
print(f"Tree size: {size} bytes")
Output:
Tree size: 245801 bytes
with_name, with_stem, with_suffix
These three methods return new Path objects with one component swapped — they do not mutate the original (pathlib paths are immutable). Use them when transforming filenames in bulk renames or output-path derivation.
from pathlib import Path
p = Path("/var/log/app.log.gz")
print(p.with_suffix(".zip")) # last suffix only
print(p.with_stem("alice")) # everything before the last suffix
print(p.with_name("new.txt")) # name = stem + suffix
# Strip every suffix with a loop
q = Path("backup.tar.gz")
while q.suffix:
q = q.with_suffix("")
print(q)
Output:
/var/log/app.log.zip
/var/log/alice.gz
/var/log/new.txt
backup
p.suffixesreturns all suffixes (['.tar', '.gz']) — useful for double-extension files.p.suffixis just the last one (.gz).
Comparing and sorting paths
Path objects implement __eq__ and __hash__ based on the normalised string form; they sort lexicographically. Comparing across pure/concrete classes works as expected, but cross-OS PurePosixPath vs PureWindowsPath comparisons raise TypeError.
from pathlib import Path
ps = [Path("b/2.txt"), Path("a/10.txt"), Path("a/2.txt")]
for p in sorted(ps):
print(p)
Output:
a/10.txt
a/2.txt
b/2.txt
Lexicographic order treats
10.txtas less than2.txt. For natural ordering, sort by(parent, int(stem)) if stem.isdigit() else (parent, stem)or use the natsort library.
Relative paths and is_relative_to
p.relative_to(base) returns the path of p relative to base, raising ValueError if p is not under base. is_relative_to(base) (Python 3.9+) is the boolean form — no exception, just True/False.
from pathlib import Path
root = Path("/srv/myapp")
log = Path("/srv/myapp/logs/2026/app.log")
print(log.relative_to(root))
print(log.is_relative_to(root))
print(log.is_relative_to("/etc"))
Output:
logs/2026/app.log
True
False
This is the standard guard against directory-traversal attacks: validate that user-supplied paths resolve to inside an allowed base before opening them.
from pathlib import Path
UPLOAD_ROOT = Path("/srv/uploads").resolve()
def safe_read(rel: str) -> str:
p = (UPLOAD_ROOT / rel).resolve()
if not p.is_relative_to(UPLOAD_ROOT):
raise PermissionError(f"refusing to access {p}")
return p.read_text()
Common pitfalls
Pathis immutable — methods likewith_suffix,with_name, andjoinpathreturn new objects.p.with_suffix(".bak")alone does nothing; assign the result:p = p.with_suffix(".bak").
exists()follows symlinks — a broken symlink returnsFalse. Usep.is_symlink()(does not follow) to detect the symlink itself, even if its target is missing.
Path("")is not a real path — it represents the current directory.afterresolve()but is not equal toPath(".")for string purposes. Always usePath.cwd()orPath(".")explicitly.
renamecan clobber silently on POSIX — if the destination exists, POSIXrename(2)overwrites it. Usereplace()for explicit overwrite or check existence first.
p.stat()raisesFileNotFoundErrorfor missing paths — there is no "soft" stat. Either checkexists()first or wrap intry/except FileNotFoundError.
rglobfollows symlinks, which can cause infinite loops on symlink cycles. Passfollow_symlinks=False(Python 3.13+) or filter out symlinks manually.
Real-world recipes
Recursive size of a directory
Summing stat().st_size across rglob("*") is the pathlib idiom for "how big is this directory?". Filter is_file() to skip the directory entries themselves.
from pathlib import Path
def tree_size(root: Path) -> int:
return sum(p.stat().st_size for p in root.rglob("*") if p.is_file())
print(f"{tree_size(Path('/tmp/demo_project'))} bytes")
Output:
36 bytes
Batch rename — date-prefix every file
Walk a directory, build the new name with with_name, and call rename. Always print the planned rename first in dry-run mode before applying.
from pathlib import Path
from datetime import datetime
def date_prefix(root: Path, dry_run=True):
today = datetime.now().strftime("%Y-%m-%d")
for p in sorted(root.iterdir()):
if p.is_file() and not p.name.startswith(today):
target = p.with_name(f"{today}_{p.name}")
print(f"{'DRY' if dry_run else 'MV '} {p} -> {target}")
if not dry_run:
p.rename(target)
date_prefix(Path("/tmp/inbox"), dry_run=True)
Output:
DRY /tmp/inbox/photo.jpg -> /tmp/inbox/2026-05-25_photo.jpg
DRY /tmp/inbox/notes.md -> /tmp/inbox/2026-05-25_notes.md
Find the largest N files under a tree
Build a list of (size, path) tuples from rglob, then use heapq.nlargest for an O(n log k) top-K.
import heapq
from pathlib import Path
def largest(root: Path, n: int = 10):
files = ((p.stat().st_size, p) for p in root.rglob("*") if p.is_file())
for size, p in heapq.nlargest(n, files, key=lambda pair: pair[0]):
print(f"{size:>10} {p}")
largest(Path.home(), n=5)
Output:
10240000 /home/alice/Videos/clip.mp4
2456789 /home/alice/Downloads/dataset.csv
893456 /home/alice/Pictures/photo.jpg
102400 /home/alice/notes.tar.gz
45120 /home/alice/.bash_history
Atomic config write
Write to a temp sibling and replace the target. This guarantees readers never see a partially written file. Pair with os.fsync if you need durability across power loss.
from pathlib import Path
import json, os
def write_atomic(path: Path, data: dict) -> None:
tmp = path.with_suffix(path.suffix + ".tmp")
with tmp.open("w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
f.flush()
os.fsync(f.fileno())
tmp.replace(path)
write_atomic(Path("/tmp/config.json"), {"host": "myhost", "port": 8080})
Find git repos under a tree
Recursively locate every directory containing a .git subdirectory. The trick is to prune the walk once .git is found — don't descend into it.
from pathlib import Path
def find_repos(root: Path):
for p in root.rglob(".git"):
if p.is_dir():
yield p.parent
for repo in find_repos(Path.home() / "Code"):
print(repo)
Output:
/home/alice/Code/jockey
/home/alice/Code/dotfiles
/home/alice/Code/notes