cheat sheet

pathlib

Work with filesystem paths using Python's built-in pathlib module. Covers Path creation, navigation, reading/writing files, glob patterns, and stat.

pathlib — Object-Oriented File Paths

What it is

pathlib is part of the Python standard library (no install needed). It represents filesystem paths as Path objects instead of plain strings, giving you methods for reading, writing, navigating, and querying files — all in a cross-platform way that handles Windows backslashes automatically.

It replaces os.path, os.getcwd(), open() boilerplate, and glob.glob() for most filesystem tasks.

Quick example

python
from pathlib import Path

p = Path("documents/readme.txt")
print(p.name)       # filename with extension
print(p.stem)       # filename without extension
print(p.suffix)     # extension with dot
print(p.parent)     # containing directory
print(p.parts)      # tuple of path components

Output:

text
readme.txt
readme
.txt
documents
('documents', 'readme.txt')

When / why to use it

  • Any time you manipulate file paths — pathlib is cleaner and more readable than os.path.join().
  • Reading and writing files without managing file handles.
  • Recursive directory searches with rglob("*.py").
  • Cross-platform code: Path uses the right separator on every OS.

Prefer Path over str for all path values in new code. If a library requires a string, wrap: str(p) or use p.as_posix() for forward-slash strings.

Common pitfalls

/ operator builds paths, not dividesPath("/home") / "user" / "file.txt" is path concatenation. If you pass an absolute path as the right operand it replaces everything to the left: Path("/home") / "/etc/passwd"Path("/etc/passwd").

Path.open() vs open() — both work, but path.read_text() / path.write_text() are shorter for simple file reads/writes and handle encoding arguments directly.

Special paths

Path.home() returns the current user's home directory; Path.cwd() returns the process working directory. In scripts, Path(__file__).parent gives the directory of the script itself — useful for building paths relative to the source file rather than wherever the script is invoked from.

python
from pathlib import Path

print(Path.cwd())        # current working directory
print(Path.home())       # home directory (~)
print(Path("/").root)    # "/"

Output:

text
/home/user/myproject
/home/user
/

Reading and writing

read_text() and write_text() handle the open/read/close cycle in one call and accept an encoding parameter. read_bytes() and write_bytes() do the same for binary data. For appending or more control, fall back to p.open("a") as a context manager.

python
from pathlib import Path

p = Path("notes.txt")

# Write text (creates or overwrites)
p.write_text("Hello\nWorld\n", encoding="utf-8")

# Read text
content = p.read_text(encoding="utf-8")
print(repr(content))

# Append (open explicitly)
with p.open("a") as f:
    f.write("More text\n")

# Binary
p.write_bytes(b"\x89PNG\r\n")
data = p.read_bytes()
print(len(data), "bytes")

Output:

text
'Hello\nWorld\n'
6 bytes

Richer example — directory operations and glob

python
from pathlib import Path

base = Path("/tmp/demo_project")

# Create a directory tree
(base / "src").mkdir(parents=True, exist_ok=True)
(base / "tests").mkdir(parents=True, exist_ok=True)

# Write some files
(base / "src" / "app.py").write_text("print('app')\n")
(base / "src" / "utils.py").write_text("# utilities\n")
(base / "tests" / "test_app.py").write_text("# tests\n")
(base / "README.md").write_text("# Demo\n")

# Find all Python files recursively
py_files = sorted(base.rglob("*.py"))
print("Python files:")
for f in py_files:
    print(f"  {f.relative_to(base)}  ({f.stat().st_size} bytes)")

# Find only in src/ (non-recursive)
src_files = sorted((base / "src").glob("*.py"))
print("\nSrc files:")
for f in src_files:
    print(f"  {f.name}")

Output:

text
Python files:
  src/app.py  (13 bytes)
  src/utils.py  (14 bytes)
  tests/test_app.py  (9 bytes)

Src files:
  app.py
  utils.py

Checking existence and type

exists() returns True for any path that exists on the filesystem, including symlinks and directories. Use the more specific is_file() or is_dir() when you need to distinguish between them — both return False if the path doesn't exist at all.

python
from pathlib import Path

p = Path("/tmp/demo_project/src/app.py")

print(p.exists())     # True if path exists (any type)
print(p.is_file())    # True if regular file
print(p.is_dir())     # True if directory
print(p.is_symlink()) # True if symlink

Output:

text
True
True
False
False

Quick reference

TaskCode
Build pathPath("dir") / "sub" / "file.txt"
Absolute pathp.resolve()
Home dirPath.home()
Current dirPath.cwd()
Read textp.read_text(encoding="utf-8")
Write textp.write_text("content")
Read bytesp.read_bytes()
Create dirp.mkdir(parents=True, exist_ok=True)
List dirlist(p.iterdir())
Glob (non-recursive)list(p.glob("*.py"))
Glob (recursive)list(p.rglob("*.py"))
File sizep.stat().st_size
Rename / movep.rename(new_path)
Copy (no method — use)shutil.copy2(src, dst)
Delete filep.unlink()
Delete empty dirp.rmdir()
Delete treeshutil.rmtree(p)
Change extensionp.with_suffix(".txt")
Change namep.with_name("other.txt")
Change stemp.with_stem("other")
Relative pathp.relative_to(base)
Parent dirsp.parents[0], p.parents[1], …

Path vs PurePath

PurePath is the string-manipulation base class — it understands path syntax (separators, suffixes, parts) but never touches the filesystem. Path is the concrete subclass that adds I/O methods (read_text, iterdir, exists, stat). Reach for PurePath when you are constructing paths for another system (e.g., building a Linux path on Windows for a remote machine) or when writing pure logic that should not hit the disk.

python
from pathlib import Path, PurePath, PurePosixPath, PureWindowsPath

# Pure: string-only, no filesystem access
p = PurePosixPath("/srv/data/logs.txt")
print(p.suffix, p.parts)

w = PureWindowsPath(r"C:\Users\Alice\Desktop\notes.txt")
print(w.drive, w.parts)

# Concrete: hits the filesystem
c = Path.home() / ".bashrc"
print(c.exists())

Output:

text
.txt ('/', 'srv', 'data', 'logs.txt')
C: ('C:\\', 'Users', 'Alice', 'Desktop', 'notes.txt')
True

PurePath("a/b") resolves to either PurePosixPath or PureWindowsPath depending on the host OS; instantiate the specific class to force semantics.

OS-specific subclasses — PosixPath and WindowsPath

On a POSIX system, Path() returns a PosixPath; on Windows it returns a WindowsPath. Both inherit the same API, but each enforces its own separator and forbidden-character rules. You rarely instantiate the subclass directly — use the Path factory.

python
from pathlib import Path, PosixPath, WindowsPath
import os

p = Path("/tmp/example")
print(type(p).__name__)

# On Windows this would print:
#   WindowsPath
# On macOS/Linux:
#   PosixPath

# Force POSIX-style serialisation regardless of host
print(p.as_posix())
print(p.as_uri())   # → file:///tmp/example

Output (Linux/macOS):

text
PosixPath
/tmp/example
file:///tmp/example

When writing to JSON/YAML/TOML for cross-platform tools, store p.as_posix() rather than str(p). The Posix form round-trips on every OS.

The / operator and joinpath

The division operator on Path performs path joining, not arithmetic. It accepts any number of str or Path operands on the right side and is the idiomatic replacement for os.path.join(...). joinpath(*parts) is the method form, useful when you have a list of components.

python
from pathlib import Path

base = Path.home()

# Operator form (preferred)
log = base / "logs" / "app.log"
print(log)

# Method form — same result
log2 = base.joinpath("logs", "app.log")
print(log2)

# Joining with a list (use *unpacking)
parts = ["src", "lib", "core.py"]
p = base.joinpath(*parts)
print(p)

Output:

text
/home/alice/logs/app.log
/home/alice/logs/app.log
/home/alice/src/lib/core.py

Absolute paths on the right reset everything. Path("/home/alice") / "/etc/passwd" returns Path("/etc/passwd") — anything to the left is discarded. This mirrors os.path.join behaviour but surprises newcomers. Validate untrusted user input with is_absolute() before joining.

Resolution, normalisation, and absolute paths

absolute() prepends the current working directory but does not resolve symlinks or .. segments. resolve(strict=False) does the full thing: it makes the path absolute and resolves symlinks, dots, and .. to a canonical form. Use resolve(strict=True) to raise FileNotFoundError if any path component is missing — a useful sanity check.

python
from pathlib import Path

p = Path("../docs/./readme.md")
print(p.absolute())   # cwd-prepended; not normalised
print(p.resolve())    # canonical; symlinks resolved
print(p.expanduser()) # ~ → /home/alice

Output:

text
/home/alice/project/../docs/./readme.md
/home/alice/docs/readme.md
../docs/./readme.md

expanduser() expands a leading ~ or ~user. Combine with resolve() for full canonicalisation: Path("~/notes.md").expanduser().resolve().

Glob and rglob — pattern matching

glob(pattern) matches files in a directory using shell-style wildcards (*, ?, [abc], **); rglob(pattern) is the recursive form, equivalent to glob("**/" + pattern). Both return generators — wrap in list(...) or sorted(...) when you need a concrete sequence.

python
from pathlib import Path

root = Path("/tmp/demo_project")

# All Python files anywhere under root (recursive)
py = sorted(root.rglob("*.py"))

# Direct children only matching a pattern
direct = sorted(root.glob("src/*.py"))

# Multiple patterns with chain.from_iterable
from itertools import chain
images = sorted(chain.from_iterable(root.rglob(p)
                                     for p in ("*.png", "*.jpg", "*.webp")))

# Use ** explicitly for arbitrary-depth match
deep = sorted(root.glob("**/test_*.py"))

# Glob with character class
test_files = sorted(root.rglob("test_[a-z]*.py"))

Output: (paths printed depend on filesystem contents)

text
[PosixPath('/tmp/demo_project/src/app.py'), ...]

rglob traverses hidden directories (starting with .) as well. If you want to skip .git or node_modules, filter the iterator or use os.walk with explicit pruning. Pathlib added a case_sensitive= argument in Python 3.12.

For complex filesystem queries (e.g. "Python files under 1 MB modified this week"), chain rglob with stat():

python
import time
cutoff = time.time() - 7*86400
recent = [p for p in root.rglob("*.py")
          if p.stat().st_size < 1_000_000
          and p.stat().st_mtime > cutoff]

iterdir and walking trees

iterdir() yields the immediate children (files and directories) of a directory; it does not recurse. For full tree walks, use rglob("*") or Path.walk() (Python 3.12+), which mirrors os.walk but yields Path objects.

python
from pathlib import Path

root = Path("/tmp/demo_project")

# Direct children
for child in sorted(root.iterdir()):
    kind = "DIR " if child.is_dir() else "FILE"
    print(kind, child.name)

Output:

text
FILE README.md
DIR  src
DIR  tests

Path.walk() (Python 3.12+) is the modern recursive walker; on earlier versions, use os.walk(root) and wrap results in Path.

python
# Python 3.12+
from pathlib import Path

for dirpath, dirnames, filenames in Path("/tmp/demo_project").walk():
    # Prune unwanted dirs in-place (just like os.walk)
    dirnames[:] = [d for d in dirnames if d not in {".git", "__pycache__"}]
    for name in filenames:
        print(dirpath / name)
python
# Pre-3.12 fallback
import os
from pathlib import Path

for dirpath, dirnames, filenames in os.walk("/tmp/demo_project"):
    dirnames[:] = [d for d in dirnames if d not in {".git"}]
    for name in filenames:
        print(Path(dirpath) / name)

Creating directories — mkdir

mkdir() creates a single directory and raises FileExistsError if it exists or FileNotFoundError if a parent is missing. parents=True creates missing intermediates (mkdir -p); exist_ok=True makes the call idempotent. The canonical safe-create idiom is p.mkdir(parents=True, exist_ok=True).

python
from pathlib import Path

target = Path("/tmp/myapp/logs/2026")

target.mkdir(parents=True, exist_ok=True)
print(target.is_dir())

# Mode argument controls permissions on Unix (subject to umask)
restricted = Path("/tmp/myapp/secrets")
restricted.mkdir(mode=0o700, exist_ok=True)
print(oct(restricted.stat().st_mode & 0o777))

Output:

text
True
0o700

Renaming, moving, and replacing

rename(target) renames or moves a path; on POSIX it fails silently if target exists on the same filesystem, but cross-device moves raise. replace(target) always overwrites the target, atomically when possible — use it for safe writes (write to temp, then replace). Cross-device moves are not atomic; fall back to shutil.move for that case.

python
from pathlib import Path
import shutil

src = Path("/tmp/notes.txt")
src.write_text("v1")

# Rename within the same dir
new = src.rename(src.with_name("notes-2026.txt"))
print(new)

# Atomic replace (overwrites)
tmp = Path("/tmp/notes-2026.txt.tmp")
tmp.write_text("v2")
tmp.replace(new)
print(new.read_text())

# Cross-device move (or unknown FS): use shutil
shutil.move(str(new), "/tmp/archive/notes-2026.txt")

Output:

text
/tmp/notes-2026.txt
v2

The atomic-write pattern: write to path.with_suffix(path.suffix + ".tmp"), fsync, then replace(path). Readers always see either the old or the complete new file — never a partial write.

Stat and metadata

stat() returns an os.stat_result with size, timestamps, mode, and inode. Use it to filter, sort, or audit large directories. lstat() does the same but does not follow symlinks (returns metadata of the symlink itself).

python
from pathlib import Path
import time

p = Path("/tmp/demo_project/src/app.py")
s = p.stat()

print(f"size:   {s.st_size} bytes")
print(f"mode:   {oct(s.st_mode & 0o777)}")
print(f"mtime:  {time.ctime(s.st_mtime)}")
print(f"is_dir: {p.is_dir()}")
print(f"owner:  {p.owner()}  group: {p.group()}")  # Unix only

Output:

text
size:   13 bytes
mode:   0o644
mtime:  Sun Apr 25 14:30:00 2026
is_dir: False
owner:  alice  group: alice

Every p.stat() call is a syscall. When filtering thousands of files, cache the result: for p in root.rglob('*'): info = p.stat(); .... Repeating p.stat().st_size and p.stat().st_mtime doubles the I/O.

is_symlink() checks symlink status without following; resolve() follows the symlink to its real target; readlink() returns the immediate target (a single hop, not transitively resolved). Create symlinks with symlink_to() and hardlinks with hardlink_to() (Python 3.10+).

python
from pathlib import Path

target = Path("/tmp/real_file.txt")
target.write_text("data")

link = Path("/tmp/alias.txt")
link.unlink(missing_ok=True)
link.symlink_to(target)

print(link.is_symlink())
print(link.readlink())          # one hop
print(link.resolve())           # full canonical target
print(link.read_text())         # symlink follows transparently

Output:

text
True
/tmp/real_file.txt
/tmp/real_file.txt
data

unlink() removes a single file or symlink; rmdir() removes an empty directory; shutil.rmtree(path) recursively removes a whole tree. None of these are reversible — there is no recycle bin. Use missing_ok=True (Python 3.8+) to suppress the FileNotFoundError for idempotent cleanup.

python
from pathlib import Path
import shutil

# Single file (idempotent)
Path("/tmp/old.log").unlink(missing_ok=True)

# Empty directory
Path("/tmp/emptydir").rmdir()

# Recursive
shutil.rmtree("/tmp/demo_project", ignore_errors=True)

shutil.rmtree is permanent and recursive. Pass ignore_errors=True only when you've already verified the path. A bug in path construction can wipe production data — always log the path first and use a confirmation in interactive scripts (click.confirm if you're using click).

Integration with os and shutil

Pathlib covers the high-frequency 95% of filesystem work. For the rest, os and shutil are still the right answer — every Path accepts a string via str(p) and works with os.path.* for legacy APIs that expect strings.

TaskModuleCall
Copy file (with metadata)shutilshutil.copy2(src, dst)
Copy file treeshutilshutil.copytree(src, dst)
Move (cross-device-safe)shutilshutil.move(src, dst)
Recursive deleteshutilshutil.rmtree(path)
Disk usageshutilshutil.disk_usage(path)
Free disk for treeshutilshutil.disk_usage(path).free
Temp directorytempfiletempfile.TemporaryDirectory()
Temp filetempfiletempfile.NamedTemporaryFile(delete=False)
Get/change cwdosos.getcwd() / os.chdir(path)
File modeosos.chmod(path, 0o644)
Path expansionos.pathos.path.expandvars("$HOME/...")
Walk a treeosos.walk(path) (or Path.walk() on 3.12+)
python
import shutil
from pathlib import Path

src  = Path("photo.jpg")
dst  = Path("/tmp/backup") / src.name
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(src, dst)

# Walk for tree size
size = sum(f.stat().st_size for f in dst.parent.rglob("*") if f.is_file())
print(f"Tree size: {size} bytes")

Output:

text
Tree size: 245801 bytes

with_name, with_stem, with_suffix

These three methods return new Path objects with one component swapped — they do not mutate the original (pathlib paths are immutable). Use them when transforming filenames in bulk renames or output-path derivation.

python
from pathlib import Path

p = Path("/var/log/app.log.gz")

print(p.with_suffix(".zip"))    # last suffix only
print(p.with_stem("alice"))     # everything before the last suffix
print(p.with_name("new.txt"))   # name = stem + suffix

# Strip every suffix with a loop
q = Path("backup.tar.gz")
while q.suffix:
    q = q.with_suffix("")
print(q)

Output:

text
/var/log/app.log.zip
/var/log/alice.gz
/var/log/new.txt
backup

p.suffixes returns all suffixes (['.tar', '.gz']) — useful for double-extension files. p.suffix is just the last one (.gz).

Comparing and sorting paths

Path objects implement __eq__ and __hash__ based on the normalised string form; they sort lexicographically. Comparing across pure/concrete classes works as expected, but cross-OS PurePosixPath vs PureWindowsPath comparisons raise TypeError.

python
from pathlib import Path

ps = [Path("b/2.txt"), Path("a/10.txt"), Path("a/2.txt")]
for p in sorted(ps):
    print(p)

Output:

text
a/10.txt
a/2.txt
b/2.txt

Lexicographic order treats 10.txt as less than 2.txt. For natural ordering, sort by (parent, int(stem)) if stem.isdigit() else (parent, stem) or use the natsort library.

Relative paths and is_relative_to

p.relative_to(base) returns the path of p relative to base, raising ValueError if p is not under base. is_relative_to(base) (Python 3.9+) is the boolean form — no exception, just True/False.

python
from pathlib import Path

root = Path("/srv/myapp")
log  = Path("/srv/myapp/logs/2026/app.log")

print(log.relative_to(root))
print(log.is_relative_to(root))
print(log.is_relative_to("/etc"))

Output:

text
logs/2026/app.log
True
False

This is the standard guard against directory-traversal attacks: validate that user-supplied paths resolve to inside an allowed base before opening them.

python
from pathlib import Path

UPLOAD_ROOT = Path("/srv/uploads").resolve()

def safe_read(rel: str) -> str:
    p = (UPLOAD_ROOT / rel).resolve()
    if not p.is_relative_to(UPLOAD_ROOT):
        raise PermissionError(f"refusing to access {p}")
    return p.read_text()

Common pitfalls

Path is immutable — methods like with_suffix, with_name, and joinpath return new objects. p.with_suffix(".bak") alone does nothing; assign the result: p = p.with_suffix(".bak").

exists() follows symlinks — a broken symlink returns False. Use p.is_symlink() (does not follow) to detect the symlink itself, even if its target is missing.

Path("") is not a real path — it represents the current directory . after resolve() but is not equal to Path(".") for string purposes. Always use Path.cwd() or Path(".") explicitly.

rename can clobber silently on POSIX — if the destination exists, POSIX rename(2) overwrites it. Use replace() for explicit overwrite or check existence first.

p.stat() raises FileNotFoundError for missing paths — there is no "soft" stat. Either check exists() first or wrap in try/except FileNotFoundError.

rglob follows symlinks, which can cause infinite loops on symlink cycles. Pass follow_symlinks=False (Python 3.13+) or filter out symlinks manually.

Real-world recipes

Recursive size of a directory

Summing stat().st_size across rglob("*") is the pathlib idiom for "how big is this directory?". Filter is_file() to skip the directory entries themselves.

python
from pathlib import Path

def tree_size(root: Path) -> int:
    return sum(p.stat().st_size for p in root.rglob("*") if p.is_file())

print(f"{tree_size(Path('/tmp/demo_project'))} bytes")

Output:

text
36 bytes

Batch rename — date-prefix every file

Walk a directory, build the new name with with_name, and call rename. Always print the planned rename first in dry-run mode before applying.

python
from pathlib import Path
from datetime import datetime

def date_prefix(root: Path, dry_run=True):
    today = datetime.now().strftime("%Y-%m-%d")
    for p in sorted(root.iterdir()):
        if p.is_file() and not p.name.startswith(today):
            target = p.with_name(f"{today}_{p.name}")
            print(f"{'DRY' if dry_run else 'MV '} {p} -> {target}")
            if not dry_run:
                p.rename(target)

date_prefix(Path("/tmp/inbox"), dry_run=True)

Output:

text
DRY /tmp/inbox/photo.jpg -> /tmp/inbox/2026-05-25_photo.jpg
DRY /tmp/inbox/notes.md  -> /tmp/inbox/2026-05-25_notes.md

Find the largest N files under a tree

Build a list of (size, path) tuples from rglob, then use heapq.nlargest for an O(n log k) top-K.

python
import heapq
from pathlib import Path

def largest(root: Path, n: int = 10):
    files = ((p.stat().st_size, p) for p in root.rglob("*") if p.is_file())
    for size, p in heapq.nlargest(n, files, key=lambda pair: pair[0]):
        print(f"{size:>10}  {p}")

largest(Path.home(), n=5)

Output:

text
  10240000  /home/alice/Videos/clip.mp4
   2456789  /home/alice/Downloads/dataset.csv
    893456  /home/alice/Pictures/photo.jpg
    102400  /home/alice/notes.tar.gz
     45120  /home/alice/.bash_history

Atomic config write

Write to a temp sibling and replace the target. This guarantees readers never see a partially written file. Pair with os.fsync if you need durability across power loss.

python
from pathlib import Path
import json, os

def write_atomic(path: Path, data: dict) -> None:
    tmp = path.with_suffix(path.suffix + ".tmp")
    with tmp.open("w", encoding="utf-8") as f:
        json.dump(data, f, indent=2)
        f.flush()
        os.fsync(f.fileno())
    tmp.replace(path)

write_atomic(Path("/tmp/config.json"), {"host": "myhost", "port": 8080})

Find git repos under a tree

Recursively locate every directory containing a .git subdirectory. The trick is to prune the walk once .git is found — don't descend into it.

python
from pathlib import Path

def find_repos(root: Path):
    for p in root.rglob(".git"):
        if p.is_dir():
            yield p.parent

for repo in find_repos(Path.home() / "Code"):
    print(repo)

Output:

text
/home/alice/Code/jockey
/home/alice/Code/dotfiles
/home/alice/Code/notes