cheat sheet
tar
Bundle directory trees into a single archive file with optional compression. Covers create/extract/list, gzip/bzip2/xz formats, exclusions, transforms, and incremental backups.
tar — Archive & Compress
What it is
tar (short for tape archiver) is a POSIX utility that bundles files and directories into a single stream — originally for magnetic tape, now used universally for distributing source code, backups, and Docker image layers. GNU tar (current stable 1.35, released August 2023) is the implementation shipped on most Linux distributions and is the reference for the flags below; BSD tar (used on macOS and FreeBSD, built on libarchive 3.8.7, April 2026) is broadly compatible but lacks a few GNU-specific options like --listed-incremental. Reach for tar whenever you need to preserve a directory tree as one file, especially when compression is desired; for one-way directory replication across machines, rsync is usually a better choice.
Install
tar is preinstalled on every mainstream Linux distribution and on macOS. The package only needs to be installed manually on a minimal container image.
# Debian/Ubuntu (already present on full installs)
sudo apt install tar
# Fedora/RHEL
sudo dnf install tar
# Alpine (containers)
apk add tar
# macOS — GNU tar via Homebrew (installed as `gtar`)
brew install gnu-tar
Output: (none — exits 0 on success)
Syntax
tar takes a one-letter operation mode (c, x, t, u, r) plus modifier flags, then a list of paths. The flags can be passed with or without a leading dash, which is why you often see the historic compact form czvf rather than -c -z -v -f.
tar [OPERATION][OPTIONS] -f ARCHIVE [PATH...]
tar [OPERATION][OPTIONS] f ARCHIVE [PATH...] # historic, no dash
Output: (none — exits 0 on success)
Essential operations & flags
The operation letter chooses what tar does; the compression letter chooses the codec; -f always names the archive file (use - for stdin/stdout).
| Flag | Meaning |
|---|---|
-c | Create a new archive |
-x | Extract files from archive |
-t | List archive contents |
-u | Update archive (append newer copies) |
-r | Append files to an uncompressed archive |
-f FILE | Use archive FILE (- for stdin/stdout) |
-v | Verbose — print each file as it's processed |
-z | gzip compression (.tar.gz / .tgz) |
-j | bzip2 compression (.tar.bz2) |
-J | xz compression (.tar.xz) |
--zstd | zstd compression (.tar.zst) |
-a | Auto-detect compression from -f extension |
-C DIR | Change to DIR before operating |
-p | Preserve permissions on extract (default for root) |
--strip-components=N | Drop N leading path components on extract |
--exclude=GLOB | Skip paths matching GLOB |
--transform=SED | Rewrite paths with a sed expression |
--listed-incremental=SNAR | Incremental backup using snapshot file |
The classic mnemonics
The three most common invocations are easy to remember as three-letter shells: create-zip-verbose-file, extract-zip-verbose-file, tist-zip-verbose-file. Read each letter left-to-right and the meaning falls out.
tar czvf project.tar.gz project/ # Create gzip archive
tar xzvf project.tar.gz # eXtract gzip archive
tar tzvf project.tar.gz # lisT gzip archive (no extract)
Output (tar czvf project.tar.gz project/):
project/
project/README.md
project/src/
project/src/main.py
project/src/utils.py
project/tests/test_main.py
Choosing a compression format
tar itself does not compress — it pipes through an external compressor selected by a flag. The trade-off is speed vs. ratio: gzip is fast and universally readable, bzip2 is slower with better ratio, xz is slowest with the best ratio, and zstd rivals xz for size at gzip-class speed. As of 2026, zstd is the recommended default for new archives — it has native multi-threading, ships in every mainstream distro, and is used by Btrfs, Docker, and most package managers.
| Flag | Tool | Extension | Speed | Ratio |
|---|---|---|---|---|
--zstd | zstd | .tar.zst | very fast | high |
-z | gzip | .tar.gz / .tgz | fast | low |
-j | bzip2 | .tar.bz2 | medium | medium |
-J | xz | .tar.xz | slow | highest |
| (none) | — | .tar | fastest | none |
tar --zstd -cf site.tar.zst site/ # zstd (recommended)
tar czf site.tar.gz site/ # gzip (legacy / max portability)
tar cjf site.tar.bz2 site/ # bzip2
tar cJf site.tar.xz site/ # xz (smallest, slowest)
tar caf site.tar.gz site/ # auto from extension
Output: (none — exits 0 on success)
Parallel / multi-threaded compression
Single-threaded compression is often the bottleneck on a modern multi-core CPU. zstd has built-in threading; gzip and xz get parallelism from drop-in replacements (pigz, pixz). Use --use-compress-program (or the shorthand -I) to swap the codec, and pass -T0 to zstd to use every core.
# zstd with all CPU cores (fastest, modern default)
tar -cf site.tar.zst -I 'zstd -T0 -19' site/
# pzstd — parallel zstd front-end
tar -cf site.tar.zst -I pzstd site/
# pigz — parallel gzip, drop-in for .tar.gz
tar -cf site.tar.gz -I pigz site/
# pixz — parallel xz, also enables tar-aware indexed extraction
tar -cf site.tar.xz -I pixz site/
Output: (none — exits 0 on success)
Decompression of zstd is single-threaded by design; threading only helps on the create side. For random access into a huge archive, pixz writes a per-file index so
tar -xcan seek directly to one entry without scanning the whole stream.
Creating archives
-c writes a new archive; pair it with -f to name the output file and an optional compression flag. Pass directories or files as positional arguments; tar recurses into directories by default.
# Single directory
tar czvf backup.tar.gz /home/alice/notes
# Multiple paths
tar czvf bundle.tar.gz /home/alice/notes /home/alice/photos
# Use -C to avoid embedding absolute paths
tar -C /home/alice -czvf notes.tar.gz notes
Output (tar -C /home/alice -czvf notes.tar.gz notes):
notes/
notes/2026-05-01.md
notes/2026-05-15.md
notes/inbox/
notes/inbox/draft.md
Always test new archives with
tar tzvf archive.tar.gz | headbefore deleting the source — a typo in the create command can produce an empty archive without error.
Extracting archives
-x unpacks an archive into the current directory unless -C DIR redirects it. By default existing files are overwritten; use --keep-old-files or --skip-old-files to change that, and --strip-components=N when you want to flatten away wrapper directories.
# Extract here
tar xzvf project.tar.gz
# Extract into another directory
mkdir -p /tmp/restore
tar xzvf project.tar.gz -C /tmp/restore
# Drop the top-level "project/" wrapper
tar xzvf project.tar.gz --strip-components=1
# Only one file/dir
tar xzvf project.tar.gz project/src/main.py
Output (tar xzvf project.tar.gz --strip-components=1):
README.md
src/
src/main.py
src/utils.py
tests/test_main.py
Listing without extracting
-t prints the archive table-of-contents without writing any files — the cheapest way to confirm what's inside before extracting. Combine with -v for the long-format listing (permissions, owner, size, mtime).
tar tzf archive.tar.gz # filenames only
tar tzvf archive.tar.gz # long listing
tar tzf archive.tar.gz | wc -l # how many entries
tar tzf archive.tar.gz | grep .py # filter listing
Output (tar tzvf archive.tar.gz):
drwxr-xr-x alice/alice 0 2026-05-24 10:00 project/
-rw-r--r-- alice/alice 1024 2026-05-24 10:01 project/README.md
drwxr-xr-x alice/alice 0 2026-05-24 10:01 project/src/
-rw-r--r-- alice/alice 4096 2026-05-24 10:02 project/src/main.py
Excluding files
--exclude=GLOB skips any path matching the shell glob; pass it multiple times for several patterns, or list patterns in a file with --exclude-from=FILE. The flag must appear before the directory argument it applies to, or tar silently ignores it.
# Skip caches and node_modules
tar czvf src.tar.gz \
--exclude='*.pyc' \
--exclude='__pycache__' \
--exclude='node_modules' \
/home/alice/project
# Many patterns from a file
cat > /tmp/skip.txt <<'EOF'
*.log
.git
build/
dist/
EOF
tar czvf src.tar.gz --exclude-from=/tmp/skip.txt /home/alice/project
# Anchor a pattern to the archive root
tar czvf src.tar.gz --exclude='./tmp/*' .
Output: (none — exits 0 on success)
Path transforms
--transform=EXPR rewrites every stored path with a sed substitution as the archive is written or extracted. This is the cleanest way to rename a top-level directory without first renaming on disk, or to drop a prefix on extract.
# Store project/ as project-2026-05-24/ inside the archive
tar czvf release.tar.gz \
--transform='s,^project,project-2026-05-24,' \
project/
# Strip a leading "src/" on extract
tar xzvf release.tar.gz --transform='s,^src/,,'
# Rename everything to lowercase on extract
tar xzvf docs.tar.gz --transform='s/.*/\L&/'
Output: (none — exits 0 on success)
Piped extraction (download + untar)
tar reads from stdin when -f - (or just -) is given as the archive, which makes it natural to pipe a download straight into extraction without ever writing the .tar.gz to disk. This is the canonical way to install upstream tarballs.
# Download and extract in one shot (curl)
curl -fsSL https://example.com/release-1.4.0.tar.gz | tar xz
# Same with wget
wget -qO- https://example.com/release-1.4.0.tar.gz | tar xz
# Strip the top-level directory on the fly
curl -fsSL https://example.com/release-1.4.0.tar.gz \
| tar xz --strip-components=1 -C /opt/myapp
# Extract one specific file out of a remote archive
curl -fsSL https://example.com/release-1.4.0.tar.gz \
| tar xz -O release/CHANGELOG.md
Output: (none — exits 0 on success)
-O("to stdout") prints the named entry to stdout instead of writing it to disk. Combine withcurl | tar xz -O ...to peek inside a remote archive without staging anything locally.
Incremental backups
--listed-incremental=SNAR produces a backup that contains only files changed since the previous run, using the snapshot file SNAR to track state. The first run with an empty/missing snapshot creates a full archive (level 0); subsequent runs create incrementals (level 1, 2, …). Restoring requires extracting every archive in order. This is a GNU tar feature — BSD tar does not support it.
# Level 0 — full backup, snapshot is created
tar -cvf /backups/alice-full.tar \
--listed-incremental=/backups/alice.snar \
/home/alice
# Level 1 — only files changed since the level 0
tar -cvf /backups/alice-2026-05-25.tar \
--listed-incremental=/backups/alice.snar \
/home/alice
# Restore: extract them in order
cd /
tar -xvf /backups/alice-full.tar --listed-incremental=/dev/null
tar -xvf /backups/alice-2026-05-25.tar --listed-incremental=/dev/null
Output (tar -cvf ... --listed-incremental=... on a level 1):
tar: /home/alice: Directory is new
home/alice/notes/2026-05-25.md
home/alice/photos/IMG_2026.jpg
Preserving (or stripping) ownership
By default GNU tar stores the numeric and named uid/gid of every file; on extract it preserves them when run as root and falls back to the extracting user otherwise. Use --owner, --group, and --numeric-owner to override this behaviour, which is critical when archives move between machines with different user databases.
# Store numeric uid/gid only (portable across hosts)
tar czvf backup.tar.gz --numeric-owner /home/alice
# Force all files to be owned by uid 1000:1000 in the archive
tar czvf release.tar.gz --owner=1000 --group=1000 dist/
# On extract, force ownership to the current user
tar xzvf backup.tar.gz --no-same-owner
Output: (none — exits 0 on success)
Verifying archives
-W (--verify) re-reads the archive after writing to confirm it matches the source — only valid when the archive lives on a seekable medium (file), not a pipe. Combine with a separate sha256sum step for end-to-end checksum coverage.
# Verify on write (uncompressed only)
tar -cvf archive.tar -W project/
# Verify against original tree
tar dzf archive.tar.gz # diff archive vs filesystem
# Checksum the archive
sha256sum archive.tar.gz > archive.tar.gz.sha256
sha256sum -c archive.tar.gz.sha256
Output (sha256sum -c archive.tar.gz.sha256):
archive.tar.gz: OK
Common pitfalls
- Absolute paths leak into the archive —
tar czvf x.tar.gz /home/alice/projectstores entries ashome/alice/project/.... Use-C /home/alice -czvf x.tar.gz projectto anchor the archive atproject/. --excludeplacement — put it before the path argument ortarwill not apply it.-zon a plain.tar— passing-zto a non-gzipped archive returnsgzip: stdin: not in gzip format. Use-ato auto-detect, or omit the codec flag.- Permissions lost on extract — non-root extraction silently drops uid/gid mismatches. Either run as root or use
--no-same-ownerdeliberately and remap. - Forgetting
-f— without-f, GNUtarwrites to/dev/tapeon systems where that exists, or errors. Always specify-f ARCHIVE. - Symlinks — by default
tararchives the link itself, not its target. Use-h(--dereference) to follow links and store the target file.
Real-world recipes
Timestamped directory snapshot
A one-liner that captures a directory into a date-stamped archive, ideal for ad-hoc backups before a risky change.
DIR=/home/alice/project
STAMP=$(date +%Y-%m-%d_%H%M%S)
tar -C "$(dirname "$DIR")" -czvf "${DIR##*/}-${STAMP}.tar.gz" "$(basename "$DIR")"
Output:
project/
project/README.md
project/src/main.py
project/tests/test_main.py
Copy a directory tree preserving permissions
tar piped into another tar is the classic technique for replicating a directory with all metadata intact when cp -a is not enough (e.g. across filesystems or into a remote shell).
# Local copy
tar -C /home/alice/src -cf - . | tar -C /home/alice/dst -xpvf -
# Remote copy over SSH
tar -C /home/alice/src -cf - . \
| ssh alicedev@myhost 'tar -C /opt/app -xpvf -'
Output: (none — exits 0 on success)
Extract a single file from a huge archive
When you only need one file out of a multi-gigabyte backup, name it as a positional argument so tar stops after finding it (with --occurrence=1).
tar xzvf backup.tar.gz --occurrence=1 home/alice/notes/2026-05-15.md
Output:
home/alice/notes/2026-05-15.md
Stream a directory to a remote host (no temp file)
Combine tar and ssh to move data without ever writing a .tar.gz to disk on either side — useful when local disk is tight.
tar -C /home/alice/photos -czf - . \
| ssh alicedev@myhost 'cat > /backups/photos-$(date +%F).tar.gz'
Output: (none — exits 0 on success)
Split a giant archive into chunks
split cuts a huge archive into fixed-size chunks for transfer over flaky networks or services with file-size caps. Reassemble with cat then pipe into tar.
# Split into 2 GiB chunks
tar czvf - /home/alice | split -b 2G - bigbackup.tar.gz.
# Reassemble and extract
cat bigbackup.tar.gz.* | tar xzv
Output: (none — exits 0 on success)
Build a reproducible release tarball
--sort=name, --mtime, --owner, and --group together produce byte-identical archives across runs, which makes sha256sum comparisons meaningful for releases.
tar --sort=name \
--mtime='2026-05-24 00:00:00 UTC' \
--owner=0 --group=0 --numeric-owner \
-czvf myapp-1.4.0.tar.gz \
-C dist myapp-1.4.0
Output:
myapp-1.4.0/
myapp-1.4.0/bin/myapp
myapp-1.4.0/share/man/myapp.1
myapp-1.4.0/README.md
TAR_OPTIONS environment variable
tar has no traditional config file, but GNU tar reads the TAR_OPTIONS environment variable on every invocation and prepends its contents to the command line. This is the canonical way to pin a default behaviour (codec, owner remap, verbosity) across a shell session or systemd unit without rewriting every script.
# Always use zstd and numeric owners in this shell
export TAR_OPTIONS='--zstd --numeric-owner'
tar -cf release.tar.zst dist/ # picks up both flags automatically
# One-off override: empty TAR_OPTIONS for this command
TAR_OPTIONS= tar czf legacy.tar.gz dist/
Output: (none — exits 0 on success)
Modern alternatives
GNU tar is intentionally stable, but a few newer tools are worth knowing for specific workflows. bsdtar is the default on macOS and FreeBSD and is a near drop-in replacement; ouch is a Rust-based front-end that auto-detects format from filename and parallelises compression; pigz / pzstd / pixz give you per-codec parallelism without leaving the tar ecosystem.
| Tool | When to reach for it |
|---|---|
bsdtar (libarchive) | Cross-platform scripts; reads/writes tar, zip, 7z, iso, cpio from one CLI |
ouch | Friendly CLI: ouch compress src/ out.tar.zst — no flag soup, parallel by default |
pzstd / pigz / pixz | Drop-in parallel compressors used via tar -I |
zpaq | Deduplicating, versioned archives for long-term backups (very slow, very small) |
# bsdtar — same flags as GNU tar for the common cases
bsdtar -cf site.tar.zst --zstd site/
bsdtar -xf site.zip # also handles zip natively
bsdtar -cf site.7z --format=7zip site/ # write 7z without a separate tool
# ouch — format inferred from extension
ouch compress src/ release.tar.zst
ouch decompress release.tar.zst
ouch list release.tar.zst
Output: (none — exits 0 on success)
Exit codes
| Code | Meaning |
|---|---|
0 | Success |
1 | Some files differ (e.g. -d diff mode) |
2 | Fatal error |
tar -tf archive.tar.gz | xargs -d'\n' -I{} stat -c '%s %n' {}is not a thing —tardoes not stat the underlying files at list time. Usetar tzvfand parse the long listing if you need sizes; or extract to/dev/nulland watchdu.
Sources
- GNU tar 1.35 manual
- GNU tar 1.35 release announcement (info-gnu, July 2023)
- GNU tar — Appendix A: Changes
- libarchive / bsdtar project
- libarchive on GitHub (3.8.x releases)
- ouch — Painless compression and decompression in the terminal
- Speed up tar archiving with zstd and LZ4 (Transloadit)
- Multicore compress and decompress with tar and pzstd
- Arch Wiki — Archiving and compression