cheat sheet

tar

Bundle directory trees into a single archive file with optional compression. Covers create/extract/list, gzip/bzip2/xz formats, exclusions, transforms, and incremental backups.

tar — Archive & Compress

What it is

tar (short for tape archiver) is a POSIX utility that bundles files and directories into a single stream — originally for magnetic tape, now used universally for distributing source code, backups, and Docker image layers. GNU tar (current stable 1.35, released August 2023) is the implementation shipped on most Linux distributions and is the reference for the flags below; BSD tar (used on macOS and FreeBSD, built on libarchive 3.8.7, April 2026) is broadly compatible but lacks a few GNU-specific options like --listed-incremental. Reach for tar whenever you need to preserve a directory tree as one file, especially when compression is desired; for one-way directory replication across machines, rsync is usually a better choice.

Install

tar is preinstalled on every mainstream Linux distribution and on macOS. The package only needs to be installed manually on a minimal container image.

bash
# Debian/Ubuntu (already present on full installs)
sudo apt install tar

# Fedora/RHEL
sudo dnf install tar

# Alpine (containers)
apk add tar

# macOS — GNU tar via Homebrew (installed as `gtar`)
brew install gnu-tar

Output: (none — exits 0 on success)

Syntax

tar takes a one-letter operation mode (c, x, t, u, r) plus modifier flags, then a list of paths. The flags can be passed with or without a leading dash, which is why you often see the historic compact form czvf rather than -c -z -v -f.

bash
tar [OPERATION][OPTIONS] -f ARCHIVE [PATH...]
tar [OPERATION][OPTIONS]  f  ARCHIVE [PATH...]   # historic, no dash

Output: (none — exits 0 on success)

Essential operations & flags

The operation letter chooses what tar does; the compression letter chooses the codec; -f always names the archive file (use - for stdin/stdout).

FlagMeaning
-cCreate a new archive
-xExtract files from archive
-tList archive contents
-uUpdate archive (append newer copies)
-rAppend files to an uncompressed archive
-f FILEUse archive FILE (- for stdin/stdout)
-vVerbose — print each file as it's processed
-zgzip compression (.tar.gz / .tgz)
-jbzip2 compression (.tar.bz2)
-Jxz compression (.tar.xz)
--zstdzstd compression (.tar.zst)
-aAuto-detect compression from -f extension
-C DIRChange to DIR before operating
-pPreserve permissions on extract (default for root)
--strip-components=NDrop N leading path components on extract
--exclude=GLOBSkip paths matching GLOB
--transform=SEDRewrite paths with a sed expression
--listed-incremental=SNARIncremental backup using snapshot file

The classic mnemonics

The three most common invocations are easy to remember as three-letter shells: create-zip-verbose-file, extract-zip-verbose-file, tist-zip-verbose-file. Read each letter left-to-right and the meaning falls out.

bash
tar czvf project.tar.gz project/   # Create gzip archive
tar xzvf project.tar.gz            # eXtract gzip archive
tar tzvf project.tar.gz            # lisT gzip archive (no extract)

Output (tar czvf project.tar.gz project/):

text
project/
project/README.md
project/src/
project/src/main.py
project/src/utils.py
project/tests/test_main.py

Choosing a compression format

tar itself does not compress — it pipes through an external compressor selected by a flag. The trade-off is speed vs. ratio: gzip is fast and universally readable, bzip2 is slower with better ratio, xz is slowest with the best ratio, and zstd rivals xz for size at gzip-class speed. As of 2026, zstd is the recommended default for new archives — it has native multi-threading, ships in every mainstream distro, and is used by Btrfs, Docker, and most package managers.

FlagToolExtensionSpeedRatio
--zstdzstd.tar.zstvery fasthigh
-zgzip.tar.gz / .tgzfastlow
-jbzip2.tar.bz2mediummedium
-Jxz.tar.xzslowhighest
(none).tarfastestnone
bash
tar --zstd -cf site.tar.zst site/ # zstd (recommended)
tar czf  site.tar.gz   site/      # gzip (legacy / max portability)
tar cjf  site.tar.bz2  site/      # bzip2
tar cJf  site.tar.xz   site/      # xz (smallest, slowest)
tar caf  site.tar.gz   site/      # auto from extension

Output: (none — exits 0 on success)

Parallel / multi-threaded compression

Single-threaded compression is often the bottleneck on a modern multi-core CPU. zstd has built-in threading; gzip and xz get parallelism from drop-in replacements (pigz, pixz). Use --use-compress-program (or the shorthand -I) to swap the codec, and pass -T0 to zstd to use every core.

bash
# zstd with all CPU cores (fastest, modern default)
tar -cf site.tar.zst -I 'zstd -T0 -19' site/

# pzstd — parallel zstd front-end
tar -cf site.tar.zst -I pzstd site/

# pigz — parallel gzip, drop-in for .tar.gz
tar -cf site.tar.gz -I pigz site/

# pixz — parallel xz, also enables tar-aware indexed extraction
tar -cf site.tar.xz -I pixz site/

Output: (none — exits 0 on success)

Decompression of zstd is single-threaded by design; threading only helps on the create side. For random access into a huge archive, pixz writes a per-file index so tar -x can seek directly to one entry without scanning the whole stream.

Creating archives

-c writes a new archive; pair it with -f to name the output file and an optional compression flag. Pass directories or files as positional arguments; tar recurses into directories by default.

bash
# Single directory
tar czvf backup.tar.gz /home/alice/notes

# Multiple paths
tar czvf bundle.tar.gz /home/alice/notes /home/alice/photos

# Use -C to avoid embedding absolute paths
tar -C /home/alice -czvf notes.tar.gz notes

Output (tar -C /home/alice -czvf notes.tar.gz notes):

text
notes/
notes/2026-05-01.md
notes/2026-05-15.md
notes/inbox/
notes/inbox/draft.md

Always test new archives with tar tzvf archive.tar.gz | head before deleting the source — a typo in the create command can produce an empty archive without error.

Extracting archives

-x unpacks an archive into the current directory unless -C DIR redirects it. By default existing files are overwritten; use --keep-old-files or --skip-old-files to change that, and --strip-components=N when you want to flatten away wrapper directories.

bash
# Extract here
tar xzvf project.tar.gz

# Extract into another directory
mkdir -p /tmp/restore
tar xzvf project.tar.gz -C /tmp/restore

# Drop the top-level "project/" wrapper
tar xzvf project.tar.gz --strip-components=1

# Only one file/dir
tar xzvf project.tar.gz project/src/main.py

Output (tar xzvf project.tar.gz --strip-components=1):

text
README.md
src/
src/main.py
src/utils.py
tests/test_main.py

Listing without extracting

-t prints the archive table-of-contents without writing any files — the cheapest way to confirm what's inside before extracting. Combine with -v for the long-format listing (permissions, owner, size, mtime).

bash
tar tzf  archive.tar.gz            # filenames only
tar tzvf archive.tar.gz            # long listing
tar tzf  archive.tar.gz | wc -l    # how many entries
tar tzf  archive.tar.gz | grep .py # filter listing

Output (tar tzvf archive.tar.gz):

text
drwxr-xr-x alice/alice       0 2026-05-24 10:00 project/
-rw-r--r-- alice/alice    1024 2026-05-24 10:01 project/README.md
drwxr-xr-x alice/alice       0 2026-05-24 10:01 project/src/
-rw-r--r-- alice/alice    4096 2026-05-24 10:02 project/src/main.py

Excluding files

--exclude=GLOB skips any path matching the shell glob; pass it multiple times for several patterns, or list patterns in a file with --exclude-from=FILE. The flag must appear before the directory argument it applies to, or tar silently ignores it.

bash
# Skip caches and node_modules
tar czvf src.tar.gz \
  --exclude='*.pyc' \
  --exclude='__pycache__' \
  --exclude='node_modules' \
  /home/alice/project

# Many patterns from a file
cat > /tmp/skip.txt <<'EOF'
*.log
.git
build/
dist/
EOF
tar czvf src.tar.gz --exclude-from=/tmp/skip.txt /home/alice/project

# Anchor a pattern to the archive root
tar czvf src.tar.gz --exclude='./tmp/*' .

Output: (none — exits 0 on success)

Path transforms

--transform=EXPR rewrites every stored path with a sed substitution as the archive is written or extracted. This is the cleanest way to rename a top-level directory without first renaming on disk, or to drop a prefix on extract.

bash
# Store project/ as project-2026-05-24/ inside the archive
tar czvf release.tar.gz \
  --transform='s,^project,project-2026-05-24,' \
  project/

# Strip a leading "src/" on extract
tar xzvf release.tar.gz --transform='s,^src/,,'

# Rename everything to lowercase on extract
tar xzvf docs.tar.gz --transform='s/.*/\L&/'

Output: (none — exits 0 on success)

Piped extraction (download + untar)

tar reads from stdin when -f - (or just -) is given as the archive, which makes it natural to pipe a download straight into extraction without ever writing the .tar.gz to disk. This is the canonical way to install upstream tarballs.

bash
# Download and extract in one shot (curl)
curl -fsSL https://example.com/release-1.4.0.tar.gz | tar xz

# Same with wget
wget -qO- https://example.com/release-1.4.0.tar.gz | tar xz

# Strip the top-level directory on the fly
curl -fsSL https://example.com/release-1.4.0.tar.gz \
  | tar xz --strip-components=1 -C /opt/myapp

# Extract one specific file out of a remote archive
curl -fsSL https://example.com/release-1.4.0.tar.gz \
  | tar xz -O release/CHANGELOG.md

Output: (none — exits 0 on success)

-O ("to stdout") prints the named entry to stdout instead of writing it to disk. Combine with curl | tar xz -O ... to peek inside a remote archive without staging anything locally.

Incremental backups

--listed-incremental=SNAR produces a backup that contains only files changed since the previous run, using the snapshot file SNAR to track state. The first run with an empty/missing snapshot creates a full archive (level 0); subsequent runs create incrementals (level 1, 2, …). Restoring requires extracting every archive in order. This is a GNU tar feature — BSD tar does not support it.

bash
# Level 0 — full backup, snapshot is created
tar -cvf /backups/alice-full.tar \
  --listed-incremental=/backups/alice.snar \
  /home/alice

# Level 1 — only files changed since the level 0
tar -cvf /backups/alice-2026-05-25.tar \
  --listed-incremental=/backups/alice.snar \
  /home/alice

# Restore: extract them in order
cd /
tar -xvf /backups/alice-full.tar           --listed-incremental=/dev/null
tar -xvf /backups/alice-2026-05-25.tar     --listed-incremental=/dev/null

Output (tar -cvf ... --listed-incremental=... on a level 1):

text
tar: /home/alice: Directory is new
home/alice/notes/2026-05-25.md
home/alice/photos/IMG_2026.jpg

Preserving (or stripping) ownership

By default GNU tar stores the numeric and named uid/gid of every file; on extract it preserves them when run as root and falls back to the extracting user otherwise. Use --owner, --group, and --numeric-owner to override this behaviour, which is critical when archives move between machines with different user databases.

bash
# Store numeric uid/gid only (portable across hosts)
tar czvf backup.tar.gz --numeric-owner /home/alice

# Force all files to be owned by uid 1000:1000 in the archive
tar czvf release.tar.gz --owner=1000 --group=1000 dist/

# On extract, force ownership to the current user
tar xzvf backup.tar.gz --no-same-owner

Output: (none — exits 0 on success)

Verifying archives

-W (--verify) re-reads the archive after writing to confirm it matches the source — only valid when the archive lives on a seekable medium (file), not a pipe. Combine with a separate sha256sum step for end-to-end checksum coverage.

bash
# Verify on write (uncompressed only)
tar -cvf archive.tar -W project/

# Verify against original tree
tar dzf archive.tar.gz                # diff archive vs filesystem

# Checksum the archive
sha256sum archive.tar.gz > archive.tar.gz.sha256
sha256sum -c archive.tar.gz.sha256

Output (sha256sum -c archive.tar.gz.sha256):

text
archive.tar.gz: OK

Common pitfalls

  1. Absolute paths leak into the archivetar czvf x.tar.gz /home/alice/project stores entries as home/alice/project/.... Use -C /home/alice -czvf x.tar.gz project to anchor the archive at project/.
  2. --exclude placement — put it before the path argument or tar will not apply it.
  3. -z on a plain .tar — passing -z to a non-gzipped archive returns gzip: stdin: not in gzip format. Use -a to auto-detect, or omit the codec flag.
  4. Permissions lost on extract — non-root extraction silently drops uid/gid mismatches. Either run as root or use --no-same-owner deliberately and remap.
  5. Forgetting -f — without -f, GNU tar writes to /dev/tape on systems where that exists, or errors. Always specify -f ARCHIVE.
  6. Symlinks — by default tar archives the link itself, not its target. Use -h (--dereference) to follow links and store the target file.

Real-world recipes

Timestamped directory snapshot

A one-liner that captures a directory into a date-stamped archive, ideal for ad-hoc backups before a risky change.

bash
DIR=/home/alice/project
STAMP=$(date +%Y-%m-%d_%H%M%S)
tar -C "$(dirname "$DIR")" -czvf "${DIR##*/}-${STAMP}.tar.gz" "$(basename "$DIR")"

Output:

text
project/
project/README.md
project/src/main.py
project/tests/test_main.py

Copy a directory tree preserving permissions

tar piped into another tar is the classic technique for replicating a directory with all metadata intact when cp -a is not enough (e.g. across filesystems or into a remote shell).

bash
# Local copy
tar -C /home/alice/src -cf - . | tar -C /home/alice/dst -xpvf -

# Remote copy over SSH
tar -C /home/alice/src -cf - . \
  | ssh alicedev@myhost 'tar -C /opt/app -xpvf -'

Output: (none — exits 0 on success)

Extract a single file from a huge archive

When you only need one file out of a multi-gigabyte backup, name it as a positional argument so tar stops after finding it (with --occurrence=1).

bash
tar xzvf backup.tar.gz --occurrence=1 home/alice/notes/2026-05-15.md

Output:

text
home/alice/notes/2026-05-15.md

Stream a directory to a remote host (no temp file)

Combine tar and ssh to move data without ever writing a .tar.gz to disk on either side — useful when local disk is tight.

bash
tar -C /home/alice/photos -czf - . \
  | ssh alicedev@myhost 'cat > /backups/photos-$(date +%F).tar.gz'

Output: (none — exits 0 on success)

Split a giant archive into chunks

split cuts a huge archive into fixed-size chunks for transfer over flaky networks or services with file-size caps. Reassemble with cat then pipe into tar.

bash
# Split into 2 GiB chunks
tar czvf - /home/alice | split -b 2G - bigbackup.tar.gz.

# Reassemble and extract
cat bigbackup.tar.gz.* | tar xzv

Output: (none — exits 0 on success)

Build a reproducible release tarball

--sort=name, --mtime, --owner, and --group together produce byte-identical archives across runs, which makes sha256sum comparisons meaningful for releases.

bash
tar --sort=name \
    --mtime='2026-05-24 00:00:00 UTC' \
    --owner=0 --group=0 --numeric-owner \
    -czvf myapp-1.4.0.tar.gz \
    -C dist myapp-1.4.0

Output:

text
myapp-1.4.0/
myapp-1.4.0/bin/myapp
myapp-1.4.0/share/man/myapp.1
myapp-1.4.0/README.md

TAR_OPTIONS environment variable

tar has no traditional config file, but GNU tar reads the TAR_OPTIONS environment variable on every invocation and prepends its contents to the command line. This is the canonical way to pin a default behaviour (codec, owner remap, verbosity) across a shell session or systemd unit without rewriting every script.

bash
# Always use zstd and numeric owners in this shell
export TAR_OPTIONS='--zstd --numeric-owner'

tar -cf release.tar.zst dist/      # picks up both flags automatically

# One-off override: empty TAR_OPTIONS for this command
TAR_OPTIONS= tar czf legacy.tar.gz dist/

Output: (none — exits 0 on success)

Modern alternatives

GNU tar is intentionally stable, but a few newer tools are worth knowing for specific workflows. bsdtar is the default on macOS and FreeBSD and is a near drop-in replacement; ouch is a Rust-based front-end that auto-detects format from filename and parallelises compression; pigz / pzstd / pixz give you per-codec parallelism without leaving the tar ecosystem.

ToolWhen to reach for it
bsdtar (libarchive)Cross-platform scripts; reads/writes tar, zip, 7z, iso, cpio from one CLI
ouchFriendly CLI: ouch compress src/ out.tar.zst — no flag soup, parallel by default
pzstd / pigz / pixzDrop-in parallel compressors used via tar -I
zpaqDeduplicating, versioned archives for long-term backups (very slow, very small)
bash
# bsdtar — same flags as GNU tar for the common cases
bsdtar -cf site.tar.zst --zstd site/
bsdtar -xf site.zip                       # also handles zip natively
bsdtar -cf site.7z --format=7zip site/    # write 7z without a separate tool

# ouch — format inferred from extension
ouch compress src/ release.tar.zst
ouch decompress release.tar.zst
ouch list release.tar.zst

Output: (none — exits 0 on success)

Exit codes

CodeMeaning
0Success
1Some files differ (e.g. -d diff mode)
2Fatal error

tar -tf archive.tar.gz | xargs -d'\n' -I{} stat -c '%s %n' {} is not a thing — tar does not stat the underlying files at list time. Use tar tzvf and parse the long listing if you need sizes; or extract to /dev/null and watch du.

Sources