cheat sheet
wget
Non-interactive network downloader. Covers single and batch downloads, recursive mirroring, authentication, resuming, rate limiting, and site archiving.
wget — File Downloader
What it is
wget is GNU's free, non-interactive command-line file downloader, available on virtually every Linux distribution and maintained as part of the GNU Project. It supports HTTP, HTTPS, and FTP, and excels at recursive website mirroring, resuming interrupted downloads, and scripted bulk retrieval without any user interaction. Reach for wget when you need to mirror a directory tree or download files reliably in scripts; use curl for API work where fine-grained control over request headers, methods, and TLS is needed.
Basic downloads
wget https://example.com/file.tar.gz # download to cwd
Output:
--2026-04-24 10:00:01-- https://example.com/file.tar.gz
Resolving example.com (example.com)... 93.184.216.34
Connecting to example.com (example.com)|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 52428800 (50M) [application/x-gzip]
Saving to: 'file.tar.gz'
file.tar.gz 100%[===================>] 50.00M 4.21MB/s in 11s
2026-04-24 10:00:12 (4.21 MB/s) - 'file.tar.gz' saved [52428800/52428800]
wget -O output.tar.gz https://example.com/f # custom filename
wget -P /tmp https://example.com/file.tar.gz # save to /tmp/
wget -q https://example.com/file # quiet (no progress bar)
wget -nv https://example.com/file # no verbose (minimal output)
Output (wget -nv https://example.com/file):
2026-04-24 10:00:01 URL:https://example.com/file [1024/1024] -> "file" [1]
wget -S https://example.com # show server response headers
Output:
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 1256
Cache-Control: max-age=604800
Date: Thu, 24 Apr 2026 10:00:01 GMT
Expires: Thu, 01 May 2026 10:00:01 GMT
Server: ECS (nyb/1D2B)
wget --spider https://example.com/file # check URL without downloading
Output:
Spider mode enabled. Check if remote file exists.
--2026-04-24 10:00:01-- https://example.com/file
Resolving example.com... 93.184.216.34
Connecting to example.com|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Remote file exists.
Resume and retry
-c (continue) appends to a partially downloaded file by sending a Range request from the last byte received, provided the server supports it. --tries sets the maximum retry count and --waitretry adds a delay between attempts; setting retries to 0 enables infinite retries, which is useful for unreliable connections.
wget -c https://example.com/bigfile.tar.gz # continue/resume download
Output:
--2026-04-24 10:05:00-- https://example.com/bigfile.tar.gz
Connecting to example.com|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 52428800 (50M), 31457280 (30M) remaining [application/x-gzip]
Saving to: 'bigfile.tar.gz'
bigfile.tar.gz 60%[=========> ] 30.00M 3.87MB/s in 7s
2026-04-24 10:05:07 (3.87 MB/s) - 'bigfile.tar.gz' saved [52428800/52428800]
wget --retry-connrefused -t 10 URL # retry up to 10 times
wget -t 0 URL # infinite retries
wget -w 5 -t 10 URL # wait 5s between retries
wget --timeout=30 URL # connection/read timeout
Output: (none — exits 0 on success)
Authentication
--user and --password supply HTTP Basic or Digest credentials; prefer --netrc-file pointing to a ~/.netrc (chmod 600) to avoid exposing passwords in shell history. For HTTP Basic specifically, --auth-no-challenge sends credentials immediately without waiting for a 401 challenge, which speeds up requests to servers that require auth on the first hit.
wget --user=alice --password=secret https://protected.example.com/file
wget --ask-password --user=alice https://example.com/file
wget --http-user=alice --http-password=s3cr3t https://example.com
wget --no-http-keep-alive --user=alice ... # disable keepalive
# .netrc file (~/.netrc, chmod 600)
# machine example.com login alice password secret
wget --netrc-file=~/.netrc https://example.com/private/file
Output: (none — exits 0 on success)
Headers and cookies
--header appends arbitrary HTTP headers to every request, useful for passing Authorization tokens or custom Accept types. Cookie handling works in two steps: --save-cookies captures the session cookie from a login POST, and --load-cookies replays it on subsequent requests to access protected pages.
wget --header="Authorization: Bearer TOKEN" https://api.example.com
wget --header="Accept: application/json" URL
wget --save-cookies cookies.txt \
--post-data "user=alice&pass=secret" \
https://example.com/login
wget --load-cookies cookies.txt https://example.com/protected
wget --keep-session-cookies --save-cookies cookies.txt URL
Output: (none — exits 0 on success)
Download speed control
--limit-rate caps the download bandwidth so wget doesn't saturate the connection — useful in scripts running during business hours or alongside latency-sensitive traffic. Suffix values with k for kilobytes/s or m for megabytes/s; combine with -w (wait) and --random-wait to also throttle request frequency.
wget --limit-rate=500k URL # limit to 500 KB/s
wget --limit-rate=2m URL # limit to 2 MB/s
wget -w 1 URL # wait 1 second between requests
wget --random-wait URL # random wait 0.5–1.5× -w value
Output: (none — exits 0 on success)
TLS / SSL
--no-check-certificate disables server certificate validation entirely — acceptable for quick tests against self-signed certs but never appropriate for sensitive data. --ca-certificate points wget to a custom CA bundle for verifying private PKI certificates; --certificate and --private-key supply a client certificate for mutual TLS authentication.
wget --no-check-certificate URL # skip cert verification
wget --ca-certificate=/path/to/ca.pem URL
wget --certificate=client.pem --private-key=client.key URL
Output: (none — exits 0 on success)
Batch and list downloads
-i reads URLs from a plain-text file, one per line, and downloads each in sequence. This is the simplest way to run wget against a pre-generated list without shell loops; combine with -P to direct all files into a single directory and -c to resume any that were interrupted.
wget -i urls.txt # download all URLs from file
wget -i urls.txt -P /downloads/ # save all to directory
wget -q -i urls.txt -P /dest/ & # background batch download
# Generate URL list and download
seq 1 100 | sed 's|.*|https://example.com/page/&|' | wget -i - -P pages/
Output: (none — exits 0 on success)
Recursive / mirror
-r follows links within downloaded pages up to a configurable depth (-l), while --mirror is shorthand for -r -N -l inf --no-remove-listing, which is the standard combination for creating a full offline copy of a site. Always pair with --no-parent to prevent wget from crawling up to parent directories outside the target path.
# Download a full website
wget --mirror -p --convert-links \
--no-parent -P ./site-mirror \
https://docs.example.com/
Output:
--2026-04-24 10:10:00-- https://docs.example.com/
Connecting to docs.example.com|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Saving to: 'site-mirror/docs.example.com/index.html'
site-mirror/docs.example.com/index.html saved [12345]
--2026-04-24 10:10:01-- https://docs.example.com/guide/
Saving to: 'site-mirror/docs.example.com/guide/index.html'
…
FINISHED --2026-04-24 10:10:45--
Total wall clock time: 45s
Downloaded: 182 files, 8.4M in 45s (191 KB/s)
# Flags explained:
# --mirror = -r -N -l inf --no-remove-listing
# -p = download all assets (CSS, images, JS)
# --convert-links = rewrite links for offline use
# --no-parent = don't go above the given directory
# Recursive, limited depth
wget -r -l 2 https://example.com/docs/
Output:
--2026-04-24 10:12:00-- https://example.com/docs/
…
Loading robots.txt; please ignore errors.
--2026-04-24 10:12:00-- https://example.com/robots.txt
…
FINISHED --2026-04-24 10:12:22--
Total wall clock time: 22s
Downloaded: 47 files, 2.1M in 22s (97.7 KB/s)
# Download only specific file types
wget -r -l 2 -A "*.pdf,*.doc" https://example.com/resources/
# Exclude file types
wget -r -l 2 -R "*.jpg,*.png,*.gif" https://example.com/
# Stay within the same domain
wget -r -H -D example.com https://example.com/
Output: (none — exits 0 on success)
Recursive options reference
| Flag | Meaning |
|---|---|
-r / --recursive | Recursive download |
-l N | Recursion depth (default 5; inf = unlimited) |
-np / --no-parent | Don't go up to parent directories |
-N | Only download newer files (timestamping) |
-k / --convert-links | Convert links for local browsing |
-p / --page-requisites | Get all assets needed to display page |
-H | Span hosts (follow links to other domains) |
-D DOMAIN | Comma-separated domains to follow |
-A LIST | Accept list (file patterns/extensions) |
-R LIST | Reject list |
-I LIST | Include directories |
-X LIST | Exclude directories |
--no-clobber | Don't overwrite existing files |
Output and logging
-q silences wget's progress output entirely; -o redirects all messages to a log file while -a appends to an existing one. For scripts that should be quiet but still show a progress bar, -q --show-progress is the right pairing.
wget -a wget.log URL # append log to file
wget -o wget.log URL # write log to file (overwrite)
wget --progress=bar URL # progress bar style
wget --progress=dot:giga URL # dot progress for big files
wget -q --show-progress URL # quiet + progress bar
Output (wget -q --show-progress URL):
file.tar.gz 100%[===================>] 50.00M 5.12MB/s in 9s
Timestamps and conditional fetch
-N (timestamping) sends a conditional If-Modified-Since header and skips downloading files that haven't changed since the local copy was last written. This makes repeated runs of a mirror script efficient — wget only transfers files the server reports as newer than what you already have.
wget -N URL # only download if newer than local copy
wget --no-if-modified-since URL # always download
Output: (none — exits 0 on success)
Background and daemon
-b forks wget into the background immediately after starting, freeing the terminal for other work. Progress is written to wget-log by default; use -o logfile to redirect it, then tail -f the log to monitor progress without keeping wget in the foreground.
wget -b URL # download in background
wget -b -o background.log URL # background with log
tail -f wget-log # watch background download progress
Output: (none — exits 0 on success)
FTP support
wget supports plain FTP and FTPS using the same --user/--password flags as HTTP, and -r works for recursive FTP directory traversal. For encrypted transfers or key-based authentication, use sftp:// URIs with a tool like curl or lftp instead, as wget's FTP support does not include SFTP.
wget ftp://ftp.example.com/pub/file.tar.gz
wget --ftp-user=alice --ftp-password=secret ftp://ftp.example.com/file
wget -r ftp://ftp.example.com/pub/ # recursive FTP
Output: (none — exits 0 on success)
wget vs curl
| Task | wget | curl |
|---|---|---|
| Simple download | wget URL | curl -LO URL |
| Save with custom name | wget -O name URL | curl -o name URL |
| Resume | wget -c URL | curl -C - URL |
| Batch from file | wget -i list.txt | xargs -n1 curl -LO < list.txt |
| Recursive mirror | wget --mirror | not built-in |
| API / REST calls | limited | curl -X POST -d ... |
| Pipe to stdout | wget -qO- URL | curl -sS URL |
For simple scripted downloads,
wget -q --show-progress -c -O "$dest" "$url"is the most reliable combination: quiet (no clutter), shows a progress bar, resumes if interrupted, and saves to a named file.
Mirroring in depth
--mirror is the canonical "make me a local copy of this site" switch — it expands to -r -N -l inf --no-remove-listing, enabling unlimited-depth recursion with timestamping so a second run only fetches what changed. Pair it with --page-requisites (-p) to pull every CSS/JS/image referenced from each page, and --convert-links (-k) to rewrite the HTML so it renders correctly from the local filesystem.
# Full offline copy ready to open in a browser
wget --mirror \
--page-requisites \
--convert-links \
--adjust-extension \
--no-parent \
--restrict-file-names=windows \
-e robots=off \
-P ./mirror \
https://docs.example.com/guide/
Output: (none — exits 0 on success)
Flag-by-flag rationale:
| Flag | Why |
|---|---|
--mirror | -r -N -l inf --no-remove-listing — recurse forever, only re-fetch newer files |
-p / --page-requisites | Also pull CSS, JS, images, fonts needed to render each saved page |
-k / --convert-links | Rewrite absolute links to local relative paths post-download |
--adjust-extension | Save .html extension on pages served as text/html without one |
--no-parent | Never ascend above the URL's directory — keeps the crawl scoped |
--restrict-file-names=windows | Strip :?* etc. so the mirror is portable to NTFS/exFAT |
-e robots=off | Ignore robots.txt — only do this on sites you own or are permitted to crawl |
--mirror -npstill walks every linked page within the starting directory. A docs site with thousands of pages produces thousands of HTTP requests. Throttle with--wait,--random-wait, and--limit-rate, and confirm you have permission before crawling third-party sites.
Recursion depth and scope
-l N caps recursion at N levels (default 5; -l inf is unlimited). -D domain1,domain2 plus -H (span hosts) lets wget follow into another domain — required for sites where assets live on cdn.example.com while pages are on www.example.com.
# Two levels deep, stay on this host
wget -r -l 2 --no-parent https://example.com/docs/
# Follow into the CDN domain for assets but not anywhere else
wget --mirror -p -k -H -D example.com,cdn.example.com https://example.com/
# Mirror only PDFs and HTML, two levels deep
wget -r -l 2 -A "*.pdf,*.html" --no-parent https://example.com/papers/
# Mirror everything EXCEPT images and large archives
wget -r -l 3 -R "*.jpg,*.png,*.iso,*.zip" https://example.com/
# Include only specific directories
wget -r -I /docs,/api -X /docs/legacy https://example.com/
Output: (none — exits 0 on success)
-A/-R (accept/reject) filter by filename pattern; -I/-X filter by URL path. They run after wget has fetched and parsed each HTML page, so a -R "*.html" is unhelpful (wget needs the HTML to find the links). Use -A for terminal file types (PDFs, tarballs) and let wget crawl HTML normally.
--page-requisites + --convert-links in detail
-p pulls every resource referenced from each downloaded HTML page that the browser would need: <link rel="stylesheet">, <img src>, <script src>, inline url(...) references, and a few others. Without -p, the mirror is HTML-only and renders as unstyled text.
# Single page with all its assets, no link rewriting
wget -p https://example.com/article/
# Single page, fully self-contained for offline use
wget -p -k -H -nd -P ./snapshot https://example.com/article/
Output: (none — exits 0 on success)
-nd (no directories) flattens everything into the destination — useful for "save one article" but disastrous for whole-site mirrors where filenames collide.
-k/--convert-links runs after the download completes and only rewrites links to files that were actually downloaded; missing assets stay as their original absolute URL so the mirror is honest about what failed. Combine with -K (--backup-converted) to keep an .orig copy of every modified HTML file.
Cookies and authenticated scraping
wget's cookie handling is built around the Netscape-format jar shared with curl, Firefox, and most other tools. The recipe for scraping a login-walled site is always the same: POST to the login form to obtain session cookies, save them to a jar, then load the jar for every subsequent request.
# Step 1 — log in and capture session cookies
wget --save-cookies cookies.txt \
--keep-session-cookies \
--post-data 'username=alicedev&password=secret' \
--delete-after \
https://example.com/login
# Step 2 — pull a protected page using those cookies
wget --load-cookies cookies.txt \
https://example.com/account/dashboard
# Mirror an authenticated section
wget --mirror -p -k --no-parent \
--load-cookies cookies.txt \
--user-agent="Mozilla/5.0" \
https://example.com/private/docs/
Output: (none — exits 0 on success)
Key flags:
--save-cookies FILE— write received cookies to FILE on exit.--keep-session-cookies— also save session cookies (the kind without an expiry); without this they're discarded.--load-cookies FILE— send cookies from FILE on every request.--delete-after— remove the body of the response after processing; we only wanted the cookies.
Cookies exported from Firefox/Chrome (via "Get cookies.txt" extensions) work directly with --load-cookies. For sites that set cookies during a redirect chain, also pass --keep-session-cookies to the first request.
If a site refuses to log in via wget but works in the browser, the missing piece is almost always
--user-agent, an extra header like--header="X-CSRF-Token: ...", or a hidden form field grabbed byxidelorcurlfrom the login page first.
Authentication recipes
Beyond Basic and Digest auth, real-world scraping often needs bearer tokens, custom CSRF headers, or form-based login flows. The common pattern is "extract token with a first request, then replay it on subsequent ones".
# Bearer token from env (avoid shell history leak)
TOKEN=$(pass show api/example | head -1)
wget --header="Authorization: Bearer $TOKEN" https://api.example.com/blob
# CSRF + cookie: grab the form, extract the token, post it back
csrf=$(wget -qO- --save-cookies cookies.txt --keep-session-cookies \
https://example.com/login \
| grep -oP 'name="csrf_token" value="\K[^"]+')
wget --load-cookies cookies.txt \
--save-cookies cookies.txt \
--keep-session-cookies \
--post-data "csrf_token=$csrf&user=alicedev&pass=secret" \
https://example.com/login \
-O /dev/null
# .netrc-driven (recommended for repeatable scripts)
cat > ~/.netrc <<'EOF'
machine example.com
login alicedev
password s3cret
EOF
chmod 600 ~/.netrc
wget --netrc https://example.com/private/file
Output: (none — exits 0 on success)
Bandwidth and politeness controls
When mirroring third-party sites, throttling matters as much as correctness — fast crawlers get IP-banned and burn the host's bandwidth. wget combines per-connection rate limiting with inter-request delays and a User-Agent override that identifies you to the site's operators.
# Cap rate AND space requests apart
wget --mirror -p -k \
--limit-rate=300k \
--wait=2 \
--random-wait \
--user-agent="alicedev-mirror/1.0 (alice@example.com)" \
--no-parent \
https://docs.example.com/
# Even slower — 1 request every 5–15s, 100 KB/s cap
wget -r -l 3 --wait=10 --random-wait --limit-rate=100k --no-parent URL
Output: (none — exits 0 on success)
--random-wait multiplies --wait by a random factor between 0.5 and 1.5 for each request, smoothing the request pattern enough that simple rate-based bot detection won't flag you. Combine with --quota=SIZE to abort the run after N bytes downloaded, a useful safety net for unattended jobs.
# Stop after 500 MB regardless of how many files are left
wget -r --no-parent --quota=500m https://example.com/
Output: (none — exits 0 on success)
Header injection and User-Agent
--header adds a custom request header and can be repeated; --user-agent is shorthand for the User-Agent header. Some sites filter by both User-Agent and Referer, so a realistic-looking pair is often necessary to fetch HTML that renders correctly.
# Pretend to be a recent Firefox
wget --user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0" URL
# Set Referer (some hotlink-protected images require it)
wget --referer="https://example.com/page/" https://example.com/images/photo.jpg
# Multiple custom headers
wget --header="Accept: application/json" \
--header="X-Request-ID: $(uuidgen)" \
--header="Authorization: Bearer $TOKEN" \
https://api.example.com/data
Output: (none — exits 0 on success)
wgetrc — system-wide and per-user defaults
/etc/wgetrc (system) and ~/.wgetrc (per-user) hold long-form option = value pairs that wget reads at startup. Anything settable on the command line is settable here; the file is the right place for proxy configuration, default user agent, retry behaviour, and a "quiet but progress visible" default.
# ~/.wgetrc
# Always retry transient failures
tries = 5
waitretry = 5
retry_connrefused = on
# Default UA — replace with something descriptive
user_agent = alicedev-wget/1.0 (alice@example.com)
# Be polite
wait = 1
random_wait = on
limit_rate = 1M
# Cookies in a stable location
cookies = on
# Always continue partial files
continue = on
# Logging
quiet = off
verbose = off
# Proxy — also honours http_proxy / https_proxy env vars
# use_proxy = on
# http_proxy = http://proxy.example.com:8080
# https_proxy = http://proxy.example.com:8080
A single command-line flag overrides the same key in ~/.wgetrc, and ~/.wgetrc overrides /etc/wgetrc. To suppress all rc files for one invocation, prepend WGETRC=/dev/null to the command.
Proxy support
wget reads the standard http_proxy, https_proxy, ftp_proxy, and no_proxy environment variables. Equivalent wgetrc keys are http_proxy, https_proxy, use_proxy, and no_proxy; explicit -e flags let you set them inline.
# Inline proxy
wget -e use_proxy=yes -e http_proxy=http://proxy.example.com:8080 URL
# Authenticated proxy
wget --proxy-user=alicedev --proxy-password=secret \
-e use_proxy=yes -e https_proxy=http://proxy:8080 URL
# Bypass proxy for internal hosts
export no_proxy="localhost,127.0.0.0/8,10.0.0.0/8,*.internal"
wget URL
# Disable proxy for one invocation (override env)
wget --no-proxy URL
Output: (none — exits 0 on success)
For SOCKS proxies, wget itself has no native support — wrap it in tsocks/proxychains4 or use aria2c --all-proxy=socks5://... instead.
wget2 notes
wget2 is the GNU successor to classic wget, written from scratch around libwget. The 2.x series is stable (latest 2.2.1, January 2026) and adds HTTP/2 multiplexing, multi-threaded parallel downloads (5 threads by default), brotli + zstd decompression, full HSTS, and TCP Fast Open, while staying option-compatible for the most common flags. The binary ships as wget2 (not wget, so it doesn't clash) in Debian, Ubuntu, Alpine, Fedora, and Homebrew.
# Install
sudo apt install wget2 # Debian/Ubuntu
brew install wget2 # macOS
# Same flags as wget1 for basic use
wget2 -c https://example.com/file.tar.gz
# Parallel-connection acceleration (wget1 has no equivalent)
wget2 --max-threads=8 --chunk-size=2M https://example.com/big.iso
# HTTP/2 mirror with brotli + zstd compression negotiated automatically
wget2 --mirror --compression=br,zstd https://example.com/docs/
Output: (none — exits 0 on success)
Differences worth knowing:
wget2opens multiple connections per file for parallel range fetching (wget1 is strictly single-connection).- HTTP/2 multiplexes many requests over one TCP connection — a recursive mirror over HTTP/2 finishes substantially faster than wget1's serialized HTTP/1.1.
Accept-Encodingis advertised automatically forbr,zstd,lzip,bzip2,xz,gzip, anddeflatewhen the matching libraries are present at build time.- Cookies, recursion, and
--mirrorall work the same way. - A few flags renamed:
--threads/--max-threadsis new; the legacy single-letter shorts are mostly preserved. - Output format is similar but subtly different — scripts that parse wget1's stderr may need adjusting.
wget2 is the right choice for new scripts that need parallelism without taking on aria2c's complexity; stick with wget (wget1) for compatibility with old scripts or environments where wget2 isn't packaged.
wget 1.25 — security & shorthand URL removal
wget 1.25.0 (released November 2024) is the current stable wget1 line and the version shipping in Debian 13, Ubuntu 26.04, Fedora 41+, and Alpine 3.21+. The headline change is the fix for CVE-2024-10524: support for the legacy shorthand URL formats user@host/path and host:/path was removed because wget was interpreting any URL containing : as FTP, allowing crafted credentials to redirect requests to attacker-controlled hosts.
# OLD (now rejected on 1.25+)
wget alice@example.com/file # was interpreted as http://alice@example.com/file
wget example.com:/pub/file # was interpreted as ftp://example.com/pub/file
# Use full URLs everywhere
wget https://alice@example.com/file
wget ftp://example.com/pub/file
Output: (none — exits 0 on success)
If a script suddenly emits Invalid URL on 1.25+, the cause is almost always a shorthand URL — rewrite it with an explicit scheme. The wget --version output shows GNU Wget 1.25.0 …; everything older is vulnerable and should be updated.
Common recipes
Archive a documentation site
wget --mirror \
--page-requisites \
--convert-links \
--adjust-extension \
--no-parent \
--restrict-file-names=windows \
--user-agent="alicedev-archive/1.0 (alice@example.com)" \
--wait=1 --random-wait \
-P ./docs-mirror \
https://docs.example.com/v2/
Output: (none — exits 0 on success)
Scripted login + scrape
#!/usr/bin/env bash
set -euo pipefail
JAR=$(mktemp)
trap 'rm -f "$JAR"' EXIT
# 1. Log in
wget --save-cookies "$JAR" --keep-session-cookies \
--post-data "user=alicedev&pass=$LOGIN_PW" \
--delete-after \
https://app.example.com/login
# 2. Iterate over protected pages
for id in 1 2 3 4 5; do
wget --load-cookies "$JAR" \
--output-document="report-$id.html" \
"https://app.example.com/reports/$id"
done
Output: (none — exits 0 on success)
Resume a stalled mirror
A mid-mirror crash leaves a partial tree behind. The same wget --mirror command re-run with -N (already implied by --mirror) only re-fetches files the server reports as newer than the local copy.
wget --mirror -p -k -np -c -P ./mirror https://docs.example.com/
Output: (none — exits 0 on success)
-c ensures any individual file that was mid-transfer at the crash continues from its current size, instead of restarting from byte 0.
Download every .pdf linked from a single page
# Two-step: list the PDFs, then download them
wget -qO- https://example.com/papers/index.html \
| grep -oE 'href="[^"]+\.pdf"' \
| sed -E 's/href="([^"]+)"/\1/' \
| xargs -I{} wget -P ./pdfs/ "https://example.com/papers/{}"
# Or in one wget invocation
wget -r -l 1 -np -nd -A pdf -P ./pdfs https://example.com/papers/
Output: (none — exits 0 on success)
Periodic mirror via cron
# Refresh nightly at 03:00 — only changed files
0 3 * * * cd /srv/mirror && wget --mirror -p -k -np -q \
--limit-rate=2m \
--user-agent="mirror-bot/1.0 (alice@example.com)" \
https://docs.example.com/ \
>> /var/log/wget-mirror.log 2>&1
Pipe output through tar without saving the tarball
wget -qO- https://example.com/release.tar.gz | tar -xz -C /opt/app
Output: (none — exits 0 on success)
-qO- is the canonical "stream to stdout" combo: quiet, output to - (stdout). Equivalent to curl -fsSL.
wget vs curl vs aria2c
| Task | wget | curl | aria2c |
|---|---|---|---|
| Recursive site mirror | --mirror (native) | not built-in | not built-in |
| API/REST work | limited | first-class (-X, -H, -d) | limited |
| Parallel segments for one file | no (wget2: yes) | --parallel (multi-URL only) | -x N -s N (per-file segmenting) |
| Magnet / BitTorrent | no | no | yes |
| Resume large download | -c | -C - | -c |
| Cookies | --save-cookies / --load-cookies | -c / -b | --load-cookies only |
| Pipe to stdout | -qO- | -fsSL | not the right tool |
| Long-lived daemon with RPC | no | no | --enable-rpc |
| Batch from URL list | -i list.txt | shell loop | -i list.txt (richer format) |
| HTTP/2 | wget2 yes; wget1 no | yes | yes |
Exit codes
| Code | Meaning |
|---|---|
0 | All files downloaded successfully |
1 | Generic error |
2 | Parse error (bad command-line option) |
3 | I/O error (cannot read/write file) |
4 | Network failure |
5 | SSL verification failure |
6 | Authentication failure |
7 | Protocol error |
8 | Server issued an error response (4xx/5xx) |
wget exits 0 only when every URL succeeded — for partial-success batch runs, inspect the log file (-o/-a) rather than relying on exit code.
# Distinguish "all succeeded" from "some failed"
if wget -i urls.txt -P ./out -nv -a wget.log; then
echo "All downloads succeeded"
else
echo "Some downloads failed — see wget.log"
grep -E 'ERROR|failed' wget.log
fi
Output: (none — exits 0 on success)
Sources
- GNU Wget — official project page
- wget 1.25.0 — savannah commit fixing CVE-2024-10524 (drop shorthand URLs)
- CVE-2024-10524 — JFrog write-up of the shorthand-URL flaw
- wget2 — GitLab releases (2.x stable line, latest 2.2.1, Jan 2026)
- wget2 — README (HTTP/2, parallel threads, brotli/zstd, HSTS)
- wget2 2.2.1 announcement on info-gnu