cheat sheet

wget

Non-interactive network downloader. Covers single and batch downloads, recursive mirroring, authentication, resuming, rate limiting, and site archiving.

updated 05-25-2026

wget — File Downloader

What it is

wget is GNU's free, non-interactive command-line file downloader, available on virtually every Linux distribution and maintained as part of the GNU Project. It supports HTTP, HTTPS, and FTP, and excels at recursive website mirroring, resuming interrupted downloads, and scripted bulk retrieval without any user interaction. Reach for wget when you need to mirror a directory tree or download files reliably in scripts; use curl for API work where fine-grained control over request headers, methods, and TLS is needed.

Basic downloads

bash

wget https://example.com/file.tar.gz         # download to cwd

Output:

text

--2026-04-24 10:00:01--  https://example.com/file.tar.gz
Resolving example.com (example.com)... 93.184.216.34
Connecting to example.com (example.com)|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 52428800 (50M) [application/x-gzip]
Saving to: 'file.tar.gz'

file.tar.gz         100%[===================>]  50.00M  4.21MB/s    in 11s

2026-04-24 10:00:12 (4.21 MB/s) - 'file.tar.gz' saved [52428800/52428800]

bash

wget -O output.tar.gz https://example.com/f  # custom filename
wget -P /tmp https://example.com/file.tar.gz # save to /tmp/
wget -q https://example.com/file             # quiet (no progress bar)
wget -nv https://example.com/file            # no verbose (minimal output)

Output (wget -nv https://example.com/file):

text

2026-04-24 10:00:01 URL:https://example.com/file [1024/1024] -> "file" [1]

bash

wget -S https://example.com                  # show server response headers

Output:

text

  HTTP/1.1 200 OK
  Content-Type: text/html; charset=UTF-8
  Content-Length: 1256
  Cache-Control: max-age=604800
  Date: Thu, 24 Apr 2026 10:00:01 GMT
  Expires: Thu, 01 May 2026 10:00:01 GMT
  Server: ECS (nyb/1D2B)

bash

wget --spider https://example.com/file       # check URL without downloading

Output:

text

Spider mode enabled. Check if remote file exists.
--2026-04-24 10:00:01--  https://example.com/file
Resolving example.com... 93.184.216.34
Connecting to example.com|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Remote file exists.

Resume and retry

-c (continue) appends to a partially downloaded file by sending a Range request from the last byte received, provided the server supports it. --tries sets the maximum retry count and --waitretry adds a delay between attempts; setting retries to 0 enables infinite retries, which is useful for unreliable connections.

bash

wget -c https://example.com/bigfile.tar.gz   # continue/resume download

Output:

text

--2026-04-24 10:05:00--  https://example.com/bigfile.tar.gz
Connecting to example.com|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 52428800 (50M), 31457280 (30M) remaining [application/x-gzip]
Saving to: 'bigfile.tar.gz'

bigfile.tar.gz       60%[=========>          ]  30.00M  3.87MB/s    in 7s

2026-04-24 10:05:07 (3.87 MB/s) - 'bigfile.tar.gz' saved [52428800/52428800]

bash

wget --retry-connrefused -t 10 URL           # retry up to 10 times
wget -t 0 URL                                # infinite retries
wget -w 5 -t 10 URL                          # wait 5s between retries
wget --timeout=30 URL                        # connection/read timeout

Output: (none — exits 0 on success)

Authentication

--user and --password supply HTTP Basic or Digest credentials; prefer --netrc-file pointing to a ~/.netrc (chmod 600) to avoid exposing passwords in shell history. For HTTP Basic specifically, --auth-no-challenge sends credentials immediately without waiting for a 401 challenge, which speeds up requests to servers that require auth on the first hit.

bash

wget --user=alice --password=secret https://protected.example.com/file
wget --ask-password --user=alice https://example.com/file
wget --http-user=alice --http-password=s3cr3t https://example.com
wget --no-http-keep-alive --user=alice ...   # disable keepalive

# .netrc file (~/.netrc, chmod 600)
# machine example.com login alice password secret
wget --netrc-file=~/.netrc https://example.com/private/file

Output: (none — exits 0 on success)

Headers and cookies

--header appends arbitrary HTTP headers to every request, useful for passing Authorization tokens or custom Accept types. Cookie handling works in two steps: --save-cookies captures the session cookie from a login POST, and --load-cookies replays it on subsequent requests to access protected pages.

bash

wget --header="Authorization: Bearer TOKEN" https://api.example.com
wget --header="Accept: application/json" URL

wget --save-cookies cookies.txt \
     --post-data "user=alice&pass=secret" \
     https://example.com/login
wget --load-cookies cookies.txt https://example.com/protected
wget --keep-session-cookies --save-cookies cookies.txt URL

Output: (none — exits 0 on success)

Download speed control

--limit-rate caps the download bandwidth so wget doesn't saturate the connection — useful in scripts running during business hours or alongside latency-sensitive traffic. Suffix values with k for kilobytes/s or m for megabytes/s; combine with -w (wait) and --random-wait to also throttle request frequency.

bash

wget --limit-rate=500k URL                  # limit to 500 KB/s
wget --limit-rate=2m URL                   # limit to 2 MB/s
wget -w 1 URL                              # wait 1 second between requests
wget --random-wait URL                     # random wait 0.5–1.5× -w value

Output: (none — exits 0 on success)

TLS / SSL

--no-check-certificate disables server certificate validation entirely — acceptable for quick tests against self-signed certs but never appropriate for sensitive data. --ca-certificate points wget to a custom CA bundle for verifying private PKI certificates; --certificate and --private-key supply a client certificate for mutual TLS authentication.

bash

wget --no-check-certificate URL            # skip cert verification
wget --ca-certificate=/path/to/ca.pem URL
wget --certificate=client.pem --private-key=client.key URL

Output: (none — exits 0 on success)

Batch and list downloads

-i reads URLs from a plain-text file, one per line, and downloads each in sequence. This is the simplest way to run wget against a pre-generated list without shell loops; combine with -P to direct all files into a single directory and -c to resume any that were interrupted.

bash

wget -i urls.txt                            # download all URLs from file
wget -i urls.txt -P /downloads/             # save all to directory
wget -q -i urls.txt -P /dest/ &            # background batch download

# Generate URL list and download
seq 1 100 | sed 's|.*|https://example.com/page/&|' | wget -i - -P pages/

Output: (none — exits 0 on success)

Recursive / mirror

-r follows links within downloaded pages up to a configurable depth (-l), while --mirror is shorthand for -r -N -l inf --no-remove-listing, which is the standard combination for creating a full offline copy of a site. Always pair with --no-parent to prevent wget from crawling up to parent directories outside the target path.

bash

# Download a full website
wget --mirror -p --convert-links \
     --no-parent -P ./site-mirror \
     https://docs.example.com/

Output:

text

--2026-04-24 10:10:00--  https://docs.example.com/
Connecting to docs.example.com|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Saving to: 'site-mirror/docs.example.com/index.html'

site-mirror/docs.example.com/index.html saved [12345]

--2026-04-24 10:10:01--  https://docs.example.com/guide/
Saving to: 'site-mirror/docs.example.com/guide/index.html'
…
FINISHED --2026-04-24 10:10:45--
Total wall clock time: 45s
Downloaded: 182 files, 8.4M in 45s (191 KB/s)

bash

# Flags explained:
# --mirror        = -r -N -l inf --no-remove-listing
# -p              = download all assets (CSS, images, JS)
# --convert-links = rewrite links for offline use
# --no-parent     = don't go above the given directory

# Recursive, limited depth
wget -r -l 2 https://example.com/docs/

Output:

text

--2026-04-24 10:12:00--  https://example.com/docs/
…
Loading robots.txt; please ignore errors.
--2026-04-24 10:12:00--  https://example.com/robots.txt
…
FINISHED --2026-04-24 10:12:22--
Total wall clock time: 22s
Downloaded: 47 files, 2.1M in 22s (97.7 KB/s)

bash

# Download only specific file types
wget -r -l 2 -A "*.pdf,*.doc" https://example.com/resources/

# Exclude file types
wget -r -l 2 -R "*.jpg,*.png,*.gif" https://example.com/

# Stay within the same domain
wget -r -H -D example.com https://example.com/

Output: (none — exits 0 on success)

Recursive options reference

Flag	Meaning
`-r` / `--recursive`	Recursive download
`-l N`	Recursion depth (default 5; `inf` = unlimited)
`-np` / `--no-parent`	Don't go up to parent directories
`-N`	Only download newer files (timestamping)
`-k` / `--convert-links`	Convert links for local browsing
`-p` / `--page-requisites`	Get all assets needed to display page
`-H`	Span hosts (follow links to other domains)
`-D DOMAIN`	Comma-separated domains to follow
`-A LIST`	Accept list (file patterns/extensions)
`-R LIST`	Reject list
`-I LIST`	Include directories
`-X LIST`	Exclude directories
`--no-clobber`	Don't overwrite existing files

Output and logging

-q silences wget's progress output entirely; -o redirects all messages to a log file while -a appends to an existing one. For scripts that should be quiet but still show a progress bar, -q --show-progress is the right pairing.

bash

wget -a wget.log URL                       # append log to file
wget -o wget.log URL                       # write log to file (overwrite)
wget --progress=bar URL                    # progress bar style
wget --progress=dot:giga URL              # dot progress for big files
wget -q --show-progress URL               # quiet + progress bar

Output (wget -q --show-progress URL):

text

file.tar.gz         100%[===================>]  50.00M  5.12MB/s    in 9s

Timestamps and conditional fetch

-N (timestamping) sends a conditional If-Modified-Since header and skips downloading files that haven't changed since the local copy was last written. This makes repeated runs of a mirror script efficient — wget only transfers files the server reports as newer than what you already have.

bash

wget -N URL                    # only download if newer than local copy
wget --no-if-modified-since URL # always download

Output: (none — exits 0 on success)

Background and daemon

-b forks wget into the background immediately after starting, freeing the terminal for other work. Progress is written to wget-log by default; use -o logfile to redirect it, then tail -f the log to monitor progress without keeping wget in the foreground.

bash

wget -b URL                    # download in background
wget -b -o background.log URL  # background with log
tail -f wget-log               # watch background download progress

Output: (none — exits 0 on success)

FTP support

wget supports plain FTP and FTPS using the same --user/--password flags as HTTP, and -r works for recursive FTP directory traversal. For encrypted transfers or key-based authentication, use sftp:// URIs with a tool like curl or lftp instead, as wget's FTP support does not include SFTP.

bash

wget ftp://ftp.example.com/pub/file.tar.gz
wget --ftp-user=alice --ftp-password=secret ftp://ftp.example.com/file
wget -r ftp://ftp.example.com/pub/                # recursive FTP

Output: (none — exits 0 on success)

wget vs curl

Task	wget	curl
Simple download	`wget URL`	`curl -LO URL`
Save with custom name	`wget -O name URL`	`curl -o name URL`
Resume	`wget -c URL`	`curl -C - URL`
Batch from file	`wget -i list.txt`	`xargs -n1 curl -LO < list.txt`
Recursive mirror	`wget --mirror`	not built-in
API / REST calls	limited	`curl -X POST -d ...`
Pipe to stdout	`wget -qO- URL`	`curl -sS URL`

For simple scripted downloads, wget -q --show-progress -c -O "$dest" "$url" is the most reliable combination: quiet (no clutter), shows a progress bar, resumes if interrupted, and saves to a named file.

Mirroring in depth

--mirror is the canonical "make me a local copy of this site" switch — it expands to -r -N -l inf --no-remove-listing, enabling unlimited-depth recursion with timestamping so a second run only fetches what changed. Pair it with --page-requisites (-p) to pull every CSS/JS/image referenced from each page, and --convert-links (-k) to rewrite the HTML so it renders correctly from the local filesystem.

bash

# Full offline copy ready to open in a browser
wget --mirror \
     --page-requisites \
     --convert-links \
     --adjust-extension \
     --no-parent \
     --restrict-file-names=windows \
     -e robots=off \
     -P ./mirror \
     https://docs.example.com/guide/

Output: (none — exits 0 on success)

Flag-by-flag rationale:

Flag	Why
`--mirror`	`-r -N -l inf --no-remove-listing` — recurse forever, only re-fetch newer files
`-p` / `--page-requisites`	Also pull CSS, JS, images, fonts needed to render each saved page
`-k` / `--convert-links`	Rewrite absolute links to local relative paths post-download
`--adjust-extension`	Save `.html` extension on pages served as `text/html` without one
`--no-parent`	Never ascend above the URL's directory — keeps the crawl scoped
`--restrict-file-names=windows`	Strip `:?*` etc. so the mirror is portable to NTFS/exFAT
`-e robots=off`	Ignore `robots.txt` — only do this on sites you own or are permitted to crawl

--mirror -np still walks every linked page within the starting directory. A docs site with thousands of pages produces thousands of HTTP requests. Throttle with --wait, --random-wait, and --limit-rate, and confirm you have permission before crawling third-party sites.

Recursion depth and scope

-l N caps recursion at N levels (default 5; -l inf is unlimited). -D domain1,domain2 plus -H (span hosts) lets wget follow into another domain — required for sites where assets live on cdn.example.com while pages are on www.example.com.

bash

# Two levels deep, stay on this host
wget -r -l 2 --no-parent https://example.com/docs/

# Follow into the CDN domain for assets but not anywhere else
wget --mirror -p -k -H -D example.com,cdn.example.com https://example.com/

# Mirror only PDFs and HTML, two levels deep
wget -r -l 2 -A "*.pdf,*.html" --no-parent https://example.com/papers/

# Mirror everything EXCEPT images and large archives
wget -r -l 3 -R "*.jpg,*.png,*.iso,*.zip" https://example.com/

# Include only specific directories
wget -r -I /docs,/api -X /docs/legacy https://example.com/

Output: (none — exits 0 on success)

-A/-R (accept/reject) filter by filename pattern; -I/-X filter by URL path. They run after wget has fetched and parsed each HTML page, so a -R "*.html" is unhelpful (wget needs the HTML to find the links). Use -A for terminal file types (PDFs, tarballs) and let wget crawl HTML normally.

`--page-requisites` + `--convert-links` in detail

-p pulls every resource referenced from each downloaded HTML page that the browser would need: <link rel="stylesheet">, <img src>, <script src>, inline url(...) references, and a few others. Without -p, the mirror is HTML-only and renders as unstyled text.

bash

# Single page with all its assets, no link rewriting
wget -p https://example.com/article/

# Single page, fully self-contained for offline use
wget -p -k -H -nd -P ./snapshot https://example.com/article/

Output: (none — exits 0 on success)

-nd (no directories) flattens everything into the destination — useful for "save one article" but disastrous for whole-site mirrors where filenames collide.

-k/--convert-links runs after the download completes and only rewrites links to files that were actually downloaded; missing assets stay as their original absolute URL so the mirror is honest about what failed. Combine with -K (--backup-converted) to keep an .orig copy of every modified HTML file.

Cookies and authenticated scraping

wget's cookie handling is built around the Netscape-format jar shared with curl, Firefox, and most other tools. The recipe for scraping a login-walled site is always the same: POST to the login form to obtain session cookies, save them to a jar, then load the jar for every subsequent request.

bash

# Step 1 — log in and capture session cookies
wget --save-cookies cookies.txt \
     --keep-session-cookies \
     --post-data 'username=alicedev&password=secret' \
     --delete-after \
     https://example.com/login

# Step 2 — pull a protected page using those cookies
wget --load-cookies cookies.txt \
     https://example.com/account/dashboard

# Mirror an authenticated section
wget --mirror -p -k --no-parent \
     --load-cookies cookies.txt \
     --user-agent="Mozilla/5.0" \
     https://example.com/private/docs/

Output: (none — exits 0 on success)

Key flags:

--save-cookies FILE — write received cookies to FILE on exit.
--keep-session-cookies — also save session cookies (the kind without an expiry); without this they're discarded.
--load-cookies FILE — send cookies from FILE on every request.
--delete-after — remove the body of the response after processing; we only wanted the cookies.

Cookies exported from Firefox/Chrome (via "Get cookies.txt" extensions) work directly with --load-cookies. For sites that set cookies during a redirect chain, also pass --keep-session-cookies to the first request.

If a site refuses to log in via wget but works in the browser, the missing piece is almost always --user-agent, an extra header like --header="X-CSRF-Token: ...", or a hidden form field grabbed by xidel or curl from the login page first.

Authentication recipes

Beyond Basic and Digest auth, real-world scraping often needs bearer tokens, custom CSRF headers, or form-based login flows. The common pattern is "extract token with a first request, then replay it on subsequent ones".

bash

# Bearer token from env (avoid shell history leak)
TOKEN=$(pass show api/example | head -1)
wget --header="Authorization: Bearer $TOKEN" https://api.example.com/blob

# CSRF + cookie: grab the form, extract the token, post it back
csrf=$(wget -qO- --save-cookies cookies.txt --keep-session-cookies \
       https://example.com/login \
       | grep -oP 'name="csrf_token" value="\K[^"]+')
wget --load-cookies cookies.txt \
     --save-cookies cookies.txt \
     --keep-session-cookies \
     --post-data "csrf_token=$csrf&user=alicedev&pass=secret" \
     https://example.com/login \
     -O /dev/null

# .netrc-driven (recommended for repeatable scripts)
cat > ~/.netrc <<'EOF'
machine example.com
  login alicedev
  password s3cret
EOF
chmod 600 ~/.netrc
wget --netrc https://example.com/private/file

Output: (none — exits 0 on success)

Bandwidth and politeness controls

When mirroring third-party sites, throttling matters as much as correctness — fast crawlers get IP-banned and burn the host's bandwidth. wget combines per-connection rate limiting with inter-request delays and a User-Agent override that identifies you to the site's operators.

bash

# Cap rate AND space requests apart
wget --mirror -p -k \
     --limit-rate=300k \
     --wait=2 \
     --random-wait \
     --user-agent="alicedev-mirror/1.0 (alice@example.com)" \
     --no-parent \
     https://docs.example.com/

# Even slower — 1 request every 5–15s, 100 KB/s cap
wget -r -l 3 --wait=10 --random-wait --limit-rate=100k --no-parent URL

Output: (none — exits 0 on success)

--random-wait multiplies --wait by a random factor between 0.5 and 1.5 for each request, smoothing the request pattern enough that simple rate-based bot detection won't flag you. Combine with --quota=SIZE to abort the run after N bytes downloaded, a useful safety net for unattended jobs.

bash

# Stop after 500 MB regardless of how many files are left
wget -r --no-parent --quota=500m https://example.com/

Output: (none — exits 0 on success)

Header injection and User-Agent

--header adds a custom request header and can be repeated; --user-agent is shorthand for the User-Agent header. Some sites filter by both User-Agent and Referer, so a realistic-looking pair is often necessary to fetch HTML that renders correctly.

bash

# Pretend to be a recent Firefox
wget --user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0" URL

# Set Referer (some hotlink-protected images require it)
wget --referer="https://example.com/page/" https://example.com/images/photo.jpg

# Multiple custom headers
wget --header="Accept: application/json" \
     --header="X-Request-ID: $(uuidgen)" \
     --header="Authorization: Bearer $TOKEN" \
     https://api.example.com/data

Output: (none — exits 0 on success)

`wgetrc` — system-wide and per-user defaults

/etc/wgetrc (system) and ~/.wgetrc (per-user) hold long-form option = value pairs that wget reads at startup. Anything settable on the command line is settable here; the file is the right place for proxy configuration, default user agent, retry behaviour, and a "quiet but progress visible" default.

text

# ~/.wgetrc
# Always retry transient failures
tries = 5
waitretry = 5
retry_connrefused = on

# Default UA — replace with something descriptive
user_agent = alicedev-wget/1.0 (alice@example.com)

# Be polite
wait = 1
random_wait = on
limit_rate = 1M

# Cookies in a stable location
cookies = on

# Always continue partial files
continue = on

# Logging
quiet = off
verbose = off

# Proxy — also honours http_proxy / https_proxy env vars
# use_proxy = on
# http_proxy = http://proxy.example.com:8080
# https_proxy = http://proxy.example.com:8080

A single command-line flag overrides the same key in ~/.wgetrc, and ~/.wgetrc overrides /etc/wgetrc. To suppress all rc files for one invocation, prepend WGETRC=/dev/null to the command.

Proxy support

wget reads the standard http_proxy, https_proxy, ftp_proxy, and no_proxy environment variables. Equivalent wgetrc keys are http_proxy, https_proxy, use_proxy, and no_proxy; explicit -e flags let you set them inline.

bash

# Inline proxy
wget -e use_proxy=yes -e http_proxy=http://proxy.example.com:8080 URL

# Authenticated proxy
wget --proxy-user=alicedev --proxy-password=secret \
     -e use_proxy=yes -e https_proxy=http://proxy:8080 URL

# Bypass proxy for internal hosts
export no_proxy="localhost,127.0.0.0/8,10.0.0.0/8,*.internal"
wget URL

# Disable proxy for one invocation (override env)
wget --no-proxy URL

Output: (none — exits 0 on success)

For SOCKS proxies, wget itself has no native support — wrap it in tsocks/proxychains4 or use aria2c --all-proxy=socks5://... instead.

`wget2` notes

wget2 is the GNU successor to classic wget, written from scratch around libwget. The 2.x series is stable (latest 2.2.1, January 2026) and adds HTTP/2 multiplexing, multi-threaded parallel downloads (5 threads by default), brotli + zstd decompression, full HSTS, and TCP Fast Open, while staying option-compatible for the most common flags. The binary ships as wget2 (not wget, so it doesn't clash) in Debian, Ubuntu, Alpine, Fedora, and Homebrew.

bash

# Install
sudo apt install wget2            # Debian/Ubuntu
brew install wget2                 # macOS

# Same flags as wget1 for basic use
wget2 -c https://example.com/file.tar.gz

# Parallel-connection acceleration (wget1 has no equivalent)
wget2 --max-threads=8 --chunk-size=2M https://example.com/big.iso

# HTTP/2 mirror with brotli + zstd compression negotiated automatically
wget2 --mirror --compression=br,zstd https://example.com/docs/

Output: (none — exits 0 on success)

Differences worth knowing:

wget2 opens multiple connections per file for parallel range fetching (wget1 is strictly single-connection).
HTTP/2 multiplexes many requests over one TCP connection — a recursive mirror over HTTP/2 finishes substantially faster than wget1's serialized HTTP/1.1.
Accept-Encoding is advertised automatically for br, zstd, lzip, bzip2, xz, gzip, and deflate when the matching libraries are present at build time.
Cookies, recursion, and --mirror all work the same way.
A few flags renamed: --threads/--max-threads is new; the legacy single-letter shorts are mostly preserved.
Output format is similar but subtly different — scripts that parse wget1's stderr may need adjusting.

wget2 is the right choice for new scripts that need parallelism without taking on aria2c's complexity; stick with wget (wget1) for compatibility with old scripts or environments where wget2 isn't packaged.

wget 1.25 — security & shorthand URL removal

wget 1.25.0 (released November 2024) is the current stable wget1 line and the version shipping in Debian 13, Ubuntu 26.04, Fedora 41+, and Alpine 3.21+. The headline change is the fix for CVE-2024-10524: support for the legacy shorthand URL formats user@host/path and host:/path was removed because wget was interpreting any URL containing : as FTP, allowing crafted credentials to redirect requests to attacker-controlled hosts.

bash

# OLD (now rejected on 1.25+)
wget alice@example.com/file        # was interpreted as http://alice@example.com/file
wget example.com:/pub/file         # was interpreted as ftp://example.com/pub/file

# Use full URLs everywhere
wget https://alice@example.com/file
wget ftp://example.com/pub/file

Output: (none — exits 0 on success)

If a script suddenly emits Invalid URL on 1.25+, the cause is almost always a shorthand URL — rewrite it with an explicit scheme. The wget --version output shows GNU Wget 1.25.0 …; everything older is vulnerable and should be updated.

Common recipes

Archive a documentation site

bash

wget --mirror \
     --page-requisites \
     --convert-links \
     --adjust-extension \
     --no-parent \
     --restrict-file-names=windows \
     --user-agent="alicedev-archive/1.0 (alice@example.com)" \
     --wait=1 --random-wait \
     -P ./docs-mirror \
     https://docs.example.com/v2/

Output: (none — exits 0 on success)

bash

#!/usr/bin/env bash
set -euo pipefail
JAR=$(mktemp)
trap 'rm -f "$JAR"' EXIT

# 1. Log in
wget --save-cookies "$JAR" --keep-session-cookies \
     --post-data "user=alicedev&pass=$LOGIN_PW" \
     --delete-after \
     https://app.example.com/login

# 2. Iterate over protected pages
for id in 1 2 3 4 5; do
  wget --load-cookies "$JAR" \
       --output-document="report-$id.html" \
       "https://app.example.com/reports/$id"
done

Output: (none — exits 0 on success)

Resume a stalled mirror

A mid-mirror crash leaves a partial tree behind. The same wget --mirror command re-run with -N (already implied by --mirror) only re-fetches files the server reports as newer than the local copy.

bash

wget --mirror -p -k -np -c -P ./mirror https://docs.example.com/

Output: (none — exits 0 on success)

-c ensures any individual file that was mid-transfer at the crash continues from its current size, instead of restarting from byte 0.

Download every `.pdf` linked from a single page

bash

# Two-step: list the PDFs, then download them
wget -qO- https://example.com/papers/index.html \
  | grep -oE 'href="[^"]+\.pdf"' \
  | sed -E 's/href="([^"]+)"/\1/' \
  | xargs -I{} wget -P ./pdfs/ "https://example.com/papers/{}"

# Or in one wget invocation
wget -r -l 1 -np -nd -A pdf -P ./pdfs https://example.com/papers/

Output: (none — exits 0 on success)

Periodic mirror via cron

text

# Refresh nightly at 03:00 — only changed files
0 3 * * *  cd /srv/mirror && wget --mirror -p -k -np -q \
             --limit-rate=2m \
             --user-agent="mirror-bot/1.0 (alice@example.com)" \
             https://docs.example.com/ \
             >> /var/log/wget-mirror.log 2>&1

Pipe output through tar without saving the tarball

bash

wget -qO- https://example.com/release.tar.gz | tar -xz -C /opt/app

Output: (none — exits 0 on success)

-qO- is the canonical "stream to stdout" combo: quiet, output to - (stdout). Equivalent to curl -fsSL.

wget vs curl vs aria2c

Task	wget	curl	aria2c
Recursive site mirror	`--mirror` (native)	not built-in	not built-in
API/REST work	limited	first-class (`-X`, `-H`, `-d`)	limited
Parallel segments for one file	no (wget2: yes)	`--parallel` (multi-URL only)	`-x N -s N` (per-file segmenting)
Magnet / BitTorrent	no	no	yes
Resume large download	`-c`	`-C -`	`-c`
Cookies	`--save-cookies` / `--load-cookies`	`-c` / `-b`	`--load-cookies` only
Pipe to stdout	`-qO-`	`-fsSL`	not the right tool
Long-lived daemon with RPC	no	no	`--enable-rpc`
Batch from URL list	`-i list.txt`	shell loop	`-i list.txt` (richer format)
HTTP/2	wget2 yes; wget1 no	yes	yes

Exit codes

Code	Meaning
`0`	All files downloaded successfully
`1`	Generic error
`2`	Parse error (bad command-line option)
`3`	I/O error (cannot read/write file)
`4`	Network failure
`5`	SSL verification failure
`6`	Authentication failure
`7`	Protocol error
`8`	Server issued an error response (4xx/5xx)

wget exits 0 only when every URL succeeded — for partial-success batch runs, inspect the log file (-o/-a) rather than relying on exit code.

bash

# Distinguish "all succeeded" from "some failed"
if wget -i urls.txt -P ./out -nv -a wget.log; then
  echo "All downloads succeeded"
else
  echo "Some downloads failed — see wget.log"
  grep -E 'ERROR|failed' wget.log
fi

Output: (none — exits 0 on success)

wget — File Downloader

What it is

Basic downloads

Resume and retry

Authentication

Headers and cookies

Download speed control

TLS / SSL

Batch and list downloads

Recursive / mirror

Recursive options reference

Output and logging

Timestamps and conditional fetch

Background and daemon

FTP support

wget vs curl

Mirroring in depth

Recursion depth and scope

--page-requisites + --convert-links in detail

Cookies and authenticated scraping

Authentication recipes

Bandwidth and politeness controls

Header injection and User-Agent

wgetrc — system-wide and per-user defaults

Proxy support

wget2 notes

wget 1.25 — security & shorthand URL removal

Common recipes

Archive a documentation site

Scripted login + scrape

Resume a stalled mirror

Download every .pdf linked from a single page

Periodic mirror via cron

Pipe output through tar without saving the tarball

wget vs curl vs aria2c

Exit codes

Sources

`--page-requisites` + `--convert-links` in detail

`wgetrc` — system-wide and per-user defaults

`wget2` notes

Download every `.pdf` linked from a single page