concept · weight 5

Pipes

An operating-system primitive that streams one process's stdout into another's stdin, letting small composable tools build large data flows.

Pipes

Definition

A pipe is a unidirectional, in-kernel byte buffer that connects one process's standard output to another process's standard input. The shell operator | is the user-facing syntax, but the underlying primitive is the pipe(2) system call, which returns a pair of file descriptors — one for reading, one for writing — sharing a fixed-size kernel ring buffer. Pipes apply anywhere a programming environment needs to compose independent processes into a streaming data flow without intermediate temp files.

Why it matters

Pipes are the reason the Unix toolset is greater than the sum of its parts. Each utility (grep, sort, awk, jq, cut, xargs) does one thing and reads from stdin / writes to stdout, so any of them can be chained in any order. That composition collapses what would otherwise be bespoke scripts into a single line, runs every stage in parallel (each process is scheduled independently the moment data is available), and keeps memory flat because the kernel buffer back-pressures producers as soon as the consumer falls behind. The same model carries over to PowerShell (where the payload is .NET objects instead of bytes), to programming languages (subprocess.Popen in Python, streams in Node, channels in Go), and to distributed systems (Kafka, Unix-domain sockets, gRPC streaming) — anywhere a producer/consumer boundary needs to stream rather than batch.

Pipes were not in the original Unix. M. Doug McIlroy floated the "garden hose" memo in 1964, and Ken Thompson finally implemented the | operator and pipe(2) system call overnight on January 15, 1973, in Version 3 Unix. The Bell Labs team called it "a day-long orgy of one-liners." Sixty years later the same character does the same job in every POSIX shell.

How it works

A shell pipeline like ps aux | grep nginx | wc -l does the following:

  1. The shell calls pipe(2) twice — once between ps and grep, once between grep and wc. Each call returns two file descriptors (read_fd, write_fd) backed by a kernel-resident ring buffer.
  2. The shell forks a child for each stage, then uses dup2(2) to wire each child's fd 0 (stdin) and fd 1 (stdout) to the appropriate pipe end before execing the command. The producer's write end becomes the consumer's stdin; descriptors not needed by a given child are closed so the kernel knows when the pipe is truly orphaned.
  3. All three processes run concurrently — they are real OS processes scheduled independently. As ps writes, grep is reading; as grep writes, wc is reading. There is no temp file and no batch boundary.
  4. The kernel ring buffer provides back-pressure. On Linux the default capacity is 64 KiB (16 × 4 KiB pages); writes of up to PIPE_BUF (4096 bytes on Linux) are guaranteed atomic. When the buffer is full, the writer blocks in write(2) until the reader drains it. When the buffer is empty, the reader blocks in read(2) until more data arrives.
  5. When the writer closes its end and the buffer is drained, the reader sees EOF (a zero-byte read). When the reader closes its end while the writer still has data to push, the kernel raises SIGPIPE on the writer, which by default terminates it — this is the famous "Broken pipe" error you see when you pipe yes into head.

Two important variants extend the basic model:

  • Named pipes (FIFOs) — created with mkfifo, they live on the filesystem so unrelated processes can find them by path. Behaviour is otherwise identical to anonymous pipes (same kernel buffer, same SIGPIPE semantics), but open() blocks until both a reader and a writer have shown up.
  • Object pipelines — PowerShell and similar shells pass typed objects (e.g. System.Diagnostics.Process instances with Name, Id, CPU properties) instead of bytes. The consumer can address fields by name (| Where-Object CPU -gt 10) with no awk/cut parsing required. The trade-off is that you give up the "everything is a stream of bytes" universality that makes Unix pipes interoperate with any language and any tool.

Modern shells have also begun to shrink the pipe model where a pipe is overkill. Two 2025-era examples worth knowing because they change the trade-offs around small captures and optional inputs:

  • Bash 5.3 in-shell command substitution (July 2025) introduces ${ cmd; } and ${| cmd; }. Classic $(cmd) always forks a subshell connected by an anonymous pipe so the parent can capture stdout — cheap once, expensive in tight loops. The new forms run cmd in the current shell with no fork and no pipe; ${ cmd; } captures stdout, ${| cmd; } returns whatever the function leaves in $REPLY. The pipe semantics still apply when you actually want concurrency or backpressure; these forms are the right call for "I just need this command's output as a string, right now, in this shell."
  • Fish 4.0 silent-optional input redirection (<? file, January 2025) reads from file if it exists and silently falls back to /dev/null otherwise. It collapses the [ -f file ] && cat file | mycmd guard — a pipeline you only need because of the optional file — into a single redirection, which is how pipes were always meant to work when the input source is genuinely a stream.

Common pitfalls

  1. Forgetting set -o pipefail — by default, a bash pipeline's exit status is only the status of the last command. failing-cmd | tee log returns 0 even if failing-cmd crashed, because tee succeeded. Set pipefail (and ideally the rest of strict mode: set -euo pipefail) so the pipeline returns the rightmost non-zero status.
  2. SIGPIPE killing the producer silently — when a downstream stage exits early (yes | head -n 1), the upstream process is hit with SIGPIPE. In a shell that's usually fine, but a long-running program (Python, Node, a service) needs to either trap the signal or ignore EPIPE from write() — otherwise the parent looks like it crashed for no reason. In Python, signal.signal(signal.SIGPIPE, signal.SIG_DFL) restores the default at process start.
  3. Atomicity assumptions above PIPE_BUF — only writes of PIPE_BUF bytes or fewer (4096 on Linux) are guaranteed atomic. Anything larger may interleave with concurrent writers, corrupting line-oriented consumers. If multiple producers write to the same pipe, keep each write() ≤ 4096 bytes or serialize through a single writer.
  4. Subshell scoping in the last stagecmd | while read line; do count=$((count+1)); done runs the loop in a subshell, so count is lost when the pipeline exits. Use process substitution (while read line; do …; done < <(cmd)) when you need variable assignments to survive.
  5. Mixing buffered I/O with pipes — when stdout is a pipe, libc switches to block buffering (typically 4 KiB) instead of line buffering. A long-running grep --line-buffered or stdbuf -oL cmd is needed for real-time output through | tee or | less.
  6. Treating named pipes like regular filesmkfifo /tmp/p; echo hi > /tmp/p blocks until a reader opens it. The mistake is assuming the write completes immediately; nothing happens until both ends are open.
  7. Confusing the Unix and PowerShell models — copy-pasting ps | grep foo | awk '{print $2}' into PowerShell will not work as expected, because PowerShell passes objects and grep/awk are not native cmdlets. The equivalent is Get-Process | Where-Object Name -like '*foo*' | Select-Object Id.
  8. Reaching for a pipe when a fork-free substitution is cheaper — every $(cmd) and every shell pipeline forks at least one child and wires an anonymous pipe to capture stdout. Inside a tight loop (for i in {1..10000}; do x=$(date +%s); done) the fork cost dominates the work. On Bash 5.3+, ${ cmd; } runs in the current shell with no fork and no pipe, and ${| cmd; } returns via $REPLY for helper functions. Reach for them only when you don't need concurrency or backpressure — i.e. when you were treating the pipe as a glorified capture channel rather than a streaming boundary.

Where to go next

  • #streaming — pipes are the OS-level realisation of the broader "stream of bytes" abstraction; the streaming tag covers in-process streams, network sockets, and async iterators.
  • /sections/linux/bash-redirection — the redirection-operator companion to |, including tee, process substitution, named pipes (FIFOs), coproc, and set -o pipefail.
  • /sections/linux/tr-xargs — two of the most frequently piped utilities, with xargs -P for parallel fan-out across pipeline stages.
  • /sections/linux/grep — the prototypical pipeline filter; demonstrates --line-buffered and how to keep colour through a pipe.
  • /sections/linux/fzf — interactive fuzzy filter that reads stdin and writes the selection to stdout, slotting into any pipeline.
  • /sections/osx/pbcopy-pbpaste — macOS clipboard endpoints that act as pipe sinks/sources (some-cmd | pbcopy).

Sources

References consulted while writing this concept page. Links open in a new tab.

  • The New Stack — How the pipe system call came about — Provided the 1964 McIlroy "garden hose" memo and the January 15, 1973 implementation date for the Definition and "Why it matters" history paragraph.
  • Wikipedia — Pipeline (Unix) — Canonical overview of the shell | operator, concurrency model, and exit-status behaviour; backed the "How it works" walkthrough of the three-stage pipeline.
  • Linux pipe(7) man page — Authoritative source for pipe(2) semantics, the 64 KiB default buffer, F_SETPIPE_SZ, and SIGPIPE/EPIPE behaviour.
  • GNU libc manual — Pipe Atomicity — Defined the PIPE_BUF (4096 bytes on Linux) atomicity guarantee referenced in pitfall #3.
  • Baeldung — Anonymous and Named Pipes in Linux — Clarified the FIFO/anonymous distinction (persistence, unrelated processes, open-blocks-until-both-ends) used in the "How it works" variants section.
  • John D. Cook — Comparing the Unix and PowerShell pipelines — Grounded the object-pipeline contrast in "How it works" and pitfall #7.
  • Linuxiac — Bash 5.3 release: new in-shell command substitution — Source for the ${ cmd; } and ${| cmd; } fork-free substitution forms covered in the "How it works" modern-shells aside and pitfall #8.
  • Fish shell — Release notes (4.0 silent-optional redirection) — Authoritative source for the <? file operator added in fish 4.0 and referenced in the modern-shells aside.