concept · weight 5
Pipes
An operating-system primitive that streams one process's stdout into another's stdin, letting small composable tools build large data flows.
Pipes
Definition
A pipe is a unidirectional, in-kernel byte buffer that connects one process's standard output to another process's standard input. The shell operator | is the user-facing syntax, but the underlying primitive is the pipe(2) system call, which returns a pair of file descriptors — one for reading, one for writing — sharing a fixed-size kernel ring buffer. Pipes apply anywhere a programming environment needs to compose independent processes into a streaming data flow without intermediate temp files.
Why it matters
Pipes are the reason the Unix toolset is greater than the sum of its parts. Each utility (grep, sort, awk, jq, cut, xargs) does one thing and reads from stdin / writes to stdout, so any of them can be chained in any order. That composition collapses what would otherwise be bespoke scripts into a single line, runs every stage in parallel (each process is scheduled independently the moment data is available), and keeps memory flat because the kernel buffer back-pressures producers as soon as the consumer falls behind. The same model carries over to PowerShell (where the payload is .NET objects instead of bytes), to programming languages (subprocess.Popen in Python, streams in Node, channels in Go), and to distributed systems (Kafka, Unix-domain sockets, gRPC streaming) — anywhere a producer/consumer boundary needs to stream rather than batch.
Pipes were not in the original Unix. M. Doug McIlroy floated the "garden hose" memo in 1964, and Ken Thompson finally implemented the | operator and pipe(2) system call overnight on January 15, 1973, in Version 3 Unix. The Bell Labs team called it "a day-long orgy of one-liners." Sixty years later the same character does the same job in every POSIX shell.
How it works
A shell pipeline like ps aux | grep nginx | wc -l does the following:
- The shell calls
pipe(2)twice — once betweenpsandgrep, once betweengrepandwc. Each call returns two file descriptors(read_fd, write_fd)backed by a kernel-resident ring buffer. - The shell
forks a child for each stage, then usesdup2(2)to wire each child's fd 0 (stdin) and fd 1 (stdout) to the appropriate pipe end beforeexecing the command. The producer's write end becomes the consumer's stdin; descriptors not needed by a given child are closed so the kernel knows when the pipe is truly orphaned. - All three processes run concurrently — they are real OS processes scheduled independently. As
pswrites,grepis reading; asgrepwrites,wcis reading. There is no temp file and no batch boundary. - The kernel ring buffer provides back-pressure. On Linux the default capacity is 64 KiB (16 × 4 KiB pages); writes of up to
PIPE_BUF(4096 bytes on Linux) are guaranteed atomic. When the buffer is full, the writer blocks inwrite(2)until the reader drains it. When the buffer is empty, the reader blocks inread(2)until more data arrives. - When the writer closes its end and the buffer is drained, the reader sees EOF (a zero-byte
read). When the reader closes its end while the writer still has data to push, the kernel raisesSIGPIPEon the writer, which by default terminates it — this is the famous "Broken pipe" error you see when you pipeyesintohead.
Two important variants extend the basic model:
- Named pipes (FIFOs) — created with
mkfifo, they live on the filesystem so unrelated processes can find them by path. Behaviour is otherwise identical to anonymous pipes (same kernel buffer, same SIGPIPE semantics), butopen()blocks until both a reader and a writer have shown up. - Object pipelines — PowerShell and similar shells pass typed objects (e.g.
System.Diagnostics.Processinstances withName,Id,CPUproperties) instead of bytes. The consumer can address fields by name (| Where-Object CPU -gt 10) with noawk/cutparsing required. The trade-off is that you give up the "everything is a stream of bytes" universality that makes Unix pipes interoperate with any language and any tool.
Modern shells have also begun to shrink the pipe model where a pipe is overkill. Two 2025-era examples worth knowing because they change the trade-offs around small captures and optional inputs:
- Bash 5.3 in-shell command substitution (July 2025) introduces
${ cmd; }and${| cmd; }. Classic$(cmd)always forks a subshell connected by an anonymous pipe so the parent can capture stdout — cheap once, expensive in tight loops. The new forms runcmdin the current shell with no fork and no pipe;${ cmd; }captures stdout,${| cmd; }returns whatever the function leaves in$REPLY. The pipe semantics still apply when you actually want concurrency or backpressure; these forms are the right call for "I just need this command's output as a string, right now, in this shell." - Fish 4.0 silent-optional input redirection (
<? file, January 2025) reads fromfileif it exists and silently falls back to/dev/nullotherwise. It collapses the[ -f file ] && cat file | mycmdguard — a pipeline you only need because of the optional file — into a single redirection, which is how pipes were always meant to work when the input source is genuinely a stream.
Common pitfalls
- Forgetting
set -o pipefail— by default, a bash pipeline's exit status is only the status of the last command.failing-cmd | tee logreturns 0 even iffailing-cmdcrashed, becauseteesucceeded. Setpipefail(and ideally the rest of strict mode:set -euo pipefail) so the pipeline returns the rightmost non-zero status. - SIGPIPE killing the producer silently — when a downstream stage exits early (
yes | head -n 1), the upstream process is hit withSIGPIPE. In a shell that's usually fine, but a long-running program (Python, Node, a service) needs to either trap the signal or ignoreEPIPEfromwrite()— otherwise the parent looks like it crashed for no reason. In Python,signal.signal(signal.SIGPIPE, signal.SIG_DFL)restores the default at process start. - Atomicity assumptions above
PIPE_BUF— only writes ofPIPE_BUFbytes or fewer (4096 on Linux) are guaranteed atomic. Anything larger may interleave with concurrent writers, corrupting line-oriented consumers. If multiple producers write to the same pipe, keep eachwrite()≤ 4096 bytes or serialize through a single writer. - Subshell scoping in the last stage —
cmd | while read line; do count=$((count+1)); doneruns the loop in a subshell, socountis lost when the pipeline exits. Use process substitution (while read line; do …; done < <(cmd)) when you need variable assignments to survive. - Mixing buffered I/O with pipes — when stdout is a pipe, libc switches to block buffering (typically 4 KiB) instead of line buffering. A long-running
grep --line-bufferedorstdbuf -oL cmdis needed for real-time output through| teeor| less. - Treating named pipes like regular files —
mkfifo /tmp/p; echo hi > /tmp/pblocks until a reader opens it. The mistake is assuming the write completes immediately; nothing happens until both ends are open. - Confusing the Unix and PowerShell models — copy-pasting
ps | grep foo | awk '{print $2}'into PowerShell will not work as expected, because PowerShell passes objects andgrep/awkare not native cmdlets. The equivalent isGet-Process | Where-Object Name -like '*foo*' | Select-Object Id. - Reaching for a pipe when a fork-free substitution is cheaper — every
$(cmd)and every shell pipeline forks at least one child and wires an anonymous pipe to capture stdout. Inside a tight loop (for i in {1..10000}; do x=$(date +%s); done) the fork cost dominates the work. On Bash 5.3+,${ cmd; }runs in the current shell with no fork and no pipe, and${| cmd; }returns via$REPLYfor helper functions. Reach for them only when you don't need concurrency or backpressure — i.e. when you were treating the pipe as a glorified capture channel rather than a streaming boundary.
Where to go next
- #streaming — pipes are the OS-level realisation of the broader "stream of bytes" abstraction; the streaming tag covers in-process streams, network sockets, and async iterators.
- /sections/linux/bash-redirection — the redirection-operator companion to
|, includingtee, process substitution, named pipes (FIFOs),coproc, andset -o pipefail. - /sections/linux/tr-xargs — two of the most frequently piped utilities, with
xargs -Pfor parallel fan-out across pipeline stages. - /sections/linux/grep — the prototypical pipeline filter; demonstrates
--line-bufferedand how to keep colour through a pipe. - /sections/linux/fzf — interactive fuzzy filter that reads stdin and writes the selection to stdout, slotting into any pipeline.
- /sections/osx/pbcopy-pbpaste — macOS clipboard endpoints that act as pipe sinks/sources (
some-cmd | pbcopy).
Sources
References consulted while writing this concept page. Links open in a new tab.
- The New Stack — How the pipe system call came about — Provided the 1964 McIlroy "garden hose" memo and the January 15, 1973 implementation date for the Definition and "Why it matters" history paragraph.
- Wikipedia — Pipeline (Unix) — Canonical overview of the shell
|operator, concurrency model, and exit-status behaviour; backed the "How it works" walkthrough of the three-stage pipeline. - Linux pipe(7) man page — Authoritative source for
pipe(2)semantics, the 64 KiB default buffer,F_SETPIPE_SZ, and SIGPIPE/EPIPE behaviour. - GNU libc manual — Pipe Atomicity — Defined the
PIPE_BUF(4096 bytes on Linux) atomicity guarantee referenced in pitfall #3. - Baeldung — Anonymous and Named Pipes in Linux — Clarified the FIFO/anonymous distinction (persistence, unrelated processes, open-blocks-until-both-ends) used in the "How it works" variants section.
- John D. Cook — Comparing the Unix and PowerShell pipelines — Grounded the object-pipeline contrast in "How it works" and pitfall #7.
- Linuxiac — Bash 5.3 release: new in-shell command substitution — Source for the
${ cmd; }and${| cmd; }fork-free substitution forms covered in the "How it works" modern-shells aside and pitfall #8. - Fish shell — Release notes (4.0 silent-optional redirection) — Authoritative source for the
<? fileoperator added in fish 4.0 and referenced in the modern-shells aside.