cheat sheet

SoX

Comprehensive SoX reference covering file info, format conversion, synthesis, playback, recording, combining, effects (trim, reverb, compand, rate, pitch, tempo, noise gate), batch processing, spectrograms, and piping.

SoX — Swiss Army Knife of Audio

What it is

SoX (Sound eXchange) is a free, open-source cross-platform command-line audio processing tool originally developed in the 1990s and still actively packaged by major Linux distributions. It reads and writes audio files in nearly all common formats and can apply a chainable set of effects — trim, normalize, resample, reverb, compand, and more — in a single pass. Reach for SoX when you need scriptable, non-destructive audio conversion, batch processing, or effects that would take several steps in a GUI editor; three binaries are installed: sox (process/convert), play (playback), and rec (record).

Maintenance status (2026)

The upstream SourceForge project has been dormant since 14.4.2 (22 February 2015) — no further releases have shipped from the original maintainers. In practice, the binary that distributions install today is almost always 14.4.2 plus a long stack of distro-applied bug-fix and CVE patches, so the command-line surface documented below is still current. The de-facto active continuation is sox_ng — a hard fork from 14.4.2 dated 2024-05-18 that consolidates fixes from the ~50 downstream distributions and other forks. sox_ng follows semantic versioning on a six-month cadence (micro = bug fixes, minor = features); 14.4.3 cleared all known CVEs, 14.5+ added new features, and 14.9.0 is scheduled for 2026-11-18. Homebrew, Debian, and most other distros still ship the original sox package name and gradually backport sox_ng patches.

When to consider alternatives:

Use caseBetter tool
Modern codecs (Opus, AAC, AV1 audio, container handling)ffmpeg — broader format coverage, actively maintained
Resampling inside your own programlibsoxr (the SoX resampler library, usable standalone)
Reading/writing audio file headers from codelibsndfile
Loudness-normalised batch processingffmpeg -af loudnorm or r128gain
GUI editing with a scriptable CLIAudacity (audacity --cli macros)

For most ad-hoc effect chains, batch scripts, spectrogram generation, and quick synthesis, plain sox (or sox_ng) remains the most ergonomic choice.

Installation

bash
# macOS
brew install sox

# Ubuntu / Debian
sudo apt install sox libsox-fmt-all

# Arch / Manjaro
sudo pacman -S sox

# Fedora
sudo dnf install sox

# Check version
sox --version

Output (sox --version):

text
SoX v14.4.2

The reported version string has been frozen at 14.4.2 for over a decade — the binary you get from Homebrew or apt is upstream 14.4.2 plus distro patches. To check whether your distribution has switched to the sox_ng fork (some have, under the same sox package name), look for 14.4.3 or higher, or run sox -h 2>&1 | head -1.

General usage

SoX follows an input → effects chain → output model: you specify one or more input files, optional per-file flags (sample rate, channel count, bit depth), an output file, and then a sequence of effects to apply in order. Using -n as the output discards audio and is the standard way to run analysis-only effects like stats or spectrogram.

bash
sox [global_flags] [input_flags] infile [input_flags] infile2 \
    [output_flags] outfile [effect [args]] ...

# Special output "-n" discards audio (use for analysis, synthesis to null)
sox input.wav -n stats

# Read from stdin / write to stdout
sox -t wav - -t wav - gain -3

# Pipe chain
sox input.wav -t wav - | sox -t wav - output.flac

Output: (none — exits 0 on success)

Getting information

soxi (or sox --info) reads the audio file header and reports sample rate, channel count, bit depth, duration, and encoding without processing any audio. The -n output with stats goes further, measuring actual signal properties like RMS level, peak, and DC offset across the decoded samples.

bash
# Full file info: sample rate, channels, bit depth, duration, encoding
sox --info input.wav
soxi input.wav

# Individual fields
soxi -r input.wav            # sample rate (Hz)
soxi -c input.wav            # number of channels
soxi -b input.wav            # bit depth
soxi -D input.wav            # duration in seconds (float)
soxi -s input.wav            # sample count
soxi -e input.wav            # encoding type
soxi -t input.wav            # file type / format

# Measure audio properties: RMS, peak, DC offset, noise floor
sox input.wav -n stats

# Measure loudness (EBU R128 integrated, LRA, true-peak)
sox input.wav -n loudness

Output (soxi input.wav):

text
Input File     : 'input.wav'
Channels       : 2
Sample Rate    : 44100
Precision      : 16-bit
Duration       : 00:03:45.12 = 9924912 samples = 16875.4 CDDA sectors
File Size      : 19.9M
Bit Rate       : 1.41M
Sample Encoding: 16-bit Signed Integer PCM

Output (soxi -r input.wav):

text
44100

Output (soxi -D input.wav):

text
225.123946

Output (sox input.wav -n stats):

text
             Overall   Left      Right
DC offset   0.000027  0.000031  0.000022
Min level  -0.999939 -0.999939 -0.970001
Max level   0.999939  0.979980  0.999939
Pk lev dB   -0.00     -0.18     -0.00
RMS lev dB  -18.54   -19.12    -18.00
RMS Pk dB   -12.47   -13.31    -11.87
RMS Tr dB   -86.23   -82.14    -89.15
Crest factor   -         7.96      7.99
Flat factor     0.00      0.00      0.00
Pk count        2         1         1
Bit-depth      16/16    16/16    16/16
Num samples   264k
Length s      5.987
Scale max   1.000000
Window s       0.050

Format conversion

SoX infers format from the file extension automatically.

bash
# Basic conversion (wav → flac)
sox input.wav output.flac

# wav → mp3
sox input.wav output.mp3

# mp3 → flac
sox input.mp3 output.flac

# wav → ogg
sox input.wav output.ogg

# Set mp3 bitrate in kbps (using -C / --compression)
sox input.wav -C 256 output.mp3

# Lossless: wav → 24-bit flac
sox input.wav -b 24 output.flac

# Resample to 48 kHz
sox input.wav -r 48000 output.wav

# Resample to 8 kHz mono (telephony)
sox input.wav -r 8000 -c 1 output.wav

# Convert 32-bit float to 16-bit signed integer PCM
sox input.wav -e signed-integer -b 16 output.wav

# Read headerless raw audio (must specify all parameters explicitly)
sox -r 44100 -e signed-integer -b 16 -c 1 raw-audio.raw raw-audio.wav

# Force output format when extension is ambiguous
sox input.wav -t flac output.flac

Output: (none — exits 0 on success)

Resampling and quality

SoX's rate effect converts between sample rates using a high-quality polyphase filter; the -v flag selects the very-high-quality (VHQ) mode, which is slower but avoids audible aliasing. Always pair a bit-depth reduction with dither to spread quantisation noise below the noise floor rather than letting it accumulate as harmonic distortion.

bash
# Resample with explicit quality (VHQ = very high quality)
sox input.wav -r 48000 output.wav rate -v

# Dither when reducing bit depth (reduces quantisation noise)
sox input.wav -b 16 output.wav dither

# Full mastering chain: resample → bit-reduce → dither
sox input.wav -r 44100 -b 16 output.wav rate -v dither

Output: (none — exits 0 on success)

Trimming and splicing

The trim effect cuts audio to a time range: the first argument is the start offset and the second is either a duration or, when prefixed with =, an end position. pad adds silence at the start or end, and the silence effect removes samples below a threshold — useful for stripping leader/trailer silence from recordings automatically.

bash
# Keep first 30 seconds
sox input.wav output.wav trim 0 30

# Start at 1:30, keep 10 seconds
sox input.wav output.wav trim 1:30 10

# Remove first 5 seconds (start offset only)
sox input.wav output.wav trim 5

# Extract from 1:00 to 2:30 (end position, not duration — prefix with =)
sox input.wav output.wav trim 1:00 =2:30

# Pad silence: 1s at start, 2s at end
sox input.wav output.wav pad 1 2

# Trim digital silence from both ends
sox input.wav output.wav silence 1 0.1 1% 1 0.1 1%

# Trim silence only from the end
sox input.wav output.wav reverse silence 1 0.1 1% reverse

Output: (none — exits 0 on success)

Level and dynamics

gain applies a fixed dB offset; norm finds the peak sample and sets the gain so that peak reaches a target level (0 dBFS by default). compand is SoX's combined compressor/expander: it takes attack/decay times and a transfer curve mapping input dB to output dB, making it versatile enough to handle gentle compression, aggressive limiting, or noise gating within a single effect.

bash
# Reduce level by 12 dB
sox input.wav output.wav gain -12

# Increase level by 3 dB
sox input.wav output.wav gain 3

# Normalize to 0 dBFS (loudest peak = full scale)
sox input.wav output.wav norm

# Normalize to -3 dBFS headroom
sox input.wav output.wav norm -3

# Gain with clipping guard (stat-based)
sox input.wav output.wav gain -n -3

# Compand: soft knee compressor/expander
# attack,decay  in-dB:out-dB pairs  gain  initial-volume  delay
sox input.wav output.wav compand 0.01,0.3 -80,-80,-40,-40,-20,-10,0,-6 0 -90 0.1

# Limiter: hard-limit peaks above -1 dBFS
sox input.wav output.wav gain -1 norm -1

Output: (none — exits 0 on success)

EQ and filtering

SoX provides shelf filters (bass, treble), a parametric EQ band (equalizer with center frequency, Q/width, and gain in dB), and brickwall filters (highpass, lowpass, bandpass, bandreject). Multiple filter effects can be chained in a single command and are applied left-to-right in one processing pass.

bash
# Bass boost: +6 dB at 100 Hz
sox input.wav output.wav bass 6 100

# Treble cut: -6 dB at 8 kHz
sox input.wav output.wav treble -6 8000

# Parametric EQ band: boost 4 dB at 1 kHz, Q=0.7
sox input.wav output.wav equalizer 1000 0.7 4

# High-pass filter at 80 Hz (order 2)
sox input.wav output.wav highpass 80

# Low-pass filter at 12 kHz
sox input.wav output.wav lowpass 12000

# Band-pass filter: center 1000 Hz, width 200 Hz
sox input.wav output.wav bandpass 1000 200

# Band-reject (notch) at 50 Hz (hum removal)
sox input.wav output.wav bandreject 50 1

# Chain multiple effects
sox input.wav output.wav highpass 80 equalizer 1000 0.7 3 treble -3 8000

Output: (none — exits 0 on success)

Reverb and spatial effects

SoX's reverb effect implements the Freeverb algorithm, taking parameters for reverberance (room decay), high-frequency damping, room scale, and stereo depth. Use it to add warmth to dry recordings or simulate a specific acoustic environment; echo is a simpler delay-based effect suited for pronounced slapback or multi-tap echoes.

bash
# Room reverb (reverberance% HF-damping% room-scale% stereo-depth%)
sox input.wav output.wav reverb 50 50 100

# Large hall reverb
sox input.wav output.wav reverb 80 40 90 100 0 0

# Stereo widening
sox input.wav output.wav oops

# Delay: 0.5 s echo at 70% level
sox input.wav output.wav echo 0.8 0.88 500 0.7

Output: (none — exits 0 on success)

Time and pitch manipulation

speed changes both duration and pitch together (like playing a tape faster); tempo stretches or compresses duration while leaving pitch intact using a phase-vocoder algorithm; pitch shifts pitch in cents (100 per semitone) without affecting duration. These effects are computationally heavier than most and can introduce artifacts at extreme ratios.

bash
# Change playback speed (changes both pitch and duration)
sox input.wav output.wav speed 1.5

# Change tempo without affecting pitch (time-stretch)
sox input.wav output.wav tempo 1.2

# Change pitch without affecting duration (+3 semitones)
sox input.wav output.wav pitch 300        # cents: 100 per semitone

# Pitch down by 2 semitones
sox input.wav output.wav pitch -200

# Rate change (resample — different from speed)
sox input.wav output.wav rate 22050

Output: (none — exits 0 on success)

Channel manipulation

The remix effect selects and reorders channels by number (1-indexed), making it the go-to tool for extracting a single channel, swapping left/right, or building a custom channel mix from a surround file. channels does a simple downmix or upmix by summing or duplicating channels with equal weighting; for finer control over the mix matrix, use remix with explicit channel assignments.

bash
# Extract left channel (channel numbering starts at 1)
sox stereo.wav left.wav remix 1

# Extract right channel
sox stereo.wav right.wav remix 2

# Extract channels 1, 3, 5 from a surround file
sox surround.wav subset.wav remix 1 3 5

# Merge two mono files into stereo
sox -M left.wav right.wav stereo.wav

# Downmix stereo to mono
sox stereo.wav mono.wav channels 1

# Upmix mono to stereo
sox mono.wav stereo.wav channels 2

# Swap left and right channels
sox stereo.wav swapped.wav remix 2 1

# Mix down all channels with equal weighting
sox 4ch.wav -c 1 mixed.wav

Output: (none — exits 0 on success)

Combining files

By default, SoX concatenates multiple input files end-to-end into a single output. The -m flag switches to mix mode, which sums the files sample-by-sample (all inputs must share the same sample rate and channel count); -M multiplexes separate mono files as distinct channels in one multichannel file instead of summing them.

bash
# Concatenate files end-to-end
sox a.wav b.wav c.wav concatenated.wav

# Mix files (sum channels — same sample rate and channel count required)
sox -m a.wav b.wav mixed.wav

# Mix with level scaling to avoid clipping
sox -m -v 0.5 a.wav -v 0.5 b.wav mixed.wav

# Multiplex mono files as separate channels in one file
sox -M left.wav right.wav stereo.wav

# Overlay at a specific time offset (pad the second file)
sox a.wav b.wav combined.wav pad 5           # b starts 5 s into a

Output: (none — exits 0 on success)

Synthesizing audio

SoX can generate test signals without an input file using -n as the input.

bash
# 1 second of white noise
sox -n -r 44100 -c 1 noise.wav synth 1 noise

# 1 second of pink noise (more natural)
sox -n -r 44100 -c 1 pinknoise.wav synth 1 pinknoise

# 1-second 440 Hz sine tone
sox -n -r 44100 sine440.wav synth 1 sine 440

# 1-second square wave at 440 Hz
sox -n -r 44100 square.wav synth 1 square 440

# 1-second sawtooth at 440 Hz
sox -n -r 44100 saw.wav synth 1 sawtooth 440

# Stereo sine with fade in/out
sox -n -r 44100 -c 2 tone.wav synth 3 sine 440 fade 0.3 3 0.3

# Logarithmic sine sweep 40 Hz → 20 kHz over 10 seconds
sox -n -r 44100 sweep.wav synth 10 sine 40/20000

# Dirac impulse (1 sample at full scale, rest silence)
sox -n -r 44100 -c 1 impulse.wav synth 1s square pad 0 44099s

# DTMF dial tone for digit "5"
sox -n dtmf5.wav synth 0.3 sine 770 sine 1336 remix -

Output: (none — exits 0 on success)

Playback

play is a thin wrapper around sox that routes audio to the system's default output device instead of a file. It accepts the same effect chain as sox, so you can apply gain, reverb, or trimming in real time without writing a temporary file; using -n as the input with synth lets you audition generated tones directly.

bash
# Play a file through the default audio output
play input.wav

# Play with effects applied live
play input.wav gain -6 reverb 30

# Play at double speed
play input.wav speed 2.0

# Play only a section (seconds 10–20)
play input.wav trim 10 10

# Play synthesized audio directly (no output file)
play -n synth 1 sine 440

# Play 10 s of pink noise at -20 dBFS
play -n synth 10 pinknoise gain -20

# Play a streaming URL
play -t mp3 http://stream.example.com/radio.mp3

# Play at lower volume
play input.wav gain -12

Output: (none — exits 0 on success)

Recording

rec is the recording counterpart to play: it captures audio from the default input device (microphone) and writes it to a file, accepting the same format flags and effect chain as sox. On macOS use -t coreaudio to address a named device by name; on Linux use -t alsa with a hardware address from arecord -l.

bash
# Record from the default microphone to wav
rec recording.wav

# Record 10 seconds then stop
rec recording.wav trim 0 10

# Record in FLAC at 48 kHz
rec -r 48000 -b 24 recording.flac

# Record and apply noise gate in real time
rec recording.wav silence 1 0.5 3%

# macOS — list available input devices
sox -V -t coreaudio null -n 2>&1 | grep "Found Audio" | cut -d'"' -f2

# macOS — record from a named device
sox -t coreaudio "MacBook Pro Microphone" recording.wav

# Linux ALSA — list capture devices
arecord -l

# Linux ALSA — record from hw:0,0
sox -t alsa hw:0,0 recording.wav

Output (sox -V -t coreaudio null -n 2>&1 | grep "Found Audio" | cut -d'"' -f2):

text
MacBook Pro Microphone
MacBook Pro Speakers
External Microphone

Noise reduction

SoX's noisered effect uses a two-pass approach: first run noiseprof on a segment of silence to capture a noise fingerprint, then apply that profile to the full file to attenuate the noise floor. The threshold parameter (0.0–1.0) controls aggressiveness — lower values reduce more noise but risk removing wanted signal; 0.15–0.25 is a practical starting range.

bash
# Two-pass noise removal:
# 1. Capture a noise profile from a silent section (0–2 s)
sox input.wav -n trim 0 2 noiseprof noise.prof

# 2. Apply the profile to the full file
sox input.wav output.wav noisered noise.prof 0.21

# Tune the amount (0.0 = aggressive, 0.3 = gentle — higher = more artifacts)
sox input.wav output.wav noisered noise.prof 0.15

# Combine noise reduction with gating
sox input.wav output.wav noisered noise.prof 0.21 \
    silence 1 0.05 1%

Output: (none — exits 0 on success)

Batch processing

bash
# Convert all wav files to flac (bash glob)
for f in *.wav; do
    sox "$f" "${f%.wav}.flac"
done

# Normalize every mp3 in the current directory
for f in *.mp3; do
    sox "$f" "norm_$f" norm -3
done

# Resample all wavs to 16 kHz mono (in-place via temp file)
for f in *.wav; do
    sox "$f" tmp_out.wav rate 16000 channels 1
    mv tmp_out.wav "$f"
done

# Batch trim silence from a folder of recordings
for f in recordings/*.wav; do
    sox "$f" "trimmed/$(basename "$f")" silence 1 0.1 1% 1 0.1 1%
done

Output: (none — exits 0 on success)

Visualization

The spectrogram effect renders a time-frequency plot of the audio as a PNG image, with time on the x-axis, frequency on the y-axis, and amplitude encoded as color intensity. It must be used with -n as the output (discard audio) and is useful for visually inspecting noise floors, identifying frequency content, or checking the result of EQ and filtering.

bash
# Generate a spectrogram image (output: spectrogram.png)
sox input.wav -n spectrogram

# Custom title and output path
sox input.wav -n spectrogram -t "My Recording" -o my_spectrogram.png

# Zoom in on the first 10 seconds
sox input.wav -n trim 0 10 spectrogram

# High-resolution spectrogram (wider)
sox input.wav -n spectrogram -x 1200 -y 600 -o wide.png

Output: (none — exits 0 on success)

Piping and stdout

Using -t wav - as the output tells SoX to write raw WAV to stdout, enabling audio processing pipelines with tools like ffmpeg, netcat, or a second SoX invocation. Use -p as a shorthand for -t sox - (SoX's internal format) when chaining multiple SoX processes, as it avoids the overhead of encoding and decoding WAV headers.

bash
# Send processed audio to stdout (pipe to another tool)
sox input.wav -t wav - gain -6 | ffmpeg -i pipe:0 output.aac

# Read from ffmpeg stdout, process, write file
ffmpeg -i input.mkv -f wav pipe:1 | sox -t wav - output.flac

# Stream audio to a network sink (e.g. netcat)
sox input.wav -t wav - | nc host 9999

# Apply effects and listen immediately (no temp file)
sox input.wav -t wav - reverb 50 | play -t wav -

# Measure stats without creating a file
sox input.wav -n stat

Output (sox input.wav -n stat):

text
Samples read:          264600
Length (seconds):       3.000
Scaled by:         2147483647.0
Maximum amplitude:   0.999939
Minimum amplitude:  -0.999939
Midline amplitude:   0.000000
Mean    norm:         0.165820
Mean    amplitude:   -0.000027
RMS     amplitude:    0.187649
Maximum delta:        0.334167
Minimum delta:        0.000000
Mean    delta:        0.014630
RMS     delta:        0.025274
Rough   frequency:      882
Volume adjustment:        1.000

Common effect reference

EffectExamplePurpose
gaingain -6Adjust level in dB
normnorm -3Normalize to peak dBFS
trimtrim 5 30Cut to time range
padpad 1 2Add silence
fadefade 1 60 3Fade in / out
reversereverseTime-reverse
speedspeed 1.5Tempo + pitch
tempotempo 1.2Tempo only
pitchpitch 300Pitch only (cents)
raterate 48000Resample
channelschannels 1Downmix
remixremix 1 2Select/remap channels
reverbreverb 50 50 100Room reverb
echoecho 0.8 0.88 500 0.7Delay/echo
compandsee compand sectionCompress/expand
highpasshighpass 80HP filter
lowpasslowpass 12000LP filter
equalizerequalizer 1k 0.7 4Parametric EQ
bassbass 6 100Low shelf
trebletreble -3 8000High shelf
silencesilence 1 0.1 1%Silence trim/gate
noiserednoisered noise.prof 0.21Noise removal
stat(piped to -n)Statistics
stats(piped to -n)Stereo statistics
spectrogram(piped to -n)PNG spectrogram
synthsynth 1 sine 440Signal generation
ditherditherDither before bit-reduce

Sources