cheat sheet

qsv

Comprehensive reference for qsv: count, headers, stats, moarstats, select, search, sort, dedup, frequency, join, sqlp, luau, apply, schema, validate, sample, split, MCP server, and more — with examples and outputs.

qsv — CSV Toolkit

What it is

qsv is a blazing-fast, Rust-based CSV toolkit with 80+ subcommands for querying, transforming, analyzing, and validating tabular data — a maintained, feature-rich fork of the original xsv project. It adds Polars-backed acceleration, an embedded Luau scripting engine, and support for CSV, TSV, Excel, JSON, Parquet, and Apache Arrow formats. Reach for qsv when you need to slice, filter, join, or summarize structured tabular data from the command line without loading it into a full database or spreadsheet.

Install

bash
# macOS
brew install qsv

# Windows
scoop install qsv

# Cargo
cargo install qsv --locked

# Or download binary from releases
curl -LO https://github.com/dathere/qsv/releases/latest/download/qsv-x86_64-unknown-linux-gnu.zip

Output: (none — exits 0 on success)

Variants: qsv (full), qsvlite (no Luau/Python), qsvmcp (Model Context Protocol), qsvpy (Python integration).

Sample data

All examples below use these two files:

bash
cat > people.csv << 'EOF'
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
EOF

cat > dept.csv << 'EOF'
name,department
Alice,Engineering
Bob,Marketing
Carol,Engineering
Dave,Sales
Eve,Engineering
EOF

Output: (none — exits 0 on success)


Discovery

Commands for understanding an unfamiliar CSV before you commit to processing it.

sniff — Detect schema without reading the whole file

Samples the first few thousand bytes of a file to detect delimiter, quoting, field count, types, and record count without reading the whole file. Use it as a fast first look before running heavier commands.

bash
qsv sniff people.csv

Output:

text
Sniff Results for people.csv
  Last Modified  : 2026-04-26 10:00:00 UTC
  File Size      : 123 bytes
  Delimiter      : ,
  Has Header Row : true
  Quote Char     : "
  Num Records    : 5
  Num Fields     : 4
  Fields         :
                    0: name     (String)
                    1: age      (Integer)
                    2: city     (String)
                    3: salary   (Integer)
code
# Sniff a remote file
qsv sniff https://example.com/data.csv

Output: (none — exits 0 on success)

count — Count rows

Returns the number of data rows (excluding the header). Faster than wc -l because it handles quoted newlines correctly, and near-instant on indexed files.

bash
qsv count people.csv

Output:

text
5
bash
# Human-readable (useful for millions of rows)
qsv count --human-readable largefile.csv

Output:

text
1,482,309
bash
# Include record width statistics
qsv count --width people.csv

Output:

text
5
32-27-28-22-5

headers — List column names

Prints each column name with its 1-based index. Use --just-names when you need a plain list for scripting, or --intersect to find the common columns across two files before a join.

bash
qsv headers people.csv

Output:

text
1   name
2   age
3   city
4   salary
bash
# Just names (for scripting)
qsv headers --just-names people.csv

Output:

text
name
age
city
salary
bash
# Find common columns across two files
qsv headers --intersect people.csv dept.csv

Output:

text
name
bash
# Count only
qsv headers --just-count people.csv

Output:

text
4

Summary statistics

Commands for computing numeric and categorical summaries across columns without writing a full query.

stats — Per-column statistics

Computes sum, min, max, mean, stddev, null count, and type for every column in a single pass. Results are cached alongside the file, so repeated runs are instant; use --everything to add median, quartiles, and mode.

bash
qsv stats people.csv

Output:

text
field,type,sum,min,max,range,sortorder,min_length,max_length,mean,stddev,variance,cv,nullcount,max_precision,sparsity
name,String,,Alice,Eve,,Unsorted,3,5,,,,,,0,0
age,Integer,150,25,35,10,Ascending,2,2,30,3.742,14,0.1247,0,0,0
city,String,,Boston,New York,,Unsorted,6,8,,,,,,0,0
salary,Integer,391000,62000,95000,33000,Ascending,5,5,78200,11972.47,143337500,0.1531,0,0,0
bash
# Infer types only (fast — no numeric computation)
qsv stats --typesonly people.csv

Output:

text
field,type
name,String
age,Integer
city,String
salary,Integer
bash
# Full statistics including mode, median, quartiles
qsv stats --everything people.csv

Output:

text
field,type,...,mode,median,mad,q1,q2_median,q3,...
name,String,...,Alice|Bob|Carol|Dave|Eve,,,,,...
age,Integer,...,25|28|30|32|35,30,2,27,30,33,...
salary,Integer,...,62000|71000|75000|88000|95000,75000,10000,66500,75000,91500,...
bash
# Stats for specific columns only
qsv stats -s salary,age people.csv

Output: (none — exits 0 on success)

qsv stats caches results in a .stats.csv.bin.sz file alongside the input. Subsequent calls are instant. Use --force to recompute.

moarstats — Extended statistics (qsv 12+)

Augments a stats output file with up to 55 additional advanced measures — extended outlier, robust, and bivariate statistics (covariance, correlation, kurtosis, MAD, IQR, Pearson/Spearman, etc.). Run stats first, then moarstats on the resulting .stats.csv to enrich it without re-scanning the original data.

bash
# Produce stats.csv, then enrich it with advanced measures
qsv stats people.csv -o people.stats.csv
qsv moarstats people.stats.csv

Output:

text
field,type,...,kurtosis,iqr,skewness,covariance,pearson_r,spearman_r,...
age,Integer,...,-1.30,6,0.21,...
salary,Integer,...,-1.20,25000,0.34,...
bash
# Restrict to a subset of advanced measures
qsv moarstats --select kurtosis,iqr,skewness people.stats.csv

Output: (none — exits 0 on success)

moarstats was introduced in qsv 12.0.0 and refined in 13.0.0. It also powers the per-column "FAIR metadata" inference used by the MCP server and TOON output.


Selecting columns

Commands for narrowing or reordering the columns in a file before downstream processing.

select — Pick, reorder, or drop columns

Outputs a subset (or reordering) of columns by name, index, range, or regex. Prefix a selector with ! to exclude it; the order of selectors controls the output order.

bash
# Pick two columns
qsv select name,salary people.csv

Output:

text
name,salary
Alice,75000
Bob,62000
Carol,88000
Dave,71000
Eve,95000
bash
# Drop a column (! prefix = all except)
qsv select '!age' people.csv

Output:

text
name,city,salary
Alice,New York,75000
Bob,Chicago,62000
Carol,New York,88000
Dave,Chicago,71000
Eve,Boston,95000
bash
# Select by column range
qsv select 1-3 people.csv

Output:

text
name,age,city
Alice,30,New York
Bob,25,Chicago
Carol,35,New York
Dave,28,Chicago
Eve,32,Boston
bash
# Select by regex (columns starting with 'a' or 'c')
qsv select '/^[ac]/' people.csv

Output:

text
age,city
30,New York
25,Chicago
35,New York
28,Chicago
32,Boston

Filtering rows

Commands for keeping or discarding rows based on patterns or positional ranges.

search — Filter rows by regex

Filters rows using a regular expression, optionally scoped to one or more columns with -s. Use -v to invert (exclude matches), or --flag to add a match-indicator column instead of dropping rows.

bash
# Keep rows matching a pattern
qsv search "New York" people.csv

Output:

text
name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
bash
# Search in a specific column only
qsv search -s city "Chicago" people.csv

Output:

text
name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
bash
# Invert match (exclude Chicago)
qsv search -s city -v "Chicago" people.csv

Output:

text
name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
bash
# Add a match flag column instead of filtering
qsv search -s city --flag matched "New York" people.csv

Output:

text
name,age,city,salary,matched
Alice,30,New York,75000,1
Bob,25,Chicago,62000,0
Carol,35,New York,88000,1
Dave,28,Chicago,71000,0
Eve,32,Boston,95000,0
bash
# Count matches only (written to stderr)
qsv search -s city -c "New York" people.csv 2>&1 >/dev/null

Output:

text
2

slice — Extract row ranges

Extracts a contiguous range of rows by start index, end index, length, or a single row. Differs from search in that it operates by position, not content; on indexed files it is O(1) regardless of file size.

bash
# First 3 rows
qsv slice -l 3 people.csv

Output:

text
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
bash
# Rows 2–4 (0-based start, exclusive end)
qsv slice -s 1 -e 4 people.csv

Output:

text
name,age,city,salary
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
bash
# Single row by index
qsv slice -i 4 people.csv

Output:

text
name,age,city,salary
Eve,32,Boston,95000
bash
# Last 2 rows (negative index)
qsv slice -s -2 people.csv

Output:

text
name,age,city,salary
Dave,28,Chicago,71000
Eve,32,Boston,95000
bash
# JSON output for a single row
qsv slice -i 0 --json people.csv

Output:

text
[{"name":"Alice","age":"30","city":"New York","salary":"75000"}]

Sorting and deduplication

Commands for ordering rows and removing duplicates, often a prerequisite for joins or frequency counts.

sort — Sort rows

Sorts rows by one or more columns; add -N for numeric comparison and -R for descending order. Also supports --random for reproducible shuffles with a --seed.

bash
# Sort by salary numerically (ascending)
qsv sort -s salary -N people.csv

Output:

text
name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
bash
# Sort by salary descending
qsv sort -s salary -N -R people.csv

Output:

text
name,age,city,salary
Eve,32,Boston,95000
Carol,35,New York,88000
Alice,30,New York,75000
Dave,28,Chicago,71000
Bob,25,Chicago,62000
bash
# Multi-key sort (city then salary)
qsv sort -s city,salary -N people.csv

Output:

text
name,age,city,salary
Eve,32,Boston,95000
Bob,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Carol,35,New York,88000
bash
# Reproducible random shuffle
qsv sort --random --seed 42 people.csv

Output:

text
name,age,city,salary
Carol,35,New York,88000
Eve,32,Boston,95000
Bob,25,Chicago,62000
Alice,30,New York,75000
Dave,28,Chicago,71000

dedup — Remove duplicate rows

Removes rows that are identical across one or more key columns, keeping the first occurrence. Use -D to write the dropped duplicates to a separate file for auditing.

bash
# Dedup by city (keep first occurrence per city)
qsv dedup -s city people.csv

Output:

text
name,age,city,salary
Eve,32,Boston,95000
Bob,25,Chicago,62000
Alice,30,New York,75000
text
2  (duplicates removed, written to stderr)
bash
# Write duplicates to a separate file
qsv dedup -s city -D dupes.csv people.csv

Output: (none — exits 0 on success)

dupes.csv:

text
name,age,city,salary
Dave,28,Chicago,71000
Carol,35,New York,88000

Frequency analysis

Commands for counting distinct values and understanding the distribution of categorical columns.

frequency — Value counts per column

Produces a ranked value-count table for each column (or a subset with -s), including the percentage each value represents. Use --no-other to suppress the catch-all "Other" bucket when there are many distinct values.

bash
qsv frequency -s city people.csv

Output:

text
field,value,count,percentage
city,Chicago,2,40.0000
city,New York,2,40.0000
city,Boston,1,20.0000
bash
# All columns, no truncation
qsv frequency --no-other people.csv

Output:

text
field,value,count,percentage
name,Alice,1,20.0000
name,Bob,1,20.0000
name,Carol,1,20.0000
name,Dave,1,20.0000
name,Eve,1,20.0000
age,25,1,20.0000
age,28,1,20.0000
age,30,1,20.0000
age,32,1,20.0000
age,35,1,20.0000
city,Chicago,2,40.0000
city,New York,2,40.0000
city,Boston,1,20.0000
salary,62000,1,20.0000
salary,71000,1,20.0000
salary,75000,1,20.0000
salary,88000,1,20.0000
salary,95000,1,20.0000
bash
# JSON output
qsv frequency -s city --json people.csv

Output:

text
[{"field":"city","data":[{"value":"Chicago","count":2,"percentage":40.0},{"value":"New York","count":2,"percentage":40.0},{"value":"Boston","count":1,"percentage":20.0}]}]

Transforming columns

Commands for reshaping, renaming, filling, and computing new columns without leaving the command line.

rename — Rename column headers

Renames columns by supplying a comma-separated list of new names in positional order. Use --pairwise to rename only specific columns by specifying old,new pairs, leaving the rest untouched.

bash
# Rename all columns by position
qsv rename full_name,years_old,location,annual_pay people.csv

Output:

text
full_name,years_old,location,annual_pay
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
bash
# Pairwise rename (only rename specific columns)
qsv rename --pairwise age,years,salary,income people.csv

Output:

text
name,years,city,income
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000

fill — Forward-fill empty values

Propagates the last non-empty value in a column downward to fill blanks — useful for sparse exports where a value is only written on the first row of a group. Use --default to fill with a fixed string instead.

bash
# Create a CSV with gaps
printf 'name,city,salary\nAlice,New York,75000\nBob,,62000\nCarol,,88000\nDave,Chicago,\nEve,Boston,95000\n' > gaps.csv

# Forward-fill city
qsv fill city gaps.csv

Output:

text
name,city,salary
Alice,New York,75000
Bob,New York,62000
Carol,New York,88000
Dave,Chicago,
Eve,Boston,95000
bash
# Fill with a fixed default
qsv fill --default "N/A" salary gaps.csv

Output:

text
name,city,salary
Alice,New York,75000
Bob,,62000
Carol,,88000
Dave,Chicago,N/A
Eve,Boston,95000

reverse — Reverse row order

Outputs all rows in reverse order without sorting. Use this when the last record is the most recent and you want newest-first output without a sort key.

bash
qsv reverse people.csv

Output:

text
name,age,city,salary
Eve,32,Boston,95000
Dave,28,Chicago,71000
Carol,35,New York,88000
Bob,25,Chicago,62000
Alice,30,New York,75000

transpose — Swap rows and columns

Rotates the CSV so rows become columns and columns become rows. Useful for turning a wide stat table into a narrow key-value layout, or for feeding column-oriented data into row-oriented tools.

bash
qsv transpose people.csv

Output:

text
name,Alice,Bob,Carol,Dave,Eve
age,30,25,35,28,32
city,New York,Chicago,New York,Chicago,Boston
salary,75000,62000,88000,71000,95000

enum — Add a row number column

Appends a _enum column containing the 0-based row index (or a custom name and start value). Use it to add a stable surrogate key or to restore original ordering after a shuffle.

bash
qsv enum people.csv

Output:

text
name,age,city,salary,_enum
Alice,30,New York,75000,0
Bob,25,Chicago,62000,1
Carol,35,New York,88000,2
Dave,28,Chicago,71000,3
Eve,32,Boston,95000,4
bash
# Custom column name, 1-based
qsv enum --new-column row_id --start-index 1 people.csv

Output:

text
name,age,city,salary,row_id
Alice,30,New York,75000,1
Bob,25,Chicago,62000,2
Carol,35,New York,88000,3
Dave,28,Chicago,71000,4
Eve,32,Boston,95000,5

pseudo — Pseudonymize a column

Replaces the values in a column with consistent, opaque identifiers so the same input always maps to the same output within a file. Use it to anonymize PII before sharing data while preserving join-ability.

bash
# Replace names with consistent opaque IDs
qsv pseudo name people.csv

Output:

text
name,age,city,salary
b3a4f2...,30,New York,75000
9c7d1e...,25,Chicago,62000
2f8a03...,35,New York,88000
7e1c94...,28,Chicago,71000
4b5d82...,32,Boston,95000

safenames — Sanitize column names for SQL/Python

Rewrites column headers so they are valid identifiers for SQL, pandas, or R by replacing spaces and special characters with underscores. Use --mode check first to count unsafe headers without modifying the file.

bash
# Create a CSV with messy headers
printf 'Full Name,Age (Years),City/Region,Annual Salary $\nAlice,30,NYC,75000\n' > messy.csv
qsv safenames messy.csv

Output:

text
Full_Name,Age__Years_,City_Region,Annual_Salary__
Alice,30,NYC,75000
bash
# Verify names are safe (check mode)
qsv safenames --mode check messy.csv

Output:

text
4 unsafe header(s) found.

Format conversion

Commands for converting between CSV, TSV, JSONL, Excel, and other tabular formats.

fmt — Change delimiter or quoting

Reformats a CSV in place — change the delimiter, quote character, or quoting style without altering the data. Use it to convert CSV to TSV before piping into tools that expect tab-delimited input.

bash
# CSV to TSV
qsv fmt -t T people.csv

Output:

text
name	age	city	salary
Alice	30	New York	75000
Bob	25	Chicago	62000
Carol	35	New York	88000
Dave	28	Chicago	71000
Eve	32	Boston	95000
bash
# Pipe-delimited
qsv fmt -t '|' people.csv

Output:

text
name|age|city|salary
Alice|30|New York|75000
Bob|25|Chicago|62000
Carol|35|New York|88000
Dave|28|Chicago|71000
Eve|32|Boston|95000
bash
# Quote every field
qsv fmt --quote-always people.csv

Output:

text
"name","age","city","salary"
"Alice","30","New York","75000"
"Bob","25","Chicago","62000"
"Carol","35","New York","88000"
"Dave","28","Chicago","71000"
"Eve","32","Boston","95000"

tojsonl — Convert CSV to JSONL

Converts each CSV row to a JSON object on its own line (JSON Lines format), with automatic type inference so numeric and boolean columns are emitted without quotes. Use it to feed CSV data into JSON-native tools or APIs.

bash
qsv tojsonl people.csv

Output:

text
{"name":"Alice","age":30,"city":"New York","salary":75000}
{"name":"Bob","age":25,"city":"Chicago","salary":62000}
{"name":"Carol","age":35,"city":"New York","salary":88000}
{"name":"Dave","age":28,"city":"Chicago","salary":71000}
{"name":"Eve","age":32,"city":"Boston","salary":95000}

Type inference is automatic: age and salary are emitted as integers (not quoted), boolean columns become true/false, and nulls become JSON null.

excel — Extract Excel sheet to CSV

Reads .xlsx or .xls files and converts a sheet to CSV, handling merged cells, date formatting, and formula results. Use --metadata j to list all sheets before deciding which to extract.

bash
# First sheet
qsv excel data.xlsx -o output.csv

# Specific sheet by name
qsv excel data.xlsx --sheet "Sales" -o sales.csv

# List all sheets as JSON
qsv excel data.xlsx --metadata j

Output:

text
{"filename":"data.xlsx","format":"Xlsx","num_sheets":3,"sheets":[{"index":0,"name":"Sheet1","typ":"WorkSheet","visible":"Visible","headers":["name","age","city","salary"],"num_columns":4,"num_rows":6},...]}
bash
# Extract a specific cell range
qsv excel data.xlsx --range "A1:C4" -o range.csv

Output: (none — exits 0 on success)


Combining files

Commands for stacking, joining, splitting, and partitioning CSV files.

cat — Concatenate CSVs

Stacks multiple CSV files vertically (rows) or side-by-side (columns). Use rowskey when the files have different or overlapping schemas — it aligns by column name and fills missing fields with empty strings.

bash
# Stack vertically (same schema required)
qsv cat rows people.csv people2.csv

Output:

text
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
Frank,40,Seattle,105000
Grace,29,Austin,67000
bash
# Stack with differing schemas (fills missing fields with empty)
qsv cat rowskey --group fname people.csv dept.csv

Output:

text
file,name,age,city,salary,department
people.csv,Alice,30,New York,75000,
people.csv,Bob,25,Chicago,62000,
dept.csv,Alice,,,Engineering
dept.csv,Bob,,,Marketing
...
bash
# Concatenate side by side (columns)
qsv cat columns people.csv dept.csv

Output:

text
name,age,city,salary,name,department
Alice,30,New York,75000,Alice,Engineering
Bob,25,Chicago,62000,Bob,Marketing
Carol,35,New York,88000,Carol,Engineering
Dave,28,Chicago,71000,Dave,Sales
Eve,32,Boston,95000,Eve,Engineering

join — Join two CSVs

Performs an inner, outer, semi, anti, or cross join between two CSV files on one or more key columns. Differs from sqlp joins in that it does not require SQL syntax and is optimized for streaming large files.

bash
# Inner join on name
qsv join name people.csv name dept.csv

Output:

text
name,age,city,salary,name,department
Alice,30,New York,75000,Alice,Engineering
Bob,25,Chicago,62000,Bob,Marketing
Carol,35,New York,88000,Carol,Engineering
Dave,28,Chicago,71000,Dave,Sales
Eve,32,Boston,95000,Eve,Engineering
bash
# Left anti-join (people NOT in dept.csv)
qsv join --left-anti name people.csv name dept.csv

Output (empty if all names matched):

text
name,age,city,salary
bash
# Cross join (cartesian product)
qsv join --cross name people.csv name dept.csv | qsv count

Output:

text
25
Join typeFlag
Inner (default)(none)
Left outer--left
Right outer--right
Full outer--full
Left anti--left-anti
Left semi--left-semi
Right anti--right-anti
Cross (cartesian)--cross

split — Split into multiple files

Writes sequential chunks of a CSV to separate files in an output directory, either by fixed row count (-s) or by total number of chunks (-c). Use --pad and --filename to control zero-padding and naming.

bash
mkdir /tmp/split_out
# 2 rows per chunk
qsv split -s 2 /tmp/split_out people.csv
ls /tmp/split_out

Output:

text
0.csv  1.csv  2.csv
bash
# 3 chunks with padded, custom filenames
qsv split -c 3 --pad 3 --filename "chunk_{}.csv" /tmp/split_out people.csv

Output: (none — exits 0 on success)

partition — Partition by column value

Creates one output file per distinct value in a key column, named after that value. Differs from split in that grouping is by content rather than row count — ideal for producing per-department or per-region files.

bash
mkdir /tmp/by_city
qsv partition city /tmp/by_city people.csv
ls /tmp/by_city

Output:

text
Boston.csv  Chicago.csv  New York.csv

Chicago.csv:

text
name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
bash
# Drop the partition column from output files
qsv partition --drop city /tmp/by_city people.csv

Output: (none — exits 0 on success)


Scripting and queries

Commands for running SQL, embedded Lua scripts, and built-in string operations directly against CSV files.

sqlp — SQL queries via Polars

The filename (without extension) becomes the table name.

bash
# WHERE filter and ORDER BY
qsv sqlp people.csv "SELECT name, salary FROM people WHERE salary > 70000 ORDER BY salary DESC"

Output:

text
name,salary
Eve,95000
Carol,88000
Alice,75000
Dave,71000
bash
# GROUP BY aggregation
qsv sqlp people.csv "SELECT city, COUNT(*) as n, AVG(salary) as avg_salary FROM people GROUP BY city ORDER BY avg_salary DESC"

Output:

text
city,n,avg_salary
Boston,1,95000.0
New York,2,81500.0
Chicago,2,66500.0
bash
# Join two files in SQL
qsv sqlp people.csv dept.csv \
  "SELECT p.name, p.salary, d.department
   FROM people p JOIN dept d ON p.name = d.name
   WHERE d.department = 'Engineering'
   ORDER BY p.salary DESC"

Output:

text
name,salary,department
Eve,95000,Engineering
Carol,88000,Engineering
Alice,75000,Engineering
bash
# Window function: salary rank
qsv sqlp people.csv \
  "SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank FROM people"

Output:

text
name,salary,rank
Eve,95000,1
Carol,88000,2
Alice,75000,3
Dave,71000,4
Bob,62000,5
bash
# Output as JSON
qsv sqlp --format json people.csv "SELECT * FROM people WHERE city = 'Chicago'"

Output:

text
[{"name":"Bob","age":25,"city":"Chicago","salary":62000},{"name":"Dave","age":28,"city":"Chicago","salary":71000}]

Use --streaming for files larger than RAM. Add --try-parsedates to auto-parse date columns.

luau — Scripted transforms with embedded Lua

Runs a Luau (sandboxed Lua 5.1) expression per row to map a new column or filter rows, with optional --begin/--end blocks for initialization and aggregation. Reach for this when apply operations are too limited but a full sqlp query is overkill.

bash
# Add computed column (salary in thousands)
qsv luau map salary_k \
  "string.format('%.1f', col.salary / 1000)" \
  people.csv

Output:

text
name,age,city,salary,salary_k
Alice,30,New York,75000,75.0
Bob,25,Chicago,62000,62.0
Carol,35,New York,88000,88.0
Dave,28,Chicago,71000,71.0
Eve,32,Boston,95000,95.0
bash
# Filter rows with a script
qsv luau filter "tonumber(col.salary) > 75000" people.csv

Output:

text
name,age,city,salary
Carol,35,New York,88000
Eve,32,Boston,95000
bash
# Add a seniority label (conditional logic)
qsv luau map seniority \
  "if tonumber(col.age) >= 32 then return 'Senior' else return 'Junior' end" \
  people.csv

Output:

text
name,age,city,salary,seniority
Alice,30,New York,75000,Junior
Bob,25,Chicago,62000,Junior
Carol,35,New York,88000,Senior
Dave,28,Chicago,71000,Junior
Eve,32,Boston,95000,Senior
bash
# Aggregation using BEGIN/END blocks
qsv luau map dummy \
  --begin "total = 0" \
  "total = total + tonumber(col.salary); return ''" \
  --end "print('Total salary: ' .. total)" \
  people.csv > /dev/null

Output:

text
Total salary: 391000

Reference columns with col.column_name or col["col name"]. Use _IDX for the current row number. Scripts run with Luau 0.716 — a safe, sandboxed Lua 5.1 subset.

apply — Built-in string and numeric operations

Applies one or more named operations (case conversion, trimming, encoding, similarity, NLP sentiment, etc.) to a column without writing a script. Use dynfmt to produce a new column from a format string that interpolates other columns.

bash
# Uppercase a column
qsv apply operations upper name people.csv

Output:

text
name,age,city,salary
ALICE,30,New York,75000
BOB,25,Chicago,62000
CAROL,35,New York,88000
DAVE,28,Chicago,71000
EVE,32,Boston,95000
bash
# Compute string length into a new column
qsv apply operations len name -c name_len people.csv

Output:

text
name,age,city,salary,name_len
Alice,30,New York,75000,5
Bob,25,Chicago,62000,3
Carol,35,New York,88000,5
Dave,28,Chicago,71000,4
Eve,32,Boston,95000,3
bash
# Dynamic format string → computed description column
qsv apply dynfmt \
  --formatstr "{name} earns \${salary} in {city}" \
  description people.csv

Output:

text
name,age,city,salary,description
Alice,30,New York,75000,Alice earns $75000 in New York
Bob,25,Chicago,62000,Bob earns $62000 in Chicago
Carol,35,New York,88000,Carol earns $88000 in New York
Dave,28,Chicago,71000,Dave earns $71000 in Chicago
Eve,32,Boston,95000,Eve earns $95000 in Boston

Available apply operations:

CategoryOperations
Caselower, upper, titlecase
Whitespacetrim, ltrim, rtrim, squeeze
Stringlen, strip_prefix, strip_suffix, escape, replace, regex_replace
Encodingencode64, decode64, encode62, decode62, crc32
Mathround, thousands
Financialcurrencytonum, numtocurrency
Similaritysimdl, simjw, simsd, simhm
NLPsentiment, whatlang, gender_guess, eudex

Schema and validation

Commands for inferring structure from a CSV and checking that data conforms to expected types and constraints.

schema — Infer JSON Schema from CSV

Scans a CSV and generates a JSON Schema (Draft 2020-12) file capturing field types, enum values, and numeric ranges. The output schema can be fed directly to validate to enforce those constraints on new data.

bash
qsv schema people.csv

Output: (none — exits 0 on success)

Generates people.csv.schema.json:

text
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "people.csv",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "enum": ["Alice", "Bob", "Carol", "Dave", "Eve"]
    },
    "age": { "type": "integer", "minimum": 25, "maximum": 35 },
    "city": {
      "type": "string",
      "enum": ["Boston", "Chicago", "New York"]
    },
    "salary": { "type": "integer", "minimum": 62000, "maximum": 95000 }
  },
  "required": ["name", "age", "city", "salary"]
}
bash
# Polars schema (for use with sqlp/Parquet pipelines)
qsv schema --polars people.csv

Output:

text
{"name":"Utf8","age":"Int64","city":"Utf8","salary":"Int64"}

validate — Validate CSV against JSON Schema

Without a schema argument, checks that the CSV is well-formed per RFC 4180. With a schema, validates each row against it and writes passing rows to .valid, failing rows to .invalid, and a validation-errors.tsv describing each violation.

bash
# RFC 4180 well-formedness check
qsv validate people.csv

Output:

text
people.csv is valid.
bash
# Schema validation (generates .valid, .invalid, and validation-errors.tsv)
qsv schema people.csv
printf 'name,age,city,salary\nBadRow,notanumber,Unknown,0\n' > bad.csv
qsv validate bad.csv people.csv.schema.json

Output: (none — exits 0 on success)

validation-errors.tsv:

text
row_number	field	error
2	age	notanumber is not of type "integer"
2	city	Unknown is not one of ["Boston","Chicago","New York"]
2	salary	0 is less than the minimum value of 62000

Sampling

Commands for drawing representative subsets from large files without loading everything into memory.

sample — Random sampling

Draws rows using reservoir sampling by default, guaranteeing a uniform random sample in a single pass without knowing the file size upfront. Supports stratified, Bernoulli, systematic, cluster, weighted, and time-series sampling modes; use --seed for reproducibility.

bash
# Reservoir sample (3 random rows)
qsv sample 3 people.csv

Output:

text
name,age,city,salary
Bob,25,Chicago,62000
Alice,30,New York,75000
Eve,32,Boston,95000
bash
# Reproducible sample with seed
qsv sample --seed 42 3 people.csv

Output:

text
name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
bash
# 50% Bernoulli sample (each row independently included with probability 0.5)
qsv sample --bernoulli --seed 42 0.5 people.csv

Output:

text
name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
bash
# Stratified: 1 row per unique city
qsv sample --stratified city --seed 42 1 people.csv

Output:

text
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Eve,32,Boston,95000
MethodFlagUse case
Reservoir (default)General random sample
Indexed— (with .idx)Random I/O, large files
Bernoulli--bernoulliIndependent row probability
Systematic--systematic <col>Every nth record
Stratified--stratified <col>Representative subgroup samples
Weighted--weighted <col>Probability proportional to weight
Cluster--cluster <col>Sample entire clusters
Timeseries--timeseries <col>One record per time interval

Flattening and display

Commands for rendering CSV records in a human-readable layout rather than a dense columnar format.

flatten — View records one at a time

Prints each record as a vertical key-value block separated by #, making wide or deeply nested CSVs readable in a terminal. Use -c to truncate long values to a fixed character limit for a quick overview.

bash
qsv flatten people.csv

Output:

text
name    Alice
age     30
city    New York
salary  75000
#
name    Bob
age     25
city    Chicago
salary  62000
#
...
bash
# Condense long values for a quick overview
qsv flatten -c 8 people.csv

Output:

text
name    Alice
age     30
city    New York
salary  75000
#
...

Indexing

An index file dramatically speeds up commands that support random access (slice, split, sample, count, dedup).

bash
qsv index people.csv
# Creates people.csv.idx alongside the source file

Output: (none — exits 0 on success)

After indexing, qsv count and qsv slice are O(1) regardless of file size.

bash
# Force rebuild
qsv index --force people.csv

Output: (none — exits 0 on success)


Configuration

qsv reads runtime defaults from QSV_* environment variables and from a dotenv file. Use this to set delimiters, buffer sizes, parallelism, and remote-fetch behaviour project-wide without repeating flags on every invocation.

Environment variables

Every option exposed as a CLI flag has a matching QSV_* variable; the variable becomes the default and is overridden by an explicit flag. Run qsv --envlist to dump the active set.

bash
# Show every QSV_* variable currently in effect
qsv --envlist

Output:

text
QSV_DEFAULT_DELIMITER: ,
QSV_NO_HEADERS: false
QSV_COMMENT_CHAR:
QSV_MAX_JOBS: 8
QSV_CACHE_DIR: /home/alice/.qsv-cache
...
bash
# Project-wide TSV default + parallel job cap
export QSV_DEFAULT_DELIMITER=$'\t'
export QSV_MAX_JOBS=4
qsv stats data.tsv

Output: (none — exits 0 on success)

VariablePurpose
QSV_DEFAULT_DELIMITEROne ASCII char; overrides --delimiter.
QSV_SNIFF_DELIMITERIf set, auto-detect delimiter per file.
QSV_NO_HEADERSTreat first row as data, not a header.
QSV_MAX_JOBSCap parallel workers (default = logical CPUs).
QSV_CACHE_DIRWhere stats/fetch cache files are written.
QSV_DOTENV_PATHExplicit dotenv file path; "" disables loading.
QSV_LOG_LEVELerror/warn/info/debug/trace.
QSV_LOG_DIRDirectory for structured log output.
QSV_PROGRESSBAR1 to show a TTY progress bar on long runs.

Dotenv file

On startup, qsv loads a .env file from the current directory (or the path in QSV_DOTENV_PATH) and applies any QSV_*=value lines as if they were exported. Useful for pinning per-project defaults next to a dataset.

bash
cat > .env << 'EOF'
QSV_DEFAULT_DELIMITER=|
QSV_MAX_JOBS=4
QSV_LOG_LEVEL=info
EOF

qsv count people.csv   # picks up the .env automatically

Output: (none — exits 0 on success)

bash
# Point at a shared dotenv outside the cwd
QSV_DOTENV_PATH=/home/alice/projects/etl/.env qsv stats people.csv

# Disable dotenv loading for one invocation
QSV_DOTENV_PATH= qsv stats people.csv

Output: (none — exits 0 on success)


MCP server (qsv 13+)

qsv 13 added a built-in Model Context Protocol server that lets AI agents (Claude Desktop, Claude Code, and other MCP clients) query and transform local CSV/Parquet/Excel files without uploading raw data — only statistical metadata and result rows cross the wire. Reach for it when you want a chatbot to drive qsv against your own files.

bash
# Start the MCP server on stdio (default transport)
qsvmcp serve

# Or use the full binary
qsv mcp serve

Output: (none — exits 0 on success)

bash
# List the MCP-exposed tools and exit (handy for debugging)
qsvmcp list-skills

# Regenerate the bundled skill definitions
qsvmcp --update-mcp-skills

Output: (none — exits 0 on success)

Register with Claude Desktop by adding the server to claude_desktop_config.json:

text
{
  "mcpServers": {
    "qsv": {
      "command": "qsvmcp",
      "args": ["serve"],
      "env": { "QSV_CACHE_DIR": "/home/alice/.qsv-cache" }
    }
  }
}

The qsvmcp binary ships ~63 of qsv's commands — enough for the MCP skill set with a smaller footprint. Use the full qsv binary if you need commands outside the MCP surface (e.g. geocode, python).

tojsonl --toon — Token-efficient output for LLMs

qsv 12 introduced TOON, a token-optimized tabular format designed for LLM contexts — denser than JSON, still parseable. Useful when piping CSV summaries into a prompt.

bash
qsv tojsonl --toon people.csv

Output:

text
[name|age|city|salary]
Alice|30|New York|75000
Bob|25|Chicago|62000
...

Piping commands together

qsv is designed to be composed — pipe subcommands to build multi-step pipelines:

bash
# Filter to Engineering dept, sort by salary desc, pick 3 columns
qsv join name people.csv name dept.csv \
  | qsv search -s department "Engineering" \
  | qsv select name,salary,department \
  | qsv sort -s salary -N -R

Output:

text
name,salary,department
Eve,95000,Engineering
Carol,88000,Engineering
Alice,75000,Engineering
bash
# Top city by total salary
qsv sqlp people.csv \
  "SELECT city, SUM(salary) as total FROM people GROUP BY city ORDER BY total DESC LIMIT 1"

Output:

text
city,total
New York,163000
bash
# Count rows matching a pattern across a directory of CSVs
cat *.csv | qsv search "New York" | qsv count

Output: (none — exits 0 on success)

Use qsv input to normalize messy CSVs (trim whitespace, fix quoting, skip comment lines) before piping to other commands. Use qsv fixlengths to pad rows with missing fields so downstream commands don't choke on ragged files.


Sources