cheat sheet

qsv

Comprehensive reference for qsv: count, headers, stats, moarstats, select, search, sort, dedup, frequency, join, sqlp, luau, apply, schema, validate, sample, split, MCP server, and more — with examples and outputs.

updated 05-25-2026

qsv — CSV Toolkit

What it is

qsv is a blazing-fast, Rust-based CSV toolkit with 80+ subcommands for querying, transforming, analyzing, and validating tabular data — a maintained, feature-rich fork of the original xsv project. It adds Polars-backed acceleration, an embedded Luau scripting engine, and support for CSV, TSV, Excel, JSON, Parquet, and Apache Arrow formats. Reach for qsv when you need to slice, filter, join, or summarize structured tabular data from the command line without loading it into a full database or spreadsheet.

Install

bash

# macOS
brew install qsv

# Windows
scoop install qsv

# Cargo
cargo install qsv --locked

# Or download binary from releases
curl -LO https://github.com/dathere/qsv/releases/latest/download/qsv-x86_64-unknown-linux-gnu.zip

Output: (none — exits 0 on success)

Variants: qsv (full), qsvlite (no Luau/Python), qsvmcp (Model Context Protocol), qsvpy (Python integration).

Sample data

All examples below use these two files:

bash

cat > people.csv << 'EOF'
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
EOF

cat > dept.csv << 'EOF'
name,department
Alice,Engineering
Bob,Marketing
Carol,Engineering
Dave,Sales
Eve,Engineering
EOF

Output: (none — exits 0 on success)

Discovery

Commands for understanding an unfamiliar CSV before you commit to processing it.

`sniff` — Detect schema without reading the whole file

Samples the first few thousand bytes of a file to detect delimiter, quoting, field count, types, and record count without reading the whole file. Use it as a fast first look before running heavier commands.

bash

qsv sniff people.csv

Output:

text

Sniff Results for people.csv
  Last Modified  : 2026-04-26 10:00:00 UTC
  File Size      : 123 bytes
  Delimiter      : ,
  Has Header Row : true
  Quote Char     : "
  Num Records    : 5
  Num Fields     : 4
  Fields         :
                    0: name     (String)
                    1: age      (Integer)
                    2: city     (String)
                    3: salary   (Integer)

code

# Sniff a remote file
qsv sniff https://example.com/data.csv

Output: (none — exits 0 on success)

`count` — Count rows

Returns the number of data rows (excluding the header). Faster than wc -l because it handles quoted newlines correctly, and near-instant on indexed files.

bash

qsv count people.csv

Output:

text

bash

# Human-readable (useful for millions of rows)
qsv count --human-readable largefile.csv

Output:

text

1,482,309

bash

# Include record width statistics
qsv count --width people.csv

Output:

text

5
32-27-28-22-5

`headers` — List column names

Prints each column name with its 1-based index. Use --just-names when you need a plain list for scripting, or --intersect to find the common columns across two files before a join.

bash

qsv headers people.csv

Output:

text

1   name
2   age
3   city
4   salary

bash

# Just names (for scripting)
qsv headers --just-names people.csv

Output:

text

name
age
city
salary

bash

# Find common columns across two files
qsv headers --intersect people.csv dept.csv

Output:

text

name

bash

# Count only
qsv headers --just-count people.csv

Output:

text

Summary statistics

Commands for computing numeric and categorical summaries across columns without writing a full query.

`stats` — Per-column statistics

Computes sum, min, max, mean, stddev, null count, and type for every column in a single pass. Results are cached alongside the file, so repeated runs are instant; use --everything to add median, quartiles, and mode.

bash

qsv stats people.csv

Output:

text

field,type,sum,min,max,range,sortorder,min_length,max_length,mean,stddev,variance,cv,nullcount,max_precision,sparsity
name,String,,Alice,Eve,,Unsorted,3,5,,,,,,0,0
age,Integer,150,25,35,10,Ascending,2,2,30,3.742,14,0.1247,0,0,0
city,String,,Boston,New York,,Unsorted,6,8,,,,,,0,0
salary,Integer,391000,62000,95000,33000,Ascending,5,5,78200,11972.47,143337500,0.1531,0,0,0

bash

# Infer types only (fast — no numeric computation)
qsv stats --typesonly people.csv

Output:

text

field,type
name,String
age,Integer
city,String
salary,Integer

bash

# Full statistics including mode, median, quartiles
qsv stats --everything people.csv

Output:

text

field,type,...,mode,median,mad,q1,q2_median,q3,...
name,String,...,Alice|Bob|Carol|Dave|Eve,,,,,...
age,Integer,...,25|28|30|32|35,30,2,27,30,33,...
salary,Integer,...,62000|71000|75000|88000|95000,75000,10000,66500,75000,91500,...

bash

# Stats for specific columns only
qsv stats -s salary,age people.csv

Output: (none — exits 0 on success)

qsv stats caches results in a .stats.csv.bin.sz file alongside the input. Subsequent calls are instant. Use --force to recompute.

`moarstats` — Extended statistics (qsv 12+)

Augments a stats output file with up to 55 additional advanced measures — extended outlier, robust, and bivariate statistics (covariance, correlation, kurtosis, MAD, IQR, Pearson/Spearman, etc.). Run stats first, then moarstats on the resulting .stats.csv to enrich it without re-scanning the original data.

bash

# Produce stats.csv, then enrich it with advanced measures
qsv stats people.csv -o people.stats.csv
qsv moarstats people.stats.csv

Output:

text

field,type,...,kurtosis,iqr,skewness,covariance,pearson_r,spearman_r,...
age,Integer,...,-1.30,6,0.21,...
salary,Integer,...,-1.20,25000,0.34,...

bash

# Restrict to a subset of advanced measures
qsv moarstats --select kurtosis,iqr,skewness people.stats.csv

Output: (none — exits 0 on success)

moarstats was introduced in qsv 12.0.0 and refined in 13.0.0. It also powers the per-column "FAIR metadata" inference used by the MCP server and TOON output.

Selecting columns

Commands for narrowing or reordering the columns in a file before downstream processing.

`select` — Pick, reorder, or drop columns

Outputs a subset (or reordering) of columns by name, index, range, or regex. Prefix a selector with ! to exclude it; the order of selectors controls the output order.

bash

# Pick two columns
qsv select name,salary people.csv

Output:

text

name,salary
Alice,75000
Bob,62000
Carol,88000
Dave,71000
Eve,95000

bash

# Drop a column (! prefix = all except)
qsv select '!age' people.csv

Output:

text

name,city,salary
Alice,New York,75000
Bob,Chicago,62000
Carol,New York,88000
Dave,Chicago,71000
Eve,Boston,95000

bash

# Select by column range
qsv select 1-3 people.csv

Output:

text

name,age,city
Alice,30,New York
Bob,25,Chicago
Carol,35,New York
Dave,28,Chicago
Eve,32,Boston

bash

# Select by regex (columns starting with 'a' or 'c')
qsv select '/^[ac]/' people.csv

Output:

text

age,city
30,New York
25,Chicago
35,New York
28,Chicago
32,Boston

Filtering rows

Commands for keeping or discarding rows based on patterns or positional ranges.

`search` — Filter rows by regex

Filters rows using a regular expression, optionally scoped to one or more columns with -s. Use -v to invert (exclude matches), or --flag to add a match-indicator column instead of dropping rows.

bash

# Keep rows matching a pattern
qsv search "New York" people.csv

Output:

text

name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000

bash

# Search in a specific column only
qsv search -s city "Chicago" people.csv

Output:

text

name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000

bash

# Invert match (exclude Chicago)
qsv search -s city -v "Chicago" people.csv

Output:

text

name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000

bash

# Add a match flag column instead of filtering
qsv search -s city --flag matched "New York" people.csv

Output:

text

name,age,city,salary,matched
Alice,30,New York,75000,1
Bob,25,Chicago,62000,0
Carol,35,New York,88000,1
Dave,28,Chicago,71000,0
Eve,32,Boston,95000,0

bash

# Count matches only (written to stderr)
qsv search -s city -c "New York" people.csv 2>&1 >/dev/null

Output:

text

`slice` — Extract row ranges

Extracts a contiguous range of rows by start index, end index, length, or a single row. Differs from search in that it operates by position, not content; on indexed files it is O(1) regardless of file size.

bash

# First 3 rows
qsv slice -l 3 people.csv

Output:

text

name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000

bash

# Rows 2–4 (0-based start, exclusive end)
qsv slice -s 1 -e 4 people.csv

Output:

text

name,age,city,salary
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000

bash

# Single row by index
qsv slice -i 4 people.csv

Output:

text

name,age,city,salary
Eve,32,Boston,95000

bash

# Last 2 rows (negative index)
qsv slice -s -2 people.csv

Output:

text

name,age,city,salary
Dave,28,Chicago,71000
Eve,32,Boston,95000

bash

# JSON output for a single row
qsv slice -i 0 --json people.csv

Output:

text

[{"name":"Alice","age":"30","city":"New York","salary":"75000"}]

Sorting and deduplication

Commands for ordering rows and removing duplicates, often a prerequisite for joins or frequency counts.

`sort` — Sort rows

Sorts rows by one or more columns; add -N for numeric comparison and -R for descending order. Also supports --random for reproducible shuffles with a --seed.

bash

# Sort by salary numerically (ascending)
qsv sort -s salary -N people.csv

Output:

text

name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000

bash

# Sort by salary descending
qsv sort -s salary -N -R people.csv

Output:

text

name,age,city,salary
Eve,32,Boston,95000
Carol,35,New York,88000
Alice,30,New York,75000
Dave,28,Chicago,71000
Bob,25,Chicago,62000

bash

# Multi-key sort (city then salary)
qsv sort -s city,salary -N people.csv

Output:

text

name,age,city,salary
Eve,32,Boston,95000
Bob,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Carol,35,New York,88000

bash

# Reproducible random shuffle
qsv sort --random --seed 42 people.csv

Output:

text

name,age,city,salary
Carol,35,New York,88000
Eve,32,Boston,95000
Bob,25,Chicago,62000
Alice,30,New York,75000
Dave,28,Chicago,71000

`dedup` — Remove duplicate rows

Removes rows that are identical across one or more key columns, keeping the first occurrence. Use -D to write the dropped duplicates to a separate file for auditing.

bash

# Dedup by city (keep first occurrence per city)
qsv dedup -s city people.csv

Output:

text

name,age,city,salary
Eve,32,Boston,95000
Bob,25,Chicago,62000
Alice,30,New York,75000

text

2  (duplicates removed, written to stderr)

bash

# Write duplicates to a separate file
qsv dedup -s city -D dupes.csv people.csv

Output: (none — exits 0 on success)

dupes.csv:

text

name,age,city,salary
Dave,28,Chicago,71000
Carol,35,New York,88000

Frequency analysis

Commands for counting distinct values and understanding the distribution of categorical columns.

`frequency` — Value counts per column

Produces a ranked value-count table for each column (or a subset with -s), including the percentage each value represents. Use --no-other to suppress the catch-all "Other" bucket when there are many distinct values.

bash

qsv frequency -s city people.csv

Output:

text

field,value,count,percentage
city,Chicago,2,40.0000
city,New York,2,40.0000
city,Boston,1,20.0000

bash

# All columns, no truncation
qsv frequency --no-other people.csv

Output:

text

field,value,count,percentage
name,Alice,1,20.0000
name,Bob,1,20.0000
name,Carol,1,20.0000
name,Dave,1,20.0000
name,Eve,1,20.0000
age,25,1,20.0000
age,28,1,20.0000
age,30,1,20.0000
age,32,1,20.0000
age,35,1,20.0000
city,Chicago,2,40.0000
city,New York,2,40.0000
city,Boston,1,20.0000
salary,62000,1,20.0000
salary,71000,1,20.0000
salary,75000,1,20.0000
salary,88000,1,20.0000
salary,95000,1,20.0000

bash

# JSON output
qsv frequency -s city --json people.csv

Output:

text

[{"field":"city","data":[{"value":"Chicago","count":2,"percentage":40.0},{"value":"New York","count":2,"percentage":40.0},{"value":"Boston","count":1,"percentage":20.0}]}]

Transforming columns

Commands for reshaping, renaming, filling, and computing new columns without leaving the command line.

`rename` — Rename column headers

Renames columns by supplying a comma-separated list of new names in positional order. Use --pairwise to rename only specific columns by specifying old,new pairs, leaving the rest untouched.

bash

# Rename all columns by position
qsv rename full_name,years_old,location,annual_pay people.csv

Output:

text

full_name,years_old,location,annual_pay
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000

bash

# Pairwise rename (only rename specific columns)
qsv rename --pairwise age,years,salary,income people.csv

Output:

text

name,years,city,income
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000

`fill` — Forward-fill empty values

Propagates the last non-empty value in a column downward to fill blanks — useful for sparse exports where a value is only written on the first row of a group. Use --default to fill with a fixed string instead.

bash

# Create a CSV with gaps
printf 'name,city,salary\nAlice,New York,75000\nBob,,62000\nCarol,,88000\nDave,Chicago,\nEve,Boston,95000\n' > gaps.csv

# Forward-fill city
qsv fill city gaps.csv

Output:

text

name,city,salary
Alice,New York,75000
Bob,New York,62000
Carol,New York,88000
Dave,Chicago,
Eve,Boston,95000

bash

# Fill with a fixed default
qsv fill --default "N/A" salary gaps.csv

Output:

text

name,city,salary
Alice,New York,75000
Bob,,62000
Carol,,88000
Dave,Chicago,N/A
Eve,Boston,95000

`reverse` — Reverse row order

Outputs all rows in reverse order without sorting. Use this when the last record is the most recent and you want newest-first output without a sort key.

bash

qsv reverse people.csv

Output:

text

name,age,city,salary
Eve,32,Boston,95000
Dave,28,Chicago,71000
Carol,35,New York,88000
Bob,25,Chicago,62000
Alice,30,New York,75000

`transpose` — Swap rows and columns

Rotates the CSV so rows become columns and columns become rows. Useful for turning a wide stat table into a narrow key-value layout, or for feeding column-oriented data into row-oriented tools.

bash

qsv transpose people.csv

Output:

text

name,Alice,Bob,Carol,Dave,Eve
age,30,25,35,28,32
city,New York,Chicago,New York,Chicago,Boston
salary,75000,62000,88000,71000,95000

`enum` — Add a row number column

Appends a _enum column containing the 0-based row index (or a custom name and start value). Use it to add a stable surrogate key or to restore original ordering after a shuffle.

bash

qsv enum people.csv

Output:

text

name,age,city,salary,_enum
Alice,30,New York,75000,0
Bob,25,Chicago,62000,1
Carol,35,New York,88000,2
Dave,28,Chicago,71000,3
Eve,32,Boston,95000,4

bash

# Custom column name, 1-based
qsv enum --new-column row_id --start-index 1 people.csv

Output:

text

name,age,city,salary,row_id
Alice,30,New York,75000,1
Bob,25,Chicago,62000,2
Carol,35,New York,88000,3
Dave,28,Chicago,71000,4
Eve,32,Boston,95000,5

`pseudo` — Pseudonymize a column

Replaces the values in a column with consistent, opaque identifiers so the same input always maps to the same output within a file. Use it to anonymize PII before sharing data while preserving join-ability.

bash

# Replace names with consistent opaque IDs
qsv pseudo name people.csv

Output:

text

name,age,city,salary
b3a4f2...,30,New York,75000
9c7d1e...,25,Chicago,62000
2f8a03...,35,New York,88000
7e1c94...,28,Chicago,71000
4b5d82...,32,Boston,95000

`safenames` — Sanitize column names for SQL/Python

Rewrites column headers so they are valid identifiers for SQL, pandas, or R by replacing spaces and special characters with underscores. Use --mode check first to count unsafe headers without modifying the file.

bash

# Create a CSV with messy headers
printf 'Full Name,Age (Years),City/Region,Annual Salary $\nAlice,30,NYC,75000\n' > messy.csv
qsv safenames messy.csv

Output:

text

Full_Name,Age__Years_,City_Region,Annual_Salary__
Alice,30,NYC,75000

bash

# Verify names are safe (check mode)
qsv safenames --mode check messy.csv

Output:

text

4 unsafe header(s) found.

Format conversion

Commands for converting between CSV, TSV, JSONL, Excel, and other tabular formats.

`fmt` — Change delimiter or quoting

Reformats a CSV in place — change the delimiter, quote character, or quoting style without altering the data. Use it to convert CSV to TSV before piping into tools that expect tab-delimited input.

bash

# CSV to TSV
qsv fmt -t T people.csv

Output:

text

name	age	city	salary
Alice	30	New York	75000
Bob	25	Chicago	62000
Carol	35	New York	88000
Dave	28	Chicago	71000
Eve	32	Boston	95000

bash

# Pipe-delimited
qsv fmt -t '|' people.csv

Output:

text

name|age|city|salary
Alice|30|New York|75000
Bob|25|Chicago|62000
Carol|35|New York|88000
Dave|28|Chicago|71000
Eve|32|Boston|95000

bash

# Quote every field
qsv fmt --quote-always people.csv

Output:

text

"name","age","city","salary"
"Alice","30","New York","75000"
"Bob","25","Chicago","62000"
"Carol","35","New York","88000"
"Dave","28","Chicago","71000"
"Eve","32","Boston","95000"

`tojsonl` — Convert CSV to JSONL

Converts each CSV row to a JSON object on its own line (JSON Lines format), with automatic type inference so numeric and boolean columns are emitted without quotes. Use it to feed CSV data into JSON-native tools or APIs.

bash

qsv tojsonl people.csv

Output:

text

{"name":"Alice","age":30,"city":"New York","salary":75000}
{"name":"Bob","age":25,"city":"Chicago","salary":62000}
{"name":"Carol","age":35,"city":"New York","salary":88000}
{"name":"Dave","age":28,"city":"Chicago","salary":71000}
{"name":"Eve","age":32,"city":"Boston","salary":95000}

Type inference is automatic: age and salary are emitted as integers (not quoted), boolean columns become true/false, and nulls become JSON null.

`excel` — Extract Excel sheet to CSV

Reads .xlsx or .xls files and converts a sheet to CSV, handling merged cells, date formatting, and formula results. Use --metadata j to list all sheets before deciding which to extract.

bash

# First sheet
qsv excel data.xlsx -o output.csv

# Specific sheet by name
qsv excel data.xlsx --sheet "Sales" -o sales.csv

# List all sheets as JSON
qsv excel data.xlsx --metadata j

Output:

text

{"filename":"data.xlsx","format":"Xlsx","num_sheets":3,"sheets":[{"index":0,"name":"Sheet1","typ":"WorkSheet","visible":"Visible","headers":["name","age","city","salary"],"num_columns":4,"num_rows":6},...]}

bash

# Extract a specific cell range
qsv excel data.xlsx --range "A1:C4" -o range.csv

Output: (none — exits 0 on success)

Combining files

Commands for stacking, joining, splitting, and partitioning CSV files.

`cat` — Concatenate CSVs

Stacks multiple CSV files vertically (rows) or side-by-side (columns). Use rowskey when the files have different or overlapping schemas — it aligns by column name and fills missing fields with empty strings.

bash

# Stack vertically (same schema required)
qsv cat rows people.csv people2.csv

Output:

text

name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
Frank,40,Seattle,105000
Grace,29,Austin,67000

bash

# Stack with differing schemas (fills missing fields with empty)
qsv cat rowskey --group fname people.csv dept.csv

Output:

text

file,name,age,city,salary,department
people.csv,Alice,30,New York,75000,
people.csv,Bob,25,Chicago,62000,
dept.csv,Alice,,,Engineering
dept.csv,Bob,,,Marketing
...

bash

# Concatenate side by side (columns)
qsv cat columns people.csv dept.csv

Output:

text

name,age,city,salary,name,department
Alice,30,New York,75000,Alice,Engineering
Bob,25,Chicago,62000,Bob,Marketing
Carol,35,New York,88000,Carol,Engineering
Dave,28,Chicago,71000,Dave,Sales
Eve,32,Boston,95000,Eve,Engineering

`join` — Join two CSVs

Performs an inner, outer, semi, anti, or cross join between two CSV files on one or more key columns. Differs from sqlp joins in that it does not require SQL syntax and is optimized for streaming large files.

bash

# Inner join on name
qsv join name people.csv name dept.csv

Output:

text

name,age,city,salary,name,department
Alice,30,New York,75000,Alice,Engineering
Bob,25,Chicago,62000,Bob,Marketing
Carol,35,New York,88000,Carol,Engineering
Dave,28,Chicago,71000,Dave,Sales
Eve,32,Boston,95000,Eve,Engineering

bash

# Left anti-join (people NOT in dept.csv)
qsv join --left-anti name people.csv name dept.csv

Output (empty if all names matched):

text

name,age,city,salary

bash

# Cross join (cartesian product)
qsv join --cross name people.csv name dept.csv | qsv count

Output:

text

Join type	Flag
Inner (default)	(none)
Left outer	`--left`
Right outer	`--right`
Full outer	`--full`
Left anti	`--left-anti`
Left semi	`--left-semi`
Right anti	`--right-anti`
Cross (cartesian)	`--cross`

`split` — Split into multiple files

Writes sequential chunks of a CSV to separate files in an output directory, either by fixed row count (-s) or by total number of chunks (-c). Use --pad and --filename to control zero-padding and naming.

bash

mkdir /tmp/split_out
# 2 rows per chunk
qsv split -s 2 /tmp/split_out people.csv
ls /tmp/split_out

Output:

text

0.csv  1.csv  2.csv

bash

# 3 chunks with padded, custom filenames
qsv split -c 3 --pad 3 --filename "chunk_{}.csv" /tmp/split_out people.csv

Output: (none — exits 0 on success)

`partition` — Partition by column value

Creates one output file per distinct value in a key column, named after that value. Differs from split in that grouping is by content rather than row count — ideal for producing per-department or per-region files.

bash

mkdir /tmp/by_city
qsv partition city /tmp/by_city people.csv
ls /tmp/by_city

Output:

text

Boston.csv  Chicago.csv  New York.csv

Chicago.csv:

text

name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000

bash

# Drop the partition column from output files
qsv partition --drop city /tmp/by_city people.csv

Output: (none — exits 0 on success)

Scripting and queries

Commands for running SQL, embedded Lua scripts, and built-in string operations directly against CSV files.

`sqlp` — SQL queries via Polars

The filename (without extension) becomes the table name.

bash

# WHERE filter and ORDER BY
qsv sqlp people.csv "SELECT name, salary FROM people WHERE salary > 70000 ORDER BY salary DESC"

Output:

text

name,salary
Eve,95000
Carol,88000
Alice,75000
Dave,71000

bash

# GROUP BY aggregation
qsv sqlp people.csv "SELECT city, COUNT(*) as n, AVG(salary) as avg_salary FROM people GROUP BY city ORDER BY avg_salary DESC"

Output:

text

city,n,avg_salary
Boston,1,95000.0
New York,2,81500.0
Chicago,2,66500.0

bash

# Join two files in SQL
qsv sqlp people.csv dept.csv \
  "SELECT p.name, p.salary, d.department
   FROM people p JOIN dept d ON p.name = d.name
   WHERE d.department = 'Engineering'
   ORDER BY p.salary DESC"

Output:

text

name,salary,department
Eve,95000,Engineering
Carol,88000,Engineering
Alice,75000,Engineering

bash

# Window function: salary rank
qsv sqlp people.csv \
  "SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank FROM people"

Output:

text

name,salary,rank
Eve,95000,1
Carol,88000,2
Alice,75000,3
Dave,71000,4
Bob,62000,5

bash

# Output as JSON
qsv sqlp --format json people.csv "SELECT * FROM people WHERE city = 'Chicago'"

Output:

text

[{"name":"Bob","age":25,"city":"Chicago","salary":62000},{"name":"Dave","age":28,"city":"Chicago","salary":71000}]

Use --streaming for files larger than RAM. Add --try-parsedates to auto-parse date columns.

`luau` — Scripted transforms with embedded Lua

Runs a Luau (sandboxed Lua 5.1) expression per row to map a new column or filter rows, with optional --begin/--end blocks for initialization and aggregation. Reach for this when apply operations are too limited but a full sqlp query is overkill.

bash

# Add computed column (salary in thousands)
qsv luau map salary_k \
  "string.format('%.1f', col.salary / 1000)" \
  people.csv

Output:

text

name,age,city,salary,salary_k
Alice,30,New York,75000,75.0
Bob,25,Chicago,62000,62.0
Carol,35,New York,88000,88.0
Dave,28,Chicago,71000,71.0
Eve,32,Boston,95000,95.0

bash

# Filter rows with a script
qsv luau filter "tonumber(col.salary) > 75000" people.csv

Output:

text

name,age,city,salary
Carol,35,New York,88000
Eve,32,Boston,95000

bash

# Add a seniority label (conditional logic)
qsv luau map seniority \
  "if tonumber(col.age) >= 32 then return 'Senior' else return 'Junior' end" \
  people.csv

Output:

text

name,age,city,salary,seniority
Alice,30,New York,75000,Junior
Bob,25,Chicago,62000,Junior
Carol,35,New York,88000,Senior
Dave,28,Chicago,71000,Junior
Eve,32,Boston,95000,Senior

bash

# Aggregation using BEGIN/END blocks
qsv luau map dummy \
  --begin "total = 0" \
  "total = total + tonumber(col.salary); return ''" \
  --end "print('Total salary: ' .. total)" \
  people.csv > /dev/null

Output:

text

Total salary: 391000

Reference columns with col.column_name or col["col name"]. Use _IDX for the current row number. Scripts run with Luau 0.716 — a safe, sandboxed Lua 5.1 subset.

`apply` — Built-in string and numeric operations

Applies one or more named operations (case conversion, trimming, encoding, similarity, NLP sentiment, etc.) to a column without writing a script. Use dynfmt to produce a new column from a format string that interpolates other columns.

bash

# Uppercase a column
qsv apply operations upper name people.csv

Output:

text

name,age,city,salary
ALICE,30,New York,75000
BOB,25,Chicago,62000
CAROL,35,New York,88000
DAVE,28,Chicago,71000
EVE,32,Boston,95000

bash

# Compute string length into a new column
qsv apply operations len name -c name_len people.csv

Output:

text

name,age,city,salary,name_len
Alice,30,New York,75000,5
Bob,25,Chicago,62000,3
Carol,35,New York,88000,5
Dave,28,Chicago,71000,4
Eve,32,Boston,95000,3

bash

# Dynamic format string → computed description column
qsv apply dynfmt \
  --formatstr "{name} earns \${salary} in {city}" \
  description people.csv

Output:

text

name,age,city,salary,description
Alice,30,New York,75000,Alice earns $75000 in New York
Bob,25,Chicago,62000,Bob earns $62000 in Chicago
Carol,35,New York,88000,Carol earns $88000 in New York
Dave,28,Chicago,71000,Dave earns $71000 in Chicago
Eve,32,Boston,95000,Eve earns $95000 in Boston

Available apply operations:

Category	Operations
Case	`lower`, `upper`, `titlecase`
Whitespace	`trim`, `ltrim`, `rtrim`, `squeeze`
String	`len`, `strip_prefix`, `strip_suffix`, `escape`, `replace`, `regex_replace`
Encoding	`encode64`, `decode64`, `encode62`, `decode62`, `crc32`
Math	`round`, `thousands`
Financial	`currencytonum`, `numtocurrency`
Similarity	`simdl`, `simjw`, `simsd`, `simhm`
NLP	`sentiment`, `whatlang`, `gender_guess`, `eudex`

Schema and validation

Commands for inferring structure from a CSV and checking that data conforms to expected types and constraints.

`schema` — Infer JSON Schema from CSV

Scans a CSV and generates a JSON Schema (Draft 2020-12) file capturing field types, enum values, and numeric ranges. The output schema can be fed directly to validate to enforce those constraints on new data.

bash

qsv schema people.csv

Output: (none — exits 0 on success)

Generates people.csv.schema.json:

text

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "people.csv",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "enum": ["Alice", "Bob", "Carol", "Dave", "Eve"]
    },
    "age": { "type": "integer", "minimum": 25, "maximum": 35 },
    "city": {
      "type": "string",
      "enum": ["Boston", "Chicago", "New York"]
    },
    "salary": { "type": "integer", "minimum": 62000, "maximum": 95000 }
  },
  "required": ["name", "age", "city", "salary"]
}

bash

# Polars schema (for use with sqlp/Parquet pipelines)
qsv schema --polars people.csv

Output:

text

{"name":"Utf8","age":"Int64","city":"Utf8","salary":"Int64"}

`validate` — Validate CSV against JSON Schema

Without a schema argument, checks that the CSV is well-formed per RFC 4180. With a schema, validates each row against it and writes passing rows to .valid, failing rows to .invalid, and a validation-errors.tsv describing each violation.

bash

# RFC 4180 well-formedness check
qsv validate people.csv

Output:

text

people.csv is valid.

bash

# Schema validation (generates .valid, .invalid, and validation-errors.tsv)
qsv schema people.csv
printf 'name,age,city,salary\nBadRow,notanumber,Unknown,0\n' > bad.csv
qsv validate bad.csv people.csv.schema.json

Output: (none — exits 0 on success)

validation-errors.tsv:

text

row_number	field	error
2	age	notanumber is not of type "integer"
2	city	Unknown is not one of ["Boston","Chicago","New York"]
2	salary	0 is less than the minimum value of 62000

Sampling

Commands for drawing representative subsets from large files without loading everything into memory.

`sample` — Random sampling

Draws rows using reservoir sampling by default, guaranteeing a uniform random sample in a single pass without knowing the file size upfront. Supports stratified, Bernoulli, systematic, cluster, weighted, and time-series sampling modes; use --seed for reproducibility.

bash

# Reservoir sample (3 random rows)
qsv sample 3 people.csv

Output:

text

name,age,city,salary
Bob,25,Chicago,62000
Alice,30,New York,75000
Eve,32,Boston,95000

bash

# Reproducible sample with seed
qsv sample --seed 42 3 people.csv

Output:

text

name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000

bash

# 50% Bernoulli sample (each row independently included with probability 0.5)
qsv sample --bernoulli --seed 42 0.5 people.csv

Output:

text

name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000

bash

# Stratified: 1 row per unique city
qsv sample --stratified city --seed 42 1 people.csv

Output:

text

name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Eve,32,Boston,95000

Method	Flag	Use case
Reservoir (default)	—	General random sample
Indexed	— (with `.idx`)	Random I/O, large files
Bernoulli	`--bernoulli`	Independent row probability
Systematic	`--systematic <col>`	Every nth record
Stratified	`--stratified <col>`	Representative subgroup samples
Weighted	`--weighted <col>`	Probability proportional to weight
Cluster	`--cluster <col>`	Sample entire clusters
Timeseries	`--timeseries <col>`	One record per time interval

Flattening and display

Commands for rendering CSV records in a human-readable layout rather than a dense columnar format.

`flatten` — View records one at a time

Prints each record as a vertical key-value block separated by #, making wide or deeply nested CSVs readable in a terminal. Use -c to truncate long values to a fixed character limit for a quick overview.

bash

qsv flatten people.csv

Output:

text

name    Alice
age     30
city    New York
salary  75000
#
name    Bob
age     25
city    Chicago
salary  62000
#
...

bash

# Condense long values for a quick overview
qsv flatten -c 8 people.csv

Output:

text

name    Alice
age     30
city    New York
salary  75000
#
...

Indexing

An index file dramatically speeds up commands that support random access (slice, split, sample, count, dedup).

bash

qsv index people.csv
# Creates people.csv.idx alongside the source file

Output: (none — exits 0 on success)

After indexing, qsv count and qsv slice are O(1) regardless of file size.

bash

# Force rebuild
qsv index --force people.csv

Output: (none — exits 0 on success)

Configuration

qsv reads runtime defaults from QSV_* environment variables and from a dotenv file. Use this to set delimiters, buffer sizes, parallelism, and remote-fetch behaviour project-wide without repeating flags on every invocation.

Environment variables

Every option exposed as a CLI flag has a matching QSV_* variable; the variable becomes the default and is overridden by an explicit flag. Run qsv --envlist to dump the active set.

bash

# Show every QSV_* variable currently in effect
qsv --envlist

Output:

text

QSV_DEFAULT_DELIMITER: ,
QSV_NO_HEADERS: false
QSV_COMMENT_CHAR:
QSV_MAX_JOBS: 8
QSV_CACHE_DIR: /home/alice/.qsv-cache
...

bash

# Project-wide TSV default + parallel job cap
export QSV_DEFAULT_DELIMITER=$'\t'
export QSV_MAX_JOBS=4
qsv stats data.tsv

Output: (none — exits 0 on success)

Variable	Purpose
`QSV_DEFAULT_DELIMITER`	One ASCII char; overrides `--delimiter`.
`QSV_SNIFF_DELIMITER`	If set, auto-detect delimiter per file.
`QSV_NO_HEADERS`	Treat first row as data, not a header.
`QSV_MAX_JOBS`	Cap parallel workers (default = logical CPUs).
`QSV_CACHE_DIR`	Where `stats`/`fetch` cache files are written.
`QSV_DOTENV_PATH`	Explicit dotenv file path; `""` disables loading.
`QSV_LOG_LEVEL`	`error`/`warn`/`info`/`debug`/`trace`.
`QSV_LOG_DIR`	Directory for structured log output.
`QSV_PROGRESSBAR`	`1` to show a TTY progress bar on long runs.

Dotenv file

On startup, qsv loads a .env file from the current directory (or the path in QSV_DOTENV_PATH) and applies any QSV_*=value lines as if they were exported. Useful for pinning per-project defaults next to a dataset.

bash

cat > .env << 'EOF'
QSV_DEFAULT_DELIMITER=|
QSV_MAX_JOBS=4
QSV_LOG_LEVEL=info
EOF

qsv count people.csv   # picks up the .env automatically

Output: (none — exits 0 on success)

bash

# Point at a shared dotenv outside the cwd
QSV_DOTENV_PATH=/home/alice/projects/etl/.env qsv stats people.csv

# Disable dotenv loading for one invocation
QSV_DOTENV_PATH= qsv stats people.csv

Output: (none — exits 0 on success)

MCP server (qsv 13+)

qsv 13 added a built-in Model Context Protocol server that lets AI agents (Claude Desktop, Claude Code, and other MCP clients) query and transform local CSV/Parquet/Excel files without uploading raw data — only statistical metadata and result rows cross the wire. Reach for it when you want a chatbot to drive qsv against your own files.

bash

# Start the MCP server on stdio (default transport)
qsvmcp serve

# Or use the full binary
qsv mcp serve

Output: (none — exits 0 on success)

bash

# List the MCP-exposed tools and exit (handy for debugging)
qsvmcp list-skills

# Regenerate the bundled skill definitions
qsvmcp --update-mcp-skills

Output: (none — exits 0 on success)

text

{
  "mcpServers": {
    "qsv": {
      "command": "qsvmcp",
      "args": ["serve"],
      "env": { "QSV_CACHE_DIR": "/home/alice/.qsv-cache" }
    }
  }
}

The qsvmcp binary ships ~63 of qsv's commands — enough for the MCP skill set with a smaller footprint. Use the full qsv binary if you need commands outside the MCP surface (e.g. geocode, python).

`tojsonl --toon` — Token-efficient output for LLMs

qsv 12 introduced TOON, a token-optimized tabular format designed for LLM contexts — denser than JSON, still parseable. Useful when piping CSV summaries into a prompt.

bash

qsv tojsonl --toon people.csv

Output:

text

[name|age|city|salary]
Alice|30|New York|75000
Bob|25|Chicago|62000
...

Piping commands together

qsv is designed to be composed — pipe subcommands to build multi-step pipelines:

bash

# Filter to Engineering dept, sort by salary desc, pick 3 columns
qsv join name people.csv name dept.csv \
  | qsv search -s department "Engineering" \
  | qsv select name,salary,department \
  | qsv sort -s salary -N -R

Output:

text

name,salary,department
Eve,95000,Engineering
Carol,88000,Engineering
Alice,75000,Engineering

bash

# Top city by total salary
qsv sqlp people.csv \
  "SELECT city, SUM(salary) as total FROM people GROUP BY city ORDER BY total DESC LIMIT 1"

Output:

text

city,total
New York,163000

bash

# Count rows matching a pattern across a directory of CSVs
cat *.csv | qsv search "New York" | qsv count

Output: (none — exits 0 on success)

Use qsv input to normalize messy CSVs (trim whitespace, fix quoting, skip comment lines) before piping to other commands. Use qsv fixlengths to pad rows with missing fields so downstream commands don't choke on ragged files.

Sources

qsv releases (dathere/qsv) — 13.0.0 "AI-native" launch, 12.0.0 moarstats + TOON, 11.0.2 streaming stats/frequency.
qsv CHANGELOG.md — full per-release diff including subcommand additions and --weight consolidation in frequency.
ENVIRONMENT_VARIABLES.md — authoritative list of QSV_* variables and dotenv loading rules.
qsv MCP server skill README — qsvmcp binary, skill counts, Claude Desktop wiring.
qsv.dathere.com — stats docs — current reference for stats, moarstats, and cached .stats.csv.bin.sz.

qsv — CSV Toolkit

What it is

Install

Sample data

Discovery

sniff — Detect schema without reading the whole file

count — Count rows

headers — List column names

Summary statistics

stats — Per-column statistics

moarstats — Extended statistics (qsv 12+)

Selecting columns

select — Pick, reorder, or drop columns

Filtering rows

search — Filter rows by regex

slice — Extract row ranges

Sorting and deduplication

sort — Sort rows

dedup — Remove duplicate rows

Frequency analysis

frequency — Value counts per column

Transforming columns

rename — Rename column headers

fill — Forward-fill empty values

reverse — Reverse row order

transpose — Swap rows and columns

enum — Add a row number column

pseudo — Pseudonymize a column

safenames — Sanitize column names for SQL/Python

Format conversion

fmt — Change delimiter or quoting

tojsonl — Convert CSV to JSONL

excel — Extract Excel sheet to CSV

Combining files

cat — Concatenate CSVs

join — Join two CSVs

split — Split into multiple files

partition — Partition by column value

Scripting and queries

sqlp — SQL queries via Polars

luau — Scripted transforms with embedded Lua

apply — Built-in string and numeric operations

Schema and validation

schema — Infer JSON Schema from CSV

validate — Validate CSV against JSON Schema

Sampling

sample — Random sampling

Flattening and display

flatten — View records one at a time

Indexing

Configuration

Environment variables

Dotenv file

MCP server (qsv 13+)

tojsonl --toon — Token-efficient output for LLMs

Piping commands together

Sources

`sniff` — Detect schema without reading the whole file

`count` — Count rows

`headers` — List column names

`stats` — Per-column statistics

`moarstats` — Extended statistics (qsv 12+)

`select` — Pick, reorder, or drop columns

`search` — Filter rows by regex

`slice` — Extract row ranges

`sort` — Sort rows

`dedup` — Remove duplicate rows

`frequency` — Value counts per column

`rename` — Rename column headers

`fill` — Forward-fill empty values

`reverse` — Reverse row order

`transpose` — Swap rows and columns

`enum` — Add a row number column

`pseudo` — Pseudonymize a column

`safenames` — Sanitize column names for SQL/Python

`fmt` — Change delimiter or quoting

`tojsonl` — Convert CSV to JSONL

`excel` — Extract Excel sheet to CSV

`cat` — Concatenate CSVs

`join` — Join two CSVs

`split` — Split into multiple files

`partition` — Partition by column value

`sqlp` — SQL queries via Polars

`luau` — Scripted transforms with embedded Lua

`apply` — Built-in string and numeric operations

`schema` — Infer JSON Schema from CSV

`validate` — Validate CSV against JSON Schema

`sample` — Random sampling

`flatten` — View records one at a time

`tojsonl --toon` — Token-efficient output for LLMs