cheat sheet
qsv
Comprehensive reference for qsv: count, headers, stats, moarstats, select, search, sort, dedup, frequency, join, sqlp, luau, apply, schema, validate, sample, split, MCP server, and more — with examples and outputs.
qsv — CSV Toolkit
What it is
qsv is a blazing-fast, Rust-based CSV toolkit with 80+ subcommands for querying, transforming, analyzing, and validating tabular data — a maintained, feature-rich fork of the original xsv project. It adds Polars-backed acceleration, an embedded Luau scripting engine, and support for CSV, TSV, Excel, JSON, Parquet, and Apache Arrow formats. Reach for qsv when you need to slice, filter, join, or summarize structured tabular data from the command line without loading it into a full database or spreadsheet.
Install
# macOS
brew install qsv
# Windows
scoop install qsv
# Cargo
cargo install qsv --locked
# Or download binary from releases
curl -LO https://github.com/dathere/qsv/releases/latest/download/qsv-x86_64-unknown-linux-gnu.zip
Output: (none — exits 0 on success)
Variants: qsv (full), qsvlite (no Luau/Python), qsvmcp (Model Context Protocol), qsvpy (Python integration).
Sample data
All examples below use these two files:
cat > people.csv << 'EOF'
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
EOF
cat > dept.csv << 'EOF'
name,department
Alice,Engineering
Bob,Marketing
Carol,Engineering
Dave,Sales
Eve,Engineering
EOF
Output: (none — exits 0 on success)
Discovery
Commands for understanding an unfamiliar CSV before you commit to processing it.
sniff — Detect schema without reading the whole file
Samples the first few thousand bytes of a file to detect delimiter, quoting, field count, types, and record count without reading the whole file. Use it as a fast first look before running heavier commands.
qsv sniff people.csv
Output:
Sniff Results for people.csv
Last Modified : 2026-04-26 10:00:00 UTC
File Size : 123 bytes
Delimiter : ,
Has Header Row : true
Quote Char : "
Num Records : 5
Num Fields : 4
Fields :
0: name (String)
1: age (Integer)
2: city (String)
3: salary (Integer)
# Sniff a remote file
qsv sniff https://example.com/data.csv
Output: (none — exits 0 on success)
count — Count rows
Returns the number of data rows (excluding the header). Faster than wc -l because it handles quoted newlines correctly, and near-instant on indexed files.
qsv count people.csv
Output:
5
# Human-readable (useful for millions of rows)
qsv count --human-readable largefile.csv
Output:
1,482,309
# Include record width statistics
qsv count --width people.csv
Output:
5
32-27-28-22-5
headers — List column names
Prints each column name with its 1-based index. Use --just-names when you need a plain list for scripting, or --intersect to find the common columns across two files before a join.
qsv headers people.csv
Output:
1 name
2 age
3 city
4 salary
# Just names (for scripting)
qsv headers --just-names people.csv
Output:
name
age
city
salary
# Find common columns across two files
qsv headers --intersect people.csv dept.csv
Output:
name
# Count only
qsv headers --just-count people.csv
Output:
4
Summary statistics
Commands for computing numeric and categorical summaries across columns without writing a full query.
stats — Per-column statistics
Computes sum, min, max, mean, stddev, null count, and type for every column in a single pass. Results are cached alongside the file, so repeated runs are instant; use --everything to add median, quartiles, and mode.
qsv stats people.csv
Output:
field,type,sum,min,max,range,sortorder,min_length,max_length,mean,stddev,variance,cv,nullcount,max_precision,sparsity
name,String,,Alice,Eve,,Unsorted,3,5,,,,,,0,0
age,Integer,150,25,35,10,Ascending,2,2,30,3.742,14,0.1247,0,0,0
city,String,,Boston,New York,,Unsorted,6,8,,,,,,0,0
salary,Integer,391000,62000,95000,33000,Ascending,5,5,78200,11972.47,143337500,0.1531,0,0,0
# Infer types only (fast — no numeric computation)
qsv stats --typesonly people.csv
Output:
field,type
name,String
age,Integer
city,String
salary,Integer
# Full statistics including mode, median, quartiles
qsv stats --everything people.csv
Output:
field,type,...,mode,median,mad,q1,q2_median,q3,...
name,String,...,Alice|Bob|Carol|Dave|Eve,,,,,...
age,Integer,...,25|28|30|32|35,30,2,27,30,33,...
salary,Integer,...,62000|71000|75000|88000|95000,75000,10000,66500,75000,91500,...
# Stats for specific columns only
qsv stats -s salary,age people.csv
Output: (none — exits 0 on success)
qsv statscaches results in a.stats.csv.bin.szfile alongside the input. Subsequent calls are instant. Use--forceto recompute.
moarstats — Extended statistics (qsv 12+)
Augments a stats output file with up to 55 additional advanced measures — extended outlier, robust, and bivariate statistics (covariance, correlation, kurtosis, MAD, IQR, Pearson/Spearman, etc.). Run stats first, then moarstats on the resulting .stats.csv to enrich it without re-scanning the original data.
# Produce stats.csv, then enrich it with advanced measures
qsv stats people.csv -o people.stats.csv
qsv moarstats people.stats.csv
Output:
field,type,...,kurtosis,iqr,skewness,covariance,pearson_r,spearman_r,...
age,Integer,...,-1.30,6,0.21,...
salary,Integer,...,-1.20,25000,0.34,...
# Restrict to a subset of advanced measures
qsv moarstats --select kurtosis,iqr,skewness people.stats.csv
Output: (none — exits 0 on success)
moarstatswas introduced in qsv 12.0.0 and refined in 13.0.0. It also powers the per-column "FAIR metadata" inference used by the MCP server and TOON output.
Selecting columns
Commands for narrowing or reordering the columns in a file before downstream processing.
select — Pick, reorder, or drop columns
Outputs a subset (or reordering) of columns by name, index, range, or regex. Prefix a selector with ! to exclude it; the order of selectors controls the output order.
# Pick two columns
qsv select name,salary people.csv
Output:
name,salary
Alice,75000
Bob,62000
Carol,88000
Dave,71000
Eve,95000
# Drop a column (! prefix = all except)
qsv select '!age' people.csv
Output:
name,city,salary
Alice,New York,75000
Bob,Chicago,62000
Carol,New York,88000
Dave,Chicago,71000
Eve,Boston,95000
# Select by column range
qsv select 1-3 people.csv
Output:
name,age,city
Alice,30,New York
Bob,25,Chicago
Carol,35,New York
Dave,28,Chicago
Eve,32,Boston
# Select by regex (columns starting with 'a' or 'c')
qsv select '/^[ac]/' people.csv
Output:
age,city
30,New York
25,Chicago
35,New York
28,Chicago
32,Boston
Filtering rows
Commands for keeping or discarding rows based on patterns or positional ranges.
search — Filter rows by regex
Filters rows using a regular expression, optionally scoped to one or more columns with -s. Use -v to invert (exclude matches), or --flag to add a match-indicator column instead of dropping rows.
# Keep rows matching a pattern
qsv search "New York" people.csv
Output:
name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
# Search in a specific column only
qsv search -s city "Chicago" people.csv
Output:
name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
# Invert match (exclude Chicago)
qsv search -s city -v "Chicago" people.csv
Output:
name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
# Add a match flag column instead of filtering
qsv search -s city --flag matched "New York" people.csv
Output:
name,age,city,salary,matched
Alice,30,New York,75000,1
Bob,25,Chicago,62000,0
Carol,35,New York,88000,1
Dave,28,Chicago,71000,0
Eve,32,Boston,95000,0
# Count matches only (written to stderr)
qsv search -s city -c "New York" people.csv 2>&1 >/dev/null
Output:
2
slice — Extract row ranges
Extracts a contiguous range of rows by start index, end index, length, or a single row. Differs from search in that it operates by position, not content; on indexed files it is O(1) regardless of file size.
# First 3 rows
qsv slice -l 3 people.csv
Output:
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
# Rows 2–4 (0-based start, exclusive end)
qsv slice -s 1 -e 4 people.csv
Output:
name,age,city,salary
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
# Single row by index
qsv slice -i 4 people.csv
Output:
name,age,city,salary
Eve,32,Boston,95000
# Last 2 rows (negative index)
qsv slice -s -2 people.csv
Output:
name,age,city,salary
Dave,28,Chicago,71000
Eve,32,Boston,95000
# JSON output for a single row
qsv slice -i 0 --json people.csv
Output:
[{"name":"Alice","age":"30","city":"New York","salary":"75000"}]
Sorting and deduplication
Commands for ordering rows and removing duplicates, often a prerequisite for joins or frequency counts.
sort — Sort rows
Sorts rows by one or more columns; add -N for numeric comparison and -R for descending order. Also supports --random for reproducible shuffles with a --seed.
# Sort by salary numerically (ascending)
qsv sort -s salary -N people.csv
Output:
name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
# Sort by salary descending
qsv sort -s salary -N -R people.csv
Output:
name,age,city,salary
Eve,32,Boston,95000
Carol,35,New York,88000
Alice,30,New York,75000
Dave,28,Chicago,71000
Bob,25,Chicago,62000
# Multi-key sort (city then salary)
qsv sort -s city,salary -N people.csv
Output:
name,age,city,salary
Eve,32,Boston,95000
Bob,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Carol,35,New York,88000
# Reproducible random shuffle
qsv sort --random --seed 42 people.csv
Output:
name,age,city,salary
Carol,35,New York,88000
Eve,32,Boston,95000
Bob,25,Chicago,62000
Alice,30,New York,75000
Dave,28,Chicago,71000
dedup — Remove duplicate rows
Removes rows that are identical across one or more key columns, keeping the first occurrence. Use -D to write the dropped duplicates to a separate file for auditing.
# Dedup by city (keep first occurrence per city)
qsv dedup -s city people.csv
Output:
name,age,city,salary
Eve,32,Boston,95000
Bob,25,Chicago,62000
Alice,30,New York,75000
2 (duplicates removed, written to stderr)
# Write duplicates to a separate file
qsv dedup -s city -D dupes.csv people.csv
Output: (none — exits 0 on success)
dupes.csv:
name,age,city,salary
Dave,28,Chicago,71000
Carol,35,New York,88000
Frequency analysis
Commands for counting distinct values and understanding the distribution of categorical columns.
frequency — Value counts per column
Produces a ranked value-count table for each column (or a subset with -s), including the percentage each value represents. Use --no-other to suppress the catch-all "Other" bucket when there are many distinct values.
qsv frequency -s city people.csv
Output:
field,value,count,percentage
city,Chicago,2,40.0000
city,New York,2,40.0000
city,Boston,1,20.0000
# All columns, no truncation
qsv frequency --no-other people.csv
Output:
field,value,count,percentage
name,Alice,1,20.0000
name,Bob,1,20.0000
name,Carol,1,20.0000
name,Dave,1,20.0000
name,Eve,1,20.0000
age,25,1,20.0000
age,28,1,20.0000
age,30,1,20.0000
age,32,1,20.0000
age,35,1,20.0000
city,Chicago,2,40.0000
city,New York,2,40.0000
city,Boston,1,20.0000
salary,62000,1,20.0000
salary,71000,1,20.0000
salary,75000,1,20.0000
salary,88000,1,20.0000
salary,95000,1,20.0000
# JSON output
qsv frequency -s city --json people.csv
Output:
[{"field":"city","data":[{"value":"Chicago","count":2,"percentage":40.0},{"value":"New York","count":2,"percentage":40.0},{"value":"Boston","count":1,"percentage":20.0}]}]
Transforming columns
Commands for reshaping, renaming, filling, and computing new columns without leaving the command line.
rename — Rename column headers
Renames columns by supplying a comma-separated list of new names in positional order. Use --pairwise to rename only specific columns by specifying old,new pairs, leaving the rest untouched.
# Rename all columns by position
qsv rename full_name,years_old,location,annual_pay people.csv
Output:
full_name,years_old,location,annual_pay
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
# Pairwise rename (only rename specific columns)
qsv rename --pairwise age,years,salary,income people.csv
Output:
name,years,city,income
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
fill — Forward-fill empty values
Propagates the last non-empty value in a column downward to fill blanks — useful for sparse exports where a value is only written on the first row of a group. Use --default to fill with a fixed string instead.
# Create a CSV with gaps
printf 'name,city,salary\nAlice,New York,75000\nBob,,62000\nCarol,,88000\nDave,Chicago,\nEve,Boston,95000\n' > gaps.csv
# Forward-fill city
qsv fill city gaps.csv
Output:
name,city,salary
Alice,New York,75000
Bob,New York,62000
Carol,New York,88000
Dave,Chicago,
Eve,Boston,95000
# Fill with a fixed default
qsv fill --default "N/A" salary gaps.csv
Output:
name,city,salary
Alice,New York,75000
Bob,,62000
Carol,,88000
Dave,Chicago,N/A
Eve,Boston,95000
reverse — Reverse row order
Outputs all rows in reverse order without sorting. Use this when the last record is the most recent and you want newest-first output without a sort key.
qsv reverse people.csv
Output:
name,age,city,salary
Eve,32,Boston,95000
Dave,28,Chicago,71000
Carol,35,New York,88000
Bob,25,Chicago,62000
Alice,30,New York,75000
transpose — Swap rows and columns
Rotates the CSV so rows become columns and columns become rows. Useful for turning a wide stat table into a narrow key-value layout, or for feeding column-oriented data into row-oriented tools.
qsv transpose people.csv
Output:
name,Alice,Bob,Carol,Dave,Eve
age,30,25,35,28,32
city,New York,Chicago,New York,Chicago,Boston
salary,75000,62000,88000,71000,95000
enum — Add a row number column
Appends a _enum column containing the 0-based row index (or a custom name and start value). Use it to add a stable surrogate key or to restore original ordering after a shuffle.
qsv enum people.csv
Output:
name,age,city,salary,_enum
Alice,30,New York,75000,0
Bob,25,Chicago,62000,1
Carol,35,New York,88000,2
Dave,28,Chicago,71000,3
Eve,32,Boston,95000,4
# Custom column name, 1-based
qsv enum --new-column row_id --start-index 1 people.csv
Output:
name,age,city,salary,row_id
Alice,30,New York,75000,1
Bob,25,Chicago,62000,2
Carol,35,New York,88000,3
Dave,28,Chicago,71000,4
Eve,32,Boston,95000,5
pseudo — Pseudonymize a column
Replaces the values in a column with consistent, opaque identifiers so the same input always maps to the same output within a file. Use it to anonymize PII before sharing data while preserving join-ability.
# Replace names with consistent opaque IDs
qsv pseudo name people.csv
Output:
name,age,city,salary
b3a4f2...,30,New York,75000
9c7d1e...,25,Chicago,62000
2f8a03...,35,New York,88000
7e1c94...,28,Chicago,71000
4b5d82...,32,Boston,95000
safenames — Sanitize column names for SQL/Python
Rewrites column headers so they are valid identifiers for SQL, pandas, or R by replacing spaces and special characters with underscores. Use --mode check first to count unsafe headers without modifying the file.
# Create a CSV with messy headers
printf 'Full Name,Age (Years),City/Region,Annual Salary $\nAlice,30,NYC,75000\n' > messy.csv
qsv safenames messy.csv
Output:
Full_Name,Age__Years_,City_Region,Annual_Salary__
Alice,30,NYC,75000
# Verify names are safe (check mode)
qsv safenames --mode check messy.csv
Output:
4 unsafe header(s) found.
Format conversion
Commands for converting between CSV, TSV, JSONL, Excel, and other tabular formats.
fmt — Change delimiter or quoting
Reformats a CSV in place — change the delimiter, quote character, or quoting style without altering the data. Use it to convert CSV to TSV before piping into tools that expect tab-delimited input.
# CSV to TSV
qsv fmt -t T people.csv
Output:
name age city salary
Alice 30 New York 75000
Bob 25 Chicago 62000
Carol 35 New York 88000
Dave 28 Chicago 71000
Eve 32 Boston 95000
# Pipe-delimited
qsv fmt -t '|' people.csv
Output:
name|age|city|salary
Alice|30|New York|75000
Bob|25|Chicago|62000
Carol|35|New York|88000
Dave|28|Chicago|71000
Eve|32|Boston|95000
# Quote every field
qsv fmt --quote-always people.csv
Output:
"name","age","city","salary"
"Alice","30","New York","75000"
"Bob","25","Chicago","62000"
"Carol","35","New York","88000"
"Dave","28","Chicago","71000"
"Eve","32","Boston","95000"
tojsonl — Convert CSV to JSONL
Converts each CSV row to a JSON object on its own line (JSON Lines format), with automatic type inference so numeric and boolean columns are emitted without quotes. Use it to feed CSV data into JSON-native tools or APIs.
qsv tojsonl people.csv
Output:
{"name":"Alice","age":30,"city":"New York","salary":75000}
{"name":"Bob","age":25,"city":"Chicago","salary":62000}
{"name":"Carol","age":35,"city":"New York","salary":88000}
{"name":"Dave","age":28,"city":"Chicago","salary":71000}
{"name":"Eve","age":32,"city":"Boston","salary":95000}
Type inference is automatic:
ageandsalaryare emitted as integers (not quoted), boolean columns becometrue/false, and nulls become JSONnull.
excel — Extract Excel sheet to CSV
Reads .xlsx or .xls files and converts a sheet to CSV, handling merged cells, date formatting, and formula results. Use --metadata j to list all sheets before deciding which to extract.
# First sheet
qsv excel data.xlsx -o output.csv
# Specific sheet by name
qsv excel data.xlsx --sheet "Sales" -o sales.csv
# List all sheets as JSON
qsv excel data.xlsx --metadata j
Output:
{"filename":"data.xlsx","format":"Xlsx","num_sheets":3,"sheets":[{"index":0,"name":"Sheet1","typ":"WorkSheet","visible":"Visible","headers":["name","age","city","salary"],"num_columns":4,"num_rows":6},...]}
# Extract a specific cell range
qsv excel data.xlsx --range "A1:C4" -o range.csv
Output: (none — exits 0 on success)
Combining files
Commands for stacking, joining, splitting, and partitioning CSV files.
cat — Concatenate CSVs
Stacks multiple CSV files vertically (rows) or side-by-side (columns). Use rowskey when the files have different or overlapping schemas — it aligns by column name and fills missing fields with empty strings.
# Stack vertically (same schema required)
qsv cat rows people.csv people2.csv
Output:
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
Frank,40,Seattle,105000
Grace,29,Austin,67000
# Stack with differing schemas (fills missing fields with empty)
qsv cat rowskey --group fname people.csv dept.csv
Output:
file,name,age,city,salary,department
people.csv,Alice,30,New York,75000,
people.csv,Bob,25,Chicago,62000,
dept.csv,Alice,,,Engineering
dept.csv,Bob,,,Marketing
...
# Concatenate side by side (columns)
qsv cat columns people.csv dept.csv
Output:
name,age,city,salary,name,department
Alice,30,New York,75000,Alice,Engineering
Bob,25,Chicago,62000,Bob,Marketing
Carol,35,New York,88000,Carol,Engineering
Dave,28,Chicago,71000,Dave,Sales
Eve,32,Boston,95000,Eve,Engineering
join — Join two CSVs
Performs an inner, outer, semi, anti, or cross join between two CSV files on one or more key columns. Differs from sqlp joins in that it does not require SQL syntax and is optimized for streaming large files.
# Inner join on name
qsv join name people.csv name dept.csv
Output:
name,age,city,salary,name,department
Alice,30,New York,75000,Alice,Engineering
Bob,25,Chicago,62000,Bob,Marketing
Carol,35,New York,88000,Carol,Engineering
Dave,28,Chicago,71000,Dave,Sales
Eve,32,Boston,95000,Eve,Engineering
# Left anti-join (people NOT in dept.csv)
qsv join --left-anti name people.csv name dept.csv
Output (empty if all names matched):
name,age,city,salary
# Cross join (cartesian product)
qsv join --cross name people.csv name dept.csv | qsv count
Output:
25
| Join type | Flag |
|---|---|
| Inner (default) | (none) |
| Left outer | --left |
| Right outer | --right |
| Full outer | --full |
| Left anti | --left-anti |
| Left semi | --left-semi |
| Right anti | --right-anti |
| Cross (cartesian) | --cross |
split — Split into multiple files
Writes sequential chunks of a CSV to separate files in an output directory, either by fixed row count (-s) or by total number of chunks (-c). Use --pad and --filename to control zero-padding and naming.
mkdir /tmp/split_out
# 2 rows per chunk
qsv split -s 2 /tmp/split_out people.csv
ls /tmp/split_out
Output:
0.csv 1.csv 2.csv
# 3 chunks with padded, custom filenames
qsv split -c 3 --pad 3 --filename "chunk_{}.csv" /tmp/split_out people.csv
Output: (none — exits 0 on success)
partition — Partition by column value
Creates one output file per distinct value in a key column, named after that value. Differs from split in that grouping is by content rather than row count — ideal for producing per-department or per-region files.
mkdir /tmp/by_city
qsv partition city /tmp/by_city people.csv
ls /tmp/by_city
Output:
Boston.csv Chicago.csv New York.csv
Chicago.csv:
name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
# Drop the partition column from output files
qsv partition --drop city /tmp/by_city people.csv
Output: (none — exits 0 on success)
Scripting and queries
Commands for running SQL, embedded Lua scripts, and built-in string operations directly against CSV files.
sqlp — SQL queries via Polars
The filename (without extension) becomes the table name.
# WHERE filter and ORDER BY
qsv sqlp people.csv "SELECT name, salary FROM people WHERE salary > 70000 ORDER BY salary DESC"
Output:
name,salary
Eve,95000
Carol,88000
Alice,75000
Dave,71000
# GROUP BY aggregation
qsv sqlp people.csv "SELECT city, COUNT(*) as n, AVG(salary) as avg_salary FROM people GROUP BY city ORDER BY avg_salary DESC"
Output:
city,n,avg_salary
Boston,1,95000.0
New York,2,81500.0
Chicago,2,66500.0
# Join two files in SQL
qsv sqlp people.csv dept.csv \
"SELECT p.name, p.salary, d.department
FROM people p JOIN dept d ON p.name = d.name
WHERE d.department = 'Engineering'
ORDER BY p.salary DESC"
Output:
name,salary,department
Eve,95000,Engineering
Carol,88000,Engineering
Alice,75000,Engineering
# Window function: salary rank
qsv sqlp people.csv \
"SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank FROM people"
Output:
name,salary,rank
Eve,95000,1
Carol,88000,2
Alice,75000,3
Dave,71000,4
Bob,62000,5
# Output as JSON
qsv sqlp --format json people.csv "SELECT * FROM people WHERE city = 'Chicago'"
Output:
[{"name":"Bob","age":25,"city":"Chicago","salary":62000},{"name":"Dave","age":28,"city":"Chicago","salary":71000}]
Use
--streamingfor files larger than RAM. Add--try-parsedatesto auto-parse date columns.
luau — Scripted transforms with embedded Lua
Runs a Luau (sandboxed Lua 5.1) expression per row to map a new column or filter rows, with optional --begin/--end blocks for initialization and aggregation. Reach for this when apply operations are too limited but a full sqlp query is overkill.
# Add computed column (salary in thousands)
qsv luau map salary_k \
"string.format('%.1f', col.salary / 1000)" \
people.csv
Output:
name,age,city,salary,salary_k
Alice,30,New York,75000,75.0
Bob,25,Chicago,62000,62.0
Carol,35,New York,88000,88.0
Dave,28,Chicago,71000,71.0
Eve,32,Boston,95000,95.0
# Filter rows with a script
qsv luau filter "tonumber(col.salary) > 75000" people.csv
Output:
name,age,city,salary
Carol,35,New York,88000
Eve,32,Boston,95000
# Add a seniority label (conditional logic)
qsv luau map seniority \
"if tonumber(col.age) >= 32 then return 'Senior' else return 'Junior' end" \
people.csv
Output:
name,age,city,salary,seniority
Alice,30,New York,75000,Junior
Bob,25,Chicago,62000,Junior
Carol,35,New York,88000,Senior
Dave,28,Chicago,71000,Junior
Eve,32,Boston,95000,Senior
# Aggregation using BEGIN/END blocks
qsv luau map dummy \
--begin "total = 0" \
"total = total + tonumber(col.salary); return ''" \
--end "print('Total salary: ' .. total)" \
people.csv > /dev/null
Output:
Total salary: 391000
Reference columns with
col.column_nameorcol["col name"]. Use_IDXfor the current row number. Scripts run with Luau 0.716 — a safe, sandboxed Lua 5.1 subset.
apply — Built-in string and numeric operations
Applies one or more named operations (case conversion, trimming, encoding, similarity, NLP sentiment, etc.) to a column without writing a script. Use dynfmt to produce a new column from a format string that interpolates other columns.
# Uppercase a column
qsv apply operations upper name people.csv
Output:
name,age,city,salary
ALICE,30,New York,75000
BOB,25,Chicago,62000
CAROL,35,New York,88000
DAVE,28,Chicago,71000
EVE,32,Boston,95000
# Compute string length into a new column
qsv apply operations len name -c name_len people.csv
Output:
name,age,city,salary,name_len
Alice,30,New York,75000,5
Bob,25,Chicago,62000,3
Carol,35,New York,88000,5
Dave,28,Chicago,71000,4
Eve,32,Boston,95000,3
# Dynamic format string → computed description column
qsv apply dynfmt \
--formatstr "{name} earns \${salary} in {city}" \
description people.csv
Output:
name,age,city,salary,description
Alice,30,New York,75000,Alice earns $75000 in New York
Bob,25,Chicago,62000,Bob earns $62000 in Chicago
Carol,35,New York,88000,Carol earns $88000 in New York
Dave,28,Chicago,71000,Dave earns $71000 in Chicago
Eve,32,Boston,95000,Eve earns $95000 in Boston
Available apply operations:
| Category | Operations |
|---|---|
| Case | lower, upper, titlecase |
| Whitespace | trim, ltrim, rtrim, squeeze |
| String | len, strip_prefix, strip_suffix, escape, replace, regex_replace |
| Encoding | encode64, decode64, encode62, decode62, crc32 |
| Math | round, thousands |
| Financial | currencytonum, numtocurrency |
| Similarity | simdl, simjw, simsd, simhm |
| NLP | sentiment, whatlang, gender_guess, eudex |
Schema and validation
Commands for inferring structure from a CSV and checking that data conforms to expected types and constraints.
schema — Infer JSON Schema from CSV
Scans a CSV and generates a JSON Schema (Draft 2020-12) file capturing field types, enum values, and numeric ranges. The output schema can be fed directly to validate to enforce those constraints on new data.
qsv schema people.csv
Output: (none — exits 0 on success)
Generates people.csv.schema.json:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "people.csv",
"type": "object",
"properties": {
"name": {
"type": "string",
"enum": ["Alice", "Bob", "Carol", "Dave", "Eve"]
},
"age": { "type": "integer", "minimum": 25, "maximum": 35 },
"city": {
"type": "string",
"enum": ["Boston", "Chicago", "New York"]
},
"salary": { "type": "integer", "minimum": 62000, "maximum": 95000 }
},
"required": ["name", "age", "city", "salary"]
}
# Polars schema (for use with sqlp/Parquet pipelines)
qsv schema --polars people.csv
Output:
{"name":"Utf8","age":"Int64","city":"Utf8","salary":"Int64"}
validate — Validate CSV against JSON Schema
Without a schema argument, checks that the CSV is well-formed per RFC 4180. With a schema, validates each row against it and writes passing rows to .valid, failing rows to .invalid, and a validation-errors.tsv describing each violation.
# RFC 4180 well-formedness check
qsv validate people.csv
Output:
people.csv is valid.
# Schema validation (generates .valid, .invalid, and validation-errors.tsv)
qsv schema people.csv
printf 'name,age,city,salary\nBadRow,notanumber,Unknown,0\n' > bad.csv
qsv validate bad.csv people.csv.schema.json
Output: (none — exits 0 on success)
validation-errors.tsv:
row_number field error
2 age notanumber is not of type "integer"
2 city Unknown is not one of ["Boston","Chicago","New York"]
2 salary 0 is less than the minimum value of 62000
Sampling
Commands for drawing representative subsets from large files without loading everything into memory.
sample — Random sampling
Draws rows using reservoir sampling by default, guaranteeing a uniform random sample in a single pass without knowing the file size upfront. Supports stratified, Bernoulli, systematic, cluster, weighted, and time-series sampling modes; use --seed for reproducibility.
# Reservoir sample (3 random rows)
qsv sample 3 people.csv
Output:
name,age,city,salary
Bob,25,Chicago,62000
Alice,30,New York,75000
Eve,32,Boston,95000
# Reproducible sample with seed
qsv sample --seed 42 3 people.csv
Output:
name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
# 50% Bernoulli sample (each row independently included with probability 0.5)
qsv sample --bernoulli --seed 42 0.5 people.csv
Output:
name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
# Stratified: 1 row per unique city
qsv sample --stratified city --seed 42 1 people.csv
Output:
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Eve,32,Boston,95000
| Method | Flag | Use case |
|---|---|---|
| Reservoir (default) | — | General random sample |
| Indexed | — (with .idx) | Random I/O, large files |
| Bernoulli | --bernoulli | Independent row probability |
| Systematic | --systematic <col> | Every nth record |
| Stratified | --stratified <col> | Representative subgroup samples |
| Weighted | --weighted <col> | Probability proportional to weight |
| Cluster | --cluster <col> | Sample entire clusters |
| Timeseries | --timeseries <col> | One record per time interval |
Flattening and display
Commands for rendering CSV records in a human-readable layout rather than a dense columnar format.
flatten — View records one at a time
Prints each record as a vertical key-value block separated by #, making wide or deeply nested CSVs readable in a terminal. Use -c to truncate long values to a fixed character limit for a quick overview.
qsv flatten people.csv
Output:
name Alice
age 30
city New York
salary 75000
#
name Bob
age 25
city Chicago
salary 62000
#
...
# Condense long values for a quick overview
qsv flatten -c 8 people.csv
Output:
name Alice
age 30
city New York
salary 75000
#
...
Indexing
An index file dramatically speeds up commands that support random access (slice, split, sample, count, dedup).
qsv index people.csv
# Creates people.csv.idx alongside the source file
Output: (none — exits 0 on success)
After indexing, qsv count and qsv slice are O(1) regardless of file size.
# Force rebuild
qsv index --force people.csv
Output: (none — exits 0 on success)
Configuration
qsv reads runtime defaults from QSV_* environment variables and from a dotenv file. Use this to set delimiters, buffer sizes, parallelism, and remote-fetch behaviour project-wide without repeating flags on every invocation.
Environment variables
Every option exposed as a CLI flag has a matching QSV_* variable; the variable becomes the default and is overridden by an explicit flag. Run qsv --envlist to dump the active set.
# Show every QSV_* variable currently in effect
qsv --envlist
Output:
QSV_DEFAULT_DELIMITER: ,
QSV_NO_HEADERS: false
QSV_COMMENT_CHAR:
QSV_MAX_JOBS: 8
QSV_CACHE_DIR: /home/alice/.qsv-cache
...
# Project-wide TSV default + parallel job cap
export QSV_DEFAULT_DELIMITER=$'\t'
export QSV_MAX_JOBS=4
qsv stats data.tsv
Output: (none — exits 0 on success)
| Variable | Purpose |
|---|---|
QSV_DEFAULT_DELIMITER | One ASCII char; overrides --delimiter. |
QSV_SNIFF_DELIMITER | If set, auto-detect delimiter per file. |
QSV_NO_HEADERS | Treat first row as data, not a header. |
QSV_MAX_JOBS | Cap parallel workers (default = logical CPUs). |
QSV_CACHE_DIR | Where stats/fetch cache files are written. |
QSV_DOTENV_PATH | Explicit dotenv file path; "" disables loading. |
QSV_LOG_LEVEL | error/warn/info/debug/trace. |
QSV_LOG_DIR | Directory for structured log output. |
QSV_PROGRESSBAR | 1 to show a TTY progress bar on long runs. |
Dotenv file
On startup, qsv loads a .env file from the current directory (or the path in QSV_DOTENV_PATH) and applies any QSV_*=value lines as if they were exported. Useful for pinning per-project defaults next to a dataset.
cat > .env << 'EOF'
QSV_DEFAULT_DELIMITER=|
QSV_MAX_JOBS=4
QSV_LOG_LEVEL=info
EOF
qsv count people.csv # picks up the .env automatically
Output: (none — exits 0 on success)
# Point at a shared dotenv outside the cwd
QSV_DOTENV_PATH=/home/alice/projects/etl/.env qsv stats people.csv
# Disable dotenv loading for one invocation
QSV_DOTENV_PATH= qsv stats people.csv
Output: (none — exits 0 on success)
MCP server (qsv 13+)
qsv 13 added a built-in Model Context Protocol server that lets AI agents (Claude Desktop, Claude Code, and other MCP clients) query and transform local CSV/Parquet/Excel files without uploading raw data — only statistical metadata and result rows cross the wire. Reach for it when you want a chatbot to drive qsv against your own files.
# Start the MCP server on stdio (default transport)
qsvmcp serve
# Or use the full binary
qsv mcp serve
Output: (none — exits 0 on success)
# List the MCP-exposed tools and exit (handy for debugging)
qsvmcp list-skills
# Regenerate the bundled skill definitions
qsvmcp --update-mcp-skills
Output: (none — exits 0 on success)
Register with Claude Desktop by adding the server to claude_desktop_config.json:
{
"mcpServers": {
"qsv": {
"command": "qsvmcp",
"args": ["serve"],
"env": { "QSV_CACHE_DIR": "/home/alice/.qsv-cache" }
}
}
}
The
qsvmcpbinary ships ~63 of qsv's commands — enough for the MCP skill set with a smaller footprint. Use the fullqsvbinary if you need commands outside the MCP surface (e.g.geocode,python).
tojsonl --toon — Token-efficient output for LLMs
qsv 12 introduced TOON, a token-optimized tabular format designed for LLM contexts — denser than JSON, still parseable. Useful when piping CSV summaries into a prompt.
qsv tojsonl --toon people.csv
Output:
[name|age|city|salary]
Alice|30|New York|75000
Bob|25|Chicago|62000
...
Piping commands together
qsv is designed to be composed — pipe subcommands to build multi-step pipelines:
# Filter to Engineering dept, sort by salary desc, pick 3 columns
qsv join name people.csv name dept.csv \
| qsv search -s department "Engineering" \
| qsv select name,salary,department \
| qsv sort -s salary -N -R
Output:
name,salary,department
Eve,95000,Engineering
Carol,88000,Engineering
Alice,75000,Engineering
# Top city by total salary
qsv sqlp people.csv \
"SELECT city, SUM(salary) as total FROM people GROUP BY city ORDER BY total DESC LIMIT 1"
Output:
city,total
New York,163000
# Count rows matching a pattern across a directory of CSVs
cat *.csv | qsv search "New York" | qsv count
Output: (none — exits 0 on success)
Use
qsv inputto normalize messy CSVs (trim whitespace, fix quoting, skip comment lines) before piping to other commands. Useqsv fixlengthsto pad rows with missing fields so downstream commands don't choke on ragged files.
Sources
- qsv releases (dathere/qsv) — 13.0.0 "AI-native" launch, 12.0.0
moarstats+ TOON, 11.0.2 streaming stats/frequency. - qsv CHANGELOG.md — full per-release diff including subcommand additions and
--weightconsolidation infrequency. - ENVIRONMENT_VARIABLES.md — authoritative list of
QSV_*variables and dotenv loading rules. - qsv MCP server skill README —
qsvmcpbinary, skill counts, Claude Desktop wiring. - qsv.dathere.com — stats docs — current reference for
stats,moarstats, and cached.stats.csv.bin.sz.