#data
15 pages tagged data.
pandas
Package-level reference for pandas — install, versioning, Python compatibility, extras, and gotchas. The de-facto DataFrame library on PyPI.
matplotlib
Package-level reference for matplotlib on PyPI — install variants, backends, version policy, extras, and alternatives.
jupyter
Package-level reference for the jupyter meta-package on PyPI — install variants, what it pulls in, version policy, and alternatives.
qsv
Comprehensive reference for qsv: count, headers, stats, moarstats, select, search, sort, dedup, frequency, join, sqlp, luau, apply, schema, validate, sample, split, MCP server, and more — with examples and outputs.
json
Encode and decode JSON in Python with the stdlib json module. Covers dumps/loads, indent/sort_keys/separators, custom default= and JSONEncoder, object_hook decoding, JSONL streaming, and orjson/ujson/msgspec comparison.
jq
Slice, filter, map, and transform JSON data from the command line. Covers all essential filters, built-in functions, select, map, reduce, streaming, jq 1.7/1.8 additions, and real-world API response processing.
datetime
Work with dates, times, and timezones in Python using the stdlib datetime module and zoneinfo. Covers aware vs naive datetimes, ISO-8601 parsing, strftime/strptime, timedelta arithmetic, and DST handling.
awk / gawk
Pattern-action language for structured text. Field splitting, built-in variables, arithmetic, string functions, arrays, BEGIN/END blocks, and practical data-processing recipes.
polars
High-performance DataFrames with a lazy expression API. Covers read/write, select, filter, group_by, joins, LazyFrame, datetime, string operations, and pandas interop.
modin
Speed up pandas workloads across all CPU cores with a one-line import swap. Covers Ray and Dask backends, config tuning, pandas interop, and when modin wins vs polars.
dagster
Build, schedule, and observe data pipelines as software-defined assets with Dagster. Covers assets, jobs, schedules, sensors, resources, partitions, and the Dagster UI.
pandas
Load, filter, transform, and aggregate tabular data with pandas. Covers DataFrame creation, read_csv, groupby, merge, and the SettingWithCopy pitfall.
numpy
Create and manipulate N-dimensional arrays with NumPy. Covers array creation, broadcasting, vectorized math, indexing, and matrix operations.
jupyter
Run interactive Python notebooks with Jupyter. Covers JupyterLab setup, cell types, keyboard shortcuts, magic commands, nbconvert export, and common pitfalls.
sort, uniq & wc
Sort lines (numerically, by field, human-readable sizes), deduplicate with uniq, count lines/words/bytes with wc, and number lines with nl. With real-world pipeline recipes.