data
33 pages in this category.
unstructured
Package-level reference for unstructured on PyPI — install variants, the huge extras tree, system-level dependencies, and alternative parsers.
streamlit
Package-level reference for the streamlit framework on PyPI — install variants, version policy, extras, and alternatives.
scipy
Package-level reference for scipy — install, versioning, submodules, license caveats, and gotchas. Optimization, statistics, signal processing, and linear algebra.
scikit-learn
Package-level reference for scikit-learn — install, versioning, extras, and gotchas. The de-facto classical-ML library on PyPI.
prefect
Package-level reference for Prefect on PyPI — install variants, version policy, cloud-vs-OSS extras, and alternatives.
polars
Package-level reference for polars — install, versioning, extras, and gotchas. The Rust-powered Arrow-native alternative to pandas.
Pillow
Package-level reference for Pillow on PyPI — install variants, format-specific native deps, version policy, and alternatives.
pandas
Package-level reference for pandas — install, versioning, Python compatibility, extras, and gotchas. The de-facto DataFrame library on PyPI.
numpy
Package-level reference for numpy — install, versioning, ABI breaks, extras, and gotchas. The bedrock of the Python scientific stack.
modin
Package-level reference for modin — install, backend extras, versioning, and gotchas. Speeds up existing pandas code with a one-line import swap.
matplotlib
Package-level reference for matplotlib on PyPI — install variants, backends, version policy, extras, and alternatives.
jupyter
Package-level reference for the jupyter meta-package on PyPI — install variants, what it pulls in, version policy, and alternatives.
duckdb
Package-level reference for duckdb — install, versioning, extensions, and gotchas. In-process columnar OLAP for Python.
dagster
Package-level reference for Dagster on PyPI — install variants, the dagster-* plugin family, version policy, and alternatives.
beautifulsoup4
Package-level reference for beautifulsoup4 on PyPI — install variants, parser-backend selection (lxml/html5lib/html.parser), and alternatives.
Db2 SPUFI
Run SQL through SPUFI, drive Db2 with DSN subsystem commands, BIND packages and plans, schedule DSNTEP2 in JCL, query the SYSIBM catalog, and generate DCLGEN.
streamlit
Build interactive web apps for data and ML in pure Python. Covers widgets, layout, session state, caching, multipage apps, and deployment patterns.
scikit-learn
Build classical ML pipelines with scikit-learn. Covers the estimator API, train_test_split, Pipeline, ColumnTransformer, cross-validation, metrics, and model persistence.
qsv
Comprehensive reference for qsv: count, headers, stats, moarstats, select, search, sort, dedup, frequency, join, sqlp, luau, apply, schema, validate, sample, split, MCP server, and more — with examples and outputs.
json
Encode and decode JSON in Python with the stdlib json module. Covers dumps/loads, indent/sort_keys/separators, custom default= and JSONEncoder, object_hook decoding, JSONL streaming, and orjson/ujson/msgspec comparison.
jq
Slice, filter, map, and transform JSON data from the command line. Covers all essential filters, built-in functions, select, map, reduce, streaming, jq 1.7/1.8 additions, and real-world API response processing.
BeautifulSoup
Parse, search, and mutate HTML/XML with BeautifulSoup 4. Covers parser choice (html.parser/lxml/html5lib), find/find_all/select, tree navigation, attribute access, and pairing with requests/httpx/playwright for end-to-end scraping.
prefect
Build, schedule, and observe Python workflows with Prefect. Covers flows, tasks, retries, schedules, deployments, caching, concurrency, and Prefect Cloud.
polars
High-performance DataFrames with a lazy expression API. Covers read/write, select, filter, group_by, joins, LazyFrame, datetime, string operations, and pandas interop.
modin
Speed up pandas workloads across all CPU cores with a one-line import swap. Covers Ray and Dask backends, config tuning, pandas interop, and when modin wins vs polars.
DuckDB
Run fast analytical SQL queries in-process with DuckDB. Covers Python API, CSV/Parquet ingestion, pandas interop, Arrow, window functions, and persistent databases.
dagster
Build, schedule, and observe data pipelines as software-defined assets with Dagster. Covers assets, jobs, schedules, sensors, resources, partitions, and the Dagster UI.
scipy
Statistical distributions, optimization, integration, signal processing, and linear algebra with SciPy. Builds on NumPy arrays.
Pillow
Open, resize, crop, convert, and save images with Pillow (PIL fork). Covers format conversion, filters, drawing, and EXIF handling.
pandas
Load, filter, transform, and aggregate tabular data with pandas. Covers DataFrame creation, read_csv, groupby, merge, and the SettingWithCopy pitfall.
numpy
Create and manipulate N-dimensional arrays with NumPy. Covers array creation, broadcasting, vectorized math, indexing, and matrix operations.
matplotlib
Create publication-quality 2-D plots with matplotlib. Covers pyplot basics, subplots, savefig, common chart types, and the show-vs-save pitfall.
jupyter
Run interactive Python notebooks with Jupyter. Covers JupyterLab setup, cell types, keyboard shortcuts, magic commands, nbconvert export, and common pitfalls.