cheat sheet
scipy
Package-level reference for scipy — install, versioning, submodules, license caveats, and gotchas. Optimization, statistics, signal processing, and linear algebra.
scipy
What it is
scipy is the scientific-computing companion to numpy — a sibling project under the same NumFOCUS umbrella. It bundles production implementations of algorithms numpy intentionally does not ship: numerical optimization (scipy.optimize), statistics (scipy.stats), signal processing (scipy.signal), sparse matrices (scipy.sparse), interpolation, integration, FFTs, and special functions.
On PyPI scipy sits one rung below numpy in import-graph centrality — depended on by scikit-learn, statsmodels, scikit-image, networkx, and most domain-specific scientific stacks. Reach for scipy whenever numpy alone is not enough and you need a battle-tested algorithm rather than a hand-rolled one.
Install
pip install scipy
Output: (none — exits 0 on success)
uv add scipy
Output: dependency resolved, lockfile updated; pulls numpy automatically
poetry add scipy
Output: installed into the project venv
pip install scipy --only-binary=:all:
Output: forces a wheel install — avoids accidentally compiling SciPy from source on niche platforms (~30 min build)
Versioning & Python support
scipy follows the SPEC 0 support window (matched with numpy) — the latest three Python minor versions plus the most recent numpy versions. Releases are roughly twice a year; the public API is conservatively versioned with DeprecationWarning one minor before removal.
| SciPy line | Python support | Numpy requirement |
|---|---|---|
| 1.11.x | 3.9 – 3.12 | numpy >= 1.21 |
| 1.13.x | 3.10 – 3.12 | numpy >= 1.22.4, < 2 |
| 1.14.x | 3.10 – 3.13 | numpy >= 1.23.5, supports numpy 2.x |
| 1.15.x+ | 3.10 – 3.13 | numpy 2.x preferred |
A scipy install always re-pins numpy upward — if you pip install scipy after pinning numpy==1.21, the resolver will upgrade numpy or fail.
Package metadata
- Maintainer: SciPy steering council under NumFOCUS sponsorship
- Project home: github.com/scipy/scipy
- Docs: docs.scipy.org
- License: BSD-3-Clause (core); a few bundled algorithms have GPL/LGPL upstream (see Gotchas)
- PyPI: pypi.org/project/scipy
- Governance: SciPy Enhancement Proposals (SPEPs); steering council
- First released: 2001 (Travis Oliphant, Pearu Peterson, Eric Jones)
- Downloads: > 100 M / month on PyPI
Optional dependencies & extras
scipy ships no pip extras — submodules are imported as scipy.<subpackage> and are part of the same wheel. Companion packages typically installed alongside:
pip install numpy scipy matplotlib pandas scikit-learn jupyter
Output: installs the analytical / modelling stack
| Submodule | Purpose |
|---|---|
scipy.optimize | minimisation, curve fitting, root finding |
scipy.stats | distributions, hypothesis tests, descriptive stats |
scipy.signal | filters, FFT, spectrogram |
scipy.sparse | sparse matrices and operations |
scipy.spatial | KD-trees, distance, geometry |
scipy.integrate | quad, ODE solvers (solve_ivp) |
scipy.interpolate | splines, RBF, gridded interpolation |
scipy.linalg | LAPACK wrappers — lazy-loaded BLAS |
scipy.special | gamma, bessel, erf, … |
scipy.fft | FFT (preferred over the deprecated scipy.fftpack) |
scipy.io | MATLAB .mat, WAV, NetCDF readers |
Install size is ~50 MB unpacked — significantly larger than numpy because BLAS/LAPACK are statically linked.
Alternatives
| Package | One-line trade-off |
|---|---|
| numpy | core arrays; scipy needed for algorithms numpy intentionally omits |
| statsmodels | richer regression / time-series stats than scipy.stats |
| scikit-learn | ML on top of scipy/numpy; partly overlapping (e.g. SVD) |
| pymc | Bayesian modelling; statsmodels/scipy for frequentist |
| jax.scipy | scipy-shaped API with autograd + GPU/TPU |
| cupyx.scipy | scipy mirror running on NVIDIA GPUs |
| sympy | symbolic math; scipy is numerical |
Common gotchas
- Large install (~50 MB unpacked). Includes a statically linked BLAS/LAPACK. CI containers and serverless deployments feel this — slim images may need explicit
scipywheels rather than--no-binary. - GPL/LGPL submodule corners. A handful of algorithms wrap GPL or LGPL upstream code (some optimisation routines and special functions historically). The scipy wheel itself stays BSD-3 but if you re-ship scipy plus your code, audit the license report. Most users are not affected — only those building closed-source redistributions need to check.
- numpy upgrade coupling. Upgrading scipy frequently upgrades numpy. Pin both in lockfiles.
scipy.fftpackis deprecated in favour ofscipy.fft. New code should never import the old one.scipy.miscis gone. Imageio / Pillow replaced the toy demo helpers years ago. Any tutorial that importsscipy.misc.imreadis pre-2020.- Default optimisation tolerances.
scipy.optimize.minimizedefaults are loose for ill-conditioned problems. Setoptions={"ftol": 1e-10, "xtol": 1e-10}for tight fits. scipy.statsdistributions are slow to instantiate.norm.pdf(x, loc=5, scale=2)re-parses keyword arguments every call. Freeze the distribution:rv = norm(loc=5, scale=2)thenrv.pdf(x)for hot loops.- Build-from-source is painful. On platforms without a wheel (uncommon platforms, niche Python versions), scipy needs a Fortran compiler and a BLAS library — easily 30 minutes to compile. Use
--only-binary=:all:to fail fast instead.
Real-world recipes
scipy's submodules cover so much ground that the recipes here are organised by submodule rather than by pipeline. Each shows the packaging-level context — what's pulled in, what trade-offs you're making — rather than re-teaching the API (sections/python/scipy covers the API).
Curve fitting with scipy.optimize.curve_fit:
import numpy as np
from scipy.optimize import curve_fit
def logistic(x, L, k, x0):
return L / (1 + np.exp(-k * (x - x0)))
rng = np.random.default_rng(0)
x = np.linspace(0, 10, 200)
y_true = logistic(x, 1.0, 1.5, 5.0)
y_obs = y_true + rng.normal(scale=0.05, size=x.size)
(L, k, x0), cov = curve_fit(logistic, x, y_obs, p0=[1.0, 1.0, 5.0])
print(f"L={L:.3f}, k={k:.3f}, x0={x0:.3f}")
Output: parameter estimates close to the true (1.0, 1.5, 5.0); cov is the parameter covariance for confidence intervals
Sparse linear-system solve with scipy.sparse:
import numpy as np
from scipy.sparse import csr_array, eye
from scipy.sparse.linalg import spsolve
N = 10_000
A = eye(N, format="csr") * 4 + csr_array(np.diag(np.ones(N - 1), 1)).tolil().tocsr()
b = np.arange(N, dtype=float)
x = spsolve(A, b)
print(x[:5])
Output: the first 5 values of the solution; the sparse solver handles the 10000x10000 system in milliseconds despite the dense equivalent being 800 MB
Distribution fitting with scipy.stats:
import numpy as np
from scipy import stats
rng = np.random.default_rng(0)
samples = rng.normal(loc=5.0, scale=2.0, size=10_000)
# Method-of-moments / MLE fit
params = stats.norm.fit(samples)
print(f"mu={params[0]:.3f}, sigma={params[1]:.3f}")
# Goodness of fit
ks_stat, ks_p = stats.kstest(samples, "norm", args=params)
print(f"KS={ks_stat:.4f}, p={ks_p:.4f}")
Output: estimated parameters close to (5.0, 2.0) plus a KS test statistic and p-value for the null that the sample is normal
Signal processing — filter design:
import numpy as np
from scipy.signal import butter, sosfiltfilt
fs = 1000.0
sig = np.sin(2 * np.pi * 5 * np.linspace(0, 2, int(fs * 2)))
noise = np.random.default_rng(0).normal(scale=0.5, size=sig.size)
noisy = sig + noise
sos = butter(N=4, Wn=10, btype="low", fs=fs, output="sos")
clean = sosfiltfilt(sos, noisy)
print(clean[:5])
Output: the first 5 samples of the low-pass-filtered signal; second-order-sections (sos) format is numerically more stable than b, a for higher-order filters
ODE solve with scipy.integrate.solve_ivp:
import numpy as np
from scipy.integrate import solve_ivp
def lorenz(t, y, sigma=10.0, rho=28.0, beta=8 / 3):
x, y_, z = y
return [sigma * (y_ - x), x * (rho - z) - y_, x * y_ - beta * z]
sol = solve_ivp(lorenz, t_span=(0, 5), y0=[1.0, 1.0, 1.0], dense_output=True, max_step=0.01)
print(sol.t.size, sol.y.shape)
Output: number of time steps taken and the (3, N) state trajectory; dense_output=True builds an interpolant for arbitrary-t evaluation
Performance tuning
scipy performance breaks down into two layers: the BLAS/LAPACK numpy is linked against (linear algebra, eigensolvers, FFT) and the pure-scipy algorithms (optimisation, integration, statistics). Tuning differs by submodule.
import numpy as np
import scipy
np.show_config()
print(scipy.show_config())
Output: the BLAS the build was linked against (OpenBLAS on most wheels, Accelerate on macOS) — confirms whether you have a fast or vendor-default backend
Tuning levers by submodule:
| Submodule | Tuning lever | When it matters |
|---|---|---|
scipy.linalg | BLAS thread cap (OPENBLAS_NUM_THREADS) | inside parallel CV / multiprocessing |
scipy.optimize | tol=, options={"maxiter": ...} | ill-conditioned problems |
scipy.optimize | Provide analytical jac=, hess= | high-dim minimisation |
scipy.fft | workers=-1 keyword | large FFTs on multi-core |
scipy.sparse | Pick CSR for row ops, CSC for column ops | avoid format conversion churn |
scipy.stats | Freeze distributions: rv = norm(5, 2); rv.pdf(x) | hot loops over a fixed distribution |
scipy.integrate.solve_ivp | method="LSODA" adaptive | stiff ODEs |
Sparse matrix format gotcha:
from scipy.sparse import csr_array, csc_array
# CSR is fast for row slicing, slow for column slicing
A = csr_array([[1, 0, 0], [0, 0, 2], [0, 3, 0]])
print(A[0]) # row slice — fast
print(A[:, 0]) # column slice — triggers a warning + format conversion
Output: the first row, then the first column with a SparseEfficiencyWarning. Switch to csc_array when column access dominates.
Memory & dataset-size scaling
scipy's algorithms are generally in-RAM. The main scaling stories are sparse representations (massive memory reduction for matrices with mostly zeros) and chunked spectral / signal processing.
import numpy as np
from scipy.sparse import csr_array
N = 1_000_000
data = np.ones(3_000_000)
row = np.random.default_rng(0).integers(0, N, 3_000_000)
col = np.random.default_rng(1).integers(0, N, 3_000_000)
A = csr_array((data, (row, col)), shape=(N, N))
print(f"sparse: {A.data.nbytes / 1e6:.1f} MB, dense would be: {N * N * 8 / 1e9:.0f} GB")
Output: the sparse matrix's data footprint vs the (8 TB) dense equivalent — sparse storage is the only way this fits in any RAM at all
For genuinely huge problems:
- Iterative linear solvers (
scipy.sparse.linalg.cg,gmres,bicgstab) for systems too big for direct factorisation. - Out-of-core eigensolvers —
scipy.sparse.linalg.eigshfor partial eigendecompositions when the full matrix would not fit. scipy.signalchunked filtering — process audio / time series in overlapping blocks rather than loading the full waveform.- PyAMG, petsc4py — external libraries for very large sparse problems; pip-installable alongside scipy.
scipy does not have a streaming or out-of-core model of its own; the path past one node is usually a domain-specific library (PETSc, Trilinos) or hand-rolled chunking.
Version migration guide
scipy ships roughly twice a year. The recent breaks worth knowing about:
1.10 → 1.11:
- Dropped Python 3.8 support.
scipy.misc(already empty) finally removed.- Several
scipy.statsAPI tightenings (keyword-only arguments).
1.11 → 1.13:
scipy.fftpackdeprecation accelerated — new code should always usescipy.fft.scipy.sparsenow has both a matrix-style (csr_matrix) and an array-style (csr_array) API. New code should usecsr_array— it follows NumPy'sndarraysemantics rather than the legacy MATLAB-style matrix semantics.
1.13 → 1.14:
- Full NumPy 2.x compatibility.
- Some
scipy.statsdistribution methods became keyword-only (stats.norm.fit(data, loc=...)).
1.14 → 1.15:
- Several optimisation method names tightened.
scipy.linalg.solve(..., assume_a="pos")for the symmetric-positive-definite path (faster than the general solver).
from scipy.sparse import csr_matrix, csr_array
# Legacy matrix-style (avoid in new code)
m = csr_matrix([[1, 0], [0, 2]])
print(type(m), m * m) # legacy: `*` is matrix multiplication
# Modern array-style
a = csr_array([[1, 0], [0, 2]])
print(type(a), a @ a, a * a) # `@` is matmul; `*` is element-wise
Output: the matrix prints with csr_matrix-style repr; the array prints with csr_array-style and supports the same operator semantics as numpy
Pin scipy in production — pre-1.0 scipy versions are gone from most index mirrors, and the post-1.0 line breaks downstream pickles (saved sklearn models, for instance) when minor versions change.
Interop with adjacent ecosystems
scipy lives upstream of most of the scientific stack. The main interop concerns are sparse-matrix exchange and the Array API.
| Library | scipy → other | other → scipy | Zero-copy? |
|---|---|---|---|
| numpy | n/a — scipy IS numpy underneath | n/a | Yes (same buffers) |
| scikit-learn | sklearn accepts scipy.sparse directly | sklearn returns numpy arrays | Yes |
| pandas | pd.DataFrame.sparse.from_spmatrix(m) | df.sparse.to_coo() | Partial |
| networkx | nx.from_scipy_sparse_array(A) | nx.to_scipy_sparse_array(G) | Copy |
| pytorch | torch.from_numpy(A.toarray()) | dense round-trip | Copy |
| jax | jnp.asarray(A.toarray()) | dense | Copy |
| matlab (.mat) | scipy.io.savemat / loadmat | n/a | Copy |
from scipy.sparse import csr_array
from sklearn.linear_model import LogisticRegression
import numpy as np
# 1000 documents x 50000 vocab — way too big as dense, fine as sparse
rng = np.random.default_rng(0)
nnz = 10_000
X = csr_array((rng.normal(size=nnz), (rng.integers(1000, size=nnz), rng.integers(50_000, size=nnz))), shape=(1000, 50_000))
y = rng.integers(0, 2, size=1000)
model = LogisticRegression(max_iter=1000).fit(X, y)
print(model.coef_.shape)
Output: shape (1, 50000) — sklearn consumed the sparse matrix without densifying it, which is the difference between fits-in-RAM and 400 MB of zeros
Troubleshooting common errors
The errors below cover the recurring frictions; most are about scipy assumptions (BLAS, sparse format) rather than the API itself.
ImportError: cannot import name 'imread' from 'scipy.misc'—scipy.miscremoved; useimageio.v3.imreadinstead.OptimizeWarning: Covariance of the parameters could not be estimated— your fit is degenerate or your initialp0=was too far off. Try better initial guesses or use a more robust method (method="trf").SparseEfficiencyWarningwhen mixing CSR/CSC operations — implicit format conversion. Convert once explicitly (A.tocsc()) or stick to one format throughout.LinAlgError: SVD did not converge— ill-conditioned matrix. Inspect withnp.linalg.cond(A); add regularisation (Tikhonov).scipy.statsdistribution slow — you are creating the frozen object inside a hot loop.rv = norm(loc=5, scale=2)once, thenrv.pdf(x).solve_ivpreturnssuccess=False— adaptive step failed. Trymethod="LSODA"(stiff/non-stiff auto-switching) or relaxrtol/atol.scipy.signal.lfilternumerically unstable on high-order filters — switch tososfiltfiltwith second-order-sections.- scipy build-from-source begins — wheel missing. Use
pip install scipy --only-binary=:all:to fail fast, then investigate why no wheel matched your platform. - Mixing numpy 1.x and scipy 1.14+ — scipy 1.14 dropped numpy 1.x support. Pin both.
When NOT to use this
scipy is the right answer most of the time it's reached for; the cases below are where a specialised library wins.
- Pure statistics with R-style formulas: statsmodels —
smf.ols("y ~ x1 + x2", data=df)plus richer regression diagnostics. - Bayesian modelling: pymc, NumPyro. scipy.stats is frequentist.
- GPU computation: CuPy + cupyx.scipy, or JAX (
jax.scipy). - Symbolic math: SymPy.
- Production ML: scikit-learn directly; scipy is the foundation, sklearn is the layered API.
- Distributed compute: scipy is single-process. Use dask-glm, dask-ml, or ray.
See also
- sections/python/scipy — full API tutorial (stats, optimize, signal, sparse)
- sections/python/numpy — the array foundation scipy is built on
- sections/python/scikit-learn — ML on top of scipy
- sections/packages-pip/pip-numpy — sibling foundation
- sections/packages-pip/pip-scikit-learn — downstream consumer