cheat sheet

scipy

Package-level reference for scipy — install, versioning, submodules, license caveats, and gotchas. Optimization, statistics, signal processing, and linear algebra.

scipy

What it is

scipy is the scientific-computing companion to numpy — a sibling project under the same NumFOCUS umbrella. It bundles production implementations of algorithms numpy intentionally does not ship: numerical optimization (scipy.optimize), statistics (scipy.stats), signal processing (scipy.signal), sparse matrices (scipy.sparse), interpolation, integration, FFTs, and special functions.

On PyPI scipy sits one rung below numpy in import-graph centrality — depended on by scikit-learn, statsmodels, scikit-image, networkx, and most domain-specific scientific stacks. Reach for scipy whenever numpy alone is not enough and you need a battle-tested algorithm rather than a hand-rolled one.

Install

bash
pip install scipy

Output: (none — exits 0 on success)

bash
uv add scipy

Output: dependency resolved, lockfile updated; pulls numpy automatically

bash
poetry add scipy

Output: installed into the project venv

bash
pip install scipy --only-binary=:all:

Output: forces a wheel install — avoids accidentally compiling SciPy from source on niche platforms (~30 min build)

Versioning & Python support

scipy follows the SPEC 0 support window (matched with numpy) — the latest three Python minor versions plus the most recent numpy versions. Releases are roughly twice a year; the public API is conservatively versioned with DeprecationWarning one minor before removal.

SciPy linePython supportNumpy requirement
1.11.x3.9 – 3.12numpy >= 1.21
1.13.x3.10 – 3.12numpy >= 1.22.4, < 2
1.14.x3.10 – 3.13numpy >= 1.23.5, supports numpy 2.x
1.15.x+3.10 – 3.13numpy 2.x preferred

A scipy install always re-pins numpy upward — if you pip install scipy after pinning numpy==1.21, the resolver will upgrade numpy or fail.

Package metadata

  • Maintainer: SciPy steering council under NumFOCUS sponsorship
  • Project home: github.com/scipy/scipy
  • Docs: docs.scipy.org
  • License: BSD-3-Clause (core); a few bundled algorithms have GPL/LGPL upstream (see Gotchas)
  • PyPI: pypi.org/project/scipy
  • Governance: SciPy Enhancement Proposals (SPEPs); steering council
  • First released: 2001 (Travis Oliphant, Pearu Peterson, Eric Jones)
  • Downloads: > 100 M / month on PyPI

Optional dependencies & extras

scipy ships no pip extras — submodules are imported as scipy.<subpackage> and are part of the same wheel. Companion packages typically installed alongside:

bash
pip install numpy scipy matplotlib pandas scikit-learn jupyter

Output: installs the analytical / modelling stack

SubmodulePurpose
scipy.optimizeminimisation, curve fitting, root finding
scipy.statsdistributions, hypothesis tests, descriptive stats
scipy.signalfilters, FFT, spectrogram
scipy.sparsesparse matrices and operations
scipy.spatialKD-trees, distance, geometry
scipy.integratequad, ODE solvers (solve_ivp)
scipy.interpolatesplines, RBF, gridded interpolation
scipy.linalgLAPACK wrappers — lazy-loaded BLAS
scipy.specialgamma, bessel, erf, …
scipy.fftFFT (preferred over the deprecated scipy.fftpack)
scipy.ioMATLAB .mat, WAV, NetCDF readers

Install size is ~50 MB unpacked — significantly larger than numpy because BLAS/LAPACK are statically linked.

Alternatives

PackageOne-line trade-off
numpycore arrays; scipy needed for algorithms numpy intentionally omits
statsmodelsricher regression / time-series stats than scipy.stats
scikit-learnML on top of scipy/numpy; partly overlapping (e.g. SVD)
pymcBayesian modelling; statsmodels/scipy for frequentist
jax.scipyscipy-shaped API with autograd + GPU/TPU
cupyx.scipyscipy mirror running on NVIDIA GPUs
sympysymbolic math; scipy is numerical

Common gotchas

  • Large install (~50 MB unpacked). Includes a statically linked BLAS/LAPACK. CI containers and serverless deployments feel this — slim images may need explicit scipy wheels rather than --no-binary.
  • GPL/LGPL submodule corners. A handful of algorithms wrap GPL or LGPL upstream code (some optimisation routines and special functions historically). The scipy wheel itself stays BSD-3 but if you re-ship scipy plus your code, audit the license report. Most users are not affected — only those building closed-source redistributions need to check.
  • numpy upgrade coupling. Upgrading scipy frequently upgrades numpy. Pin both in lockfiles.
  • scipy.fftpack is deprecated in favour of scipy.fft. New code should never import the old one.
  • scipy.misc is gone. Imageio / Pillow replaced the toy demo helpers years ago. Any tutorial that imports scipy.misc.imread is pre-2020.
  • Default optimisation tolerances. scipy.optimize.minimize defaults are loose for ill-conditioned problems. Set options={"ftol": 1e-10, "xtol": 1e-10} for tight fits.
  • scipy.stats distributions are slow to instantiate. norm.pdf(x, loc=5, scale=2) re-parses keyword arguments every call. Freeze the distribution: rv = norm(loc=5, scale=2) then rv.pdf(x) for hot loops.
  • Build-from-source is painful. On platforms without a wheel (uncommon platforms, niche Python versions), scipy needs a Fortran compiler and a BLAS library — easily 30 minutes to compile. Use --only-binary=:all: to fail fast instead.

Real-world recipes

scipy's submodules cover so much ground that the recipes here are organised by submodule rather than by pipeline. Each shows the packaging-level context — what's pulled in, what trade-offs you're making — rather than re-teaching the API (sections/python/scipy covers the API).

Curve fitting with scipy.optimize.curve_fit:

python
import numpy as np
from scipy.optimize import curve_fit

def logistic(x, L, k, x0):
    return L / (1 + np.exp(-k * (x - x0)))

rng = np.random.default_rng(0)
x = np.linspace(0, 10, 200)
y_true = logistic(x, 1.0, 1.5, 5.0)
y_obs = y_true + rng.normal(scale=0.05, size=x.size)

(L, k, x0), cov = curve_fit(logistic, x, y_obs, p0=[1.0, 1.0, 5.0])
print(f"L={L:.3f}, k={k:.3f}, x0={x0:.3f}")

Output: parameter estimates close to the true (1.0, 1.5, 5.0); cov is the parameter covariance for confidence intervals

Sparse linear-system solve with scipy.sparse:

python
import numpy as np
from scipy.sparse import csr_array, eye
from scipy.sparse.linalg import spsolve

N = 10_000
A = eye(N, format="csr") * 4 + csr_array(np.diag(np.ones(N - 1), 1)).tolil().tocsr()
b = np.arange(N, dtype=float)
x = spsolve(A, b)
print(x[:5])

Output: the first 5 values of the solution; the sparse solver handles the 10000x10000 system in milliseconds despite the dense equivalent being 800 MB

Distribution fitting with scipy.stats:

python
import numpy as np
from scipy import stats

rng = np.random.default_rng(0)
samples = rng.normal(loc=5.0, scale=2.0, size=10_000)

# Method-of-moments / MLE fit
params = stats.norm.fit(samples)
print(f"mu={params[0]:.3f}, sigma={params[1]:.3f}")

# Goodness of fit
ks_stat, ks_p = stats.kstest(samples, "norm", args=params)
print(f"KS={ks_stat:.4f}, p={ks_p:.4f}")

Output: estimated parameters close to (5.0, 2.0) plus a KS test statistic and p-value for the null that the sample is normal

Signal processing — filter design:

python
import numpy as np
from scipy.signal import butter, sosfiltfilt

fs = 1000.0
sig = np.sin(2 * np.pi * 5 * np.linspace(0, 2, int(fs * 2)))
noise = np.random.default_rng(0).normal(scale=0.5, size=sig.size)
noisy = sig + noise

sos = butter(N=4, Wn=10, btype="low", fs=fs, output="sos")
clean = sosfiltfilt(sos, noisy)
print(clean[:5])

Output: the first 5 samples of the low-pass-filtered signal; second-order-sections (sos) format is numerically more stable than b, a for higher-order filters

ODE solve with scipy.integrate.solve_ivp:

python
import numpy as np
from scipy.integrate import solve_ivp

def lorenz(t, y, sigma=10.0, rho=28.0, beta=8 / 3):
    x, y_, z = y
    return [sigma * (y_ - x), x * (rho - z) - y_, x * y_ - beta * z]

sol = solve_ivp(lorenz, t_span=(0, 5), y0=[1.0, 1.0, 1.0], dense_output=True, max_step=0.01)
print(sol.t.size, sol.y.shape)

Output: number of time steps taken and the (3, N) state trajectory; dense_output=True builds an interpolant for arbitrary-t evaluation

Performance tuning

scipy performance breaks down into two layers: the BLAS/LAPACK numpy is linked against (linear algebra, eigensolvers, FFT) and the pure-scipy algorithms (optimisation, integration, statistics). Tuning differs by submodule.

python
import numpy as np
import scipy

np.show_config()
print(scipy.show_config())

Output: the BLAS the build was linked against (OpenBLAS on most wheels, Accelerate on macOS) — confirms whether you have a fast or vendor-default backend

Tuning levers by submodule:

SubmoduleTuning leverWhen it matters
scipy.linalgBLAS thread cap (OPENBLAS_NUM_THREADS)inside parallel CV / multiprocessing
scipy.optimizetol=, options={"maxiter": ...}ill-conditioned problems
scipy.optimizeProvide analytical jac=, hess=high-dim minimisation
scipy.fftworkers=-1 keywordlarge FFTs on multi-core
scipy.sparsePick CSR for row ops, CSC for column opsavoid format conversion churn
scipy.statsFreeze distributions: rv = norm(5, 2); rv.pdf(x)hot loops over a fixed distribution
scipy.integrate.solve_ivpmethod="LSODA" adaptivestiff ODEs

Sparse matrix format gotcha:

python
from scipy.sparse import csr_array, csc_array

# CSR is fast for row slicing, slow for column slicing
A = csr_array([[1, 0, 0], [0, 0, 2], [0, 3, 0]])
print(A[0])         # row slice — fast
print(A[:, 0])      # column slice — triggers a warning + format conversion

Output: the first row, then the first column with a SparseEfficiencyWarning. Switch to csc_array when column access dominates.

Memory & dataset-size scaling

scipy's algorithms are generally in-RAM. The main scaling stories are sparse representations (massive memory reduction for matrices with mostly zeros) and chunked spectral / signal processing.

python
import numpy as np
from scipy.sparse import csr_array

N = 1_000_000
data = np.ones(3_000_000)
row = np.random.default_rng(0).integers(0, N, 3_000_000)
col = np.random.default_rng(1).integers(0, N, 3_000_000)
A = csr_array((data, (row, col)), shape=(N, N))
print(f"sparse: {A.data.nbytes / 1e6:.1f} MB, dense would be: {N * N * 8 / 1e9:.0f} GB")

Output: the sparse matrix's data footprint vs the (8 TB) dense equivalent — sparse storage is the only way this fits in any RAM at all

For genuinely huge problems:

  • Iterative linear solvers (scipy.sparse.linalg.cg, gmres, bicgstab) for systems too big for direct factorisation.
  • Out-of-core eigensolversscipy.sparse.linalg.eigsh for partial eigendecompositions when the full matrix would not fit.
  • scipy.signal chunked filtering — process audio / time series in overlapping blocks rather than loading the full waveform.
  • PyAMG, petsc4py — external libraries for very large sparse problems; pip-installable alongside scipy.

scipy does not have a streaming or out-of-core model of its own; the path past one node is usually a domain-specific library (PETSc, Trilinos) or hand-rolled chunking.

Version migration guide

scipy ships roughly twice a year. The recent breaks worth knowing about:

1.10 → 1.11:

  • Dropped Python 3.8 support.
  • scipy.misc (already empty) finally removed.
  • Several scipy.stats API tightenings (keyword-only arguments).

1.11 → 1.13:

  • scipy.fftpack deprecation accelerated — new code should always use scipy.fft.
  • scipy.sparse now has both a matrix-style (csr_matrix) and an array-style (csr_array) API. New code should use csr_array — it follows NumPy's ndarray semantics rather than the legacy MATLAB-style matrix semantics.

1.13 → 1.14:

  • Full NumPy 2.x compatibility.
  • Some scipy.stats distribution methods became keyword-only (stats.norm.fit(data, loc=...)).

1.14 → 1.15:

  • Several optimisation method names tightened.
  • scipy.linalg.solve(..., assume_a="pos") for the symmetric-positive-definite path (faster than the general solver).
python
from scipy.sparse import csr_matrix, csr_array

# Legacy matrix-style (avoid in new code)
m = csr_matrix([[1, 0], [0, 2]])
print(type(m), m * m)            # legacy: `*` is matrix multiplication

# Modern array-style
a = csr_array([[1, 0], [0, 2]])
print(type(a), a @ a, a * a)     # `@` is matmul; `*` is element-wise

Output: the matrix prints with csr_matrix-style repr; the array prints with csr_array-style and supports the same operator semantics as numpy

Pin scipy in production — pre-1.0 scipy versions are gone from most index mirrors, and the post-1.0 line breaks downstream pickles (saved sklearn models, for instance) when minor versions change.

Interop with adjacent ecosystems

scipy lives upstream of most of the scientific stack. The main interop concerns are sparse-matrix exchange and the Array API.

Libraryscipy → otherother → scipyZero-copy?
numpyn/a — scipy IS numpy underneathn/aYes (same buffers)
scikit-learnsklearn accepts scipy.sparse directlysklearn returns numpy arraysYes
pandaspd.DataFrame.sparse.from_spmatrix(m)df.sparse.to_coo()Partial
networkxnx.from_scipy_sparse_array(A)nx.to_scipy_sparse_array(G)Copy
pytorchtorch.from_numpy(A.toarray())dense round-tripCopy
jaxjnp.asarray(A.toarray())denseCopy
matlab (.mat)scipy.io.savemat / loadmatn/aCopy
python
from scipy.sparse import csr_array
from sklearn.linear_model import LogisticRegression
import numpy as np

# 1000 documents x 50000 vocab — way too big as dense, fine as sparse
rng = np.random.default_rng(0)
nnz = 10_000
X = csr_array((rng.normal(size=nnz), (rng.integers(1000, size=nnz), rng.integers(50_000, size=nnz))), shape=(1000, 50_000))
y = rng.integers(0, 2, size=1000)

model = LogisticRegression(max_iter=1000).fit(X, y)
print(model.coef_.shape)

Output: shape (1, 50000) — sklearn consumed the sparse matrix without densifying it, which is the difference between fits-in-RAM and 400 MB of zeros

Troubleshooting common errors

The errors below cover the recurring frictions; most are about scipy assumptions (BLAS, sparse format) rather than the API itself.

  • ImportError: cannot import name 'imread' from 'scipy.misc'scipy.misc removed; use imageio.v3.imread instead.
  • OptimizeWarning: Covariance of the parameters could not be estimated — your fit is degenerate or your initial p0= was too far off. Try better initial guesses or use a more robust method (method="trf").
  • SparseEfficiencyWarning when mixing CSR/CSC operations — implicit format conversion. Convert once explicitly (A.tocsc()) or stick to one format throughout.
  • LinAlgError: SVD did not converge — ill-conditioned matrix. Inspect with np.linalg.cond(A); add regularisation (Tikhonov).
  • scipy.stats distribution slow — you are creating the frozen object inside a hot loop. rv = norm(loc=5, scale=2) once, then rv.pdf(x).
  • solve_ivp returns success=False — adaptive step failed. Try method="LSODA" (stiff/non-stiff auto-switching) or relax rtol/atol.
  • scipy.signal.lfilter numerically unstable on high-order filters — switch to sosfiltfilt with second-order-sections.
  • scipy build-from-source begins — wheel missing. Use pip install scipy --only-binary=:all: to fail fast, then investigate why no wheel matched your platform.
  • Mixing numpy 1.x and scipy 1.14+ — scipy 1.14 dropped numpy 1.x support. Pin both.

When NOT to use this

scipy is the right answer most of the time it's reached for; the cases below are where a specialised library wins.

  • Pure statistics with R-style formulas: statsmodels — smf.ols("y ~ x1 + x2", data=df) plus richer regression diagnostics.
  • Bayesian modelling: pymc, NumPyro. scipy.stats is frequentist.
  • GPU computation: CuPy + cupyx.scipy, or JAX (jax.scipy).
  • Symbolic math: SymPy.
  • Production ML: scikit-learn directly; scipy is the foundation, sklearn is the layered API.
  • Distributed compute: scipy is single-process. Use dask-glm, dask-ml, or ray.

See also