cheat sheet

scipy

Statistical distributions, optimization, integration, signal processing, and linear algebra with SciPy. Builds on NumPy arrays.

updated 04-25-2026

scipy — Scientific Computing

What it is

SciPy is a Python library for scientific and technical computing maintained by the NumPy/SciPy community, built directly on NumPy arrays. It provides production-quality implementations of algorithms for optimization (scipy.optimize), numerical integration (scipy.integrate), interpolation, eigenvalue problems, signal processing (scipy.signal), FFT, sparse matrices, statistics (scipy.stats), and more. Reach for SciPy when NumPy's array math is not enough and you need domain-specific numerical methods without implementing them from scratch.

Install

bash

pip install scipy

Output: (none — exits 0 on success)

Quick example — statistics

python

from scipy.stats import norm

print(f"PDF at 0:    {norm.pdf(0):.4f}")
print(f"CDF at 1.96: {norm.cdf(1.96):.4f}")
print(f"PPF(0.975):  {norm.ppf(0.975):.4f}")

Output:

text

PDF at 0:    0.3989
CDF at 1.96: 0.9750
PPF(0.975):  1.9600

When / why to use it

Hypothesis testing (t-test, chi-squared, ANOVA, Mann-Whitney).
Curve fitting and nonlinear least squares.
Numerical integration of functions or ODE systems.
Signal filtering and frequency analysis (FFT, Butterworth filters).
Sparse matrix operations for large graphs or finite-element problems.

Common pitfalls

"Use NumPy first" — many operations people reach for scipy (e.g. matrix multiply, basic stats) are already in NumPy. Only import scipy when you need a specialized algorithm.

Distributions return frozen vs unfrozen objects — norm.pdf(0) uses the standard normal. Pass loc and scale to fit a different distribution: norm(loc=5, scale=2).pdf(0).

scipy.stats.describe(data) gives count, mean, variance, skewness, and kurtosis in one call.

Hypothesis testing

scipy.stats provides parametric tests (t-test, ANOVA, chi-squared) and non-parametric alternatives (Mann-Whitney U, Kruskal-Wallis) for comparing groups or testing distributions. Each test returns a statistic and a p-value; reject the null hypothesis when p is below your chosen significance level (commonly 0.05), but always check effect size and sample size alongside p-values.

python

from scipy.stats import ttest_ind, mannwhitneyu
import numpy as np

rng = np.random.default_rng(42)
group_a = rng.normal(loc=5.0, scale=1.0, size=30)
group_b = rng.normal(loc=5.5, scale=1.0, size=30)

t_stat, p_value = ttest_ind(group_a, group_b)
print(f"t={t_stat:.3f}, p={p_value:.4f}")
print("Significant at 0.05" if p_value < 0.05 else "Not significant")

Output:

text

t=-1.782, p=0.0800
Not significant

Richer example — curve fitting

python

from scipy.optimize import curve_fit
import numpy as np

def exponential_decay(x, a, b):
    return a * np.exp(-b * x)

x_data = np.array([0, 1, 2, 3, 4, 5])
y_data = np.array([5.0, 3.1, 1.9, 1.2, 0.8, 0.5])

popt, pcov = curve_fit(exponential_decay, x_data, y_data, p0=[5, 0.5])
a, b = popt
print(f"Fitted: a={a:.3f}, b={b:.3f}")
print(f"Predicted y(6) = {exponential_decay(6, a, b):.3f}")

Output:

text

Fitted: a=4.942, b=0.476
Predicted y(6) = 0.311

Useful submodules at a glance

Module	Example use
`scipy.stats`	Distributions, tests: `ttest_ind`, `chi2_contingency`, `pearsonr`
`scipy.optimize`	Root finding: `fsolve`; minimization: `minimize`; curve fit: `curve_fit`
`scipy.integrate`	Numerical integration: `quad`; ODE solver: `solve_ivp`
`scipy.signal`	Butterworth filter: `butter` + `sosfilt`; `find_peaks`
`scipy.linalg`	`solve`, `eig`, `svd`, `cholesky` (faster than `np.linalg` for large arrays)
`scipy.sparse`	CSR/CSC sparse matrices; `scipy.sparse.linalg.spsolve`
`scipy.interpolate`	`interp1d`, `CubicSpline`, `griddata`
`scipy.ndimage`	Image morphology, Gaussian filter, label connected regions

Optimization (`scipy.optimize`)

scipy.optimize finds minima of scalar or multivariate functions, roots of equations, and solutions to linear programs. minimize_scalar handles 1-D unconstrained problems; minimize supports gradient-based methods (BFGS, L-BFGS-B) and derivative-free ones (Nelder-Mead) for N-D problems, with optional bounds and constraints. Provide an analytic Jacobian via jac= when you have one — it dramatically speeds up convergence.

python

from scipy.optimize import minimize, minimize_scalar, root, fsolve

# Minimize a scalar function
result = minimize_scalar(lambda x: (x - 3)**2 + 1)
print(f"min at x={result.x:.4f}, f(x)={result.fun:.4f}")

Output:

text

min at x=3.0000, f(x)=1.0000

python

# Minimize a multivariate function (Rosenbrock)
from scipy.optimize import rosen, rosen_der
result = minimize(rosen, [0, 0], jac=rosen_der, method="BFGS")
print(f"min at x={result.x}, f(x)={result.fun:.6f}, success={result.success}")

Output:

text

min at x=[1. 1.], f(x)=0.000000, success=True

python

# Find a root (zero crossing)
result = root(lambda x: x**3 - x - 2, x0=1.5)
print(f"root at x={result.x[0]:.6f}")

Output:

text

root at x=1.521380

python

# Linear programming (minimize c·x subject to A_ub·x ≤ b_ub)
from scipy.optimize import linprog
result = linprog(c=[-1, -2], A_ub=[[1, 1], [1, 0]], b_ub=[4, 3], bounds=[(0, None), (0, None)])
print(f"x={result.x}, obj={-result.fun:.2f}")

Output:

text

x=[3. 1.], obj=5.00

Numerical integration (`scipy.integrate`)

quad computes a definite integral of a Python callable over a 1-D interval using adaptive quadrature, returning the estimate and an error bound. For ODE initial value problems, use solve_ivp instead of the older odeint — it supports dense output, event detection, and multiple solver methods (RK45, BDF for stiff systems).

python

from scipy.integrate import quad, dblquad, solve_ivp

# Definite integral of f(x) = x² from 0 to 3
result, error = quad(lambda x: x**2, 0, 3)
print(f"integral = {result:.4f}, error estimate = {error:.2e}")

Output:

text

integral = 9.0000, error estimate = 1.00e-13

python

# Solve an ODE: dy/dt = -2y, y(0) = 1
sol = solve_ivp(lambda t, y: [-2 * y[0]], [0, 5], [1.0], dense_output=True)
import numpy as np
t_eval = np.linspace(0, 5, 6)
print(np.round(sol.sol(t_eval)[0], 4))

Output:

text

[1.     0.1353 0.0183 0.0025 0.0003 0.    ]

Interpolation (`scipy.interpolate`)

interp1d constructs a piecewise function (linear by default, or kind="cubic"/"nearest") that estimates values between known data points. Prefer CubicSpline when you need a smooth curve with continuous first and second derivatives — it avoids the Runge oscillation artifacts that naive high-degree polynomial interpolation can produce at the edges.

python

from scipy.interpolate import interp1d, CubicSpline
import numpy as np

x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 4, 9, 16])

# Linear interpolation
f_lin = interp1d(x, y)
print(f_lin(2.5))

Output:

text

6.5

python

# Cubic spline (smooth)
cs = CubicSpline(x, y)
print(np.round(cs(np.array([0.5, 1.5, 2.5, 3.5])), 4))

Output:

text

[0.25 2.25 6.25 12.25]

Signal processing (`scipy.signal`)

scipy.signal provides IIR and FIR filter design (butter, firwin) and application (lfilter, sosfilt), plus utilities like find_peaks, spectrogram, and convolve. Prefer sosfilt over lfilter for Butterworth and Chebyshev filters — second-order sections are numerically more stable, especially at high filter orders or narrow bandwidths.

python

from scipy.signal import butter, lfilter, find_peaks
import numpy as np

# Design a low-pass Butterworth filter (cutoff=100 Hz, fs=1000 Hz)
b, a = butter(4, 100, fs=1000, btype="low")
print(f"b coeffs: {np.round(b, 6)}")
print(f"a coeffs: {np.round(a, 6)}")

Output:

text

b coeffs: [0.000416 0.001665 0.002497 0.001665 0.000416]
a coeffs: [ 1.      -2.6862   2.9561  -1.4968   0.2929]

python

# Apply the filter to a noisy signal
t = np.linspace(0, 1, 1000)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.random.default_rng(42).standard_normal(1000)
filtered = lfilter(b, a, signal)
print(f"signal std:   {signal.std():.4f}")
print(f"filtered std: {filtered.std():.4f}")

Output:

text

signal std:   0.7229
filtered std: 0.7071

python

# Find peaks in a signal
signal = np.array([0, 1, 0.5, 2, 0.3, 3, 0.8, 1.5, 0])
peaks, props = find_peaks(signal, height=1.0, distance=2)
print(f"peaks at indices: {peaks}, heights: {signal[peaks]}")

Output:

text

peaks at indices: [3 5 7], heights: [2. 3. 1.5]

Sparse matrices (`scipy.sparse`)

Sparse formats store only non-zero elements, making them essential when a matrix is large but mostly zeros (e.g. adjacency matrices for large graphs, finite-element systems). CSR (Compressed Sparse Row) is the workhorse for arithmetic and matrix-vector products; COO (coordinate format) is easiest to construct incrementally; CSC is preferred for column slicing and some solvers.

python

from scipy.sparse import csr_matrix, eye
import numpy as np

# Create a sparse matrix from dense
dense = np.array([[1, 0, 0], [0, 2, 0], [0, 0, 3]])
sparse = csr_matrix(dense)
print(sparse)
print(f"\nnnz (non-zero elements): {sparse.nnz}")
print(f"shape: {sparse.shape}")

Output:

text

  (0, 0)	1
  (1, 1)	2
  (2, 2)	3

nnz (non-zero elements): 3
shape: (3, 3)

python

# Sparse identity matrix
I = eye(4, format="csr")
print(I.toarray())

Output:

text

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Linear algebra (`scipy.linalg`)

scipy.linalg exposes the same routines as numpy.linalg but compiled against a system LAPACK/BLAS, making it consistently faster for large arrays and providing additional decompositions (cholesky, lu, schur). Prefer solve(A, b) over inv(A) @ b — it is both faster and numerically more stable.

python

from scipy.linalg import solve, inv, eig, svd
import numpy as np

# Solve Ax = b
A = np.array([[3, 1], [1, 2]], dtype=float)
b = np.array([9, 8], dtype=float)
x = solve(A, b)
print(f"x = {x}")
print(f"verify Ax = {A @ x}")

Output:

text

x = [2. 3.]
verify Ax = [9. 8.]

python

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = eig(A)
print(f"eigenvalues: {np.real(eigenvalues)}")

Output:

text

eigenvalues: [3.56155281 1.43844719]

Spatial (`scipy.spatial`)

scipy.spatial covers geometric algorithms: KDTree for O(log n) nearest-neighbor lookups in low-dimensional spaces, Delaunay for triangulation, ConvexHull, and distance_matrix / cdist for pairwise distance computations between point sets. Use KDTree.query when you need k-NN on a static point cloud — it is much faster than a brute-force distance matrix for large datasets.

python

from scipy.spatial import KDTree
import numpy as np

# Build a KD-tree for fast nearest-neighbor search
points = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
tree = KDTree(points)

# Find nearest neighbors
query = np.array([[4, 5]])
distances, indices = tree.query(query, k=2)
print(f"nearest indices: {indices[0]}, distances: {distances[0].round(3)}")

Output:

text

nearest indices: [2 1], distances: [1.414 2.236]

FFT (`scipy.fft`)

scipy.fft is the recommended FFT implementation over numpy.fft — it supports workers (multi-threading via workers=-1), a wider set of plan types, and the rfft/irfft family for real-valued signals that halves memory and computation. Use fftfreq (or rfftfreq for real signals) to get the corresponding frequency axis in Hz when given the sample rate.

python

from scipy.fft import fft, fftfreq
import numpy as np

# Generate a signal: 5 Hz sine + 20 Hz sine
fs = 100   # sample rate Hz
t = np.linspace(0, 1, fs, endpoint=False)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 20 * t)

# Compute FFT
spectrum = fft(signal)
freqs = fftfreq(fs, 1 / fs)

# Find dominant frequencies
positive = freqs[:fs // 2]
magnitude = np.abs(spectrum[:fs // 2]) / fs * 2
peaks = positive[magnitude > 0.4]
print(f"dominant frequencies: {peaks} Hz")

Output:

text

dominant frequencies: [ 5. 20.] Hz

Distributions in `scipy.stats`

Every named distribution in scipy.stats (norm, expon, binom, poisson, gamma, beta, ...) has the same six methods: pdf / pmf, cdf, ppf (inverse CDF / quantile), sf (1 - CDF), rvs (random sampling), and fit (MLE parameter estimation). Freeze parameters once with dist = norm(loc=5, scale=2) to avoid repeating them across calls.

python

from scipy.stats import norm, expon, binom
import numpy as np

# Normal: P(X < 1.96) for N(0, 1)
print(f"P(Z<1.96) = {norm.cdf(1.96):.4f}")

# Exponential: rate λ=0.5 (mean=2)
rv = expon(scale=2)
print(f"95th percentile = {rv.ppf(0.95):.3f}")

# Binomial: 10 trials, p=0.3 — probability of exactly 4 successes
print(f"P(X=4) = {binom.pmf(4, n=10, p=0.3):.4f}")

# Fit parameters from data
data = norm.rvs(loc=5, scale=2, size=1000, random_state=42)
mu, sigma = norm.fit(data)
print(f"fit: mu={mu:.3f}, sigma={sigma:.3f}")

Output:

text

P(Z<1.96) = 0.9750
95th percentile = 5.991
P(X=4) = 0.2001
fit: mu=4.953, sigma=2.000

Confidence intervals and chi-squared

scipy.stats.t.interval returns a confidence interval from a sample mean and standard error. chi2_contingency runs a chi-squared independence test on a contingency table, returning chi-squared statistic, p-value, degrees of freedom, and the expected frequency table under the null hypothesis of independence.

python

from scipy.stats import t, chi2_contingency
import numpy as np

# 95% CI for the mean of a small sample
data = np.array([22, 25, 27, 19, 24, 26, 23, 28, 21, 25])
mean = data.mean()
se = data.std(ddof=1) / np.sqrt(len(data))
lo, hi = t.interval(0.95, df=len(data) - 1, loc=mean, scale=se)
print(f"mean={mean:.2f}, 95% CI=({lo:.2f}, {hi:.2f})")

Output:

text

mean=24.00, 95% CI=(22.21, 25.79)

python

# Chi-squared independence test
table = np.array([[30, 10], [20, 40]])   # rows = group, cols = outcome
chi2, p, dof, expected = chi2_contingency(table)
print(f"chi2={chi2:.3f}, p={p:.4f}, dof={dof}")

Output:

text

chi2=16.667, p=0.0000, dof=1

Optimisation with constraints

scipy.optimize.minimize supports linear and nonlinear constraints via the constraints= argument. Each constraint is a dict {"type": "eq" | "ineq", "fun": callable}; the optimiser then chooses an appropriate method (SLSQP for general constrained problems, trust-constr for large, sparse ones).

python

from scipy.optimize import minimize

# Minimise (x-1)^2 + (y-2)^2 subject to x + y = 1
result = minimize(
    lambda v: (v[0] - 1) ** 2 + (v[1] - 2) ** 2,
    x0=[0, 0],
    constraints={"type": "eq", "fun": lambda v: v[0] + v[1] - 1},
    method="SLSQP",
)
print(f"x={result.x[0]:.4f}, y={result.x[1]:.4f}, f={result.fun:.4f}")

Output:

text

x=0.0000, y=1.0000, f=2.0000

If your objective is non-smooth or has many local minima, switch to scipy.optimize.differential_evolution or dual_annealing — both are global, gradient-free, and accept the same bounds/constraints API.

ODE systems with `solve_ivp`

For coupled ODEs, write a function f(t, y) returning a vector dy/dt and pass it to solve_ivp. The method argument picks the integrator: RK45 (default, good general-purpose), LSODA (auto-switches between stiff and non-stiff), BDF/Radau (for stiff systems). Use events= to detect crossings and stop integration when a condition is met.

python

from scipy.integrate import solve_ivp
import numpy as np

# Lotka-Volterra predator-prey
def lv(t, y, a=1.0, b=0.1, c=1.5, d=0.075):
    prey, pred = y
    return [a * prey - b * prey * pred,
            -c * pred + d * prey * pred]

sol = solve_ivp(lv, [0, 20], [40, 9], dense_output=True, max_step=0.1)
t = np.linspace(0, 20, 5)
print(np.round(sol.sol(t), 2))

Output:

text

[[40.    9.13  4.55 13.49 50.96]
 [ 9.    7.99 12.78  6.07  4.05]]

2-D interpolation and `griddata`

When data is scattered on an irregular grid (e.g. sensor readings at random positions) use griddata to interpolate onto a regular mesh. It supports nearest, linear, and cubic methods; the cubic version uses a Clough-Tocher scheme that produces smooth surfaces.

python

from scipy.interpolate import griddata
import numpy as np

# Scattered sample points
rng = np.random.default_rng(0)
points = rng.random((30, 2))
values = np.sin(points[:, 0] * np.pi) * np.cos(points[:, 1] * np.pi)

# Regular grid for output
grid_x, grid_y = np.mgrid[0:1:5j, 0:1:5j]
grid_z = griddata(points, values, (grid_x, grid_y), method="cubic")
print(grid_z.round(3))

Output (truncated):

text

[[ 0.029  0.668  0.901  0.665  0.026]
 [ 0.029  0.671  0.927  0.671  0.026]
 [ 0.029  0.674  1.029  0.674  0.029]
 [ 0.026  0.665  0.927  0.671  0.029]
 [ 0.026  0.665  0.901  0.668  0.029]]

Frequency-domain filtering — sosfilt

Convert a filter to second-order sections (output="sos") and apply with sosfilt (or sosfiltfilt for zero-phase) — these are numerically far more stable than the b, a transfer-function form, especially at high orders or narrow bands.

python

from scipy.signal import butter, sosfilt, sosfiltfilt
import numpy as np

rng = np.random.default_rng(0)
fs = 1000
t = np.linspace(0, 1, fs, endpoint=False)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * rng.standard_normal(fs)

# 4th-order low-pass at 20 Hz as SOS
sos = butter(4, 20, btype="low", fs=fs, output="sos")
filtered = sosfiltfilt(sos, signal)   # zero-phase
print(f"original std: {signal.std():.3f}")
print(f"filtered std: {filtered.std():.3f}")

Output:

text

original std: 0.872
filtered std: 0.704

Sparse linear algebra

scipy.sparse.linalg solves linear systems where the matrix is huge but mostly zero — common in graph algorithms, finite-element problems, and PageRank-style fixed-point computations. spsolve is the direct solver; cg (conjugate gradient), gmres, and bicgstab are iterative methods for symmetric positive-definite or general non-symmetric matrices.

python

from scipy.sparse import csr_matrix, eye
from scipy.sparse.linalg import spsolve
import numpy as np

# Build a small sparse system A x = b
A = csr_matrix([[4, 1, 0], [1, 3, 1], [0, 1, 2]], dtype=float)
b = np.array([1.0, 2.0, 3.0])
x = spsolve(A, b)
print(f"x = {x.round(4)}")
print(f"residual = {(A @ x - b).round(8)}")

Output:

text

x = [ 0.0909 -0.6364  1.8182]
residual = [-0. -0.  0.]

ndimage — n-dimensional image operations

scipy.ndimage operates on n-dimensional arrays the way OpenCV operates on 2-D images: Gaussian filters, morphology (erosion, dilation), label connected components, distance transforms. It is the right tool when you don't need the deep-learning machinery of OpenCV / scikit-image but want fast filters on a NumPy array.

python

from scipy.ndimage import gaussian_filter, label
import numpy as np

# Random binary image
rng = np.random.default_rng(0)
img = (rng.random((6, 6)) > 0.7).astype(int)
print("input:")
print(img)

# Label connected components (4-connectivity by default)
labels, n = label(img)
print(f"\n{n} components")
print(labels)

# Smooth a noisy field
field = rng.random((4, 4))
print("\nsmoothed:")
print(gaussian_filter(field, sigma=1).round(3))

Output:

text

input:
[[1 0 1 1 1 0]
 [0 1 1 0 0 0]
 [0 1 0 0 1 0]
 [0 0 0 0 0 1]
 [0 0 1 0 1 0]
 [1 0 0 1 0 1]]

5 components
[[1 0 2 2 2 0]
 [0 2 2 0 0 0]
 [0 2 0 0 3 0]
 [0 0 0 0 0 4]
 [0 0 5 0 4 0]
 [6 0 0 4 0 4]]

smoothed:
[[0.531 0.49  0.448 0.396]
 [0.501 0.469 0.434 0.394]
 [0.484 0.467 0.443 0.404]
 [0.485 0.484 0.46  0.413]]

Submodule decision guide

When two libraries solve the same problem (scipy, sklearn, numpy), the right choice depends on what you want next.

Need	Use	Why
Hypothesis test on two samples	`scipy.stats.ttest_ind`	Pure statistical inference, no model fitting
Linear regression with coefficients + p-values	`scipy.stats.linregress` or `statsmodels`	Statistical inference, not prediction
Linear regression for prediction	`sklearn.linear_model.LinearRegression`	Pipeline-aware, cross-validation, scoring
Eigenvalues of a small dense matrix	`numpy.linalg.eig`	Bundled with numpy, simple API
Eigenvalues of a large sparse matrix	`scipy.sparse.linalg.eigs`	Iterative ARPACK solver
FFT of a real signal	`scipy.fft.rfft`	Twice as fast as `np.fft.fft` for real input
Pairwise distances on point cloud	`scipy.spatial.distance_matrix` or `KDTree`	Avoids broadcasting blowup
1-D interpolation	`scipy.interpolate.CubicSpline`	Smoother than `np.interp`
ODE integration	`scipy.integrate.solve_ivp`	Modern, supports events and dense output
Constrained optimisation	`scipy.optimize.minimize(method="SLSQP")`	Standard solver with constraints
Global optimisation	`scipy.optimize.differential_evolution`	Gradient-free, handles non-convex

Common pitfalls (reference)

scipy.stats.ttest_ind assumes equal variance by default — set equal_var=False for Welch's t-test, which is generally safer.

scipy.optimize.minimize does not validate x0 is feasible — if your starting point violates constraints, the optimiser may return a converged solution that is still infeasible. Always check result.success and result.constr_violation.

scipy.signal.lfilter is causal (introduces phase delay) — for offline / batch processing where causality is not required, use filtfilt or sosfiltfilt to apply the filter forward and backward, eliminating phase distortion.

scipy.integrate.odeint is deprecated in favour of solve_ivp — new code should always use solve_ivp; it has a better API and richer features.

Sparse matrix arithmetic is type-fragile — csr_matrix * csr_matrix is matrix multiplication, csr_matrix.multiply(other) is element-wise. Mixing in NumPy arrays may silently densify and exhaust memory.

Real-world recipes

A/B test with Welch's t-test

When variances may differ between groups, Welch's t-test (equal_var=False) is preferred over the standard Student's t-test.

python

from scipy.stats import ttest_ind
import numpy as np

rng = np.random.default_rng(42)
control = rng.normal(loc=100, scale=15, size=200)
variant = rng.normal(loc=104, scale=18, size=200)

t_stat, p_val = ttest_ind(control, variant, equal_var=False)
print(f"t={t_stat:.3f}, p={p_val:.4f}")

Output:

text

t=-2.187, p=0.0294

Fit a logistic curve to growth data

curve_fit handles arbitrary nonlinear functions; pick reasonable p0 initial guesses and check pcov for parameter uncertainty.

python

from scipy.optimize import curve_fit
import numpy as np

def logistic(x, L, k, x0):
    return L / (1 + np.exp(-k * (x - x0)))

x = np.linspace(0, 10, 50)
y_true = logistic(x, L=100, k=0.8, x0=5)
y = y_true + np.random.default_rng(0).normal(scale=3, size=x.size)

popt, pcov = curve_fit(logistic, x, y, p0=[80, 1.0, 4.0])
print(f"L={popt[0]:.2f}, k={popt[1]:.3f}, x0={popt[2]:.2f}")
print(f"std errs: {np.sqrt(np.diag(pcov)).round(3)}")

Output:

text

L=99.83, k=0.788, x0=4.97
std errs: [0.781 0.04  0.054]

Find peaks in a noisy signal

find_peaks returns indices and a dict of properties for each peak. Tune height, distance, and prominence to filter spurious detections.

python

from scipy.signal import find_peaks
import numpy as np

t = np.linspace(0, 4 * np.pi, 1000)
signal = np.sin(t) + 0.1 * np.random.default_rng(0).standard_normal(1000)
peaks, props = find_peaks(signal, height=0.5, distance=100, prominence=0.3)
print(f"found {len(peaks)} peaks at indices {peaks}")
print(f"heights: {props['peak_heights'].round(3)}")

Output:

text

found 2 peaks at indices [126 626]
heights: [1.063 1.06 ]

Build a transition matrix as sparse

A Markov-chain transition matrix from observed sequences is naturally sparse — most states never transition to most other states.

python

from scipy.sparse import csr_matrix
import numpy as np

# Observed transitions: state at t, state at t+1
edges = [(0, 1), (1, 2), (2, 0), (1, 2), (0, 1)]
n = 3
rows, cols = zip(*edges)
data = np.ones(len(edges))
T = csr_matrix((data, (rows, cols)), shape=(n, n))
# Normalise rows to get probabilities
row_sums = np.asarray(T.sum(axis=1)).ravel()
P = T.multiply(1.0 / row_sums[:, None])
print(P.toarray())

Output:

text

[[0.  1.  0. ]
 [0.  0.  1. ]
 [1.  0.  0. ]]

k-NN on a static point cloud

For repeated nearest-neighbour queries against a fixed set of points, build a KDTree once and reuse it — much faster than recomputing distance_matrix each time.

python

from scipy.spatial import KDTree
import numpy as np

rng = np.random.default_rng(0)
warehouse = rng.random((10_000, 3))
tree = KDTree(warehouse)

queries = rng.random((5, 3))
distances, indices = tree.query(queries, k=3)
print("nearest 3 indices per query:")
print(indices)

Output (varies by seed):

text

nearest 3 indices per query:
[[7849 3146 8512]
 [3829 1764 7351]
 [6427  812 4923]
 [4615 5178 9001]
 [ 256 1812 6789]]

scipy — Scientific Computing

What it is

Install

Quick example — statistics

When / why to use it

Common pitfalls

Hypothesis testing

Richer example — curve fitting

Useful submodules at a glance

Optimization (scipy.optimize)

Numerical integration (scipy.integrate)

Interpolation (scipy.interpolate)

Signal processing (scipy.signal)

Sparse matrices (scipy.sparse)

Linear algebra (scipy.linalg)

Spatial (scipy.spatial)

FFT (scipy.fft)

Distributions in scipy.stats

Confidence intervals and chi-squared

Optimisation with constraints

ODE systems with solve_ivp

2-D interpolation and griddata

Frequency-domain filtering — sosfilt

Sparse linear algebra

ndimage — n-dimensional image operations

Submodule decision guide

Common pitfalls (reference)

Real-world recipes

A/B test with Welch's t-test

Fit a logistic curve to growth data

Find peaks in a noisy signal

Build a transition matrix as sparse

k-NN on a static point cloud

See also

Optimization (`scipy.optimize`)

Numerical integration (`scipy.integrate`)

Interpolation (`scipy.interpolate`)

Signal processing (`scipy.signal`)

Sparse matrices (`scipy.sparse`)

Linear algebra (`scipy.linalg`)

Spatial (`scipy.spatial`)

FFT (`scipy.fft`)

Distributions in `scipy.stats`

ODE systems with `solve_ivp`

2-D interpolation and `griddata`