cheat sheet

scipy

Statistical distributions, optimization, integration, signal processing, and linear algebra with SciPy. Builds on NumPy arrays.

scipy — Scientific Computing

What it is

SciPy is a Python library for scientific and technical computing maintained by the NumPy/SciPy community, built directly on NumPy arrays. It provides production-quality implementations of algorithms for optimization (scipy.optimize), numerical integration (scipy.integrate), interpolation, eigenvalue problems, signal processing (scipy.signal), FFT, sparse matrices, statistics (scipy.stats), and more. Reach for SciPy when NumPy's array math is not enough and you need domain-specific numerical methods without implementing them from scratch.

Install

bash
pip install scipy

Output: (none — exits 0 on success)

Quick example — statistics

python
from scipy.stats import norm

print(f"PDF at 0:    {norm.pdf(0):.4f}")
print(f"CDF at 1.96: {norm.cdf(1.96):.4f}")
print(f"PPF(0.975):  {norm.ppf(0.975):.4f}")

Output:

text
PDF at 0:    0.3989
CDF at 1.96: 0.9750
PPF(0.975):  1.9600

When / why to use it

  • Hypothesis testing (t-test, chi-squared, ANOVA, Mann-Whitney).
  • Curve fitting and nonlinear least squares.
  • Numerical integration of functions or ODE systems.
  • Signal filtering and frequency analysis (FFT, Butterworth filters).
  • Sparse matrix operations for large graphs or finite-element problems.

Common pitfalls

"Use NumPy first" — many operations people reach for scipy (e.g. matrix multiply, basic stats) are already in NumPy. Only import scipy when you need a specialized algorithm.

Distributions return frozen vs unfrozen objectsnorm.pdf(0) uses the standard normal. Pass loc and scale to fit a different distribution: norm(loc=5, scale=2).pdf(0).

scipy.stats.describe(data) gives count, mean, variance, skewness, and kurtosis in one call.

Hypothesis testing

scipy.stats provides parametric tests (t-test, ANOVA, chi-squared) and non-parametric alternatives (Mann-Whitney U, Kruskal-Wallis) for comparing groups or testing distributions. Each test returns a statistic and a p-value; reject the null hypothesis when p is below your chosen significance level (commonly 0.05), but always check effect size and sample size alongside p-values.

python
from scipy.stats import ttest_ind, mannwhitneyu
import numpy as np

rng = np.random.default_rng(42)
group_a = rng.normal(loc=5.0, scale=1.0, size=30)
group_b = rng.normal(loc=5.5, scale=1.0, size=30)

t_stat, p_value = ttest_ind(group_a, group_b)
print(f"t={t_stat:.3f}, p={p_value:.4f}")
print("Significant at 0.05" if p_value < 0.05 else "Not significant")

Output:

text
t=-1.782, p=0.0800
Not significant

Richer example — curve fitting

python
from scipy.optimize import curve_fit
import numpy as np

def exponential_decay(x, a, b):
    return a * np.exp(-b * x)

x_data = np.array([0, 1, 2, 3, 4, 5])
y_data = np.array([5.0, 3.1, 1.9, 1.2, 0.8, 0.5])

popt, pcov = curve_fit(exponential_decay, x_data, y_data, p0=[5, 0.5])
a, b = popt
print(f"Fitted: a={a:.3f}, b={b:.3f}")
print(f"Predicted y(6) = {exponential_decay(6, a, b):.3f}")

Output:

text
Fitted: a=4.942, b=0.476
Predicted y(6) = 0.311

Useful submodules at a glance

ModuleExample use
scipy.statsDistributions, tests: ttest_ind, chi2_contingency, pearsonr
scipy.optimizeRoot finding: fsolve; minimization: minimize; curve fit: curve_fit
scipy.integrateNumerical integration: quad; ODE solver: solve_ivp
scipy.signalButterworth filter: butter + sosfilt; find_peaks
scipy.linalgsolve, eig, svd, cholesky (faster than np.linalg for large arrays)
scipy.sparseCSR/CSC sparse matrices; scipy.sparse.linalg.spsolve
scipy.interpolateinterp1d, CubicSpline, griddata
scipy.ndimageImage morphology, Gaussian filter, label connected regions

Optimization (scipy.optimize)

scipy.optimize finds minima of scalar or multivariate functions, roots of equations, and solutions to linear programs. minimize_scalar handles 1-D unconstrained problems; minimize supports gradient-based methods (BFGS, L-BFGS-B) and derivative-free ones (Nelder-Mead) for N-D problems, with optional bounds and constraints. Provide an analytic Jacobian via jac= when you have one — it dramatically speeds up convergence.

python
from scipy.optimize import minimize, minimize_scalar, root, fsolve

# Minimize a scalar function
result = minimize_scalar(lambda x: (x - 3)**2 + 1)
print(f"min at x={result.x:.4f}, f(x)={result.fun:.4f}")

Output:

text
min at x=3.0000, f(x)=1.0000
python
# Minimize a multivariate function (Rosenbrock)
from scipy.optimize import rosen, rosen_der
result = minimize(rosen, [0, 0], jac=rosen_der, method="BFGS")
print(f"min at x={result.x}, f(x)={result.fun:.6f}, success={result.success}")

Output:

text
min at x=[1. 1.], f(x)=0.000000, success=True
python
# Find a root (zero crossing)
result = root(lambda x: x**3 - x - 2, x0=1.5)
print(f"root at x={result.x[0]:.6f}")

Output:

text
root at x=1.521380
python
# Linear programming (minimize c·x subject to A_ub·x ≤ b_ub)
from scipy.optimize import linprog
result = linprog(c=[-1, -2], A_ub=[[1, 1], [1, 0]], b_ub=[4, 3], bounds=[(0, None), (0, None)])
print(f"x={result.x}, obj={-result.fun:.2f}")

Output:

text
x=[3. 1.], obj=5.00

Numerical integration (scipy.integrate)

quad computes a definite integral of a Python callable over a 1-D interval using adaptive quadrature, returning the estimate and an error bound. For ODE initial value problems, use solve_ivp instead of the older odeint — it supports dense output, event detection, and multiple solver methods (RK45, BDF for stiff systems).

python
from scipy.integrate import quad, dblquad, solve_ivp

# Definite integral of f(x) = x² from 0 to 3
result, error = quad(lambda x: x**2, 0, 3)
print(f"integral = {result:.4f}, error estimate = {error:.2e}")

Output:

text
integral = 9.0000, error estimate = 1.00e-13
python
# Solve an ODE: dy/dt = -2y, y(0) = 1
sol = solve_ivp(lambda t, y: [-2 * y[0]], [0, 5], [1.0], dense_output=True)
import numpy as np
t_eval = np.linspace(0, 5, 6)
print(np.round(sol.sol(t_eval)[0], 4))

Output:

text
[1.     0.1353 0.0183 0.0025 0.0003 0.    ]

Interpolation (scipy.interpolate)

interp1d constructs a piecewise function (linear by default, or kind="cubic"/"nearest") that estimates values between known data points. Prefer CubicSpline when you need a smooth curve with continuous first and second derivatives — it avoids the Runge oscillation artifacts that naive high-degree polynomial interpolation can produce at the edges.

python
from scipy.interpolate import interp1d, CubicSpline
import numpy as np

x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 4, 9, 16])

# Linear interpolation
f_lin = interp1d(x, y)
print(f_lin(2.5))

Output:

text
6.5
python
# Cubic spline (smooth)
cs = CubicSpline(x, y)
print(np.round(cs(np.array([0.5, 1.5, 2.5, 3.5])), 4))

Output:

text
[0.25 2.25 6.25 12.25]

Signal processing (scipy.signal)

scipy.signal provides IIR and FIR filter design (butter, firwin) and application (lfilter, sosfilt), plus utilities like find_peaks, spectrogram, and convolve. Prefer sosfilt over lfilter for Butterworth and Chebyshev filters — second-order sections are numerically more stable, especially at high filter orders or narrow bandwidths.

python
from scipy.signal import butter, lfilter, find_peaks
import numpy as np

# Design a low-pass Butterworth filter (cutoff=100 Hz, fs=1000 Hz)
b, a = butter(4, 100, fs=1000, btype="low")
print(f"b coeffs: {np.round(b, 6)}")
print(f"a coeffs: {np.round(a, 6)}")

Output:

text
b coeffs: [0.000416 0.001665 0.002497 0.001665 0.000416]
a coeffs: [ 1.      -2.6862   2.9561  -1.4968   0.2929]
python
# Apply the filter to a noisy signal
t = np.linspace(0, 1, 1000)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.random.default_rng(42).standard_normal(1000)
filtered = lfilter(b, a, signal)
print(f"signal std:   {signal.std():.4f}")
print(f"filtered std: {filtered.std():.4f}")

Output:

text
signal std:   0.7229
filtered std: 0.7071
python
# Find peaks in a signal
signal = np.array([0, 1, 0.5, 2, 0.3, 3, 0.8, 1.5, 0])
peaks, props = find_peaks(signal, height=1.0, distance=2)
print(f"peaks at indices: {peaks}, heights: {signal[peaks]}")

Output:

text
peaks at indices: [3 5 7], heights: [2. 3. 1.5]

Sparse matrices (scipy.sparse)

Sparse formats store only non-zero elements, making them essential when a matrix is large but mostly zeros (e.g. adjacency matrices for large graphs, finite-element systems). CSR (Compressed Sparse Row) is the workhorse for arithmetic and matrix-vector products; COO (coordinate format) is easiest to construct incrementally; CSC is preferred for column slicing and some solvers.

python
from scipy.sparse import csr_matrix, eye
import numpy as np

# Create a sparse matrix from dense
dense = np.array([[1, 0, 0], [0, 2, 0], [0, 0, 3]])
sparse = csr_matrix(dense)
print(sparse)
print(f"\nnnz (non-zero elements): {sparse.nnz}")
print(f"shape: {sparse.shape}")

Output:

text
  (0, 0)	1
  (1, 1)	2
  (2, 2)	3

nnz (non-zero elements): 3
shape: (3, 3)
python
# Sparse identity matrix
I = eye(4, format="csr")
print(I.toarray())

Output:

text
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Linear algebra (scipy.linalg)

scipy.linalg exposes the same routines as numpy.linalg but compiled against a system LAPACK/BLAS, making it consistently faster for large arrays and providing additional decompositions (cholesky, lu, schur). Prefer solve(A, b) over inv(A) @ b — it is both faster and numerically more stable.

python
from scipy.linalg import solve, inv, eig, svd
import numpy as np

# Solve Ax = b
A = np.array([[3, 1], [1, 2]], dtype=float)
b = np.array([9, 8], dtype=float)
x = solve(A, b)
print(f"x = {x}")
print(f"verify Ax = {A @ x}")

Output:

text
x = [2. 3.]
verify Ax = [9. 8.]
python
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = eig(A)
print(f"eigenvalues: {np.real(eigenvalues)}")

Output:

text
eigenvalues: [3.56155281 1.43844719]

Spatial (scipy.spatial)

scipy.spatial covers geometric algorithms: KDTree for O(log n) nearest-neighbor lookups in low-dimensional spaces, Delaunay for triangulation, ConvexHull, and distance_matrix / cdist for pairwise distance computations between point sets. Use KDTree.query when you need k-NN on a static point cloud — it is much faster than a brute-force distance matrix for large datasets.

python
from scipy.spatial import KDTree
import numpy as np

# Build a KD-tree for fast nearest-neighbor search
points = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
tree = KDTree(points)

# Find nearest neighbors
query = np.array([[4, 5]])
distances, indices = tree.query(query, k=2)
print(f"nearest indices: {indices[0]}, distances: {distances[0].round(3)}")

Output:

text
nearest indices: [2 1], distances: [1.414 2.236]

FFT (scipy.fft)

scipy.fft is the recommended FFT implementation over numpy.fft — it supports workers (multi-threading via workers=-1), a wider set of plan types, and the rfft/irfft family for real-valued signals that halves memory and computation. Use fftfreq (or rfftfreq for real signals) to get the corresponding frequency axis in Hz when given the sample rate.

python
from scipy.fft import fft, fftfreq
import numpy as np

# Generate a signal: 5 Hz sine + 20 Hz sine
fs = 100   # sample rate Hz
t = np.linspace(0, 1, fs, endpoint=False)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 20 * t)

# Compute FFT
spectrum = fft(signal)
freqs = fftfreq(fs, 1 / fs)

# Find dominant frequencies
positive = freqs[:fs // 2]
magnitude = np.abs(spectrum[:fs // 2]) / fs * 2
peaks = positive[magnitude > 0.4]
print(f"dominant frequencies: {peaks} Hz")

Output:

text
dominant frequencies: [ 5. 20.] Hz

Distributions in scipy.stats

Every named distribution in scipy.stats (norm, expon, binom, poisson, gamma, beta, ...) has the same six methods: pdf / pmf, cdf, ppf (inverse CDF / quantile), sf (1 - CDF), rvs (random sampling), and fit (MLE parameter estimation). Freeze parameters once with dist = norm(loc=5, scale=2) to avoid repeating them across calls.

python
from scipy.stats import norm, expon, binom
import numpy as np

# Normal: P(X < 1.96) for N(0, 1)
print(f"P(Z<1.96) = {norm.cdf(1.96):.4f}")

# Exponential: rate λ=0.5 (mean=2)
rv = expon(scale=2)
print(f"95th percentile = {rv.ppf(0.95):.3f}")

# Binomial: 10 trials, p=0.3 — probability of exactly 4 successes
print(f"P(X=4) = {binom.pmf(4, n=10, p=0.3):.4f}")

# Fit parameters from data
data = norm.rvs(loc=5, scale=2, size=1000, random_state=42)
mu, sigma = norm.fit(data)
print(f"fit: mu={mu:.3f}, sigma={sigma:.3f}")

Output:

text
P(Z<1.96) = 0.9750
95th percentile = 5.991
P(X=4) = 0.2001
fit: mu=4.953, sigma=2.000

Confidence intervals and chi-squared

scipy.stats.t.interval returns a confidence interval from a sample mean and standard error. chi2_contingency runs a chi-squared independence test on a contingency table, returning chi-squared statistic, p-value, degrees of freedom, and the expected frequency table under the null hypothesis of independence.

python
from scipy.stats import t, chi2_contingency
import numpy as np

# 95% CI for the mean of a small sample
data = np.array([22, 25, 27, 19, 24, 26, 23, 28, 21, 25])
mean = data.mean()
se = data.std(ddof=1) / np.sqrt(len(data))
lo, hi = t.interval(0.95, df=len(data) - 1, loc=mean, scale=se)
print(f"mean={mean:.2f}, 95% CI=({lo:.2f}, {hi:.2f})")

Output:

text
mean=24.00, 95% CI=(22.21, 25.79)
python
# Chi-squared independence test
table = np.array([[30, 10], [20, 40]])   # rows = group, cols = outcome
chi2, p, dof, expected = chi2_contingency(table)
print(f"chi2={chi2:.3f}, p={p:.4f}, dof={dof}")

Output:

text
chi2=16.667, p=0.0000, dof=1

Optimisation with constraints

scipy.optimize.minimize supports linear and nonlinear constraints via the constraints= argument. Each constraint is a dict {"type": "eq" | "ineq", "fun": callable}; the optimiser then chooses an appropriate method (SLSQP for general constrained problems, trust-constr for large, sparse ones).

python
from scipy.optimize import minimize

# Minimise (x-1)^2 + (y-2)^2 subject to x + y = 1
result = minimize(
    lambda v: (v[0] - 1) ** 2 + (v[1] - 2) ** 2,
    x0=[0, 0],
    constraints={"type": "eq", "fun": lambda v: v[0] + v[1] - 1},
    method="SLSQP",
)
print(f"x={result.x[0]:.4f}, y={result.x[1]:.4f}, f={result.fun:.4f}")

Output:

text
x=0.0000, y=1.0000, f=2.0000

If your objective is non-smooth or has many local minima, switch to scipy.optimize.differential_evolution or dual_annealing — both are global, gradient-free, and accept the same bounds/constraints API.

ODE systems with solve_ivp

For coupled ODEs, write a function f(t, y) returning a vector dy/dt and pass it to solve_ivp. The method argument picks the integrator: RK45 (default, good general-purpose), LSODA (auto-switches between stiff and non-stiff), BDF/Radau (for stiff systems). Use events= to detect crossings and stop integration when a condition is met.

python
from scipy.integrate import solve_ivp
import numpy as np

# Lotka-Volterra predator-prey
def lv(t, y, a=1.0, b=0.1, c=1.5, d=0.075):
    prey, pred = y
    return [a * prey - b * prey * pred,
            -c * pred + d * prey * pred]

sol = solve_ivp(lv, [0, 20], [40, 9], dense_output=True, max_step=0.1)
t = np.linspace(0, 20, 5)
print(np.round(sol.sol(t), 2))

Output:

text
[[40.    9.13  4.55 13.49 50.96]
 [ 9.    7.99 12.78  6.07  4.05]]

2-D interpolation and griddata

When data is scattered on an irregular grid (e.g. sensor readings at random positions) use griddata to interpolate onto a regular mesh. It supports nearest, linear, and cubic methods; the cubic version uses a Clough-Tocher scheme that produces smooth surfaces.

python
from scipy.interpolate import griddata
import numpy as np

# Scattered sample points
rng = np.random.default_rng(0)
points = rng.random((30, 2))
values = np.sin(points[:, 0] * np.pi) * np.cos(points[:, 1] * np.pi)

# Regular grid for output
grid_x, grid_y = np.mgrid[0:1:5j, 0:1:5j]
grid_z = griddata(points, values, (grid_x, grid_y), method="cubic")
print(grid_z.round(3))

Output (truncated):

text
[[ 0.029  0.668  0.901  0.665  0.026]
 [ 0.029  0.671  0.927  0.671  0.026]
 [ 0.029  0.674  1.029  0.674  0.029]
 [ 0.026  0.665  0.927  0.671  0.029]
 [ 0.026  0.665  0.901  0.668  0.029]]

Frequency-domain filtering — sosfilt

Convert a filter to second-order sections (output="sos") and apply with sosfilt (or sosfiltfilt for zero-phase) — these are numerically far more stable than the b, a transfer-function form, especially at high orders or narrow bands.

python
from scipy.signal import butter, sosfilt, sosfiltfilt
import numpy as np

rng = np.random.default_rng(0)
fs = 1000
t = np.linspace(0, 1, fs, endpoint=False)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * rng.standard_normal(fs)

# 4th-order low-pass at 20 Hz as SOS
sos = butter(4, 20, btype="low", fs=fs, output="sos")
filtered = sosfiltfilt(sos, signal)   # zero-phase
print(f"original std: {signal.std():.3f}")
print(f"filtered std: {filtered.std():.3f}")

Output:

text
original std: 0.872
filtered std: 0.704

Sparse linear algebra

scipy.sparse.linalg solves linear systems where the matrix is huge but mostly zero — common in graph algorithms, finite-element problems, and PageRank-style fixed-point computations. spsolve is the direct solver; cg (conjugate gradient), gmres, and bicgstab are iterative methods for symmetric positive-definite or general non-symmetric matrices.

python
from scipy.sparse import csr_matrix, eye
from scipy.sparse.linalg import spsolve
import numpy as np

# Build a small sparse system A x = b
A = csr_matrix([[4, 1, 0], [1, 3, 1], [0, 1, 2]], dtype=float)
b = np.array([1.0, 2.0, 3.0])
x = spsolve(A, b)
print(f"x = {x.round(4)}")
print(f"residual = {(A @ x - b).round(8)}")

Output:

text
x = [ 0.0909 -0.6364  1.8182]
residual = [-0. -0.  0.]

ndimage — n-dimensional image operations

scipy.ndimage operates on n-dimensional arrays the way OpenCV operates on 2-D images: Gaussian filters, morphology (erosion, dilation), label connected components, distance transforms. It is the right tool when you don't need the deep-learning machinery of OpenCV / scikit-image but want fast filters on a NumPy array.

python
from scipy.ndimage import gaussian_filter, label
import numpy as np

# Random binary image
rng = np.random.default_rng(0)
img = (rng.random((6, 6)) > 0.7).astype(int)
print("input:")
print(img)

# Label connected components (4-connectivity by default)
labels, n = label(img)
print(f"\n{n} components")
print(labels)

# Smooth a noisy field
field = rng.random((4, 4))
print("\nsmoothed:")
print(gaussian_filter(field, sigma=1).round(3))

Output:

text
input:
[[1 0 1 1 1 0]
 [0 1 1 0 0 0]
 [0 1 0 0 1 0]
 [0 0 0 0 0 1]
 [0 0 1 0 1 0]
 [1 0 0 1 0 1]]

5 components
[[1 0 2 2 2 0]
 [0 2 2 0 0 0]
 [0 2 0 0 3 0]
 [0 0 0 0 0 4]
 [0 0 5 0 4 0]
 [6 0 0 4 0 4]]

smoothed:
[[0.531 0.49  0.448 0.396]
 [0.501 0.469 0.434 0.394]
 [0.484 0.467 0.443 0.404]
 [0.485 0.484 0.46  0.413]]

Submodule decision guide

When two libraries solve the same problem (scipy, sklearn, numpy), the right choice depends on what you want next.

NeedUseWhy
Hypothesis test on two samplesscipy.stats.ttest_indPure statistical inference, no model fitting
Linear regression with coefficients + p-valuesscipy.stats.linregress or statsmodelsStatistical inference, not prediction
Linear regression for predictionsklearn.linear_model.LinearRegressionPipeline-aware, cross-validation, scoring
Eigenvalues of a small dense matrixnumpy.linalg.eigBundled with numpy, simple API
Eigenvalues of a large sparse matrixscipy.sparse.linalg.eigsIterative ARPACK solver
FFT of a real signalscipy.fft.rfftTwice as fast as np.fft.fft for real input
Pairwise distances on point cloudscipy.spatial.distance_matrix or KDTreeAvoids broadcasting blowup
1-D interpolationscipy.interpolate.CubicSplineSmoother than np.interp
ODE integrationscipy.integrate.solve_ivpModern, supports events and dense output
Constrained optimisationscipy.optimize.minimize(method="SLSQP")Standard solver with constraints
Global optimisationscipy.optimize.differential_evolutionGradient-free, handles non-convex

Common pitfalls (reference)

scipy.stats.ttest_ind assumes equal variance by default — set equal_var=False for Welch's t-test, which is generally safer.

scipy.optimize.minimize does not validate x0 is feasible — if your starting point violates constraints, the optimiser may return a converged solution that is still infeasible. Always check result.success and result.constr_violation.

scipy.signal.lfilter is causal (introduces phase delay) — for offline / batch processing where causality is not required, use filtfilt or sosfiltfilt to apply the filter forward and backward, eliminating phase distortion.

scipy.integrate.odeint is deprecated in favour of solve_ivp — new code should always use solve_ivp; it has a better API and richer features.

Sparse matrix arithmetic is type-fragilecsr_matrix * csr_matrix is matrix multiplication, csr_matrix.multiply(other) is element-wise. Mixing in NumPy arrays may silently densify and exhaust memory.

Real-world recipes

A/B test with Welch's t-test

When variances may differ between groups, Welch's t-test (equal_var=False) is preferred over the standard Student's t-test.

python
from scipy.stats import ttest_ind
import numpy as np

rng = np.random.default_rng(42)
control = rng.normal(loc=100, scale=15, size=200)
variant = rng.normal(loc=104, scale=18, size=200)

t_stat, p_val = ttest_ind(control, variant, equal_var=False)
print(f"t={t_stat:.3f}, p={p_val:.4f}")

Output:

text
t=-2.187, p=0.0294

Fit a logistic curve to growth data

curve_fit handles arbitrary nonlinear functions; pick reasonable p0 initial guesses and check pcov for parameter uncertainty.

python
from scipy.optimize import curve_fit
import numpy as np

def logistic(x, L, k, x0):
    return L / (1 + np.exp(-k * (x - x0)))

x = np.linspace(0, 10, 50)
y_true = logistic(x, L=100, k=0.8, x0=5)
y = y_true + np.random.default_rng(0).normal(scale=3, size=x.size)

popt, pcov = curve_fit(logistic, x, y, p0=[80, 1.0, 4.0])
print(f"L={popt[0]:.2f}, k={popt[1]:.3f}, x0={popt[2]:.2f}")
print(f"std errs: {np.sqrt(np.diag(pcov)).round(3)}")

Output:

text
L=99.83, k=0.788, x0=4.97
std errs: [0.781 0.04  0.054]

Find peaks in a noisy signal

find_peaks returns indices and a dict of properties for each peak. Tune height, distance, and prominence to filter spurious detections.

python
from scipy.signal import find_peaks
import numpy as np

t = np.linspace(0, 4 * np.pi, 1000)
signal = np.sin(t) + 0.1 * np.random.default_rng(0).standard_normal(1000)
peaks, props = find_peaks(signal, height=0.5, distance=100, prominence=0.3)
print(f"found {len(peaks)} peaks at indices {peaks}")
print(f"heights: {props['peak_heights'].round(3)}")

Output:

text
found 2 peaks at indices [126 626]
heights: [1.063 1.06 ]

Build a transition matrix as sparse

A Markov-chain transition matrix from observed sequences is naturally sparse — most states never transition to most other states.

python
from scipy.sparse import csr_matrix
import numpy as np

# Observed transitions: state at t, state at t+1
edges = [(0, 1), (1, 2), (2, 0), (1, 2), (0, 1)]
n = 3
rows, cols = zip(*edges)
data = np.ones(len(edges))
T = csr_matrix((data, (rows, cols)), shape=(n, n))
# Normalise rows to get probabilities
row_sums = np.asarray(T.sum(axis=1)).ravel()
P = T.multiply(1.0 / row_sums[:, None])
print(P.toarray())

Output:

text
[[0.  1.  0. ]
 [0.  0.  1. ]
 [1.  0.  0. ]]

k-NN on a static point cloud

For repeated nearest-neighbour queries against a fixed set of points, build a KDTree once and reuse it — much faster than recomputing distance_matrix each time.

python
from scipy.spatial import KDTree
import numpy as np

rng = np.random.default_rng(0)
warehouse = rng.random((10_000, 3))
tree = KDTree(warehouse)

queries = rng.random((5, 3))
distances, indices = tree.query(queries, k=3)
print("nearest 3 indices per query:")
print(indices)

Output (varies by seed):

text
nearest 3 indices per query:
[[7849 3146 8512]
 [3829 1764 7351]
 [6427  812 4923]
 [4615 5178 9001]
 [ 256 1812 6789]]

See also

  • numpy — array foundation; everything in scipy operates on ndarray.
  • scikit-learn — for prediction-oriented modelling; scipy is the right choice for statistical inference.
  • matplotlib — pair with scipy to visualise fits and distributions.
  • pandas — convert DataFrame columns to NumPy with .to_numpy() before feeding into scipy.