cheat sheet
scipy
Statistical distributions, optimization, integration, signal processing, and linear algebra with SciPy. Builds on NumPy arrays.
scipy — Scientific Computing
What it is
SciPy is a Python library for scientific and technical computing maintained by the NumPy/SciPy community, built directly on NumPy arrays. It provides production-quality implementations of algorithms for optimization (scipy.optimize), numerical integration (scipy.integrate), interpolation, eigenvalue problems, signal processing (scipy.signal), FFT, sparse matrices, statistics (scipy.stats), and more. Reach for SciPy when NumPy's array math is not enough and you need domain-specific numerical methods without implementing them from scratch.
Install
pip install scipy
Output: (none — exits 0 on success)
Quick example — statistics
from scipy.stats import norm
print(f"PDF at 0: {norm.pdf(0):.4f}")
print(f"CDF at 1.96: {norm.cdf(1.96):.4f}")
print(f"PPF(0.975): {norm.ppf(0.975):.4f}")
Output:
PDF at 0: 0.3989
CDF at 1.96: 0.9750
PPF(0.975): 1.9600
When / why to use it
- Hypothesis testing (t-test, chi-squared, ANOVA, Mann-Whitney).
- Curve fitting and nonlinear least squares.
- Numerical integration of functions or ODE systems.
- Signal filtering and frequency analysis (FFT, Butterworth filters).
- Sparse matrix operations for large graphs or finite-element problems.
Common pitfalls
"Use NumPy first" — many operations people reach for scipy (e.g. matrix multiply, basic stats) are already in NumPy. Only import scipy when you need a specialized algorithm.
Distributions return frozen vs unfrozen objects —
norm.pdf(0)uses the standard normal. Passlocandscaleto fit a different distribution:norm(loc=5, scale=2).pdf(0).
scipy.stats.describe(data)gives count, mean, variance, skewness, and kurtosis in one call.
Hypothesis testing
scipy.stats provides parametric tests (t-test, ANOVA, chi-squared) and non-parametric alternatives (Mann-Whitney U, Kruskal-Wallis) for comparing groups or testing distributions. Each test returns a statistic and a p-value; reject the null hypothesis when p is below your chosen significance level (commonly 0.05), but always check effect size and sample size alongside p-values.
from scipy.stats import ttest_ind, mannwhitneyu
import numpy as np
rng = np.random.default_rng(42)
group_a = rng.normal(loc=5.0, scale=1.0, size=30)
group_b = rng.normal(loc=5.5, scale=1.0, size=30)
t_stat, p_value = ttest_ind(group_a, group_b)
print(f"t={t_stat:.3f}, p={p_value:.4f}")
print("Significant at 0.05" if p_value < 0.05 else "Not significant")
Output:
t=-1.782, p=0.0800
Not significant
Richer example — curve fitting
from scipy.optimize import curve_fit
import numpy as np
def exponential_decay(x, a, b):
return a * np.exp(-b * x)
x_data = np.array([0, 1, 2, 3, 4, 5])
y_data = np.array([5.0, 3.1, 1.9, 1.2, 0.8, 0.5])
popt, pcov = curve_fit(exponential_decay, x_data, y_data, p0=[5, 0.5])
a, b = popt
print(f"Fitted: a={a:.3f}, b={b:.3f}")
print(f"Predicted y(6) = {exponential_decay(6, a, b):.3f}")
Output:
Fitted: a=4.942, b=0.476
Predicted y(6) = 0.311
Useful submodules at a glance
| Module | Example use |
|---|---|
scipy.stats | Distributions, tests: ttest_ind, chi2_contingency, pearsonr |
scipy.optimize | Root finding: fsolve; minimization: minimize; curve fit: curve_fit |
scipy.integrate | Numerical integration: quad; ODE solver: solve_ivp |
scipy.signal | Butterworth filter: butter + sosfilt; find_peaks |
scipy.linalg | solve, eig, svd, cholesky (faster than np.linalg for large arrays) |
scipy.sparse | CSR/CSC sparse matrices; scipy.sparse.linalg.spsolve |
scipy.interpolate | interp1d, CubicSpline, griddata |
scipy.ndimage | Image morphology, Gaussian filter, label connected regions |
Optimization (scipy.optimize)
scipy.optimize finds minima of scalar or multivariate functions, roots of equations, and solutions to linear programs. minimize_scalar handles 1-D unconstrained problems; minimize supports gradient-based methods (BFGS, L-BFGS-B) and derivative-free ones (Nelder-Mead) for N-D problems, with optional bounds and constraints. Provide an analytic Jacobian via jac= when you have one — it dramatically speeds up convergence.
from scipy.optimize import minimize, minimize_scalar, root, fsolve
# Minimize a scalar function
result = minimize_scalar(lambda x: (x - 3)**2 + 1)
print(f"min at x={result.x:.4f}, f(x)={result.fun:.4f}")
Output:
min at x=3.0000, f(x)=1.0000
# Minimize a multivariate function (Rosenbrock)
from scipy.optimize import rosen, rosen_der
result = minimize(rosen, [0, 0], jac=rosen_der, method="BFGS")
print(f"min at x={result.x}, f(x)={result.fun:.6f}, success={result.success}")
Output:
min at x=[1. 1.], f(x)=0.000000, success=True
# Find a root (zero crossing)
result = root(lambda x: x**3 - x - 2, x0=1.5)
print(f"root at x={result.x[0]:.6f}")
Output:
root at x=1.521380
# Linear programming (minimize c·x subject to A_ub·x ≤ b_ub)
from scipy.optimize import linprog
result = linprog(c=[-1, -2], A_ub=[[1, 1], [1, 0]], b_ub=[4, 3], bounds=[(0, None), (0, None)])
print(f"x={result.x}, obj={-result.fun:.2f}")
Output:
x=[3. 1.], obj=5.00
Numerical integration (scipy.integrate)
quad computes a definite integral of a Python callable over a 1-D interval using adaptive quadrature, returning the estimate and an error bound. For ODE initial value problems, use solve_ivp instead of the older odeint — it supports dense output, event detection, and multiple solver methods (RK45, BDF for stiff systems).
from scipy.integrate import quad, dblquad, solve_ivp
# Definite integral of f(x) = x² from 0 to 3
result, error = quad(lambda x: x**2, 0, 3)
print(f"integral = {result:.4f}, error estimate = {error:.2e}")
Output:
integral = 9.0000, error estimate = 1.00e-13
# Solve an ODE: dy/dt = -2y, y(0) = 1
sol = solve_ivp(lambda t, y: [-2 * y[0]], [0, 5], [1.0], dense_output=True)
import numpy as np
t_eval = np.linspace(0, 5, 6)
print(np.round(sol.sol(t_eval)[0], 4))
Output:
[1. 0.1353 0.0183 0.0025 0.0003 0. ]
Interpolation (scipy.interpolate)
interp1d constructs a piecewise function (linear by default, or kind="cubic"/"nearest") that estimates values between known data points. Prefer CubicSpline when you need a smooth curve with continuous first and second derivatives — it avoids the Runge oscillation artifacts that naive high-degree polynomial interpolation can produce at the edges.
from scipy.interpolate import interp1d, CubicSpline
import numpy as np
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 4, 9, 16])
# Linear interpolation
f_lin = interp1d(x, y)
print(f_lin(2.5))
Output:
6.5
# Cubic spline (smooth)
cs = CubicSpline(x, y)
print(np.round(cs(np.array([0.5, 1.5, 2.5, 3.5])), 4))
Output:
[0.25 2.25 6.25 12.25]
Signal processing (scipy.signal)
scipy.signal provides IIR and FIR filter design (butter, firwin) and application (lfilter, sosfilt), plus utilities like find_peaks, spectrogram, and convolve. Prefer sosfilt over lfilter for Butterworth and Chebyshev filters — second-order sections are numerically more stable, especially at high filter orders or narrow bandwidths.
from scipy.signal import butter, lfilter, find_peaks
import numpy as np
# Design a low-pass Butterworth filter (cutoff=100 Hz, fs=1000 Hz)
b, a = butter(4, 100, fs=1000, btype="low")
print(f"b coeffs: {np.round(b, 6)}")
print(f"a coeffs: {np.round(a, 6)}")
Output:
b coeffs: [0.000416 0.001665 0.002497 0.001665 0.000416]
a coeffs: [ 1. -2.6862 2.9561 -1.4968 0.2929]
# Apply the filter to a noisy signal
t = np.linspace(0, 1, 1000)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.random.default_rng(42).standard_normal(1000)
filtered = lfilter(b, a, signal)
print(f"signal std: {signal.std():.4f}")
print(f"filtered std: {filtered.std():.4f}")
Output:
signal std: 0.7229
filtered std: 0.7071
# Find peaks in a signal
signal = np.array([0, 1, 0.5, 2, 0.3, 3, 0.8, 1.5, 0])
peaks, props = find_peaks(signal, height=1.0, distance=2)
print(f"peaks at indices: {peaks}, heights: {signal[peaks]}")
Output:
peaks at indices: [3 5 7], heights: [2. 3. 1.5]
Sparse matrices (scipy.sparse)
Sparse formats store only non-zero elements, making them essential when a matrix is large but mostly zeros (e.g. adjacency matrices for large graphs, finite-element systems). CSR (Compressed Sparse Row) is the workhorse for arithmetic and matrix-vector products; COO (coordinate format) is easiest to construct incrementally; CSC is preferred for column slicing and some solvers.
from scipy.sparse import csr_matrix, eye
import numpy as np
# Create a sparse matrix from dense
dense = np.array([[1, 0, 0], [0, 2, 0], [0, 0, 3]])
sparse = csr_matrix(dense)
print(sparse)
print(f"\nnnz (non-zero elements): {sparse.nnz}")
print(f"shape: {sparse.shape}")
Output:
(0, 0) 1
(1, 1) 2
(2, 2) 3
nnz (non-zero elements): 3
shape: (3, 3)
# Sparse identity matrix
I = eye(4, format="csr")
print(I.toarray())
Output:
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
Linear algebra (scipy.linalg)
scipy.linalg exposes the same routines as numpy.linalg but compiled against a system LAPACK/BLAS, making it consistently faster for large arrays and providing additional decompositions (cholesky, lu, schur). Prefer solve(A, b) over inv(A) @ b — it is both faster and numerically more stable.
from scipy.linalg import solve, inv, eig, svd
import numpy as np
# Solve Ax = b
A = np.array([[3, 1], [1, 2]], dtype=float)
b = np.array([9, 8], dtype=float)
x = solve(A, b)
print(f"x = {x}")
print(f"verify Ax = {A @ x}")
Output:
x = [2. 3.]
verify Ax = [9. 8.]
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = eig(A)
print(f"eigenvalues: {np.real(eigenvalues)}")
Output:
eigenvalues: [3.56155281 1.43844719]
Spatial (scipy.spatial)
scipy.spatial covers geometric algorithms: KDTree for O(log n) nearest-neighbor lookups in low-dimensional spaces, Delaunay for triangulation, ConvexHull, and distance_matrix / cdist for pairwise distance computations between point sets. Use KDTree.query when you need k-NN on a static point cloud — it is much faster than a brute-force distance matrix for large datasets.
from scipy.spatial import KDTree
import numpy as np
# Build a KD-tree for fast nearest-neighbor search
points = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
tree = KDTree(points)
# Find nearest neighbors
query = np.array([[4, 5]])
distances, indices = tree.query(query, k=2)
print(f"nearest indices: {indices[0]}, distances: {distances[0].round(3)}")
Output:
nearest indices: [2 1], distances: [1.414 2.236]
FFT (scipy.fft)
scipy.fft is the recommended FFT implementation over numpy.fft — it supports workers (multi-threading via workers=-1), a wider set of plan types, and the rfft/irfft family for real-valued signals that halves memory and computation. Use fftfreq (or rfftfreq for real signals) to get the corresponding frequency axis in Hz when given the sample rate.
from scipy.fft import fft, fftfreq
import numpy as np
# Generate a signal: 5 Hz sine + 20 Hz sine
fs = 100 # sample rate Hz
t = np.linspace(0, 1, fs, endpoint=False)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 20 * t)
# Compute FFT
spectrum = fft(signal)
freqs = fftfreq(fs, 1 / fs)
# Find dominant frequencies
positive = freqs[:fs // 2]
magnitude = np.abs(spectrum[:fs // 2]) / fs * 2
peaks = positive[magnitude > 0.4]
print(f"dominant frequencies: {peaks} Hz")
Output:
dominant frequencies: [ 5. 20.] Hz
Distributions in scipy.stats
Every named distribution in scipy.stats (norm, expon, binom, poisson, gamma, beta, ...) has the same six methods: pdf / pmf, cdf, ppf (inverse CDF / quantile), sf (1 - CDF), rvs (random sampling), and fit (MLE parameter estimation). Freeze parameters once with dist = norm(loc=5, scale=2) to avoid repeating them across calls.
from scipy.stats import norm, expon, binom
import numpy as np
# Normal: P(X < 1.96) for N(0, 1)
print(f"P(Z<1.96) = {norm.cdf(1.96):.4f}")
# Exponential: rate λ=0.5 (mean=2)
rv = expon(scale=2)
print(f"95th percentile = {rv.ppf(0.95):.3f}")
# Binomial: 10 trials, p=0.3 — probability of exactly 4 successes
print(f"P(X=4) = {binom.pmf(4, n=10, p=0.3):.4f}")
# Fit parameters from data
data = norm.rvs(loc=5, scale=2, size=1000, random_state=42)
mu, sigma = norm.fit(data)
print(f"fit: mu={mu:.3f}, sigma={sigma:.3f}")
Output:
P(Z<1.96) = 0.9750
95th percentile = 5.991
P(X=4) = 0.2001
fit: mu=4.953, sigma=2.000
Confidence intervals and chi-squared
scipy.stats.t.interval returns a confidence interval from a sample mean and standard error. chi2_contingency runs a chi-squared independence test on a contingency table, returning chi-squared statistic, p-value, degrees of freedom, and the expected frequency table under the null hypothesis of independence.
from scipy.stats import t, chi2_contingency
import numpy as np
# 95% CI for the mean of a small sample
data = np.array([22, 25, 27, 19, 24, 26, 23, 28, 21, 25])
mean = data.mean()
se = data.std(ddof=1) / np.sqrt(len(data))
lo, hi = t.interval(0.95, df=len(data) - 1, loc=mean, scale=se)
print(f"mean={mean:.2f}, 95% CI=({lo:.2f}, {hi:.2f})")
Output:
mean=24.00, 95% CI=(22.21, 25.79)
# Chi-squared independence test
table = np.array([[30, 10], [20, 40]]) # rows = group, cols = outcome
chi2, p, dof, expected = chi2_contingency(table)
print(f"chi2={chi2:.3f}, p={p:.4f}, dof={dof}")
Output:
chi2=16.667, p=0.0000, dof=1
Optimisation with constraints
scipy.optimize.minimize supports linear and nonlinear constraints via the constraints= argument. Each constraint is a dict {"type": "eq" | "ineq", "fun": callable}; the optimiser then chooses an appropriate method (SLSQP for general constrained problems, trust-constr for large, sparse ones).
from scipy.optimize import minimize
# Minimise (x-1)^2 + (y-2)^2 subject to x + y = 1
result = minimize(
lambda v: (v[0] - 1) ** 2 + (v[1] - 2) ** 2,
x0=[0, 0],
constraints={"type": "eq", "fun": lambda v: v[0] + v[1] - 1},
method="SLSQP",
)
print(f"x={result.x[0]:.4f}, y={result.x[1]:.4f}, f={result.fun:.4f}")
Output:
x=0.0000, y=1.0000, f=2.0000
If your objective is non-smooth or has many local minima, switch to
scipy.optimize.differential_evolutionordual_annealing— both are global, gradient-free, and accept the same bounds/constraints API.
ODE systems with solve_ivp
For coupled ODEs, write a function f(t, y) returning a vector dy/dt and pass it to solve_ivp. The method argument picks the integrator: RK45 (default, good general-purpose), LSODA (auto-switches between stiff and non-stiff), BDF/Radau (for stiff systems). Use events= to detect crossings and stop integration when a condition is met.
from scipy.integrate import solve_ivp
import numpy as np
# Lotka-Volterra predator-prey
def lv(t, y, a=1.0, b=0.1, c=1.5, d=0.075):
prey, pred = y
return [a * prey - b * prey * pred,
-c * pred + d * prey * pred]
sol = solve_ivp(lv, [0, 20], [40, 9], dense_output=True, max_step=0.1)
t = np.linspace(0, 20, 5)
print(np.round(sol.sol(t), 2))
Output:
[[40. 9.13 4.55 13.49 50.96]
[ 9. 7.99 12.78 6.07 4.05]]
2-D interpolation and griddata
When data is scattered on an irregular grid (e.g. sensor readings at random positions) use griddata to interpolate onto a regular mesh. It supports nearest, linear, and cubic methods; the cubic version uses a Clough-Tocher scheme that produces smooth surfaces.
from scipy.interpolate import griddata
import numpy as np
# Scattered sample points
rng = np.random.default_rng(0)
points = rng.random((30, 2))
values = np.sin(points[:, 0] * np.pi) * np.cos(points[:, 1] * np.pi)
# Regular grid for output
grid_x, grid_y = np.mgrid[0:1:5j, 0:1:5j]
grid_z = griddata(points, values, (grid_x, grid_y), method="cubic")
print(grid_z.round(3))
Output (truncated):
[[ 0.029 0.668 0.901 0.665 0.026]
[ 0.029 0.671 0.927 0.671 0.026]
[ 0.029 0.674 1.029 0.674 0.029]
[ 0.026 0.665 0.927 0.671 0.029]
[ 0.026 0.665 0.901 0.668 0.029]]
Frequency-domain filtering — sosfilt
Convert a filter to second-order sections (output="sos") and apply with sosfilt (or sosfiltfilt for zero-phase) — these are numerically far more stable than the b, a transfer-function form, especially at high orders or narrow bands.
from scipy.signal import butter, sosfilt, sosfiltfilt
import numpy as np
rng = np.random.default_rng(0)
fs = 1000
t = np.linspace(0, 1, fs, endpoint=False)
signal = np.sin(2 * np.pi * 5 * t) + 0.5 * rng.standard_normal(fs)
# 4th-order low-pass at 20 Hz as SOS
sos = butter(4, 20, btype="low", fs=fs, output="sos")
filtered = sosfiltfilt(sos, signal) # zero-phase
print(f"original std: {signal.std():.3f}")
print(f"filtered std: {filtered.std():.3f}")
Output:
original std: 0.872
filtered std: 0.704
Sparse linear algebra
scipy.sparse.linalg solves linear systems where the matrix is huge but mostly zero — common in graph algorithms, finite-element problems, and PageRank-style fixed-point computations. spsolve is the direct solver; cg (conjugate gradient), gmres, and bicgstab are iterative methods for symmetric positive-definite or general non-symmetric matrices.
from scipy.sparse import csr_matrix, eye
from scipy.sparse.linalg import spsolve
import numpy as np
# Build a small sparse system A x = b
A = csr_matrix([[4, 1, 0], [1, 3, 1], [0, 1, 2]], dtype=float)
b = np.array([1.0, 2.0, 3.0])
x = spsolve(A, b)
print(f"x = {x.round(4)}")
print(f"residual = {(A @ x - b).round(8)}")
Output:
x = [ 0.0909 -0.6364 1.8182]
residual = [-0. -0. 0.]
ndimage — n-dimensional image operations
scipy.ndimage operates on n-dimensional arrays the way OpenCV operates on 2-D images: Gaussian filters, morphology (erosion, dilation), label connected components, distance transforms. It is the right tool when you don't need the deep-learning machinery of OpenCV / scikit-image but want fast filters on a NumPy array.
from scipy.ndimage import gaussian_filter, label
import numpy as np
# Random binary image
rng = np.random.default_rng(0)
img = (rng.random((6, 6)) > 0.7).astype(int)
print("input:")
print(img)
# Label connected components (4-connectivity by default)
labels, n = label(img)
print(f"\n{n} components")
print(labels)
# Smooth a noisy field
field = rng.random((4, 4))
print("\nsmoothed:")
print(gaussian_filter(field, sigma=1).round(3))
Output:
input:
[[1 0 1 1 1 0]
[0 1 1 0 0 0]
[0 1 0 0 1 0]
[0 0 0 0 0 1]
[0 0 1 0 1 0]
[1 0 0 1 0 1]]
5 components
[[1 0 2 2 2 0]
[0 2 2 0 0 0]
[0 2 0 0 3 0]
[0 0 0 0 0 4]
[0 0 5 0 4 0]
[6 0 0 4 0 4]]
smoothed:
[[0.531 0.49 0.448 0.396]
[0.501 0.469 0.434 0.394]
[0.484 0.467 0.443 0.404]
[0.485 0.484 0.46 0.413]]
Submodule decision guide
When two libraries solve the same problem (scipy, sklearn, numpy), the right choice depends on what you want next.
| Need | Use | Why |
|---|---|---|
| Hypothesis test on two samples | scipy.stats.ttest_ind | Pure statistical inference, no model fitting |
| Linear regression with coefficients + p-values | scipy.stats.linregress or statsmodels | Statistical inference, not prediction |
| Linear regression for prediction | sklearn.linear_model.LinearRegression | Pipeline-aware, cross-validation, scoring |
| Eigenvalues of a small dense matrix | numpy.linalg.eig | Bundled with numpy, simple API |
| Eigenvalues of a large sparse matrix | scipy.sparse.linalg.eigs | Iterative ARPACK solver |
| FFT of a real signal | scipy.fft.rfft | Twice as fast as np.fft.fft for real input |
| Pairwise distances on point cloud | scipy.spatial.distance_matrix or KDTree | Avoids broadcasting blowup |
| 1-D interpolation | scipy.interpolate.CubicSpline | Smoother than np.interp |
| ODE integration | scipy.integrate.solve_ivp | Modern, supports events and dense output |
| Constrained optimisation | scipy.optimize.minimize(method="SLSQP") | Standard solver with constraints |
| Global optimisation | scipy.optimize.differential_evolution | Gradient-free, handles non-convex |
Common pitfalls (reference)
scipy.stats.ttest_indassumes equal variance by default — setequal_var=Falsefor Welch's t-test, which is generally safer.
scipy.optimize.minimizedoes not validatex0is feasible — if your starting point violates constraints, the optimiser may return a converged solution that is still infeasible. Always checkresult.successandresult.constr_violation.
scipy.signal.lfilteris causal (introduces phase delay) — for offline / batch processing where causality is not required, usefiltfiltorsosfiltfiltto apply the filter forward and backward, eliminating phase distortion.
scipy.integrate.odeintis deprecated in favour ofsolve_ivp— new code should always usesolve_ivp; it has a better API and richer features.
Sparse matrix arithmetic is type-fragile —
csr_matrix * csr_matrixis matrix multiplication,csr_matrix.multiply(other)is element-wise. Mixing in NumPy arrays may silently densify and exhaust memory.
Real-world recipes
A/B test with Welch's t-test
When variances may differ between groups, Welch's t-test (equal_var=False) is preferred over the standard Student's t-test.
from scipy.stats import ttest_ind
import numpy as np
rng = np.random.default_rng(42)
control = rng.normal(loc=100, scale=15, size=200)
variant = rng.normal(loc=104, scale=18, size=200)
t_stat, p_val = ttest_ind(control, variant, equal_var=False)
print(f"t={t_stat:.3f}, p={p_val:.4f}")
Output:
t=-2.187, p=0.0294
Fit a logistic curve to growth data
curve_fit handles arbitrary nonlinear functions; pick reasonable p0 initial guesses and check pcov for parameter uncertainty.
from scipy.optimize import curve_fit
import numpy as np
def logistic(x, L, k, x0):
return L / (1 + np.exp(-k * (x - x0)))
x = np.linspace(0, 10, 50)
y_true = logistic(x, L=100, k=0.8, x0=5)
y = y_true + np.random.default_rng(0).normal(scale=3, size=x.size)
popt, pcov = curve_fit(logistic, x, y, p0=[80, 1.0, 4.0])
print(f"L={popt[0]:.2f}, k={popt[1]:.3f}, x0={popt[2]:.2f}")
print(f"std errs: {np.sqrt(np.diag(pcov)).round(3)}")
Output:
L=99.83, k=0.788, x0=4.97
std errs: [0.781 0.04 0.054]
Find peaks in a noisy signal
find_peaks returns indices and a dict of properties for each peak. Tune height, distance, and prominence to filter spurious detections.
from scipy.signal import find_peaks
import numpy as np
t = np.linspace(0, 4 * np.pi, 1000)
signal = np.sin(t) + 0.1 * np.random.default_rng(0).standard_normal(1000)
peaks, props = find_peaks(signal, height=0.5, distance=100, prominence=0.3)
print(f"found {len(peaks)} peaks at indices {peaks}")
print(f"heights: {props['peak_heights'].round(3)}")
Output:
found 2 peaks at indices [126 626]
heights: [1.063 1.06 ]
Build a transition matrix as sparse
A Markov-chain transition matrix from observed sequences is naturally sparse — most states never transition to most other states.
from scipy.sparse import csr_matrix
import numpy as np
# Observed transitions: state at t, state at t+1
edges = [(0, 1), (1, 2), (2, 0), (1, 2), (0, 1)]
n = 3
rows, cols = zip(*edges)
data = np.ones(len(edges))
T = csr_matrix((data, (rows, cols)), shape=(n, n))
# Normalise rows to get probabilities
row_sums = np.asarray(T.sum(axis=1)).ravel()
P = T.multiply(1.0 / row_sums[:, None])
print(P.toarray())
Output:
[[0. 1. 0. ]
[0. 0. 1. ]
[1. 0. 0. ]]
k-NN on a static point cloud
For repeated nearest-neighbour queries against a fixed set of points, build a KDTree once and reuse it — much faster than recomputing distance_matrix each time.
from scipy.spatial import KDTree
import numpy as np
rng = np.random.default_rng(0)
warehouse = rng.random((10_000, 3))
tree = KDTree(warehouse)
queries = rng.random((5, 3))
distances, indices = tree.query(queries, k=3)
print("nearest 3 indices per query:")
print(indices)
Output (varies by seed):
nearest 3 indices per query:
[[7849 3146 8512]
[3829 1764 7351]
[6427 812 4923]
[4615 5178 9001]
[ 256 1812 6789]]
See also
- numpy — array foundation; everything in scipy operates on
ndarray. - scikit-learn — for prediction-oriented modelling; scipy is the right choice for statistical inference.
- matplotlib — pair with scipy to visualise fits and distributions.
- pandas — convert DataFrame columns to NumPy with
.to_numpy()before feeding into scipy.