cheat sheet
numpy
Create and manipulate N-dimensional arrays with NumPy. Covers array creation, broadcasting, vectorized math, indexing, and matrix operations.
numpy — Numerical Arrays
What it is
NumPy provides the ndarray — a fixed-type, C-backed multi-dimensional array. It is the foundation of the entire Python scientific stack (pandas, scipy, matplotlib all build on it). Vectorized operations on ndarray are typically 100–1000× faster than equivalent pure-Python loops.
Install
pip install numpy
Output: (none — exits 0 on success)
Quick example
import numpy as np
a = np.array([1, 2, 3, 4, 5])
print(a * 2)
print(f"mean={a.mean():.1f}, std={a.std():.4f}")
print(np.dot(a, a))
Output:
[ 2 4 6 8 10]
mean=3.0, std=1.4142
55
When / why to use it
- You need fast numeric computation without pandas overhead.
- Working with images, audio, or other multi-dimensional data (images are just 3-D arrays).
- Linear algebra (matrix multiply, decompositions).
- Feeding data into machine learning libraries (all expect numpy arrays or array-like objects).
Common pitfalls
dtypedefaults can surprise you —np.array([1, 2, 3])createsint64, butnp.zeros((3, 3))createsfloat64. Mixing dtypes in operations silently upcasts. Be explicit:np.array([1, 2], dtype=np.float32).
Views vs copies — slicing a numpy array returns a view, not a copy. Modifying the slice modifies the original. Use
.copy()when you need independence:b = a[1:3].copy().
Check shapes before operations:
a.shape,a.dtype,a.ndim. Mismatched shapes are the #1 cause ofValueError: operands could not be broadcast.
Array creation
NumPy provides a family of constructors for the common cases: np.array() converts Python lists; np.zeros/np.ones/np.full fill with a constant; np.arange mirrors Python's range but returns an array; np.linspace gives evenly spaced floats inclusive of the endpoint. Always specify dtype explicitly when the default (float64 or int64) isn't what you need.
import numpy as np
np.array([1, 2, 3]) # from list
np.zeros((3, 4)) # 3×4 zeros (float64)
np.ones((2, 2), dtype=int) # 2×2 ones (int)
np.eye(3) # 3×3 identity
np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1.0]
np.random.default_rng(42).random((3, 3)) # random 3×3 (seeded)
Indexing and slicing
Basic indexing (a[1, 2], a[:, 1]) returns a view — modifying it modifies the original. Boolean masks (a[a > 5]) and fancy indexing (integer arrays) return copies. For multi-dimensional arrays, each comma-separated expression selects along one axis.
a = np.arange(12).reshape(3, 4)
print(a)
print(a[1, 2]) # row 1, col 2
print(a[:, 1]) # all rows, col 1
print(a[a > 5]) # boolean mask → 1-D array
Output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
6
[1 5 9]
[ 6 7 8 9 10 11]
Richer example — matrix operations and broadcasting
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print("Matrix product:")
print(A @ B)
print("\nElement-wise product:")
print(A * B)
# Broadcasting: add a row vector to every row of a matrix
offsets = np.array([10, 20])
print("\nBroadcast add:")
print(A + offsets)
# Trig over evenly spaced angles
x = np.linspace(0, 2 * np.pi, 5)
print("\nsin values:")
print(np.round(np.sin(x), 4))
Output:
Matrix product:
[[19 22]
[43 50]]
Element-wise product:
[[ 5 12]
[21 32]]
Broadcast add:
[[11 22]
[13 24]]
sin values:
[ 0. 1. 0. -1. -0.]
Useful functions quick reference
| Function | Purpose |
|---|---|
np.concatenate([a, b], axis=0) | Join arrays along an axis |
np.stack([a, b], axis=0) | Stack along a new axis |
np.split(a, 3) | Split into N equal parts |
np.sort(a) | Sorted copy |
np.argsort(a) | Indices that would sort the array |
np.unique(a) | Unique values |
np.where(a > 0, a, 0) | Conditional element selection |
np.clip(a, 0, 1) | Clamp values to [0, 1] |
np.linalg.norm(a) | Euclidean norm |
np.linalg.eig(A) | Eigenvalues and eigenvectors |
np.fft.fft(signal) | Fast Fourier transform |
dtypes in depth
A NumPy dtype records the binary layout of each element: kind (int, float, complex, bool, bytes, datetime), width in bytes, and byte order. Mismatched dtypes silently upcast (an int8 + float64 returns float64) and can blow up memory by 8x on big arrays, so set the dtype explicitly whenever the default doesn't match the data.
import numpy as np
print(np.array([1, 2, 3]).dtype) # platform default int (usually int64)
print(np.array([1.0, 2.0]).dtype) # float64
print(np.array([True, False]).dtype) # bool, 1 byte each
print(np.array(["abc", "de"]).dtype) # <U3 (Unicode, width 3)
print(np.zeros(3, dtype=np.float32).itemsize) # 4 bytes per element
print(np.zeros(3, dtype=np.float64).itemsize) # 8 bytes per element
Output:
int64
float64
bool
<U3
4
8
Cast with .astype() — it always returns a copy:
a = np.array([1.7, 2.3, 3.9])
print(a.astype(np.int32)) # truncates toward zero
print(np.round(a).astype(int)) # round-half-to-even, then cast
Output:
[1 2 3]
[2 2 4]
For 100M+ element arrays consider
float32overfloat64andint8/int16overint64— half/quarter the RAM and faster on SIMD hardware. The downside is reduced range and precision.
Broadcasting rules
Broadcasting is how NumPy applies an operation to arrays of different shapes without materialising an explicit copy. The rules: starting from the right, two dimensions are compatible if they are equal or one of them is 1. A length-1 axis is virtually "stretched" along the other operand's matching axis. If any non-1 pair disagrees, you get ValueError: operands could not be broadcast together.
import numpy as np
a = np.arange(12).reshape(3, 4) # (3, 4)
row = np.array([10, 20, 30, 40]) # (4,) — stretched along axis 0
col = np.array([[100], [200], [300]]) # (3, 1) — stretched along axis 1
print(a + row)
print()
print(a + col)
Output:
[[10 21 32 43]
[14 25 36 47]
[18 29 40 51]]
[[100 101 102 103]
[204 205 206 207]
[308 309 310 311]]
Use np.newaxis (or None) to introduce a length-1 axis when you need to broadcast against a different axis:
a = np.array([1, 2, 3])
b = np.array([10, 20])
# outer-product style (3,) * (2,) -> (3, 2)
print(a[:, np.newaxis] * b)
Output:
[[10 20]
[20 40]
[30 60]]
Fancy indexing and boolean masks
Fancy indexing uses integer arrays to pick arbitrary elements; boolean indexing uses a same-shape bool array as a mask. Both return copies — unlike basic slicing, you cannot use them to assign through a view in chained form. Combined indexing (a[idx, :]) can be cryptic, so name your index arrays.
import numpy as np
a = np.arange(10) * 10
# Fancy indexing — pick specific positions
idx = np.array([0, 3, 7, 9])
print(a[idx])
# Boolean mask
mask = (a > 20) & (a < 80)
print(a[mask])
# 2-D fancy indexing — pair rows with cols
m = np.arange(16).reshape(4, 4)
print(m[[0, 2, 3], [1, 1, 3]]) # picks (0,1), (2,1), (3,3)
Output:
[ 0 30 70 90]
[30 40 50 60 70]
[ 1 9 15]
a[mask] = 0modifiesain place even thougha[mask]returns a copy, because Python translates this intoa.__setitem__(mask, 0). The same does NOT work fora[mask][:] = 0— the intermediate copy is what gets mutated.
Vectorisation vs Python loops
A Python-level for loop on an ndarray is 50–500x slower than the equivalent vectorised expression because each iteration crosses the C/Python boundary. Almost any element-wise operation has a ufunc form (+, np.sin, np.where); use it. When you absolutely cannot vectorise, drop into Numba, Cython, or a list comprehension over scalars — never for i in range(len(a)): a[i] = ....
import numpy as np
import time
x = np.arange(1_000_000, dtype=np.float64)
# Slow: Python loop
t0 = time.perf_counter()
out = np.empty_like(x)
for i in range(len(x)):
out[i] = x[i] ** 2 + 1
slow = time.perf_counter() - t0
# Fast: vectorised
t0 = time.perf_counter()
out = x ** 2 + 1
fast = time.perf_counter() - t0
print(f"loop: {slow*1000:.1f} ms")
print(f"vec: {fast*1000:.1f} ms")
print(f"speedup: {slow/fast:.0f}x")
Output (typical):
loop: 240.0 ms
vec: 3.2 ms
speedup: 75x
Linear algebra (np.linalg)
np.linalg covers the common matrix routines: solve (Ax=b), inv, det, eig, svd, qr, cholesky, and matrix_rank. Always prefer solve(A, b) over inv(A) @ b — it is faster and numerically more stable. For larger problems or specialised decompositions, switch to scipy.linalg, which links to system LAPACK/BLAS.
import numpy as np
A = np.array([[3.0, 1.0], [1.0, 2.0]])
b = np.array([9.0, 8.0])
x = np.linalg.solve(A, b)
print("x =", x)
print("det =", np.linalg.det(A))
print("rank =", np.linalg.matrix_rank(A))
# Singular Value Decomposition
U, S, Vt = np.linalg.svd(A)
print("singular values =", S.round(4))
Output:
x = [2. 3.]
det = 5.0
rank = 2
singular values = [3.6180 1.382 ]
Random numbers — modern Generator API
The legacy np.random.rand / np.random.seed functions use a global, non-thread-safe state and are deprecated for new code. Use np.random.default_rng(seed) instead — it returns a Generator with explicit state, better statistical quality (PCG64), and faster sampling. Always seed in tests and pipelines so results are reproducible.
import numpy as np
rng = np.random.default_rng(42)
print(rng.random(3)) # uniform [0, 1)
print(rng.integers(0, 10, size=5)) # uniform ints [0, 10)
print(rng.normal(loc=0.0, scale=1.0, size=4)) # N(0, 1)
print(rng.choice(["A", "B", "C"], size=5, p=[0.5, 0.3, 0.2]))
print(rng.permutation(np.arange(5))) # shuffled copy
Output:
[0.77395605 0.43887844 0.85859792]
[1 7 6 4 4]
[ 0.42175859 -1.06560158 0.54470015 0.0717954 ]
['A' 'A' 'B' 'A' 'C']
[2 4 0 3 1]
Structured arrays and record arrays
A structured dtype lets one ndarray hold heterogeneous fields per element — like a fixed-schema, in-memory database row. Useful for parsing binary file formats and interop with C structs, but for general tabular data, pandas or polars is almost always a better choice.
import numpy as np
dt = np.dtype([("name", "U10"), ("age", "i4"), ("weight", "f4")])
people = np.array(
[("Alice", 30, 65.5), ("Bob", 25, 78.2), ("Carol", 35, 60.1)],
dtype=dt,
)
print(people)
print(people["name"])
print(people["age"].mean())
Output:
[('Alice', 30, 65.5) ('Bob', 25, 78.2) ('Carol', 35, 60.1)]
['Alice' 'Bob' 'Carol']
30.0
Memory layout — C vs Fortran order
Multi-dimensional arrays store elements as a flat memory buffer; C order (the default) is row-major (last axis varies fastest), F order is column-major (first axis varies fastest, like MATLAB/Fortran). Choosing the right order can speed up cache-friendly access patterns 2–10x; mismatches force a copy when handing arrays to libraries that demand one or the other (e.g. BLAS).
import numpy as np
a = np.arange(12).reshape(3, 4) # C-order (default)
print(a.flags["C_CONTIGUOUS"], a.flags["F_CONTIGUOUS"])
print(a.strides) # (32, 8): step 32 bytes per row, 8 per col
b = np.asfortranarray(a)
print(b.flags["C_CONTIGUOUS"], b.flags["F_CONTIGUOUS"])
print(b.strides) # (8, 24): step 8 bytes per row, 24 per col
# Transpose is free — it just re-labels strides, no copy
print(a.T.flags["F_CONTIGUOUS"])
Output:
True False
(32, 8)
False True
(8, 24)
True
Saving and loading
NumPy's native binary formats round-trip dtypes and shapes exactly, unlike CSV. Use .npy for a single array, .npz for multiple, and memmap when an array is too big for RAM but you only need to touch a slice at a time. For interop with other languages, prefer HDF5 (via h5py) or Parquet.
import numpy as np
a = np.arange(12).reshape(3, 4)
b = np.linspace(0, 1, 5)
# Single array
np.save("a.npy", a)
loaded = np.load("a.npy")
# Multiple arrays in one file
np.savez("bundle.npz", first=a, second=b)
data = np.load("bundle.npz")
print(data["first"].shape, data["second"].shape)
# Compressed
np.savez_compressed("bundle.npz", first=a, second=b)
# Memory-mapped — read/write without loading into RAM
mm = np.memmap("big.dat", dtype="float32", mode="w+", shape=(10_000, 10_000))
mm[0, 0] = 3.14
mm.flush() # write to disk
del mm # close
mm2 = np.memmap("big.dat", dtype="float32", mode="r", shape=(10_000, 10_000))
print(mm2[0, 0])
Output:
(3, 4) (5,)
3.14
Interop with pandas, polars, and Arrow
A pandas Series or DataFrame exposes its underlying buffer through .to_numpy(); numeric DataFrames give a 2-D array with no copy when dtypes are uniform. Polars uses Arrow internally so df.to_numpy() is zero-copy for fixed-width numeric columns and a quick memory-buffer copy otherwise. Going the other way is just pd.DataFrame(arr) or pl.from_numpy(arr).
import numpy as np
import pandas as pd
a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float64)
df = pd.DataFrame(a, columns=["x", "y", "z"])
back = df.to_numpy()
print(back.dtype, back.shape)
# polars
import polars as pl
pl_df = pl.from_numpy(a, schema=["x", "y", "z"])
print(pl_df.to_numpy().shape)
Output:
float64 (2, 3)
(2, 3)
GPU and JIT alternatives
When CPU-bound NumPy code is the bottleneck, three drop-in replacements expose nearly identical APIs:
- CuPy — NumPy on NVIDIA GPUs.
import cupy as cpand usecp.array(...); most functions matchnp.*. Best for large arrays and operations that fit in GPU memory. - JAX —
jax.numpy as jnpprovides NumPy semantics plus automatic differentiation, JIT compilation via XLA, and seamless GPU/TPU execution. Arrays are immutable; in-place mutation becomesx.at[i].set(v). - Numba —
@numba.njitdecorates a Python function and compiles it to native code at first call. Best for tight numeric loops that don't vectorise cleanly.
# Numba example
from numba import njit
import numpy as np
@njit
def sum_squares(x):
total = 0.0
for i in range(len(x)):
total += x[i] * x[i]
return total
x = np.arange(1_000_000, dtype=np.float64)
print(sum_squares(x))
Output:
3.3333328333335e+17
Before reaching for GPU/JIT, confirm you have actually vectorised the NumPy version. A clean NumPy expression often outperforms a hand-rolled Numba loop on small or medium arrays because of better cache behaviour.
Real-world recipes
Image processing: invert and normalise
Images are just H × W × 3 uint8 arrays. Convert to float, normalise to [0, 1], invert, save back.
import numpy as np
from PIL import Image
img = np.asarray(Image.open("photo.jpg")) # uint8, shape (H, W, 3)
arr = img.astype(np.float32) / 255.0 # normalise
inverted = 1.0 - arr
out = (inverted * 255).astype(np.uint8)
Image.fromarray(out).save("inverted.jpg")
Audio: detect zero-crossings
A zero-crossing is where the signal sign flips; counting them gives a rough frequency estimate.
import numpy as np
signal = np.sin(np.linspace(0, 10 * np.pi, 5000))
sign = np.sign(signal)
crossings = np.where(np.diff(sign) != 0)[0]
print(f"zero-crossings: {len(crossings)}")
Output:
zero-crossings: 10
Sliding-window mean (cumulative-sum trick)
For a flat numeric series, a sliding window mean via cumsum is O(n) and beats np.convolve for large windows.
import numpy as np
x = np.arange(1, 11, dtype=float)
w = 3
cs = np.cumsum(np.insert(x, 0, 0))
windowed = (cs[w:] - cs[:-w]) / w
print(windowed)
Output:
[2. 3. 4. 5. 6. 7. 8. 9.]
One-hot encode integer labels
Useful before feeding labels into a neural net or sklearn.linear_model.
import numpy as np
labels = np.array([0, 2, 1, 2, 0])
n_classes = labels.max() + 1
one_hot = np.eye(n_classes, dtype=np.int8)[labels]
print(one_hot)
Output:
[[1 0 0]
[0 0 1]
[0 1 0]
[0 0 1]
[1 0 0]]
Pairwise Euclidean distance matrix
Broadcasting computes a full pairwise distance matrix without an explicit loop.
import numpy as np
pts = np.array([[0, 0], [3, 4], [6, 8]], dtype=float)
diff = pts[:, np.newaxis, :] - pts[np.newaxis, :, :] # (N, N, 2)
dist = np.linalg.norm(diff, axis=-1)
print(dist)
Output:
[[ 0. 5. 10.]
[ 5. 0. 5.]
[10. 5. 0.]]
For larger point clouds use scipy.spatial.distance_matrix or scipy.spatial.KDTree — they avoid the full N×N intermediate.
See also
- pandas and polars — DataFrame libraries that wrap NumPy/Arrow buffers.
- scipy — domain-specific algorithms (stats, optimize, signal, sparse) on top of NumPy.
- matplotlib — pairs naturally with NumPy for plotting arrays.
- scikit-learn — ML models consume NumPy arrays directly.