concept · weight 10

APIs

A versioned contract between two pieces of software — endpoints, verbs, payload shapes, errors, and auth — that decouples a caller from an implementation.

APIs

Definition

An API (Application Programming Interface) is a stable, versioned contract that lets one program call another without depending on its internals. In the engineering sense used across this site, an API is almost always a networked contract — REST/JSON over HTTP, gRPC over HTTP/2, GraphQL over HTTP, or a typed RPC layer like tRPC — that fixes the endpoints, verbs, payload schemas, error envelope, authentication, and rate-limit semantics that both sides have to honour. It is the thing that does not change while the implementation behind it churns.

Why it matters

The API is the boundary where teams, processes, languages, and trust domains meet. Get it right and clients can be written by anyone, in any language, against documentation alone; get it wrong and every release becomes a coordinated rollout. The choice of style is where most of the design tradeoffs live:

  • REST over HTTP/JSON — universal client support, cache-friendly (CDNs cache GET responses by URL), and standard HTTP semantics for verbs, status codes, and headers. The default for public APIs (Stripe, GitHub, SendGrid) because every language already speaks it. Cost: over- and under-fetching is common; clients often need multiple round-trips.
  • GraphQL — one endpoint, a query language that lets the client ask for exactly the fields it wants. Wins for complex UIs that aggregate many resources. Cost: HTTP caching is broken (everything is POST to /graphql), and N+1 query problems on the server are the rule, not the exception.
  • gRPC (protobuf over HTTP/2) — binary payloads ~4× smaller than JSON, streaming in both directions, code-generated clients in 10+ languages. The default for service-to-service traffic inside a datacenter. Cost: browsers cannot speak it directly (gRPC-Web is a separate transport), and protobuf changes need disciplined versioning.
  • tRPC — end-to-end TypeScript types with no schema language; the server's function signature is the client's type. Excellent for full-stack TypeScript monorepos (Next.js, Remix). Cost: TypeScript-only — useless as a public API.
  • OpenAPI / AsyncAPI — not API styles but description languages: OpenAPI 3.1 describes synchronous HTTP APIs (now fully aligned with JSON Schema 2020-12), AsyncAPI 3.0 describes message-driven systems (Kafka, MQTT, WebSocket, SSE). Most production systems publish both.

Beyond style, the same five concerns recur on every API regardless of transport: idempotency, versioning, errors, rate limiting, and auth. Skipping any of them is what turns a working integration into a 3 a.m. page.

How it works

A well-behaved HTTP API is a small handful of disciplines that compose. RFC 9110 (the current HTTP semantics spec, published June 2022) is the foundation; everything below either implements it or extends it.

Verbs carry meaning. GET, HEAD, OPTIONS, TRACE, PUT, and DELETE are idempotent — repeating the same request has the same net effect as a single one, which is what makes safe retry-on-timeout possible. POST and PATCH are not idempotent; if a client retries a POST after a network blip it may create two resources. The standard fix is an Idempotency-Key header (popularised by Stripe): the server stores the result of the first request keyed by the header and replays it on subsequent retries within a TTL.

Status codes are part of the contract. 2xx for success, 3xx for redirection, 4xx for client mistakes the client must fix, 5xx for server faults the client should retry. 429 Too Many Requests specifically signals rate-limiting and should carry a Retry-After header (seconds, or an HTTP-date). 409 Conflict and 412 Precondition Failed are the right answers for optimistic-concurrency disputes — do not return 500.

Errors get an envelope. RFC 9457 Problem Details for HTTP APIs (July 2023, obsoletes RFC 7807) defines a single JSON shape — type, title, status, detail, instance — served as application/problem+json. Extension members carry domain-specific fields (validation errors, retry hints, trace IDs). One parser handles every API that follows it; ad-hoc {error: "..."} envelopes do not compose.

Rate limits are advertised, not just enforced. Return X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset (or the standard RateLimit / RateLimit-Policy headers from the IETF draft) on every response so clients can self-regulate. Token-bucket is the most common server-side algorithm — it absorbs short bursts while capping the long-term rate.

Versioning is forever. Public APIs almost universally pin the major version in the URL (/v1/users, /v2/users) and roll minor/patch changes in place under the same path. Inside the org, SemVer (MAJOR.MINOR.PATCH) is the right mental model: bug fixes bump patch, additive changes bump minor, breaking changes force a new major and a deprecation window. Header-based versioning (Accept: application/vnd.acme.v2+json) keeps URLs clean but hurts discoverability — most teams find the URL approach simpler.

Streaming is a first-class response shape. When the result is incremental — LLM tokens, log tails, progress updates — Server-Sent Events (SSE) over a single GET is the simplest pattern. The client opens an EventSource, the server keeps the connection open and writes data: frames. The Anthropic and OpenAI APIs both use SSE for token streaming; OpenAPI 3.2 (Sept 2025) added native media-type support for SSE, JSON Lines, and JSON Sequences. For bidirectional streaming, WebSockets or gRPC streams are the alternatives.

Auth is layered onto, not into, the URL. Bearer tokens in Authorization: Bearer <token> are the modern default; never put secrets in query strings (they leak into logs, referrers, and browser history). For machine-to-machine, OAuth 2.0 client credentials or short-lived signed tokens (JWT with rotating keys) are standard; for interactive flows, OAuth 2.1 with PKCE is the current baseline. Cloud-vendor APIs commonly require a signed-request scheme on top of the bearer model — AWS Signature V4 is the canonical example, and modern CLIs now sign natively (curl --aws-sigv4 … from curl 7.75+, OAuth2 bearer via curl --oauth2-bearer … from 7.33+) so scripts no longer need a separate signing helper. For browser-facing APIs that handle PII, post-quantum hybrid key exchange (X25519MLKEM768) is being rolled out at the TLS layer by Cloudflare, Google, AWS, and Anthropic during 2025–2026 — the change is transparent to API code but does mean curl/openssl builds need to be modern (curl ≥ 8.10, OpenSSL ≥ 3.5) for clients to negotiate it.

Supply-chain integrity is part of the API surface. Once an API ships SDKs, container images, or downloadable binaries, the integrity of those artifacts is part of the contract: a tampered SDK can exfiltrate the very bearer tokens the API issues. Sigstore-backed build attestations are now the platform default — gh attestation verify oci://ghcr.io/owner/repo:tag --owner owner enforces the SLSA v1 provenance predicate and confirms the artifact really came from the GitHub repo, commit, and workflow it claims. For high-trust APIs, publishing attestations alongside SDK releases (and verifying them in install scripts) closes the gap between "the API is authenticated" and "the code calling it has not been swapped out".

HTTP/3 in production

HTTP/3 over QUIC is now the default at the major edges: Cloudflare, Google, and Meta all serve HTTP/3 to any client that advertises h3 in Alt-Svc, and the QUIC stack avoids HTTP/2's head-of-line blocking by running each stream on its own UDP "flow". For an API in front of a CDN, this means fewer tail-latency spikes on mobile networks with packet loss and faster recovery from network changes (the QUIC connection ID survives an IP change, so a phone switching from Wi-Fi to LTE keeps its TLS session).

The client side has caught up too. curl --http3 and curl --http3-only are supported in modern builds (ngtcp2 backend is non-experimental; quiche and OpenSSL-QUIC are still flagged experimental as of curl 8.20); wget2 --compression=br,zstd adds HTTP/2 multiplexing and modern compression for batch jobs even when HTTP/3 isn't an option. For an API team, the practical implications are: (1) advertise HTTP/3 via Alt-Svc: h3=":443"; ma=86400; (2) test with curl --http3-only to catch regressions where the edge silently downgrades; and (3) confirm any proxy or WAF in front of the API actually speaks QUIC end-to-end rather than terminating it at the edge.

AI / LLM APIs as a new shape

LLM APIs (Anthropic, OpenAI, Google Gemini, GitHub Models) share a stable shape that has settled in 2025–2026: a small REST surface for inference, SSE for token streaming, JSON-schema-typed tool use (function calling) as the canonical agentic primitive, and an artifact-like resource for long-lived context (uploaded files, persistent memory, prompt caches). The contract is REST in form but the semantics are different — requests are expensive (seconds to minutes, not milliseconds), idempotency keys deduplicate billable retries, and a single response can contain interleaved text, tool-call requests, and citations.

Two shifts are worth calling out:

  • Agentic API surfaces. Tool use, MCP (Model Context Protocol), and CLI integrations like gh copilot and gh models run <model> "<prompt>" mean an API is increasingly consumed by an agent loop rather than a single client request. The agent reads the OpenAPI/JSON-schema description, decides which endpoint to call, executes, and feeds the response back into the next LLM turn. APIs designed for this loop document tool names and parameter schemas as carefully as the REST surface itself.
  • Inference-platform CLIs as a new caller class. GitHub Models (gh models list, gh models eval ./prompts/foo.prompt.yml) brings model evaluation into the same CLI used for issues and PRs, authenticated with the same GITHUB_TOKEN. The pattern — one CLI, many model providers, one authentication context — is becoming the standard developer surface and is what API teams should benchmark their own clients against.

Common pitfalls

  1. Retrying non-idempotent POST on timeout — you do not know whether the server saw the request. Use an Idempotency-Key so the retry is safe, or convert the operation to PUT against a client-chosen resource ID.
  2. Returning 200 OK with an error body — breaks every generic HTTP client and CDN. Use the right status code and a Problem Details envelope.
  3. No versioning until v1.1 — once a single external client exists, breaking changes cost money and trust. Ship /v1 on day one even if v2 feels years away.
  4. Caching POST responses — browsers and CDNs will not, which is one reason GraphQL-over-POST loses REST's caching for free. For read-heavy GraphQL, persisted queries + GET is the workaround.
  5. Hiding rate limits from the client — clients that cannot see X-RateLimit-Remaining will hammer until they hit 429, then back off blindly. Publish the budget.
  6. No Retry-After on 429 or 503 — without it, every client invents its own backoff, usually badly. Set the header.
  7. OpenAPI as an afterthought — spec generated from code is fine; spec hand-written to match approximate behaviour is a lie. Either drive the spec from a contract-test suite or generate it from typed handlers (FastAPI, Litestar, NestJS all do this).
  8. gRPC for browser clients — browsers cannot speak gRPC directly. Use gRPC-Web, ConnectRPC, or an HTTP/JSON gateway in front.
  9. Leaking internal error detail — stack traces, SQL fragments, and internal hostnames in 5xx responses are an information-disclosure vulnerability. The detail field should be safe for end users; debugging info goes to logs keyed by a trace ID returned to the client.
  10. PATCH without a media type — JSON Merge Patch (RFC 7396) and JSON Patch (RFC 6902) are different formats with different semantics. Pick one and advertise it in Content-Type.
  11. Assuming HTTP/3 is end-to-end — the edge may advertise Alt-Svc: h3=… while the WAF, load balancer, or origin behind it speak only HTTP/2. Test with curl --http3-only and check %{http_version} via --write-out instead of trusting the headline rollout.
  12. Shipping SDKs without build attestations — a tampered SDK is an authenticated client by definition. Publish Sigstore-backed attestations alongside every release and document gh attestation verify (or the language-specific equivalent) in the install instructions.

Where to go next

Sources

References consulted while writing this concept page. Links open in a new tab.

  • RFC 9110 — HTTP Semantics — Authoritative spec (June 2022) for HTTP methods, status codes, headers, and the formal definitions of safe and idempotent methods that anchor the "How it works" section.
  • RFC 9457 — Problem Details for HTTP APIs — The standard JSON error envelope (type, title, status, detail, instance); obsoletes RFC 7807 and is now the recommended error format.
  • curl — HTTP/3 in curl — Backend matrix (ngtcp2 non-experimental; quiche / OpenSSL-QUIC experimental) and --http3 / --http3-only semantics used in the HTTP/3 production subsection.
  • Cloudflare — Post-quantum TLS rolloutX25519MLKEM768 hybrid key exchange now default at the Cloudflare edge; underpins the post-quantum note in the auth subsection.
  • gh attestation verify — GitHub CLI manual — SLSA-v1-by-default Sigstore verification of release artifacts and OCI images cited in the supply-chain paragraph and pitfall #12.
  • SLSA v1 Provenance specification — Predicate format the gh attestation flow enforces; ground truth for the "what an attestation actually contains" framing.
  • OpenAPI release notes (Speakeasy) — Source for OpenAPI 3.1 / 3.2 facts (JSON Schema 2020-12 alignment; SSE / JSON Lines media-type support added Sept 2025).
  • Stripe — Rate limits — Industry-standard reference for 429, Retry-After, and token-bucket-with-burst behaviour cited in the rate-limit guidance and pitfalls list.