cheat sheet

requests

Make HTTP requests in Python with the requests library. Covers GET/POST, JSON, sessions, authentication, retries, and common pitfalls.

requests — HTTP for Humans

What it is

Requests is a Python HTTP client library created by Kenneth Reitz that wraps urllib3 with a human-friendly API — get(), post(), sessions, auth helpers, automatic JSON decoding, and streaming. It is the most-downloaded Python package on PyPI and the standard starting point for any HTTP work in Python. For projects that need async/await or HTTP/2, use httpx instead, which offers a nearly identical API with async support.

Install

bash
pip install requests

Output: (none — exits 0 on success)

Quick example

python
import requests

resp = requests.get("https://httpbin.org/json")
resp.raise_for_status()           # raises HTTPError for 4xx/5xx
data = resp.json()
print(resp.status_code)
print(data["slideshow"]["title"])

Output:

text
200
Sample Slide Show

When / why to use it

  • Any synchronous HTTP call — REST APIs, scraping, file downloads.
  • When you need a battle-tested, widely-supported client with broad documentation.
  • When you don't need async (httpx is the async-capable alternative).

Common pitfalls

No timeout by defaultrequests.get(url) will hang indefinitely if the server stalls. Always pass timeout=(connect, read):

python
requests.get(url, timeout=(3.05, 27))

SSL verification — never disable verify=False in production. It silently makes every request vulnerable to MITM attacks. If a corporate proxy breaks SSL, install the proxy's CA cert instead.

raise_for_status() placement — call it before trying to parse .json(). A 4xx/5xx response body may not be valid JSON.

Richer example — sessions and retries

python
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
retry = Retry(total=3, backoff_factor=0.5, status_forcelist=[500, 502, 503, 504])
session.mount("https://", HTTPAdapter(max_retries=retry))

resp = session.post(
    "https://httpbin.org/post",
    json={"user": "alice", "action": "login"},
    headers={"X-App-Version": "1.0"},
    timeout=(3.05, 10),
)
resp.raise_for_status()
body = resp.json()
print(body["json"])
print(body["headers"]["Content-Type"])

Output:

text
{'action': 'login', 'user': 'alice'}
application/json

Essential options reference

ParameterExampleNotes
paramsparams={"page": 1}Appended as query string
jsonjson={"k": "v"}Encodes body as JSON, sets Content-Type header
datadata={"field": "val"}Form-encoded body
headersheaders={"Auth": "Bearer tok"}Merged with session headers
timeouttimeout=(3, 10)(connect timeout, read timeout) in seconds
authauth=("user", "pass")HTTP Basic auth
streamstream=TrueStream large responses without buffering
verifyverify="/path/to/ca.pem"CA bundle for TLS verification

Streaming large downloads

stream=True keeps the response body on the socket instead of pulling it all into memory. Iterate with iter_content(chunk_size) for binary downloads or iter_lines() for line-oriented streams (logs, NDJSON). Always wrap the response in a with block so the connection is released back to the pool when you're done — otherwise it leaks until garbage collection.

python
with requests.get("https://example.com/large.zip", stream=True, timeout=30) as r:
    r.raise_for_status()
    with open("large.zip", "wb") as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)
print("Download complete")

Output:

text
Download complete

Sessions — the right default

A Session is the right default for any program that makes more than one request. It reuses the underlying TCP/TLS connection (huge win for HTTPS — TLS handshakes are expensive), persists cookies and headers across calls, and exposes per-protocol adapters for retry policies, proxies, and TLS settings. Treat a one-shot requests.get(url) as a convenience for scripts; for libraries, services, and CLIs, build a Session at startup.

python
import requests
from requests.adapters import HTTPAdapter

session = requests.Session()

# Headers + auth that apply to every call on this session
session.headers.update({
    "User-Agent": "myapp/1.0",
    "Accept": "application/json",
})
session.auth = ("alicedev", "api-token")        # HTTP Basic on every call

# Tune connection pooling for high concurrency / many hosts
adapter = HTTPAdapter(pool_connections=20, pool_maxsize=50, pool_block=False)
session.mount("https://", adapter)
session.mount("http://", adapter)

# Per-call options override the session defaults
resp = session.get("https://api.example.com/users", params={"page": 1}, timeout=10)

Always close sessions you own: session.close() or use with requests.Session() as session: .... Long-lived sessions leak file descriptors and DNS cache entries if abandoned.

Retries with urllib3.util.Retry

urllib3 ships a battle-tested Retry class that the HTTPAdapter mounts onto a session. Configure it once and every request through that session inherits retry behavior — total attempts, backoff schedule, status codes to retry on, methods to consider idempotent, and whether to honor Retry-After headers from the server.

python
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

retry_strategy = Retry(
    total=5,                                          # max retries (any reason)
    connect=3, read=3,                                # per-error-class caps
    status=5,                                         # retry on bad statuses
    backoff_factor=0.5,                               # 0.5, 1.0, 2.0, 4.0, …
    status_forcelist=(429, 500, 502, 503, 504),
    allowed_methods=("GET", "HEAD", "PUT", "DELETE", "OPTIONS", "POST"),
    respect_retry_after_header=True,                  # honor server Retry-After
    raise_on_status=False,                            # let raise_for_status() decide
)
adapter = HTTPAdapter(max_retries=retry_strategy)

session = requests.Session()
session.mount("https://", adapter)
session.mount("http://", adapter)

# Anything you call through `session` now retries automatically
resp = session.post("https://api.example.com/events", json={"event": "click"}, timeout=10)
resp.raise_for_status()

By default, Retry does not retry POST — it's treated as non-idempotent. Add "POST" to allowed_methods only if your endpoint is genuinely idempotent (e.g. an Idempotency-Key-aware API), otherwise you risk double-writes.

Authentication patterns

requests ships HTTP Basic and HTTP Digest auth out of the box (auth=). For bearer tokens, signed requests, or rotating credentials, subclass requests.auth.AuthBase — your __call__(request) method sets headers (or rewrites the URL) right before the request is sent. The same hook is how third-party libraries plug OAuth1, OAuth2, AWS SigV4, and HMAC into the same session API.

python
from requests.auth import HTTPBasicAuth, HTTPDigestAuth, AuthBase
import hmac, hashlib, time

# Built-ins
session.auth = HTTPBasicAuth("alicedev", "pw")
session.auth = HTTPDigestAuth("alicedev", "pw")

# Bearer token
class BearerAuth(AuthBase):
    def __init__(self, token: str):
        self.token = token
    def __call__(self, r):
        r.headers["Authorization"] = f"Bearer {self.token}"
        return r

# HMAC-signed requests
class HMACAuth(AuthBase):
    def __init__(self, key_id: str, secret: bytes):
        self.key_id, self.secret = key_id, secret
    def __call__(self, r):
        ts = str(int(time.time()))
        payload = (r.method + r.path_url + ts + (r.body or "")).encode()
        sig = hmac.new(self.secret, payload, hashlib.sha256).hexdigest()
        r.headers["X-Key-Id"] = self.key_id
        r.headers["X-Timestamp"] = ts
        r.headers["X-Signature"] = sig
        return r

session.auth = HMACAuth("kid_123", b"shared-secret")
resp = session.post("https://api.example.com/events", json={"event": "click"})

File uploads — multipart and chunked

requests handles multipart uploads through the files= parameter — pass a dict (or a list of tuples for multiple values with the same field name). For huge files, stream from disk so you don't load gigabytes into memory; combine with requests-toolbelt's MultipartEncoder for true chunked uploads.

python
# Single file
with open("photo.jpg", "rb") as f:
    resp = session.post(
        "https://api.example.com/uploads",
        files={"photo": ("photo.jpg", f, "image/jpeg")},
        data={"caption": "Sunset"},   # form fields alongside the file
    )

# Multiple files (same field name)
files = [
    ("photos", ("a.jpg", open("a.jpg", "rb"), "image/jpeg")),
    ("photos", ("b.jpg", open("b.jpg", "rb"), "image/jpeg")),
]
session.post("https://api.example.com/album", files=files)

# True streaming multipart for huge files
# pip install requests-toolbelt
from requests_toolbelt.multipart.encoder import MultipartEncoder, MultipartEncoderMonitor

def show_progress(monitor):
    pct = monitor.bytes_read / monitor.len * 100
    print(f"\r{pct:5.1f}%  ({monitor.bytes_read:,}/{monitor.len:,})", end="")

encoder = MultipartEncoder(fields={
    "name": "big-video.mp4",
    "file": ("big-video.mp4", open("big-video.mp4", "rb"), "video/mp4"),
})
monitor = MultipartEncoderMonitor(encoder, show_progress)
session.post(
    "https://api.example.com/uploads",
    data=monitor,
    headers={"Content-Type": monitor.content_type},
)

Request and response hooks

hooks is a per-session or per-request callback dict. The response hook fires after every response and is the cleanest place to add logging, metrics, or last-mile error translation without changing the call sites. Hooks return either None (the response passes through unchanged) or a new response object.

python
import logging
import requests

log = logging.getLogger(__name__)

def log_response(resp, *args, **kwargs):
    log.info("%s %s -> %d %dms",
             resp.request.method, resp.url, resp.status_code,
             int(resp.elapsed.total_seconds() * 1000))

def assert_2xx(resp, *args, **kwargs):
    # Centralise raise_for_status() — every call gets it
    resp.raise_for_status()

session = requests.Session()
session.hooks["response"] = [log_response, assert_2xx]

resp = session.get("https://api.example.com/health", timeout=5)

Timeouts in depth

A timeout in requests is not a wall-clock budget for the whole call — it's a per-stage limit. timeout=5 means "5 seconds to connect and 5 seconds between socket reads". A response that drips a byte every 4 seconds will never time out. For a hard wall-clock budget, either supply both halves explicitly and check elapsed time afterward, or wrap the call in concurrent.futures with a deadline.

python
# Tuple form: (connect timeout, read timeout) in seconds
requests.get(url, timeout=(3.05, 27))

# Both halves the same
requests.get(url, timeout=10)

# No timeout (BAD — only in scripts you'll babysit)
requests.get(url, timeout=None)

# Hard wall-clock budget
from concurrent.futures import ThreadPoolExecutor, TimeoutError as FTimeout
def fetch():
    return requests.get("https://slow.example.com/", timeout=10)
with ThreadPoolExecutor(max_workers=1) as ex:
    try:
        resp = ex.submit(fetch).result(timeout=15)
    except FTimeout:
        log.error("Total budget exceeded")

A common pattern is timeout=(3.05, 27) — the 3.05 is a little more than a typical TCP retransmission window (3 s), and 27 s leaves headroom under most load balancers' 30 s read budget.

TLS, certificates, and proxies

requests uses certifi's CA bundle by default — that's why HTTPS calls "just work" without a system CA store. Override with verify= (path to a custom CA bundle) or cert= (client certificate for mTLS). For proxies, set HTTPS_PROXY / HTTP_PROXY in the environment or pass proxies= explicitly. Never disable verification (verify=False) on a production-bound code path.

python
# Custom CA bundle (e.g. corporate proxy issuing its own certs)
session.verify = "/etc/ssl/certs/internal-ca.pem"

# Mutual TLS (client cert + key)
session.cert = ("/etc/myapp/client.crt", "/etc/myapp/client.key")

# Pin to a specific protocol via a custom adapter (drops TLS < 1.2)
import ssl
from requests.adapters import HTTPAdapter
from urllib3 import PoolManager

class TLS12Adapter(HTTPAdapter):
    def init_poolmanager(self, *a, **kw):
        ctx = ssl.create_default_context()
        ctx.minimum_version = ssl.TLSVersion.TLSv1_2
        kw["ssl_context"] = ctx
        self.poolmanager = PoolManager(*a, **kw)
session.mount("https://", TLS12Adapter())

# Proxies (per-request or per-session)
session.proxies = {
    "http":  "http://proxy.internal:3128",
    "https": "http://proxy.internal:3128",
    # Bypass list:
    "no_proxy": "localhost,127.0.0.1,.internal.example.com",
}

# SOCKS5 proxy — pip install "requests[socks]"
session.proxies = {"https": "socks5h://localhost:1080"}

verify=False silently disables certificate validation and emits a InsecureRequestWarning — easy to filter out, easy to forget. If a corporate proxy intercepts TLS, add the proxy's CA cert to REQUESTS_CA_BUNDLE or your distro's CA store, not verify=False.

Exception hierarchy

requests raises subclasses of requests.exceptions.RequestException. Catch the specific subclass when you can — a connection refused (ConnectionError) is recoverable with a retry; a malformed URL (MissingSchema) is a code bug.

ExceptionTriggered by
RequestExceptionBase class — catches everything else
ConnectionErrorDNS failure, connection refused, network drop
ConnectTimeoutCould not establish a TCP connection in time
ReadTimeoutServer stopped sending data mid-response
TimeoutEither ConnectTimeout or ReadTimeout
HTTPErrorRaised by raise_for_status() on 4xx/5xx
TooManyRedirectsRedirect chain exceeded max_redirects
SSLErrorTLS handshake or verification failure
ProxyErrorCould not reach proxy
URLRequired / MissingSchema / InvalidURLBad URL passed in
ChunkedEncodingErrorBad Transfer-Encoding: chunked from server
python
from requests.exceptions import (
    ConnectionError, ConnectTimeout, ReadTimeout, HTTPError, RequestException,
)

try:
    resp = session.get(url, timeout=(3, 10))
    resp.raise_for_status()
    return resp.json()
except ConnectTimeout:
    log.warning("Connect timeout — DNS/firewall issue")
except ReadTimeout:
    log.warning("Read timeout — slow server, will retry")
except HTTPError as e:
    log.error("HTTP %d: %s", e.response.status_code, e.response.text[:200])
except RequestException as e:
    log.exception("Unexpected request error: %s", e)

requests vs httpx vs aiohttp vs curl

Aspectrequestshttpxaiohttpcurl / httpie (CLI)
Sync APIYesYes (drop-in for requests)No (async only)N/A
Async APINoYes (AsyncClient)Yes (ClientSession)N/A
HTTP/2NoYes (extras)NoYes
HTTP/3NoNo (yet)Nocurl (with extras)
Built-in retriesNo (via urllib3.Retry)No (use tenacity)Nocurl --retry
Connection poolingYes (urllib3)YesYesN/A
Streamingstream=Trueclient.stream(...)async for chunk in resp.content.iter_chunked(n)curl -o / --no-buffer
Type hintsPartialFullFullN/A
Best forSync scripts, librariesModern code (sync and async, HTTP/2)Pure async (FastAPI-adjacent)Shell debugging, smoke tests

Cross-link: see httpx for the async-capable replacement and curl for the CLI equivalent.

python
# Same call, three libraries
import requests
requests.get("https://httpbin.org/get").json()

import httpx
httpx.get("https://httpbin.org/get").json()

import aiohttp, asyncio
async def go():
    async with aiohttp.ClientSession() as s:
        async with s.get("https://httpbin.org/get") as r:
            return await r.json()
asyncio.run(go())

Performance tips

The biggest wins for requests-heavy code are reusing a session (TCP/TLS connection reuse), setting a sane connection pool size, and parallelising independent calls with a thread pool. CPython's GIL is released during network I/O, so threading scales close to linearly for HTTP work.

python
# Reusing the session matters — same-host TLS handshakes drop from ~200ms to ~0
session = requests.Session()
adapter = HTTPAdapter(pool_connections=20, pool_maxsize=100)
session.mount("https://", adapter)

# Parallel fan-out with a thread pool
from concurrent.futures import ThreadPoolExecutor, as_completed

def fetch(url: str) -> dict:
    r = session.get(url, timeout=10)
    r.raise_for_status()
    return r.json()

urls = [f"https://api.example.com/users/{i}" for i in range(100)]
with ThreadPoolExecutor(max_workers=20) as ex:
    futures = {ex.submit(fetch, u): u for u in urls}
    for fut in as_completed(futures):
        url = futures[fut]
        try:
            data = fut.result()
        except Exception as e:
            log.warning("Failed %s: %s", url, e)

For more than a few hundred concurrent calls, switch to httpx.AsyncClient or aiohttp — threads have ~8 MB of stack overhead each, while async tasks are kilobytes.

Common pitfalls (extended)

Forgetting to call raise_for_status()requests does not raise on 4xx/5xx by default. A failed login returns {"error": "..."} with a 401, and resp.json() happily parses it as a success unless you check the status first.

resp.text decodes with the server's declared charset — if the server lies (sets Content-Type: text/html; charset=iso-8859-1 for a UTF-8 page), resp.text mangles non-ASCII characters. Use resp.content.decode("utf-8") when you know the real encoding.

response.json() without checking Content-Typeresp.json() happily parses any JSON-shaped string, including the random HTML error page from a misconfigured CDN. Check resp.headers.get("Content-Type") first if the API isn't fully under your control.

Decode JSON responses in their canonical form: resp.json(). The library is json.loads(resp.text) underneath, but resp.json() reuses the response's charset detection.

Real-world recipes

Polling with exponential backoff

python
import time, random
from requests.exceptions import RequestException

def poll(url: str, *, max_attempts: int = 10) -> dict:
    delay = 1.0
    for attempt in range(1, max_attempts + 1):
        try:
            r = session.get(url, timeout=10)
            if r.status_code == 200 and r.json().get("ready"):
                return r.json()
            if r.status_code in (429, 503):
                delay = float(r.headers.get("Retry-After", delay))
        except RequestException as e:
            log.warning("attempt %d failed: %s", attempt, e)
        # Exponential backoff with jitter
        time.sleep(delay + random.random())
        delay = min(delay * 2, 60)
    raise TimeoutError(f"{url} did not become ready in {max_attempts} attempts")

Paginating a JSON API

python
def all_users(session, url="https://api.example.com/users"):
    while url:
        r = session.get(url, params={"per_page": 100}, timeout=10)
        r.raise_for_status()
        yield from r.json()["items"]
        # GitHub-style Link header pagination
        url = r.links.get("next", {}).get("url")

Downloading with resume support

python
import os

def download_with_resume(url: str, dest: str) -> None:
    head = session.head(url, timeout=10)
    total = int(head.headers.get("Content-Length", 0))
    pos = os.path.getsize(dest) if os.path.exists(dest) else 0
    if pos >= total > 0:
        return  # already complete
    headers = {"Range": f"bytes={pos}-"} if pos else {}
    with session.get(url, headers=headers, stream=True, timeout=30) as r:
        r.raise_for_status()
        with open(dest, "ab") as f:
            for chunk in r.iter_content(chunk_size=64 * 1024):
                f.write(chunk)

Testing code that uses requests

The standard recipe is responses or requests-mock — both monkey-patch the transport so your code under test never hits the network. They round-trip headers, query strings, and bodies for assertion-friendly tests.

python
# pip install responses
import responses

@responses.activate
def test_create_user():
    responses.add(
        responses.POST,
        "https://api.example.com/users",
        json={"id": 1, "email": "alice@example.com"},
        status=201,
        match=[responses.matchers.json_params_matcher({"email": "alice@example.com"})],
    )
    user = create_user(email="alice@example.com")    # the code under test
    assert user["id"] == 1

Quick reference

TaskCode
GET JSONrequests.get(url, timeout=10).json()
POST JSONrequests.post(url, json={...}, timeout=10)
POST formrequests.post(url, data={...}, timeout=10)
Upload filerequests.post(url, files={"f": open("a.png", "rb")})
Query paramsrequests.get(url, params={"q": "x"})
Custom headersrequests.get(url, headers={"Auth": "Bearer t"})
Bearer authsession.auth = BearerAuth(token)
Cookiesrequests.get(url, cookies={"k": "v"})
Timeoutrequests.get(url, timeout=(3, 10))
Redirects offrequests.get(url, allow_redirects=False)
Stream downloadwith requests.get(url, stream=True): r.iter_content(...)
Sessions = requests.Session(); s.headers.update({...})
Retriess.mount("https://", HTTPAdapter(max_retries=Retry(...)))
Hookss.hooks["response"] = [fn]
Raise on 4xx/5xxresp.raise_for_status()
Status coderesp.status_code
Body bytesresp.content
Body textresp.text
Decoded JSONresp.json()
Response headersresp.headers["Content-Type"]
Elapsed timeresp.elapsed.total_seconds()
Iterate linesfor line in resp.iter_lines(decode_unicode=True): ...
mTLS certsession.cert = ("client.crt", "client.key")
Custom CA bundlesession.verify = "/path/ca.pem"
Proxysession.proxies = {"https": "http://..."}