cheat sheet
requests
Make HTTP requests in Python with the requests library. Covers GET/POST, JSON, sessions, authentication, retries, and common pitfalls.
requests — HTTP for Humans
What it is
Requests is a Python HTTP client library created by Kenneth Reitz that wraps urllib3 with a human-friendly API — get(), post(), sessions, auth helpers, automatic JSON decoding, and streaming. It is the most-downloaded Python package on PyPI and the standard starting point for any HTTP work in Python. For projects that need async/await or HTTP/2, use httpx instead, which offers a nearly identical API with async support.
Install
pip install requests
Output: (none — exits 0 on success)
Quick example
import requests
resp = requests.get("https://httpbin.org/json")
resp.raise_for_status() # raises HTTPError for 4xx/5xx
data = resp.json()
print(resp.status_code)
print(data["slideshow"]["title"])
Output:
200
Sample Slide Show
When / why to use it
- Any synchronous HTTP call — REST APIs, scraping, file downloads.
- When you need a battle-tested, widely-supported client with broad documentation.
- When you don't need async (
httpxis the async-capable alternative).
Common pitfalls
No timeout by default —
requests.get(url)will hang indefinitely if the server stalls. Always passtimeout=(connect, read):requests.get(url, timeout=(3.05, 27))
SSL verification — never disable
verify=Falsein production. It silently makes every request vulnerable to MITM attacks. If a corporate proxy breaks SSL, install the proxy's CA cert instead.
raise_for_status()placement — call it before trying to parse.json(). A 4xx/5xx response body may not be valid JSON.
Richer example — sessions and retries
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(total=3, backoff_factor=0.5, status_forcelist=[500, 502, 503, 504])
session.mount("https://", HTTPAdapter(max_retries=retry))
resp = session.post(
"https://httpbin.org/post",
json={"user": "alice", "action": "login"},
headers={"X-App-Version": "1.0"},
timeout=(3.05, 10),
)
resp.raise_for_status()
body = resp.json()
print(body["json"])
print(body["headers"]["Content-Type"])
Output:
{'action': 'login', 'user': 'alice'}
application/json
Essential options reference
| Parameter | Example | Notes |
|---|---|---|
params | params={"page": 1} | Appended as query string |
json | json={"k": "v"} | Encodes body as JSON, sets Content-Type header |
data | data={"field": "val"} | Form-encoded body |
headers | headers={"Auth": "Bearer tok"} | Merged with session headers |
timeout | timeout=(3, 10) | (connect timeout, read timeout) in seconds |
auth | auth=("user", "pass") | HTTP Basic auth |
stream | stream=True | Stream large responses without buffering |
verify | verify="/path/to/ca.pem" | CA bundle for TLS verification |
Streaming large downloads
stream=True keeps the response body on the socket instead of pulling it all into memory. Iterate with iter_content(chunk_size) for binary downloads or iter_lines() for line-oriented streams (logs, NDJSON). Always wrap the response in a with block so the connection is released back to the pool when you're done — otherwise it leaks until garbage collection.
with requests.get("https://example.com/large.zip", stream=True, timeout=30) as r:
r.raise_for_status()
with open("large.zip", "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
print("Download complete")
Output:
Download complete
Sessions — the right default
A Session is the right default for any program that makes more than one request. It reuses the underlying TCP/TLS connection (huge win for HTTPS — TLS handshakes are expensive), persists cookies and headers across calls, and exposes per-protocol adapters for retry policies, proxies, and TLS settings. Treat a one-shot requests.get(url) as a convenience for scripts; for libraries, services, and CLIs, build a Session at startup.
import requests
from requests.adapters import HTTPAdapter
session = requests.Session()
# Headers + auth that apply to every call on this session
session.headers.update({
"User-Agent": "myapp/1.0",
"Accept": "application/json",
})
session.auth = ("alicedev", "api-token") # HTTP Basic on every call
# Tune connection pooling for high concurrency / many hosts
adapter = HTTPAdapter(pool_connections=20, pool_maxsize=50, pool_block=False)
session.mount("https://", adapter)
session.mount("http://", adapter)
# Per-call options override the session defaults
resp = session.get("https://api.example.com/users", params={"page": 1}, timeout=10)
Always close sessions you own:
session.close()or usewith requests.Session() as session: .... Long-lived sessions leak file descriptors and DNS cache entries if abandoned.
Retries with urllib3.util.Retry
urllib3 ships a battle-tested Retry class that the HTTPAdapter mounts onto a session. Configure it once and every request through that session inherits retry behavior — total attempts, backoff schedule, status codes to retry on, methods to consider idempotent, and whether to honor Retry-After headers from the server.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
retry_strategy = Retry(
total=5, # max retries (any reason)
connect=3, read=3, # per-error-class caps
status=5, # retry on bad statuses
backoff_factor=0.5, # 0.5, 1.0, 2.0, 4.0, …
status_forcelist=(429, 500, 502, 503, 504),
allowed_methods=("GET", "HEAD", "PUT", "DELETE", "OPTIONS", "POST"),
respect_retry_after_header=True, # honor server Retry-After
raise_on_status=False, # let raise_for_status() decide
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("https://", adapter)
session.mount("http://", adapter)
# Anything you call through `session` now retries automatically
resp = session.post("https://api.example.com/events", json={"event": "click"}, timeout=10)
resp.raise_for_status()
By default,
Retrydoes not retryPOST— it's treated as non-idempotent. Add"POST"toallowed_methodsonly if your endpoint is genuinely idempotent (e.g. anIdempotency-Key-aware API), otherwise you risk double-writes.
Authentication patterns
requests ships HTTP Basic and HTTP Digest auth out of the box (auth=). For bearer tokens, signed requests, or rotating credentials, subclass requests.auth.AuthBase — your __call__(request) method sets headers (or rewrites the URL) right before the request is sent. The same hook is how third-party libraries plug OAuth1, OAuth2, AWS SigV4, and HMAC into the same session API.
from requests.auth import HTTPBasicAuth, HTTPDigestAuth, AuthBase
import hmac, hashlib, time
# Built-ins
session.auth = HTTPBasicAuth("alicedev", "pw")
session.auth = HTTPDigestAuth("alicedev", "pw")
# Bearer token
class BearerAuth(AuthBase):
def __init__(self, token: str):
self.token = token
def __call__(self, r):
r.headers["Authorization"] = f"Bearer {self.token}"
return r
# HMAC-signed requests
class HMACAuth(AuthBase):
def __init__(self, key_id: str, secret: bytes):
self.key_id, self.secret = key_id, secret
def __call__(self, r):
ts = str(int(time.time()))
payload = (r.method + r.path_url + ts + (r.body or "")).encode()
sig = hmac.new(self.secret, payload, hashlib.sha256).hexdigest()
r.headers["X-Key-Id"] = self.key_id
r.headers["X-Timestamp"] = ts
r.headers["X-Signature"] = sig
return r
session.auth = HMACAuth("kid_123", b"shared-secret")
resp = session.post("https://api.example.com/events", json={"event": "click"})
File uploads — multipart and chunked
requests handles multipart uploads through the files= parameter — pass a dict (or a list of tuples for multiple values with the same field name). For huge files, stream from disk so you don't load gigabytes into memory; combine with requests-toolbelt's MultipartEncoder for true chunked uploads.
# Single file
with open("photo.jpg", "rb") as f:
resp = session.post(
"https://api.example.com/uploads",
files={"photo": ("photo.jpg", f, "image/jpeg")},
data={"caption": "Sunset"}, # form fields alongside the file
)
# Multiple files (same field name)
files = [
("photos", ("a.jpg", open("a.jpg", "rb"), "image/jpeg")),
("photos", ("b.jpg", open("b.jpg", "rb"), "image/jpeg")),
]
session.post("https://api.example.com/album", files=files)
# True streaming multipart for huge files
# pip install requests-toolbelt
from requests_toolbelt.multipart.encoder import MultipartEncoder, MultipartEncoderMonitor
def show_progress(monitor):
pct = monitor.bytes_read / monitor.len * 100
print(f"\r{pct:5.1f}% ({monitor.bytes_read:,}/{monitor.len:,})", end="")
encoder = MultipartEncoder(fields={
"name": "big-video.mp4",
"file": ("big-video.mp4", open("big-video.mp4", "rb"), "video/mp4"),
})
monitor = MultipartEncoderMonitor(encoder, show_progress)
session.post(
"https://api.example.com/uploads",
data=monitor,
headers={"Content-Type": monitor.content_type},
)
Request and response hooks
hooks is a per-session or per-request callback dict. The response hook fires after every response and is the cleanest place to add logging, metrics, or last-mile error translation without changing the call sites. Hooks return either None (the response passes through unchanged) or a new response object.
import logging
import requests
log = logging.getLogger(__name__)
def log_response(resp, *args, **kwargs):
log.info("%s %s -> %d %dms",
resp.request.method, resp.url, resp.status_code,
int(resp.elapsed.total_seconds() * 1000))
def assert_2xx(resp, *args, **kwargs):
# Centralise raise_for_status() — every call gets it
resp.raise_for_status()
session = requests.Session()
session.hooks["response"] = [log_response, assert_2xx]
resp = session.get("https://api.example.com/health", timeout=5)
Timeouts in depth
A timeout in requests is not a wall-clock budget for the whole call — it's a per-stage limit. timeout=5 means "5 seconds to connect and 5 seconds between socket reads". A response that drips a byte every 4 seconds will never time out. For a hard wall-clock budget, either supply both halves explicitly and check elapsed time afterward, or wrap the call in concurrent.futures with a deadline.
# Tuple form: (connect timeout, read timeout) in seconds
requests.get(url, timeout=(3.05, 27))
# Both halves the same
requests.get(url, timeout=10)
# No timeout (BAD — only in scripts you'll babysit)
requests.get(url, timeout=None)
# Hard wall-clock budget
from concurrent.futures import ThreadPoolExecutor, TimeoutError as FTimeout
def fetch():
return requests.get("https://slow.example.com/", timeout=10)
with ThreadPoolExecutor(max_workers=1) as ex:
try:
resp = ex.submit(fetch).result(timeout=15)
except FTimeout:
log.error("Total budget exceeded")
A common pattern is
timeout=(3.05, 27)— the 3.05 is a little more than a typical TCP retransmission window (3 s), and 27 s leaves headroom under most load balancers' 30 s read budget.
TLS, certificates, and proxies
requests uses certifi's CA bundle by default — that's why HTTPS calls "just work" without a system CA store. Override with verify= (path to a custom CA bundle) or cert= (client certificate for mTLS). For proxies, set HTTPS_PROXY / HTTP_PROXY in the environment or pass proxies= explicitly. Never disable verification (verify=False) on a production-bound code path.
# Custom CA bundle (e.g. corporate proxy issuing its own certs)
session.verify = "/etc/ssl/certs/internal-ca.pem"
# Mutual TLS (client cert + key)
session.cert = ("/etc/myapp/client.crt", "/etc/myapp/client.key")
# Pin to a specific protocol via a custom adapter (drops TLS < 1.2)
import ssl
from requests.adapters import HTTPAdapter
from urllib3 import PoolManager
class TLS12Adapter(HTTPAdapter):
def init_poolmanager(self, *a, **kw):
ctx = ssl.create_default_context()
ctx.minimum_version = ssl.TLSVersion.TLSv1_2
kw["ssl_context"] = ctx
self.poolmanager = PoolManager(*a, **kw)
session.mount("https://", TLS12Adapter())
# Proxies (per-request or per-session)
session.proxies = {
"http": "http://proxy.internal:3128",
"https": "http://proxy.internal:3128",
# Bypass list:
"no_proxy": "localhost,127.0.0.1,.internal.example.com",
}
# SOCKS5 proxy — pip install "requests[socks]"
session.proxies = {"https": "socks5h://localhost:1080"}
verify=Falsesilently disables certificate validation and emits aInsecureRequestWarning— easy to filter out, easy to forget. If a corporate proxy intercepts TLS, add the proxy's CA cert toREQUESTS_CA_BUNDLEor your distro's CA store, notverify=False.
Exception hierarchy
requests raises subclasses of requests.exceptions.RequestException. Catch the specific subclass when you can — a connection refused (ConnectionError) is recoverable with a retry; a malformed URL (MissingSchema) is a code bug.
| Exception | Triggered by |
|---|---|
RequestException | Base class — catches everything else |
ConnectionError | DNS failure, connection refused, network drop |
ConnectTimeout | Could not establish a TCP connection in time |
ReadTimeout | Server stopped sending data mid-response |
Timeout | Either ConnectTimeout or ReadTimeout |
HTTPError | Raised by raise_for_status() on 4xx/5xx |
TooManyRedirects | Redirect chain exceeded max_redirects |
SSLError | TLS handshake or verification failure |
ProxyError | Could not reach proxy |
URLRequired / MissingSchema / InvalidURL | Bad URL passed in |
ChunkedEncodingError | Bad Transfer-Encoding: chunked from server |
from requests.exceptions import (
ConnectionError, ConnectTimeout, ReadTimeout, HTTPError, RequestException,
)
try:
resp = session.get(url, timeout=(3, 10))
resp.raise_for_status()
return resp.json()
except ConnectTimeout:
log.warning("Connect timeout — DNS/firewall issue")
except ReadTimeout:
log.warning("Read timeout — slow server, will retry")
except HTTPError as e:
log.error("HTTP %d: %s", e.response.status_code, e.response.text[:200])
except RequestException as e:
log.exception("Unexpected request error: %s", e)
requests vs httpx vs aiohttp vs curl
| Aspect | requests | httpx | aiohttp | curl / httpie (CLI) |
|---|---|---|---|---|
| Sync API | Yes | Yes (drop-in for requests) | No (async only) | N/A |
| Async API | No | Yes (AsyncClient) | Yes (ClientSession) | N/A |
| HTTP/2 | No | Yes (extras) | No | Yes |
| HTTP/3 | No | No (yet) | No | curl (with extras) |
| Built-in retries | No (via urllib3.Retry) | No (use tenacity) | No | curl --retry |
| Connection pooling | Yes (urllib3) | Yes | Yes | N/A |
| Streaming | stream=True | client.stream(...) | async for chunk in resp.content.iter_chunked(n) | curl -o / --no-buffer |
| Type hints | Partial | Full | Full | N/A |
| Best for | Sync scripts, libraries | Modern code (sync and async, HTTP/2) | Pure async (FastAPI-adjacent) | Shell debugging, smoke tests |
Cross-link: see httpx for the async-capable replacement and curl for the CLI equivalent.
# Same call, three libraries
import requests
requests.get("https://httpbin.org/get").json()
import httpx
httpx.get("https://httpbin.org/get").json()
import aiohttp, asyncio
async def go():
async with aiohttp.ClientSession() as s:
async with s.get("https://httpbin.org/get") as r:
return await r.json()
asyncio.run(go())
Performance tips
The biggest wins for requests-heavy code are reusing a session (TCP/TLS connection reuse), setting a sane connection pool size, and parallelising independent calls with a thread pool. CPython's GIL is released during network I/O, so threading scales close to linearly for HTTP work.
# Reusing the session matters — same-host TLS handshakes drop from ~200ms to ~0
session = requests.Session()
adapter = HTTPAdapter(pool_connections=20, pool_maxsize=100)
session.mount("https://", adapter)
# Parallel fan-out with a thread pool
from concurrent.futures import ThreadPoolExecutor, as_completed
def fetch(url: str) -> dict:
r = session.get(url, timeout=10)
r.raise_for_status()
return r.json()
urls = [f"https://api.example.com/users/{i}" for i in range(100)]
with ThreadPoolExecutor(max_workers=20) as ex:
futures = {ex.submit(fetch, u): u for u in urls}
for fut in as_completed(futures):
url = futures[fut]
try:
data = fut.result()
except Exception as e:
log.warning("Failed %s: %s", url, e)
For more than a few hundred concurrent calls, switch to
httpx.AsyncClientoraiohttp— threads have ~8 MB of stack overhead each, while async tasks are kilobytes.
Common pitfalls (extended)
Forgetting to call
raise_for_status()—requestsdoes not raise on 4xx/5xx by default. A failed login returns{"error": "..."}with a 401, andresp.json()happily parses it as a success unless you check the status first.
resp.textdecodes with the server's declared charset — if the server lies (setsContent-Type: text/html; charset=iso-8859-1for a UTF-8 page),resp.textmangles non-ASCII characters. Useresp.content.decode("utf-8")when you know the real encoding.
response.json()without checkingContent-Type—resp.json()happily parses any JSON-shaped string, including the random HTML error page from a misconfigured CDN. Checkresp.headers.get("Content-Type")first if the API isn't fully under your control.
Decode JSON responses in their canonical form:
resp.json(). The library isjson.loads(resp.text)underneath, butresp.json()reuses the response's charset detection.
Real-world recipes
Polling with exponential backoff
import time, random
from requests.exceptions import RequestException
def poll(url: str, *, max_attempts: int = 10) -> dict:
delay = 1.0
for attempt in range(1, max_attempts + 1):
try:
r = session.get(url, timeout=10)
if r.status_code == 200 and r.json().get("ready"):
return r.json()
if r.status_code in (429, 503):
delay = float(r.headers.get("Retry-After", delay))
except RequestException as e:
log.warning("attempt %d failed: %s", attempt, e)
# Exponential backoff with jitter
time.sleep(delay + random.random())
delay = min(delay * 2, 60)
raise TimeoutError(f"{url} did not become ready in {max_attempts} attempts")
Paginating a JSON API
def all_users(session, url="https://api.example.com/users"):
while url:
r = session.get(url, params={"per_page": 100}, timeout=10)
r.raise_for_status()
yield from r.json()["items"]
# GitHub-style Link header pagination
url = r.links.get("next", {}).get("url")
Downloading with resume support
import os
def download_with_resume(url: str, dest: str) -> None:
head = session.head(url, timeout=10)
total = int(head.headers.get("Content-Length", 0))
pos = os.path.getsize(dest) if os.path.exists(dest) else 0
if pos >= total > 0:
return # already complete
headers = {"Range": f"bytes={pos}-"} if pos else {}
with session.get(url, headers=headers, stream=True, timeout=30) as r:
r.raise_for_status()
with open(dest, "ab") as f:
for chunk in r.iter_content(chunk_size=64 * 1024):
f.write(chunk)
Testing code that uses requests
The standard recipe is responses or requests-mock — both monkey-patch the transport so your code under test never hits the network. They round-trip headers, query strings, and bodies for assertion-friendly tests.
# pip install responses
import responses
@responses.activate
def test_create_user():
responses.add(
responses.POST,
"https://api.example.com/users",
json={"id": 1, "email": "alice@example.com"},
status=201,
match=[responses.matchers.json_params_matcher({"email": "alice@example.com"})],
)
user = create_user(email="alice@example.com") # the code under test
assert user["id"] == 1
Quick reference
| Task | Code |
|---|---|
| GET JSON | requests.get(url, timeout=10).json() |
| POST JSON | requests.post(url, json={...}, timeout=10) |
| POST form | requests.post(url, data={...}, timeout=10) |
| Upload file | requests.post(url, files={"f": open("a.png", "rb")}) |
| Query params | requests.get(url, params={"q": "x"}) |
| Custom headers | requests.get(url, headers={"Auth": "Bearer t"}) |
| Bearer auth | session.auth = BearerAuth(token) |
| Cookies | requests.get(url, cookies={"k": "v"}) |
| Timeout | requests.get(url, timeout=(3, 10)) |
| Redirects off | requests.get(url, allow_redirects=False) |
| Stream download | with requests.get(url, stream=True): r.iter_content(...) |
| Session | s = requests.Session(); s.headers.update({...}) |
| Retries | s.mount("https://", HTTPAdapter(max_retries=Retry(...))) |
| Hooks | s.hooks["response"] = [fn] |
| Raise on 4xx/5xx | resp.raise_for_status() |
| Status code | resp.status_code |
| Body bytes | resp.content |
| Body text | resp.text |
| Decoded JSON | resp.json() |
| Response headers | resp.headers["Content-Type"] |
| Elapsed time | resp.elapsed.total_seconds() |
| Iterate lines | for line in resp.iter_lines(decode_unicode=True): ... |
| mTLS cert | session.cert = ("client.crt", "client.key") |
| Custom CA bundle | session.verify = "/path/ca.pem" |
| Proxy | session.proxies = {"https": "http://..."} |