cheat sheet
openai
Package-level reference for openai on npm — Chat Completions, the Responses API, streaming, tool calls, structured outputs, embeddings, and the v4→v5 migration.
openai
What it is
openai is the official JavaScript/TypeScript SDK for the OpenAI API — chat completions, the Responses API (the current preferred surface as of 2025+), streaming, tool/function calling, structured outputs (JSON Schema-enforced), embeddings, image generation (DALL-E / gpt-image-1), speech-to-text (Whisper), text-to-speech, vision input, fine-tuning, batch, files, and assistants.
It works on Node 18+, the browser (with dangerouslyAllowBrowser: true and proper key handling), Cloudflare Workers, Vercel Edge, Bun, and Deno. The v5 major (mid-2025) reorganised the API surface around the Responses primitive — a unified "give me a response" surface that subsumes Chat Completions, Tools, and Files into one consistent flow.
In 2026, the canonical pattern is client.responses.create({ model: "gpt-...", input: "..." }) — the Chat Completions endpoint (client.chat.completions.create) still works and is supported, but the Responses API is what new OpenAI features ship in first.
Install
# npm / pnpm / yarn / bun
npm install openai
pnpm add openai
yarn add openai
bun add openai
Output: runtime dep. ~600 KB unpacked (large — includes all model/endpoint type definitions and shapes).
# Optional — zod for runtime validation of structured outputs
npm install zod
Output: Zod schemas plug into the SDK's structured-output helpers.
# CLI (community, not official) — only the SDK is published officially
npx openai --help # NOT a real subcommand; refer to docs
Output: no official CLI; the package is a library only.
Versioning & Node support
- Current major line is
5.x(stable since mid-2025) — introduces the Responses API as the headline, with the older Chat Completions still supported. Provides expanded streaming events, native browser support, and many type-shape changes. 4.xis the previous major (the rewrite from^3axios-based to native fetch). Still supported via security backports; many existing codebases live here.- Node ≥18 required. Uses built-in
fetch; older Node requires a polyfill. - Pure TS source compiled to dual ESM + CJS. Types bundled.
- Always a runtime dependency — your code calls the API at runtime.
- API versions are separate from SDK versions. OpenAI's API has its own
Dateversioning (api-version: 2024-12-01). Pin the SDK against the API version expected by your features.
Package metadata
- Maintainer: OpenAI (
@openai) — official SDK - Project home: github.com/openai/openai-node
- Docs: platform.openai.com/docs
- npm: npmjs.com/package/openai
- License: Apache-2.0
- First released: 2022 (
v1thin wrapper); rewrites in 2023 (v4) and 2025 (v5). - Downloads: ~5 million per week — top AI-SDK by far on npm
Peer dependencies & extras
| Package | Purpose |
|---|---|
zod | Structured-output schemas — openai SDK has helpers for Zod. |
@anthropic-ai/sdk | Sibling SDK; pattern is similar. |
ai (Vercel AI SDK) | Higher-level abstraction over openai + others. Pick when you want a unified multi-provider interface. |
langchain | Heavier orchestration layer; uses openai under the hood for OpenAI calls. |
llamaindex | RAG-focused; also uses openai. |
tiktoken | OpenAI's tokenizer for counting tokens client-side. |
gpt-tokenizer | Pure-JS tokenizer alternative. |
Alternatives
| Library | Trade-off |
|---|---|
Bare fetch | Zero-dep. Manual streaming, manual error handling. Pick for tiny scripts that call one endpoint. |
Vercel ai SDK | Provider-agnostic (OpenAI, Anthropic, Google, etc.). Streaming primitives, React hooks. Pick for full-stack AI apps. |
| LangChain JS | Heavyweight RAG/agent framework. Pick for complex multi-step pipelines. |
| LlamaIndex | RAG-centric. Pick when retrieval is the focus. |
| Anthropic SDK | For Claude models. Different provider, same SDK pattern. |
@google/genai | Gemini. |
openai-fetch | Third-party tiny fetch wrapper around the OpenAI API. Pick to avoid the 600 KB bundle. |
Common gotchas
- Don't put your API key in client-side code. Even with
dangerouslyAllowBrowser: true, exposing the key is a key-exfiltration vector. Use server-side proxying for browser apps. - Streaming requires
for awaitconsumption.const stream = await client.responses.create({ stream: true, ... })returns an async iterable, not a Promise of all events. Looping over it withfor awaitis the only consumption pattern. - Token limits aren't enforced client-side. Sending too many tokens returns an API error. Count with
tiktoken(or a heuristic) before sending if you need predictable behaviour. responses≠chat.completions. The Responses API is a different surface — different request shape, different event types in streaming. Code written forchat.completionsdoesn't drop in.- Network errors get auto-retried by default. SDK retries 2× on 5xx / connection errors. Set
maxRetries: 0to disable; turn it up for flaky integrations. tool_callsarrive in chunks during streaming. Reassembly is your responsibility. The SDK provides helpers (stream.finalRunStep()etc.) but the raw event flow is fine-grained.
Real-world recipes
Responses API — the canonical recipe (v5+)
import OpenAI from "openai";
const client = new OpenAI(); // reads OPENAI_API_KEY from env
const response = await client.responses.create({
model: "gpt-4.1",
input: "Write a haiku about TypeScript.",
});
console.log(response.output_text);
Output:
Types guide each step taken,
Compiler whispers gently,
Bugs found before run.
The Responses API is the simplest entry point in v5. output_text is a convenience field; the full structured output lives in response.output.
Chat Completions — still supported
import OpenAI from "openai";
const client = new OpenAI();
const completion = await client.chat.completions.create({
model: "gpt-4.1",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What's 2 + 2?" },
],
});
console.log(completion.choices[0].message.content);
Output:
2 + 2 = 4.
Chat Completions remains fully supported. New apps prefer responses; existing apps don't need to migrate.
Streaming response
import OpenAI from "openai";
const client = new OpenAI();
const stream = await client.responses.create({
model: "gpt-4.1",
input: "Count to 5 slowly.",
stream: true,
});
for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
}
Output:
One.
Two.
Three.
Four.
Five.
Streaming with the Responses API uses a typed event stream — response.output_text.delta for token chunks, response.completed at end, response.tool_call.created when tools are invoked.
For Chat Completions streaming (legacy):
const stream = await client.chat.completions.create({
model: "gpt-4.1",
messages: [{ role: "user", content: "Count to 5." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0].delta.content ?? "");
}
Tool / function calling
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.responses.create({
model: "gpt-4.1",
input: "What's the weather in Tokyo?",
tools: [
{
type: "function",
name: "get_weather",
description: "Get the current weather for a city.",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"],
},
},
],
});
// Check if the model called a tool
for (const output of response.output) {
if (output.type === "function_call") {
const args = JSON.parse(output.arguments);
const weather = await fetchWeather(args.city);
// Send the result back
const followUp = await client.responses.create({
model: "gpt-4.1",
input: [
{ type: "function_call_output", call_id: output.call_id, output: JSON.stringify(weather) },
],
previous_response_id: response.id,
});
console.log(followUp.output_text);
}
}
Output:
The current weather in Tokyo is 22°C and clear.
Two API round-trips: (1) model calls the tool, (2) you send the result back via previous_response_id, model produces the final answer.
Structured outputs (JSON Schema)
import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const client = new OpenAI();
const Recipe = z.object({
title: z.string(),
ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
steps: z.array(z.string()),
});
const completion = await client.chat.completions.parse({
model: "gpt-4.1",
messages: [
{ role: "user", content: "Give me a recipe for pancakes." },
],
response_format: zodResponseFormat(Recipe, "recipe"),
});
const recipe = completion.choices[0].message.parsed;
console.log(recipe?.title);
console.log(recipe?.ingredients);
Output:
{
"title": "Classic Pancakes",
"ingredients": [
{ "name": "flour", "amount": "1.5 cups" },
{ "name": "milk", "amount": "1.25 cups" },
{ "name": "eggs", "amount": "1" }
],
"steps": ["Whisk dry ingredients.", "Add wet ingredients.", "Cook on a griddle."]
}
parsed is typed against the Zod schema — fully type-safe structured output. The model is constrained at decode time to produce JSON matching the schema.
Embeddings
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.embeddings.create({
model: "text-embedding-3-small",
input: ["The quick brown fox", "jumps over the lazy dog"],
});
console.log(response.data.length); // 2
console.log(response.data[0].embedding.length); // 1536 (dimensions)
console.log(response.data[0].embedding.slice(0, 5)); // first 5 floats
Output:
2
1536
[0.013, -0.041, 0.022, 0.011, -0.008]
Embed in batches up to ~2048 inputs per request — much cheaper than one-at-a-time. Use text-embedding-3-small (1536-d, cheap) or text-embedding-3-large (3072-d, better quality).
Vision input
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.responses.create({
model: "gpt-4.1",
input: [
{
role: "user",
content: [
{ type: "input_text", text: "What's in this image?" },
{ type: "input_image", image_url: "https://example.com/photo.jpg" },
],
},
],
});
console.log(response.output_text);
Output:
The image shows a golden retriever sitting on a grassy field with mountains in the background.
Pass URLs (HTTPS only) or base64-encoded data: data:image/jpeg;base64,/9j/.... For local files, read and base64-encode.
Production deployment
API key handling
// Server-side — env var
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
Never ship the API key in client-side bundles. For browser apps, proxy through your server:
// Client
const res = await fetch("/api/llm", { method: "POST", body: JSON.stringify({ prompt }) });
// Server (Next.js route, Express, etc.)
import OpenAI from "openai";
const client = new OpenAI();
export async function POST(req: Request) {
const { prompt } = await req.json();
const response = await client.responses.create({ model: "gpt-4.1", input: prompt });
return Response.json({ text: response.output_text });
}
Timeouts and retries
const client = new OpenAI({
timeout: 60 * 1000, // 60s per request
maxRetries: 3, // retry 3× on 5xx / connection errors
});
// Per-request override
const response = await client.responses.create(
{ model: "gpt-4.1", input: "..." },
{ timeout: 120 * 1000, maxRetries: 0 }
);
For long-running streams, increase per-request timeout. The SDK uses exponential backoff between retries.
Edge runtime
The SDK works on Cloudflare Workers, Vercel Edge, Deno, Bun. Pass fetch explicitly if your runtime needs custom request handling:
const client = new OpenAI({
apiKey: env.OPENAI_API_KEY,
fetch: globalThis.fetch,
});
Rate limit handling
The SDK retries on rate-limit (429) automatically. For batch jobs, use the Batch API (client.batches.create) — 50% cheaper, 24h SLA.
const batch = await client.batches.create({
input_file_id: file.id,
endpoint: "/v1/chat/completions",
completion_window: "24h",
});
Performance tuning
Pick the right model
| Model | Latency | Cost (relative) | Use |
|---|---|---|---|
gpt-4.1 / gpt-5 | High | Highest | Complex reasoning, tool use, vision |
gpt-4.1-mini / gpt-5-mini | Medium | Mid | Most app workflows; great default |
gpt-4.1-nano / gpt-5-nano | Low | Lowest | Simple classification, light tasks |
text-embedding-3-small | Very low | Cheap | Embeddings (always use small unless you've measured) |
(Specific names hedge — model lineups evolve quarterly; check the OpenAI docs for current canonical names.)
Streaming reduces perceived latency
Streaming doesn't reduce total latency, but the first-token latency is far lower than waiting for the full response. Always stream for user-facing chat.
Batch embeddings
// Slow — 1 request per input
for (const text of texts) {
await client.embeddings.create({ model: "...", input: text });
}
// Fast — 1 request for all
await client.embeddings.create({ model: "...", input: texts });
Batch up to ~2048 inputs per call.
Connection reuse
The SDK uses Node's keep-alive by default — no special config needed.
Token counting client-side
import { encoding_for_model } from "tiktoken";
const enc = encoding_for_model("gpt-4");
const tokens = enc.encode("hello world").length;
enc.free();
Useful for staying under context windows; saves a round-trip vs API trial-and-error.
Version migration guide
v3 → v4 (2023) — the axios → fetch rewrite
- Drop
openaiv3 axios shape. - New TS-first API with full typed responses.
- ESM + CJS dual published.
// v3
const { Configuration, OpenAIApi } = require("openai");
const cfg = new Configuration({ apiKey: "..." });
const openai = new OpenAIApi(cfg);
const res = await openai.createChatCompletion({ model: "...", messages: [...] });
// v4
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const completion = await client.chat.completions.create({ model: "...", messages: [...] });
v4 → v5 (2025) — Responses API
The v5 migration is the important one in 2026. New canonical surface:
| Concept | v4 | v5 |
|---|---|---|
| Simple response | client.chat.completions.create({ messages: [...] }) | client.responses.create({ input: "..." }) (or messages still work) |
| Streaming | chunk.choices[0].delta.content | typed events: response.output_text.delta |
| Tools | messages: [{ role: "tool", tool_call_id, content }] | input: [{ type: "function_call_output", call_id, output }] |
| Structured output | response_format: { type: "json_schema" } | same — works in both APIs |
| Multi-turn | rebuild full history each time | previous_response_id chains |
| File inputs | Files API + assistants | input: [{ type: "input_file", file_id: "..." }] |
// v4 — Chat Completions
const c = await client.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "..." },
{ role: "user", content: "Hi" },
],
});
// v5 — Responses
const r = await client.responses.create({
model: "gpt-4.1",
instructions: "...",
input: "Hi",
});
Chat Completions still works in v5. Many teams stay on Chat Completions and adopt Responses incrementally. The migration is opt-in, not forced.
Things to watch when upgrading from v4 to v5:
- Type names changed for many response shapes (
Responseis a global DOM type now used in the SDK). - Streaming event types are different — re-write stream consumers.
- Tools: the request and result shapes differ; not a drop-in.
- File handling: Responses API has its own input/output file types separate from the legacy Files API.
Stay on v4
If you're not adopting Responses-specific features (statefulness via previous_response_id, simpler streaming, image generation in-line), staying on v4 with Chat Completions is fine. Backports continue.
Security considerations
- API key exfiltration is the #1 risk. Never put
OPENAI_API_KEYin client bundles, environment files in repo, or browser localStorage. Use server-side proxying. - Prompt injection. User input embedded in a system prompt can override instructions ("Ignore previous and ..."). Sanitise — or rely on careful prompt engineering — for any user-input → system-prompt flow.
- Data exfiltration via tools. A model with tool access can be tricked into calling tools with attacker-controlled args. Filter tool arguments server-side; don't let the model invoke arbitrary HTTP fetches.
- PII leakage. OpenAI's data-retention policy varies by tier; check the policy for your account. For PII-heavy workloads, use the zero-retention tier or run an on-prem model.
- Rate-limit abuse. A buggy frontend can drain your monthly quota. Always wrap API calls with per-user rate limits.
- Untrusted tool outputs. If a tool returns attacker-controlled content (e.g. web search results), the model may follow instructions in it. Treat tool outputs like untrusted input.
Testing & CI integration
import { describe, it, expect, vi } from "vitest";
import OpenAI from "openai";
describe("ai integration", () => {
it("calls responses API", async () => {
const mockCreate = vi.fn().mockResolvedValue({
output_text: "Hello!",
output: [{ type: "message", content: [{ type: "output_text", text: "Hello!" }] }],
});
const client = { responses: { create: mockCreate } } as unknown as OpenAI;
const r = await client.responses.create({ model: "gpt-4.1", input: "Hi" });
expect(r.output_text).toBe("Hello!");
});
});
Output: mock the SDK at the method level. For integration tests, use a separate test key with strict spend limits.
For CI, set spend caps in the OpenAI dashboard — never run the SDK against prod keys in CI without limits.
Ecosystem integrations
| Tool | Integration |
|---|---|
zod | zodResponseFormat(schema) for structured outputs |
ai (Vercel AI SDK) | Higher-level streaming, React hooks; uses openai for OpenAI calls |
langchain | OpenAI provider for chains/agents |
llamaindex | OpenAI provider for RAG |
tiktoken | Token counting client-side |
next.js | Server actions / route handlers — call SDK server-side |
cloudflare-workers | Pass fetch: globalThis.fetch |
vercel edge | Works out of the box |
mcp (Model Context Protocol) | Pair with OpenAI's MCP server APIs |
Troubleshooting common errors
401 Unauthorized—OPENAI_API_KEYnot set or invalid. Check env loading.429 Too Many Requests— rate limit hit. SDK auto-retries; for sustained load, upgrade tier or use Batch API.400: model does not exist— model name typo, or model was retired. Check the docs for current names.context_length_exceeded— input + output > context window. Trim history, use a model with a larger window, or summarise.- Stream hangs — iterator never exits. Always have a timeout via
AbortControllerortimeout:option. - TS:
Cannot find module 'openai/helpers/zod'— bundler doesn't honour subpath exports. Ensure modern Node TS resolution ("moduleResolution": "node16"or"bundler"). - High latency on first request — connection setup. Warm the connection (no-op embedding call) at app start if startup latency matters.
output_textis undefined — model returned tool calls, not text. Inspectresponse.output[]forfunction_callitems.
When NOT to use this
- You need a multi-provider abstraction. Use Vercel's
aiSDK — swap between OpenAI, Anthropic, Google with a config change. - You only need one endpoint and want a tiny bundle. Use bare
fetchagainst the REST API — ~50 lines, zero deps. The SDK is ~600 KB. - You're doing complex multi-step agent flows. Use LangChain JS or the Vercel AI SDK's agents API — more orchestration scaffolding.
- You're using a local model (Ollama, LM Studio). They expose an OpenAI-compatible API but the SDK adds nothing — use
fetch. - You need browser-side streaming with token auth. Use server-sent events from your backend; don't put the SDK in the browser.
See also
- Concept: agents — tool-using LLM patterns, multi-step flows
- Concept: API — API key handling, rate limiting, retries
- JavaScript: fetch — what the SDK uses under the hood