cheat sheet

openai

Package-level reference for openai on npm — Chat Completions, the Responses API, streaming, tool calls, structured outputs, embeddings, and the v4→v5 migration.

openai

What it is

openai is the official JavaScript/TypeScript SDK for the OpenAI API — chat completions, the Responses API (the current preferred surface as of 2025+), streaming, tool/function calling, structured outputs (JSON Schema-enforced), embeddings, image generation (DALL-E / gpt-image-1), speech-to-text (Whisper), text-to-speech, vision input, fine-tuning, batch, files, and assistants.

It works on Node 18+, the browser (with dangerouslyAllowBrowser: true and proper key handling), Cloudflare Workers, Vercel Edge, Bun, and Deno. The v5 major (mid-2025) reorganised the API surface around the Responses primitive — a unified "give me a response" surface that subsumes Chat Completions, Tools, and Files into one consistent flow.

In 2026, the canonical pattern is client.responses.create({ model: "gpt-...", input: "..." }) — the Chat Completions endpoint (client.chat.completions.create) still works and is supported, but the Responses API is what new OpenAI features ship in first.

Install

bash
# npm / pnpm / yarn / bun
npm install openai
pnpm add openai
yarn add openai
bun add openai

Output: runtime dep. ~600 KB unpacked (large — includes all model/endpoint type definitions and shapes).

bash
# Optional — zod for runtime validation of structured outputs
npm install zod

Output: Zod schemas plug into the SDK's structured-output helpers.

bash
# CLI (community, not official) — only the SDK is published officially
npx openai --help     # NOT a real subcommand; refer to docs

Output: no official CLI; the package is a library only.

Versioning & Node support

  • Current major line is 5.x (stable since mid-2025) — introduces the Responses API as the headline, with the older Chat Completions still supported. Provides expanded streaming events, native browser support, and many type-shape changes.
  • 4.x is the previous major (the rewrite from ^3 axios-based to native fetch). Still supported via security backports; many existing codebases live here.
  • Node ≥18 required. Uses built-in fetch; older Node requires a polyfill.
  • Pure TS source compiled to dual ESM + CJS. Types bundled.
  • Always a runtime dependency — your code calls the API at runtime.
  • API versions are separate from SDK versions. OpenAI's API has its own Date versioning (api-version: 2024-12-01). Pin the SDK against the API version expected by your features.

Package metadata

  • Maintainer: OpenAI (@openai) — official SDK
  • Project home: github.com/openai/openai-node
  • Docs: platform.openai.com/docs
  • npm: npmjs.com/package/openai
  • License: Apache-2.0
  • First released: 2022 (v1 thin wrapper); rewrites in 2023 (v4) and 2025 (v5).
  • Downloads: ~5 million per week — top AI-SDK by far on npm

Peer dependencies & extras

PackagePurpose
zodStructured-output schemas — openai SDK has helpers for Zod.
@anthropic-ai/sdkSibling SDK; pattern is similar.
ai (Vercel AI SDK)Higher-level abstraction over openai + others. Pick when you want a unified multi-provider interface.
langchainHeavier orchestration layer; uses openai under the hood for OpenAI calls.
llamaindexRAG-focused; also uses openai.
tiktokenOpenAI's tokenizer for counting tokens client-side.
gpt-tokenizerPure-JS tokenizer alternative.

Alternatives

LibraryTrade-off
Bare fetchZero-dep. Manual streaming, manual error handling. Pick for tiny scripts that call one endpoint.
Vercel ai SDKProvider-agnostic (OpenAI, Anthropic, Google, etc.). Streaming primitives, React hooks. Pick for full-stack AI apps.
LangChain JSHeavyweight RAG/agent framework. Pick for complex multi-step pipelines.
LlamaIndexRAG-centric. Pick when retrieval is the focus.
Anthropic SDKFor Claude models. Different provider, same SDK pattern.
@google/genaiGemini.
openai-fetchThird-party tiny fetch wrapper around the OpenAI API. Pick to avoid the 600 KB bundle.

Common gotchas

  1. Don't put your API key in client-side code. Even with dangerouslyAllowBrowser: true, exposing the key is a key-exfiltration vector. Use server-side proxying for browser apps.
  2. Streaming requires for await consumption. const stream = await client.responses.create({ stream: true, ... }) returns an async iterable, not a Promise of all events. Looping over it with for await is the only consumption pattern.
  3. Token limits aren't enforced client-side. Sending too many tokens returns an API error. Count with tiktoken (or a heuristic) before sending if you need predictable behaviour.
  4. responseschat.completions. The Responses API is a different surface — different request shape, different event types in streaming. Code written for chat.completions doesn't drop in.
  5. Network errors get auto-retried by default. SDK retries 2× on 5xx / connection errors. Set maxRetries: 0 to disable; turn it up for flaky integrations.
  6. tool_calls arrive in chunks during streaming. Reassembly is your responsibility. The SDK provides helpers (stream.finalRunStep() etc.) but the raw event flow is fine-grained.

Real-world recipes

Responses API — the canonical recipe (v5+)

typescript
import OpenAI from "openai";

const client = new OpenAI();   // reads OPENAI_API_KEY from env

const response = await client.responses.create({
  model: "gpt-4.1",
  input: "Write a haiku about TypeScript.",
});

console.log(response.output_text);

Output:

text
Types guide each step taken,
Compiler whispers gently,
Bugs found before run.

The Responses API is the simplest entry point in v5. output_text is a convenience field; the full structured output lives in response.output.

Chat Completions — still supported

typescript
import OpenAI from "openai";

const client = new OpenAI();

const completion = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What's 2 + 2?" },
  ],
});

console.log(completion.choices[0].message.content);

Output:

text
2 + 2 = 4.

Chat Completions remains fully supported. New apps prefer responses; existing apps don't need to migrate.

Streaming response

typescript
import OpenAI from "openai";

const client = new OpenAI();

const stream = await client.responses.create({
  model: "gpt-4.1",
  input: "Count to 5 slowly.",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

Output:

text
One.
Two.
Three.
Four.
Five.

Streaming with the Responses API uses a typed event stream — response.output_text.delta for token chunks, response.completed at end, response.tool_call.created when tools are invoked.

For Chat Completions streaming (legacy):

typescript
const stream = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "Count to 5." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Tool / function calling

typescript
import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1",
  input: "What's the weather in Tokyo?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get the current weather for a city.",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  ],
});

// Check if the model called a tool
for (const output of response.output) {
  if (output.type === "function_call") {
    const args = JSON.parse(output.arguments);
    const weather = await fetchWeather(args.city);

    // Send the result back
    const followUp = await client.responses.create({
      model: "gpt-4.1",
      input: [
        { type: "function_call_output", call_id: output.call_id, output: JSON.stringify(weather) },
      ],
      previous_response_id: response.id,
    });

    console.log(followUp.output_text);
  }
}

Output:

text
The current weather in Tokyo is 22°C and clear.

Two API round-trips: (1) model calls the tool, (2) you send the result back via previous_response_id, model produces the final answer.

Structured outputs (JSON Schema)

typescript
import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

const Recipe = z.object({
  title: z.string(),
  ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
  steps: z.array(z.string()),
});

const completion = await client.chat.completions.parse({
  model: "gpt-4.1",
  messages: [
    { role: "user", content: "Give me a recipe for pancakes." },
  ],
  response_format: zodResponseFormat(Recipe, "recipe"),
});

const recipe = completion.choices[0].message.parsed;
console.log(recipe?.title);
console.log(recipe?.ingredients);

Output:

json
{
  "title": "Classic Pancakes",
  "ingredients": [
    { "name": "flour", "amount": "1.5 cups" },
    { "name": "milk", "amount": "1.25 cups" },
    { "name": "eggs", "amount": "1" }
  ],
  "steps": ["Whisk dry ingredients.", "Add wet ingredients.", "Cook on a griddle."]
}

parsed is typed against the Zod schema — fully type-safe structured output. The model is constrained at decode time to produce JSON matching the schema.

Embeddings

typescript
import OpenAI from "openai";

const client = new OpenAI();

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: ["The quick brown fox", "jumps over the lazy dog"],
});

console.log(response.data.length);                    // 2
console.log(response.data[0].embedding.length);       // 1536 (dimensions)
console.log(response.data[0].embedding.slice(0, 5));  // first 5 floats

Output:

text
2
1536
[0.013, -0.041, 0.022, 0.011, -0.008]

Embed in batches up to ~2048 inputs per request — much cheaper than one-at-a-time. Use text-embedding-3-small (1536-d, cheap) or text-embedding-3-large (3072-d, better quality).

Vision input

typescript
import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1",
  input: [
    {
      role: "user",
      content: [
        { type: "input_text", text: "What's in this image?" },
        { type: "input_image", image_url: "https://example.com/photo.jpg" },
      ],
    },
  ],
});

console.log(response.output_text);

Output:

text
The image shows a golden retriever sitting on a grassy field with mountains in the background.

Pass URLs (HTTPS only) or base64-encoded data: data:image/jpeg;base64,/9j/.... For local files, read and base64-encode.

Production deployment

API key handling

typescript
// Server-side — env var
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Never ship the API key in client-side bundles. For browser apps, proxy through your server:

typescript
// Client
const res = await fetch("/api/llm", { method: "POST", body: JSON.stringify({ prompt }) });

// Server (Next.js route, Express, etc.)
import OpenAI from "openai";
const client = new OpenAI();
export async function POST(req: Request) {
  const { prompt } = await req.json();
  const response = await client.responses.create({ model: "gpt-4.1", input: prompt });
  return Response.json({ text: response.output_text });
}

Timeouts and retries

typescript
const client = new OpenAI({
  timeout: 60 * 1000,          // 60s per request
  maxRetries: 3,                // retry 3× on 5xx / connection errors
});

// Per-request override
const response = await client.responses.create(
  { model: "gpt-4.1", input: "..." },
  { timeout: 120 * 1000, maxRetries: 0 }
);

For long-running streams, increase per-request timeout. The SDK uses exponential backoff between retries.

Edge runtime

The SDK works on Cloudflare Workers, Vercel Edge, Deno, Bun. Pass fetch explicitly if your runtime needs custom request handling:

typescript
const client = new OpenAI({
  apiKey: env.OPENAI_API_KEY,
  fetch: globalThis.fetch,
});

Rate limit handling

The SDK retries on rate-limit (429) automatically. For batch jobs, use the Batch API (client.batches.create) — 50% cheaper, 24h SLA.

typescript
const batch = await client.batches.create({
  input_file_id: file.id,
  endpoint: "/v1/chat/completions",
  completion_window: "24h",
});

Performance tuning

Pick the right model

ModelLatencyCost (relative)Use
gpt-4.1 / gpt-5HighHighestComplex reasoning, tool use, vision
gpt-4.1-mini / gpt-5-miniMediumMidMost app workflows; great default
gpt-4.1-nano / gpt-5-nanoLowLowestSimple classification, light tasks
text-embedding-3-smallVery lowCheapEmbeddings (always use small unless you've measured)

(Specific names hedge — model lineups evolve quarterly; check the OpenAI docs for current canonical names.)

Streaming reduces perceived latency

Streaming doesn't reduce total latency, but the first-token latency is far lower than waiting for the full response. Always stream for user-facing chat.

Batch embeddings

typescript
// Slow — 1 request per input
for (const text of texts) {
  await client.embeddings.create({ model: "...", input: text });
}

// Fast — 1 request for all
await client.embeddings.create({ model: "...", input: texts });

Batch up to ~2048 inputs per call.

Connection reuse

The SDK uses Node's keep-alive by default — no special config needed.

Token counting client-side

typescript
import { encoding_for_model } from "tiktoken";

const enc = encoding_for_model("gpt-4");
const tokens = enc.encode("hello world").length;
enc.free();

Useful for staying under context windows; saves a round-trip vs API trial-and-error.

Version migration guide

v3 → v4 (2023) — the axios → fetch rewrite

  • Drop openai v3 axios shape.
  • New TS-first API with full typed responses.
  • ESM + CJS dual published.
typescript
// v3
const { Configuration, OpenAIApi } = require("openai");
const cfg = new Configuration({ apiKey: "..." });
const openai = new OpenAIApi(cfg);
const res = await openai.createChatCompletion({ model: "...", messages: [...] });

// v4
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const completion = await client.chat.completions.create({ model: "...", messages: [...] });

v4 → v5 (2025) — Responses API

The v5 migration is the important one in 2026. New canonical surface:

Conceptv4v5
Simple responseclient.chat.completions.create({ messages: [...] })client.responses.create({ input: "..." }) (or messages still work)
Streamingchunk.choices[0].delta.contenttyped events: response.output_text.delta
Toolsmessages: [{ role: "tool", tool_call_id, content }]input: [{ type: "function_call_output", call_id, output }]
Structured outputresponse_format: { type: "json_schema" }same — works in both APIs
Multi-turnrebuild full history each timeprevious_response_id chains
File inputsFiles API + assistantsinput: [{ type: "input_file", file_id: "..." }]
typescript
// v4 — Chat Completions
const c = await client.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "system", content: "..." },
    { role: "user", content: "Hi" },
  ],
});

// v5 — Responses
const r = await client.responses.create({
  model: "gpt-4.1",
  instructions: "...",
  input: "Hi",
});

Chat Completions still works in v5. Many teams stay on Chat Completions and adopt Responses incrementally. The migration is opt-in, not forced.

Things to watch when upgrading from v4 to v5:

  • Type names changed for many response shapes (Response is a global DOM type now used in the SDK).
  • Streaming event types are different — re-write stream consumers.
  • Tools: the request and result shapes differ; not a drop-in.
  • File handling: Responses API has its own input/output file types separate from the legacy Files API.

Stay on v4

If you're not adopting Responses-specific features (statefulness via previous_response_id, simpler streaming, image generation in-line), staying on v4 with Chat Completions is fine. Backports continue.

Security considerations

  1. API key exfiltration is the #1 risk. Never put OPENAI_API_KEY in client bundles, environment files in repo, or browser localStorage. Use server-side proxying.
  2. Prompt injection. User input embedded in a system prompt can override instructions ("Ignore previous and ..."). Sanitise — or rely on careful prompt engineering — for any user-input → system-prompt flow.
  3. Data exfiltration via tools. A model with tool access can be tricked into calling tools with attacker-controlled args. Filter tool arguments server-side; don't let the model invoke arbitrary HTTP fetches.
  4. PII leakage. OpenAI's data-retention policy varies by tier; check the policy for your account. For PII-heavy workloads, use the zero-retention tier or run an on-prem model.
  5. Rate-limit abuse. A buggy frontend can drain your monthly quota. Always wrap API calls with per-user rate limits.
  6. Untrusted tool outputs. If a tool returns attacker-controlled content (e.g. web search results), the model may follow instructions in it. Treat tool outputs like untrusted input.

Testing & CI integration

typescript
import { describe, it, expect, vi } from "vitest";
import OpenAI from "openai";

describe("ai integration", () => {
  it("calls responses API", async () => {
    const mockCreate = vi.fn().mockResolvedValue({
      output_text: "Hello!",
      output: [{ type: "message", content: [{ type: "output_text", text: "Hello!" }] }],
    });

    const client = { responses: { create: mockCreate } } as unknown as OpenAI;
    const r = await client.responses.create({ model: "gpt-4.1", input: "Hi" });
    expect(r.output_text).toBe("Hello!");
  });
});

Output: mock the SDK at the method level. For integration tests, use a separate test key with strict spend limits.

For CI, set spend caps in the OpenAI dashboard — never run the SDK against prod keys in CI without limits.

Ecosystem integrations

ToolIntegration
zodzodResponseFormat(schema) for structured outputs
ai (Vercel AI SDK)Higher-level streaming, React hooks; uses openai for OpenAI calls
langchainOpenAI provider for chains/agents
llamaindexOpenAI provider for RAG
tiktokenToken counting client-side
next.jsServer actions / route handlers — call SDK server-side
cloudflare-workersPass fetch: globalThis.fetch
vercel edgeWorks out of the box
mcp (Model Context Protocol)Pair with OpenAI's MCP server APIs

Troubleshooting common errors

  • 401 UnauthorizedOPENAI_API_KEY not set or invalid. Check env loading.
  • 429 Too Many Requests — rate limit hit. SDK auto-retries; for sustained load, upgrade tier or use Batch API.
  • 400: model does not exist — model name typo, or model was retired. Check the docs for current names.
  • context_length_exceeded — input + output > context window. Trim history, use a model with a larger window, or summarise.
  • Stream hangs — iterator never exits. Always have a timeout via AbortController or timeout: option.
  • TS: Cannot find module 'openai/helpers/zod' — bundler doesn't honour subpath exports. Ensure modern Node TS resolution ("moduleResolution": "node16" or "bundler").
  • High latency on first request — connection setup. Warm the connection (no-op embedding call) at app start if startup latency matters.
  • output_text is undefined — model returned tool calls, not text. Inspect response.output[] for function_call items.

When NOT to use this

  • You need a multi-provider abstraction. Use Vercel's ai SDK — swap between OpenAI, Anthropic, Google with a config change.
  • You only need one endpoint and want a tiny bundle. Use bare fetch against the REST API — ~50 lines, zero deps. The SDK is ~600 KB.
  • You're doing complex multi-step agent flows. Use LangChain JS or the Vercel AI SDK's agents API — more orchestration scaffolding.
  • You're using a local model (Ollama, LM Studio). They expose an OpenAI-compatible API but the SDK adds nothing — use fetch.
  • You need browser-side streaming with token auth. Use server-sent events from your backend; don't put the SDK in the browser.

See also