cheat sheet

openai

Package-level reference for openai on npm — Chat Completions, the Responses API, streaming, tool calls, structured outputs, embeddings, and the v4→v5 migration.

updated 05-31-2026

openai

What it is

openai is the official JavaScript/TypeScript SDK for the OpenAI API — chat completions, the Responses API (the current preferred surface as of 2025+), streaming, tool/function calling, structured outputs (JSON Schema-enforced), embeddings, image generation (DALL-E / gpt-image-1), speech-to-text (Whisper), text-to-speech, vision input, fine-tuning, batch, files, and assistants.

It works on Node 18+, the browser (with dangerouslyAllowBrowser: true and proper key handling), Cloudflare Workers, Vercel Edge, Bun, and Deno. The v5 major (mid-2025) reorganised the API surface around the Responses primitive — a unified "give me a response" surface that subsumes Chat Completions, Tools, and Files into one consistent flow.

In 2026, the canonical pattern is client.responses.create({ model: "gpt-...", input: "..." }) — the Chat Completions endpoint (client.chat.completions.create) still works and is supported, but the Responses API is what new OpenAI features ship in first.

Install

bash

# npm / pnpm / yarn / bun
npm install openai
pnpm add openai
yarn add openai
bun add openai

Output: runtime dep. ~600 KB unpacked (large — includes all model/endpoint type definitions and shapes).

bash

# Optional — zod for runtime validation of structured outputs
npm install zod

Output: Zod schemas plug into the SDK's structured-output helpers.

bash

# CLI (community, not official) — only the SDK is published officially
npx openai --help     # NOT a real subcommand; refer to docs

Output: no official CLI; the package is a library only.

Versioning & Node support

Current major line is 5.x (stable since mid-2025) — introduces the Responses API as the headline, with the older Chat Completions still supported. Provides expanded streaming events, native browser support, and many type-shape changes.
4.x is the previous major (the rewrite from ^3 axios-based to native fetch). Still supported via security backports; many existing codebases live here.
Node ≥18 required. Uses built-in fetch; older Node requires a polyfill.
Pure TS source compiled to dual ESM + CJS. Types bundled.
Always a runtime dependency — your code calls the API at runtime.
API versions are separate from SDK versions. OpenAI's API has its own Date versioning (api-version: 2024-12-01). Pin the SDK against the API version expected by your features.

Package metadata

Maintainer: OpenAI (@openai) — official SDK
Project home: github.com/openai/openai-node
Docs: platform.openai.com/docs
npm: npmjs.com/package/openai
License: Apache-2.0
First released: 2022 (v1 thin wrapper); rewrites in 2023 (v4) and 2025 (v5).
Downloads: ~5 million per week — top AI-SDK by far on npm

Peer dependencies & extras

Package	Purpose
`zod`	Structured-output schemas — `openai` SDK has helpers for Zod.
`@anthropic-ai/sdk`	Sibling SDK; pattern is similar.
`ai` (Vercel AI SDK)	Higher-level abstraction over `openai` + others. Pick when you want a unified multi-provider interface.
`langchain`	Heavier orchestration layer; uses `openai` under the hood for OpenAI calls.
`llamaindex`	RAG-focused; also uses `openai`.
`tiktoken`	OpenAI's tokenizer for counting tokens client-side.
`gpt-tokenizer`	Pure-JS tokenizer alternative.

Alternatives

Library	Trade-off
Bare `fetch`	Zero-dep. Manual streaming, manual error handling. Pick for tiny scripts that call one endpoint.
Vercel `ai` SDK	Provider-agnostic (OpenAI, Anthropic, Google, etc.). Streaming primitives, React hooks. Pick for full-stack AI apps.
LangChain JS	Heavyweight RAG/agent framework. Pick for complex multi-step pipelines.
LlamaIndex	RAG-centric. Pick when retrieval is the focus.
Anthropic SDK	For Claude models. Different provider, same SDK pattern.
`@google/genai`	Gemini.
`openai-fetch`	Third-party tiny fetch wrapper around the OpenAI API. Pick to avoid the 600 KB bundle.

Common gotchas

Don't put your API key in client-side code. Even with dangerouslyAllowBrowser: true, exposing the key is a key-exfiltration vector. Use server-side proxying for browser apps.
Streaming requires for await consumption. const stream = await client.responses.create({ stream: true, ... }) returns an async iterable, not a Promise of all events. Looping over it with for await is the only consumption pattern.
Token limits aren't enforced client-side. Sending too many tokens returns an API error. Count with tiktoken (or a heuristic) before sending if you need predictable behaviour.
responses ≠ chat.completions. The Responses API is a different surface — different request shape, different event types in streaming. Code written for chat.completions doesn't drop in.
Network errors get auto-retried by default. SDK retries 2× on 5xx / connection errors. Set maxRetries: 0 to disable; turn it up for flaky integrations.
tool_calls arrive in chunks during streaming. Reassembly is your responsibility. The SDK provides helpers (stream.finalRunStep() etc.) but the raw event flow is fine-grained.

Real-world recipes

Responses API — the canonical recipe (v5+)

typescript

import OpenAI from "openai";

const client = new OpenAI();   // reads OPENAI_API_KEY from env

const response = await client.responses.create({
  model: "gpt-4.1",
  input: "Write a haiku about TypeScript.",
});

console.log(response.output_text);

Output:

text

Types guide each step taken,
Compiler whispers gently,
Bugs found before run.

The Responses API is the simplest entry point in v5. output_text is a convenience field; the full structured output lives in response.output.

Chat Completions — still supported

typescript

import OpenAI from "openai";

const client = new OpenAI();

const completion = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What's 2 + 2?" },
  ],
});

console.log(completion.choices[0].message.content);

Output:

text

2 + 2 = 4.

Chat Completions remains fully supported. New apps prefer responses; existing apps don't need to migrate.

Streaming response

typescript

import OpenAI from "openai";

const client = new OpenAI();

const stream = await client.responses.create({
  model: "gpt-4.1",
  input: "Count to 5 slowly.",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

Output:

text

One.
Two.
Three.
Four.
Five.

Streaming with the Responses API uses a typed event stream — response.output_text.delta for token chunks, response.completed at end, response.tool_call.created when tools are invoked.

For Chat Completions streaming (legacy):

typescript

const stream = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "Count to 5." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Tool / function calling

typescript

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1",
  input: "What's the weather in Tokyo?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get the current weather for a city.",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  ],
});

// Check if the model called a tool
for (const output of response.output) {
  if (output.type === "function_call") {
    const args = JSON.parse(output.arguments);
    const weather = await fetchWeather(args.city);

    // Send the result back
    const followUp = await client.responses.create({
      model: "gpt-4.1",
      input: [
        { type: "function_call_output", call_id: output.call_id, output: JSON.stringify(weather) },
      ],
      previous_response_id: response.id,
    });

    console.log(followUp.output_text);
  }
}

Output:

text

The current weather in Tokyo is 22°C and clear.

Two API round-trips: (1) model calls the tool, (2) you send the result back via previous_response_id, model produces the final answer.

Structured outputs (JSON Schema)

typescript

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

const Recipe = z.object({
  title: z.string(),
  ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
  steps: z.array(z.string()),
});

const completion = await client.chat.completions.parse({
  model: "gpt-4.1",
  messages: [
    { role: "user", content: "Give me a recipe for pancakes." },
  ],
  response_format: zodResponseFormat(Recipe, "recipe"),
});

const recipe = completion.choices[0].message.parsed;
console.log(recipe?.title);
console.log(recipe?.ingredients);

Output:

json

{
  "title": "Classic Pancakes",
  "ingredients": [
    { "name": "flour", "amount": "1.5 cups" },
    { "name": "milk", "amount": "1.25 cups" },
    { "name": "eggs", "amount": "1" }
  ],
  "steps": ["Whisk dry ingredients.", "Add wet ingredients.", "Cook on a griddle."]
}

parsed is typed against the Zod schema — fully type-safe structured output. The model is constrained at decode time to produce JSON matching the schema.

Embeddings

typescript

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: ["The quick brown fox", "jumps over the lazy dog"],
});

console.log(response.data.length);                    // 2
console.log(response.data[0].embedding.length);       // 1536 (dimensions)
console.log(response.data[0].embedding.slice(0, 5));  // first 5 floats

Output:

text

2
1536
[0.013, -0.041, 0.022, 0.011, -0.008]

Embed in batches up to ~2048 inputs per request — much cheaper than one-at-a-time. Use text-embedding-3-small (1536-d, cheap) or text-embedding-3-large (3072-d, better quality).

Vision input

typescript

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1",
  input: [
    {
      role: "user",
      content: [
        { type: "input_text", text: "What's in this image?" },
        { type: "input_image", image_url: "https://example.com/photo.jpg" },
      ],
    },
  ],
});

console.log(response.output_text);

Output:

text

The image shows a golden retriever sitting on a grassy field with mountains in the background.

Pass URLs (HTTPS only) or base64-encoded data: data:image/jpeg;base64,/9j/.... For local files, read and base64-encode.

Production deployment

API key handling

typescript

// Server-side — env var
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Never ship the API key in client-side bundles. For browser apps, proxy through your server:

typescript

// Client
const res = await fetch("/api/llm", { method: "POST", body: JSON.stringify({ prompt }) });

// Server (Next.js route, Express, etc.)
import OpenAI from "openai";
const client = new OpenAI();
export async function POST(req: Request) {
  const { prompt } = await req.json();
  const response = await client.responses.create({ model: "gpt-4.1", input: prompt });
  return Response.json({ text: response.output_text });
}

Timeouts and retries

typescript

const client = new OpenAI({
  timeout: 60 * 1000,          // 60s per request
  maxRetries: 3,                // retry 3× on 5xx / connection errors
});

// Per-request override
const response = await client.responses.create(
  { model: "gpt-4.1", input: "..." },
  { timeout: 120 * 1000, maxRetries: 0 }
);

For long-running streams, increase per-request timeout. The SDK uses exponential backoff between retries.

Edge runtime

The SDK works on Cloudflare Workers, Vercel Edge, Deno, Bun. Pass fetch explicitly if your runtime needs custom request handling:

typescript

const client = new OpenAI({
  apiKey: env.OPENAI_API_KEY,
  fetch: globalThis.fetch,
});

Rate limit handling

The SDK retries on rate-limit (429) automatically. For batch jobs, use the Batch API (client.batches.create) — 50% cheaper, 24h SLA.

typescript

const batch = await client.batches.create({
  input_file_id: file.id,
  endpoint: "/v1/chat/completions",
  completion_window: "24h",
});

Performance tuning

Pick the right model

Model	Latency	Cost (relative)	Use
`gpt-4.1` / `gpt-5`	High	Highest	Complex reasoning, tool use, vision
`gpt-4.1-mini` / `gpt-5-mini`	Medium	Mid	Most app workflows; great default
`gpt-4.1-nano` / `gpt-5-nano`	Low	Lowest	Simple classification, light tasks
`text-embedding-3-small`	Very low	Cheap	Embeddings (always use small unless you've measured)

(Specific names hedge — model lineups evolve quarterly; check the OpenAI docs for current canonical names.)

Streaming reduces perceived latency

Streaming doesn't reduce total latency, but the first-token latency is far lower than waiting for the full response. Always stream for user-facing chat.

Batch embeddings

typescript

// Slow — 1 request per input
for (const text of texts) {
  await client.embeddings.create({ model: "...", input: text });
}

// Fast — 1 request for all
await client.embeddings.create({ model: "...", input: texts });

Batch up to ~2048 inputs per call.

Connection reuse

The SDK uses Node's keep-alive by default — no special config needed.

Token counting client-side

typescript

import { encoding_for_model } from "tiktoken";

const enc = encoding_for_model("gpt-4");
const tokens = enc.encode("hello world").length;
enc.free();

Useful for staying under context windows; saves a round-trip vs API trial-and-error.

Version migration guide

v3 → v4 (2023) — the axios → fetch rewrite

Drop openai v3 axios shape.
New TS-first API with full typed responses.
ESM + CJS dual published.

typescript

// v3
const { Configuration, OpenAIApi } = require("openai");
const cfg = new Configuration({ apiKey: "..." });
const openai = new OpenAIApi(cfg);
const res = await openai.createChatCompletion({ model: "...", messages: [...] });

// v4
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const completion = await client.chat.completions.create({ model: "...", messages: [...] });

v4 → v5 (2025) — Responses API

The v5 migration is the important one in 2026. New canonical surface:

Concept	v4	v5
Simple response	`client.chat.completions.create({ messages: [...] })`	`client.responses.create({ input: "..." })` (or messages still work)
Streaming	`chunk.choices[0].delta.content`	typed events: `response.output_text.delta`
Tools	`messages: [{ role: "tool", tool_call_id, content }]`	`input: [{ type: "function_call_output", call_id, output }]`
Structured output	`response_format: { type: "json_schema" }`	same — works in both APIs
Multi-turn	rebuild full history each time	`previous_response_id` chains
File inputs	Files API + assistants	`input: [{ type: "input_file", file_id: "..." }]`

typescript

// v4 — Chat Completions
const c = await client.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "system", content: "..." },
    { role: "user", content: "Hi" },
  ],
});

// v5 — Responses
const r = await client.responses.create({
  model: "gpt-4.1",
  instructions: "...",
  input: "Hi",
});

Chat Completions still works in v5. Many teams stay on Chat Completions and adopt Responses incrementally. The migration is opt-in, not forced.

Things to watch when upgrading from v4 to v5:

Type names changed for many response shapes (Response is a global DOM type now used in the SDK).
Streaming event types are different — re-write stream consumers.
Tools: the request and result shapes differ; not a drop-in.
File handling: Responses API has its own input/output file types separate from the legacy Files API.

Stay on v4

If you're not adopting Responses-specific features (statefulness via previous_response_id, simpler streaming, image generation in-line), staying on v4 with Chat Completions is fine. Backports continue.

Security considerations

API key exfiltration is the #1 risk. Never put OPENAI_API_KEY in client bundles, environment files in repo, or browser localStorage. Use server-side proxying.
Prompt injection. User input embedded in a system prompt can override instructions ("Ignore previous and ..."). Sanitise — or rely on careful prompt engineering — for any user-input → system-prompt flow.
Data exfiltration via tools. A model with tool access can be tricked into calling tools with attacker-controlled args. Filter tool arguments server-side; don't let the model invoke arbitrary HTTP fetches.
PII leakage. OpenAI's data-retention policy varies by tier; check the policy for your account. For PII-heavy workloads, use the zero-retention tier or run an on-prem model.
Rate-limit abuse. A buggy frontend can drain your monthly quota. Always wrap API calls with per-user rate limits.
Untrusted tool outputs. If a tool returns attacker-controlled content (e.g. web search results), the model may follow instructions in it. Treat tool outputs like untrusted input.

Testing & CI integration

typescript

import { describe, it, expect, vi } from "vitest";
import OpenAI from "openai";

describe("ai integration", () => {
  it("calls responses API", async () => {
    const mockCreate = vi.fn().mockResolvedValue({
      output_text: "Hello!",
      output: [{ type: "message", content: [{ type: "output_text", text: "Hello!" }] }],
    });

    const client = { responses: { create: mockCreate } } as unknown as OpenAI;
    const r = await client.responses.create({ model: "gpt-4.1", input: "Hi" });
    expect(r.output_text).toBe("Hello!");
  });
});

Output: mock the SDK at the method level. For integration tests, use a separate test key with strict spend limits.

For CI, set spend caps in the OpenAI dashboard — never run the SDK against prod keys in CI without limits.

Ecosystem integrations

Tool	Integration
`zod`	`zodResponseFormat(schema)` for structured outputs
`ai` (Vercel AI SDK)	Higher-level streaming, React hooks; uses `openai` for OpenAI calls
`langchain`	OpenAI provider for chains/agents
`llamaindex`	OpenAI provider for RAG
`tiktoken`	Token counting client-side
`next.js`	Server actions / route handlers — call SDK server-side
`cloudflare-workers`	Pass `fetch: globalThis.fetch`
`vercel edge`	Works out of the box
`mcp` (Model Context Protocol)	Pair with OpenAI's MCP server APIs

Troubleshooting common errors

401 Unauthorized — OPENAI_API_KEY not set or invalid. Check env loading.
429 Too Many Requests — rate limit hit. SDK auto-retries; for sustained load, upgrade tier or use Batch API.
400: model does not exist — model name typo, or model was retired. Check the docs for current names.
context_length_exceeded — input + output > context window. Trim history, use a model with a larger window, or summarise.
Stream hangs — iterator never exits. Always have a timeout via AbortController or timeout: option.
TS: Cannot find module 'openai/helpers/zod' — bundler doesn't honour subpath exports. Ensure modern Node TS resolution ("moduleResolution": "node16" or "bundler").
High latency on first request — connection setup. Warm the connection (no-op embedding call) at app start if startup latency matters.
output_text is undefined — model returned tool calls, not text. Inspect response.output[] for function_call items.

When NOT to use this

You need a multi-provider abstraction. Use Vercel's ai SDK — swap between OpenAI, Anthropic, Google with a config change.
You only need one endpoint and want a tiny bundle. Use bare fetch against the REST API — ~50 lines, zero deps. The SDK is ~600 KB.
You're doing complex multi-step agent flows. Use LangChain JS or the Vercel AI SDK's agents API — more orchestration scaffolding.
You're using a local model (Ollama, LM Studio). They expose an OpenAI-compatible API but the SDK adds nothing — use fetch.
You need browser-side streaming with token auth. Use server-sent events from your backend; don't put the SDK in the browser.

openai

What it is

Install

Versioning & Node support

Package metadata

Peer dependencies & extras

Alternatives

Common gotchas

Real-world recipes

Responses API — the canonical recipe (v5+)

Chat Completions — still supported

Streaming response

Tool / function calling

Structured outputs (JSON Schema)

Embeddings

Vision input

Production deployment

API key handling

Timeouts and retries

Edge runtime

Rate limit handling

Performance tuning

Pick the right model

Streaming reduces perceived latency

Batch embeddings

Connection reuse

Token counting client-side

Version migration guide

v3 → v4 (2023) — the axios → fetch rewrite

v4 → v5 (2025) — Responses API

Stay on v4

Security considerations

Testing & CI integration

Ecosystem integrations

Troubleshooting common errors

When NOT to use this

See also