Skip to main content

Overview

The calado SDK returns a Proxy over your existing LLM client. Every call to messages.create (Anthropic) or chat.completions.create (OpenAI) is tapped, serialized, and batched for ingestion. Your call-sites don’t change.

Installation

npm install calado
# or: pnpm add calado / yarn add calado
Requirements: Node.js 18+, Bun, or Deno (via the npm: specifier). Vercel Edge Runtime works for the basic wrap but doesn’t support withConversation (no AsyncLocalStorage). Your API key is generated in your agent’s Settings page on app.calado.ai. See Quickstart for the step-by-step.

Quick start

Wrap your LLM client and pass a conversation id so calado can group turns into sessions. Any stable string works — a chat session id, a support ticket number, a DB row id.
import Anthropic from "@anthropic-ai/sdk";
import { calado } from "calado";

calado.init(process.env.CALADO_API_KEY!);
const anthropic = calado.wrap(new Anthropic());

await calado.withConversation(`session_${sessionId}`, async () => {
  await anthropic.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    system: "You are a helpful assistant.",
    messages: [{ role: "user", content: "hello" }],
  });
});

await calado.flush();
The conversation id is optional — calado still captures calls without it — but strongly recommended. Without it, every LLM call is treated as a one-off and patterns are much harder to surface.
flush() sends the queued events now instead of waiting for the 5-second timer. Required in scripts and serverless (the timer won’t fire before the process exits or freezes). Harmless in long-running servers. See Quickstart for the full walkthrough including API key generation.

Conversation tracking

A single messages.create call is one turn. Real conversations span many turns. Group them so calado can analyze them as one session. Wrap your multi-turn logic in calado.withConversation. Every SDK call inside inherits the same id.
await calado.withConversation(`session_${sessionId}`, async () => {
  await anthropic.messages.create({ /* turn 1 */ });
  await anthropic.messages.create({ /* turn 2 */ });
  await anthropic.messages.create({ /* turn 3 */ });
});
Use any stable string: a session token, a database row id, a UUID.

Setting a user ID

withContext is the longer form. Use it to also attribute a conversation to an end user.
await calado.withContext(`session_${sessionId}`, `user_${userId}`, async () => {
  await anthropic.messages.create({ /* ... */ });
});
withConversation(id, fn) is sugar for withContext(id, undefined, fn).

Edge runtime fallback

Vercel Edge, Cloudflare Workers, and similar environments don’t support AsyncLocalStorage. Use the direct setter and reset it manually.
calado.conversationId = `session_${sessionId}`;
try {
  await anthropic.messages.create({ /* ... */ });
} finally {
  calado.conversationId = undefined;
}
The setter is global to the isolate. In concurrent handlers, it can cause cross-talk. Prefer withConversation whenever AsyncLocalStorage is available.

Multi-agent orchestrators

If your app runs an orchestrator that dispatches sub-agents, wrap each sub-agent boundary with runStep. Every SDK call inside the boundary attaches that step to its ingestion events, so calado can render the run as a tree and attribute behavior to the right sub-agent.
import { calado, runStep } from "calado";

const anthropic = calado.wrap(new Anthropic());

await runStep({ id: "orchestrator-root", roleName: "orchestrator" }, async () => {
  // The orchestrator decides which sub-agent to call.
  await anthropic.messages.create({ /* ... */ });

  // Dispatch to a flight_search sub-agent. The inner step's parentId
  // auto-inherits the outer step's id ("orchestrator-root").
  await runStep({ id: "flight-search-1", roleName: "flight_search" }, async () => {
    await anthropic.messages.create({ /* ... */ });
  });
});
Nested runStep calls inherit parentId from the enclosing step. Set parentId explicitly to override inheritance — for example, when the parent lives in another process or you’re stitching a step into an existing trace. Each wrapped call’s system prompt is captured as that step’s inline definition, so every sub-agent is analyzed against its own prompt rather than the orchestrator’s. A caller-supplied inlineDefinition on the step always wins.

Per-call override

For one-off calls, runtimes without AsyncLocalStorage, or when you’d rather not wrap the call site, pass step through the calado namespace on the request:
await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  messages: [{ role: "user", content: "..." }],
  calado: {
    step: { id: "explicit-step", parentId: "orchestrator-root", roleName: "flight_search" },
  },
});
The override beats any surrounding runStep context. The calado field is stripped before the request reaches the provider SDK.

Provider support

calado.wrap() only captures Anthropic (.messages.create / .messages.stream) and OpenAI (.chat.completions.create) clients. Anything else — Vercel AI SDK, LangChain.js, Google Gemini, AWS Bedrock, Mistral, Cohere, raw fetch — passes through unchanged and nothing is captured. For those, send events with the Direct API; its payload accepts the same anthropic / openai_chat shapes the SDK emits, so it plugs into any framework’s completion callback (e.g. the Vercel AI SDK’s onFinish).Running on LangChain or LangGraph in Python? Use the LangChain callback handler instead — it captures chain hierarchy, tool boundaries, and retrievals that wrap() can’t see.

Anthropic

calado.wrap(anthropicClient) taps client.messages.create and client.messages.stream. Both streaming and non-streaming calls are captured. Aborted streams are stored with metadata.partial = true.

OpenAI

calado.wrap(openaiClient) taps client.chat.completions.create. Streaming and non-streaming both work. The SDK does not yet wrap the responses.create endpoint automatically — use the Direct API for that endpoint, or for any non-Anthropic / non-OpenAI client.

What wrap detects

Client hasSDK treats it as
.messages.create is a functionAnthropic
.chat.completions.create is a functionOpenAI
NeitherReturns the client unchanged. No capture.

Streaming

Both Anthropic and OpenAI streaming work without extra code. calado reconstructs the final response from the stream and captures it on the stream’s completion.
const stream = await anthropic.messages.stream({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "tell me a joke" }],
});

for await (const chunk of stream) {
  process.stdout.write(chunk);
}
// calado captures the reconstructed response automatically.
If the stream is aborted or errors mid-flight, the partial response is captured with metadata.partial = true.

Serverless

Serverless runtimes freeze between invocations, so the 5-second flush timer often never fires. You must flush explicitly before your handler returns. See Serverless patterns for the full set of runtime recipes.

Runtime behavior

  • Batching. Events accumulate in memory until batchSize is reached or flushInterval fires.
  • Overflow. When the queue exceeds maxQueueEvents or maxQueueBytes, oldest events are dropped first.
  • Retry. 5xx responses and network errors retry twice with jittered backoff. 4xx responses drop immediately.
  • 401 auto-disable. After three consecutive 401 responses, the transport disables itself to stop filling your logs. Fix the key and call init() again to re-enable.
  • Never throws at runtime. Your LLM calls always succeed even if calado’s transport is failing. Init is the only place calado can throw.

API reference

MethodPurpose
calado.init(apiKey, options?)Initialize. Throws on bad config.
calado.wrap<T>(client)Return a Proxy over an Anthropic or OpenAI client. Unknown clients pass through.
calado.flush()Send queued data now. Returns a Promise.
calado.shutdown()Flush and clear timers. Use on graceful shutdown.
calado.status()Return { enabled, queued, lastIngestAt, lastError, consecutive401s }.
calado.withContext(convId, userId, fn)Run fn with conversation and user context (Node only).
calado.withConversation(id, fn)Shorthand for withContext(id, undefined, fn).
calado.conversationIdGet or set the active conversation id (scripts and Edge runtimes).
runStep(step, fn)Run fn inside a step context for multi-agent orchestrators. Nested calls auto-inherit parentId. See Multi-agent orchestrators.

Redacting sensitive data

Pass a mask callback at init and the SDK runs it synchronously on every captured event before transport. Return the redacted event, or return null to drop the event entirely. The function runs inside your process, so raw PII never leaves the machine. The full guide lives at Redacting sensitive data. It covers a runnable email + phone regex, the createPlaceholderTracker helper for stable cross-turn placeholders, fail-closed semantics, and a pairing recipe with Presidio.

Testing

Don’t call calado.init() in your test environment. When the SDK isn’t initialized, wrap() returns the client unchanged and nothing is sent anywhere. For explicit teardown after init:
await calado.shutdown();

Requirements

  • Node.js 18 or later
  • TypeScript 5+ (types ship with the package)

Next: serverless patterns

Flush the queue before your function freezes.