Overview
The calado SDK returns a Proxy over your existing LLM client. Every call tomessages.create (Anthropic) or chat.completions.create (OpenAI) is tapped, serialized, and batched for ingestion. Your call-sites don’t change.
Installation
npm: specifier). Vercel Edge Runtime works for the basic wrap but doesn’t support withConversation (no AsyncLocalStorage).
Your API key is generated in your agent’s Settings page on app.calado.ai. See Quickstart for the step-by-step.
Quick start
Wrap your LLM client and pass a conversation id so calado can group turns into sessions. Any stable string works — a chat session id, a support ticket number, a DB row id.The conversation id is optional — calado still captures calls without it — but strongly recommended. Without it, every LLM call is treated as a one-off and patterns are much harder to surface.
flush() sends the queued events now instead of waiting for the 5-second timer. Required in scripts and serverless (the timer won’t fire before the process exits or freezes). Harmless in long-running servers.
See Quickstart for the full walkthrough including API key generation.
Conversation tracking
A singlemessages.create call is one turn. Real conversations span many turns. Group them so calado can analyze them as one session.
Setting a conversation ID (recommended)
Wrap your multi-turn logic incalado.withConversation. Every SDK call inside inherits the same id.
Setting a user ID
withContext is the longer form. Use it to also attribute a conversation to an end user.
withConversation(id, fn) is sugar for withContext(id, undefined, fn).
Edge runtime fallback
Vercel Edge, Cloudflare Workers, and similar environments don’t supportAsyncLocalStorage. Use the direct setter and reset it manually.
Multi-agent orchestrators
If your app runs an orchestrator that dispatches sub-agents, wrap each sub-agent boundary withrunStep. Every SDK call inside the boundary attaches that step to its ingestion events, so calado can render the run as a tree and attribute behavior to the right sub-agent.
runStep calls inherit parentId from the enclosing step. Set parentId explicitly to override inheritance — for example, when the parent lives in another process or you’re stitching a step into an existing trace.
Each wrapped call’s system prompt is captured as that step’s inline definition, so every sub-agent is analyzed against its own prompt rather than the orchestrator’s. A caller-supplied inlineDefinition on the step always wins.
Per-call override
For one-off calls, runtimes withoutAsyncLocalStorage, or when you’d rather not wrap the call site, pass step through the calado namespace on the request:
runStep context. The calado field is stripped before the request reaches the provider SDK.
Provider support
Anthropic
calado.wrap(anthropicClient) taps client.messages.create and client.messages.stream. Both streaming and non-streaming calls are captured. Aborted streams are stored with metadata.partial = true.
OpenAI
calado.wrap(openaiClient) taps client.chat.completions.create. Streaming and non-streaming both work. The SDK does not yet wrap the responses.create endpoint automatically — use the Direct API for that endpoint, or for any non-Anthropic / non-OpenAI client.
What wrap detects
| Client has | SDK treats it as |
|---|---|
.messages.create is a function | Anthropic |
.chat.completions.create is a function | OpenAI |
| Neither | Returns the client unchanged. No capture. |
Streaming
Both Anthropic and OpenAI streaming work without extra code. calado reconstructs the final response from the stream and captures it on the stream’s completion.metadata.partial = true.
Serverless
Serverless runtimes freeze between invocations, so the 5-second flush timer often never fires. You must flush explicitly before your handler returns. See Serverless patterns for the full set of runtime recipes.Runtime behavior
- Batching. Events accumulate in memory until
batchSizeis reached orflushIntervalfires. - Overflow. When the queue exceeds
maxQueueEventsormaxQueueBytes, oldest events are dropped first. - Retry. 5xx responses and network errors retry twice with jittered backoff. 4xx responses drop immediately.
- 401 auto-disable. After three consecutive 401 responses, the transport disables itself to stop filling your logs. Fix the key and call
init()again to re-enable. - Never throws at runtime. Your LLM calls always succeed even if calado’s transport is failing. Init is the only place calado can throw.
API reference
| Method | Purpose |
|---|---|
calado.init(apiKey, options?) | Initialize. Throws on bad config. |
calado.wrap<T>(client) | Return a Proxy over an Anthropic or OpenAI client. Unknown clients pass through. |
calado.flush() | Send queued data now. Returns a Promise. |
calado.shutdown() | Flush and clear timers. Use on graceful shutdown. |
calado.status() | Return { enabled, queued, lastIngestAt, lastError, consecutive401s }. |
calado.withContext(convId, userId, fn) | Run fn with conversation and user context (Node only). |
calado.withConversation(id, fn) | Shorthand for withContext(id, undefined, fn). |
calado.conversationId | Get or set the active conversation id (scripts and Edge runtimes). |
runStep(step, fn) | Run fn inside a step context for multi-agent orchestrators. Nested calls auto-inherit parentId. See Multi-agent orchestrators. |
Redacting sensitive data
Pass amask callback at init and the SDK runs it synchronously on every captured event before transport. Return the redacted event, or return null to drop the event entirely. The function runs inside your process, so raw PII never leaves the machine. The full guide lives at Redacting sensitive data. It covers a runnable email + phone regex, the createPlaceholderTracker helper for stable cross-turn placeholders, fail-closed semantics, and a pairing recipe with Presidio.
Testing
Don’t callcalado.init() in your test environment. When the SDK isn’t initialized, wrap() returns the client unchanged and nothing is sent anywhere.
For explicit teardown after init:
Requirements
- Node.js 18 or later
- TypeScript 5+ (types ship with the package)
Next: serverless patterns
Flush the queue before your function freezes.