Redacting sensitive data

Why this exists

calado analyzes how your agent behaves, not the literal content your users type. When an email or a phone number gets replaced with a placeholder like [EMAIL_1] before the SDK ships the event, the analysis still sees the same agent decisions and the same failure modes. You keep the signal, the raw value never leaves your process. The mask callback is the supported way to do that. It runs synchronously inside your runtime, before transport, on every event the SDK is about to send. Bring your own detector. Anything that can return a redacted copy of the event synchronously will do.

The `mask` callback

import { calado, type IngestionEvent } from "calado";

calado.init(process.env.CALADO_API_KEY!, {
  mask: (event: IngestionEvent): IngestionEvent | null | undefined => {
    // return a redacted copy, or null/undefined to drop the event entirely
    return event;
  },
});

It runs synchronously. enqueue is on the LLM call hot path, so mask cannot be async. If you need a network call to a vault, do that scrubbing before the wrapped LLM call. One callback covers everything the SDK captures: conversation events, system-prompt definitions, and tool-schema definitions. Branch on event.kind if you need different rules. Returning null or undefined drops the event. That is also the way to exclude an entire conversation rather than redact it. Anything else replaces the original event in the queue.

A real regex example

This one is fully runnable. It scrubs emails and a few common phone-number shapes from request messages and from the model’s reply, branching on event.format so the Anthropic and OpenAI response shapes are both handled.

import { calado, type IngestionEvent } from "calado";

const EMAIL = /[\w.+-]+@[\w-]+\.[\w.-]+/g;
const PHONE = /\+?\d[\d\s().-]{7,}\d/g;

function scrub(value: string): string {
  return value.replace(EMAIL, "[EMAIL]").replace(PHONE, "[PHONE]");
}

// Walks a content field that may be a plain string OR an array of content
// blocks (Anthropic content blocks, OpenAI multimodal parts, tool results).
// Mutates blocks in place; returns a possibly-replaced value for the string
// case so callers can reassign.
function scrubContent(value: unknown): unknown {
  if (typeof value === "string") return scrub(value);
  if (Array.isArray(value)) {
    for (const block of value) {
      if (block && typeof block === "object") {
        const b = block as { text?: unknown; content?: unknown };
        if (typeof b.text === "string") b.text = scrub(b.text);
        if (b.content !== undefined) b.content = scrubContent(b.content);
      }
    }
  }
  return value;
}

calado.init(process.env.CALADO_API_KEY!, {
  mask: (event: IngestionEvent) => {
    if (event.kind === "definition") {
      event.payload.content = scrub(event.payload.content);
      return event;
    }
    if (event.kind !== "conversation") return event;

    const req = event.payload.request as {
      messages?: Array<{ content?: unknown }>;
    };
    for (const m of req.messages ?? []) m.content = scrubContent(m.content);

    if (event.format === "anthropic") {
      const res = event.payload.response as { content?: unknown };
      res.content = scrubContent(res.content);
    } else if (event.format === "openai_chat") {
      const res = event.payload.response as {
        choices?: Array<{ message?: { content?: unknown } }>;
      };
      for (const c of res.choices ?? []) {
        if (c.message) c.message.content = scrubContent(c.message.content);
      }
    }
    return event;
  },
});

Open your agent in the calado dashboard, pick a recent conversation, and confirm the request body shows [EMAIL] and [PHONE] instead of the raw values. That round-trip is the proof that the mask is wired up.

Pair with a detector

calado has no opinion on which detector you use. The only contract is that mask returns the redacted event synchronously. A few options that fit different stacks:

Microsoft Presidio. Python. The strongest open-source PII detector, with named-entity recognition, regex, and rule-based recognizers across multiple languages. It does not run in Node, so the integration pattern is a sidecar: deploy Presidio as a service, run detection during request handling, attach the redaction map to AsyncLocalStorage, and have mask read pre-computed labels synchronously. Worth the infra if you already run Python services or need multi-language coverage.
LLM Guard. Python, by Protect AI. Same sidecar pattern as Presidio. Goes beyond PII with input and output scanners covering prompt injection, toxicity, secret leakage, malicious URLs, and bias. Pick this if you want one tool for sanitization and trust scoring instead of bolting two libraries together.
Custom regex. The starter example above is a real production pattern for stable entity formats (email, phone, common ID shapes). Cheap to run, no infra, deterministic. Wire it together with createPlaceholderTracker (see “Stable placeholders across a conversation” below) when you want stable per-conversation IDs. Don’t ship this alone if you process arbitrary free-text where new entity formats keep appearing.
Node-native PII libraries. A handful exist on npm (redact-pii, compromise-pii, etc.), mostly regex-based and not actively maintained at the moment. Useful as a faster start than writing the regex yourself, not a substitute for Presidio-grade coverage.

Whatever you pick, the SDK contract stays the same: mask runs synchronously, the failure path is fail-closed, and stable placeholders are available via createPlaceholderTracker. If your scrubber is heavy, see “Measuring mask cost” below before shipping.

Fields you probably want to scrub

calado does not pre-process or hash any of the values below. Whatever your mask returns is what reaches the server.

event.payload.request.messages[].content. Every user and assistant turn on conversation events. May be a string or an array of content blocks (multimodal, tool results, Anthropic content blocks).
The model’s reply on conversation events. The exact path depends on event.format. Anthropic: event.payload.response.content is an array of content blocks like [{ type: 'text', text: '...' }]. OpenAI: event.payload.response.choices[].message.content is a string (or null on tool-call-only turns).
event.payload.metadata.userId. Populated only when you wrap calls with calado.withContext(convId, userId, ...). If you forward a raw user id without scrubbing it, you have effectively defeated the rest of your DPA work.
event.payload.metadata.conversationId. Same caveat. Stable, but often derived from a session token or a PII-bearing identifier.
Provider-supplied message IDs in the request and response payload, if those contain anything you treat as sensitive.
For definition events: event.payload.content carries the system prompt or tool schema verbatim.

userId and conversationId only appear when the customer code path used withContext or withConversation. If you never call those, the metadata stays empty and you can skip those branches.

Stable placeholders across a conversation

Replacing every email with the literal string [EMAIL] works, but it loses information the analysis can still use. If the customer pasted the same email twice in a five-turn support thread, [EMAIL_1] in both turns lets calado see that the agent is referring back to the same value. Two distinct emails that resolve to [EMAIL_1] and [EMAIL_2] keep that distinction. createPlaceholderTracker gives you a counter that memoizes per (category, raw) and survives across mask invocations. One instance has one flat counter namespace, so the typical pattern is one tracker per conversation, kept in a Map keyed by conversation id.

import { calado, createPlaceholderTracker, type PlaceholderTracker } from "calado";

const trackers = new Map<string, PlaceholderTracker>();

function getTracker(convId: string): PlaceholderTracker {
  let t = trackers.get(convId);
  if (!t) {
    t = createPlaceholderTracker();
    trackers.set(convId, t);
  }
  return t;
}

calado.init(process.env.CALADO_API_KEY!, {
  mask: (event) => {
    if (event.kind !== "conversation") return event;
    const convId = event.payload.metadata?.conversationId ?? "unknown";
    const tracker = getTracker(convId);
    // tracker.placeholder("EMAIL", rawEmail) -> "[EMAIL_1]" or the existing label
    // ...apply with the regex of your choice...
    return event;
  },
});

The contract:

tracker.placeholder(category, raw) returns [CATEGORY_N]. The same (category, raw) pair returns the same label every time. The counter is per category, so emails and phones get independent numbering.
tracker.categories() returns the categories this tracker has seen so far. Useful when forwarding the placeholder vocabulary to downstream analysis.
tracker.reset() clears memoization and counters. Useful in tests, or as an alternative to a per-conversation Map if you only ever have one conversation in flight at a time.

A single tracker has one flat counter namespace, so reusing the same instance across conversations would mean [EMAIL_1] in conversation A and [EMAIL_1] in conversation B refer to the same person. Keeping one tracker per conversation (as the example above does) is what gives you “different people get different numbers, the same person gets the same number” inside a thread without leaking identifiers across threads. Evict entries from the Map when a conversation ends, otherwise trackers grows for the lifetime of the process.

Asserting the mask is configured in CI

If your team’s deploy contract is “no calado traffic without a mask,” put a one-line guard in your boot path or your test suite.

import { calado } from "calado";

if (!calado.status().maskConfigured) {
  throw new Error("calado mask must be configured in this environment");
}

maskConfigured flips to true the moment you pass a mask to init. It does not validate what the mask does, it only confirms one is present.

Measuring mask cost in production

mask runs on every captured LLM call. A heavy scrubbing library can add 100 to 300 milliseconds to your time-to-first-byte before the stream even starts. The 50ms debug warn the SDK emits is off by default in production, so a slow mask is silent unless you measure it. Wrap your function once and instrument it yourself.

import { calado, type MaskFn, type IngestionEvent } from "calado";

function timed(name: string, fn: MaskFn): MaskFn {
  return (event: IngestionEvent) => {
    const start = performance.now();
    try {
      return fn(event);
    } finally {
      const ms = performance.now() - start;
      if (ms > 5) yourMetrics.histogram(`calado.mask.${name}_ms`, ms);
    }
  };
}

calado.init(process.env.CALADO_API_KEY!, {
  mask: timed("scrub", scrubFn),
});

Aim for under 5ms per call. Past that, every captured request pays the cost twice (once for the LLM, once for your scrubber), and your end-user TTFB starts to feel it.

Handling streaming partial captures

When a stream is aborted mid-flight, the SDK still emits the captured-so-far envelope with metadata.partial = true. Run your mask unconditionally on it. The partial flag is preserved on whatever event you return.

calado.init(process.env.CALADO_API_KEY!, {
  mask: (event) => {
    if (event.kind === "conversation") {
      // ...same scrubbing as the full case...
    }
    return event;
  },
});

A common mistake is to gate the scrub with if (!event.payload.metadata?.partial). That skips redaction on exactly the events most likely to leak unexpected content, which is the wrong direction.

Beyond redaction: dropping events entirely

mask is also the way to keep specific conversations out of calado without unwrapping the client. Return null or undefined to drop the event before it enters the queue.

const blockedUsers = new Set(["user_42", "user_99"]);

calado.init(process.env.CALADO_API_KEY!, {
  mask: (event) => {
    if (event.kind === "conversation") {
      const userId = event.payload.metadata?.userId;
      if (userId && blockedUsers.has(userId)) return null;
    }
    return event;
  },
});

The same shape covers per-tenant allowlists, sampled capture (drop a percentage), and “never analyze conversations tagged sensitive.”

Failure mode is fail-closed

If your mask throws, the SDK drops the event. It does not fall back to sending the unredacted version. That contract is the whole point of having a mask. When something goes wrong:

The event is dropped.
status().maskFailures increments (cumulative, never resets).
status().consecutiveMaskFailures increments (resets the moment a mask call returns successfully, including an intentional null drop).
If debug: true is set, the failure is logged with the error message.

Returning a Promise, a string, or a malformed object is treated the same way. mask must return IngestionEvent, null, or undefined synchronously, anything else is a contract violation and the event is dropped.

Auto-disable and recovery

After 100 consecutive mask failures, the SDK auto-disables to prevent silent retry loops on a broken function. Once that happens:

status().enabled becomes false.
status().lastError is set to the literal string mask_disabled_after_100_failures: call calado.init() to recover.
All subsequent enqueues are no-ops until you call calado.init() again.

Calling init again replaces the transport and resets all counters, including consecutiveMaskFailures and lastError. That is the recovery path. The lastError string is stable and machine-parseable on purpose. Wire it into whatever alerting you already run.

setInterval(() => {
  const s = calado.status();
  if (s.enabled === false) {
    fetch(process.env.ALERT_WEBHOOK_URL!, {
      method: "POST",
      headers: { "content-type": "application/json" },
      body: JSON.stringify({
        text: `calado disabled: lastError=${s.lastError ?? ""} maskFailures=${s.maskFailures}`,
      }),
    }).catch(() => {});
  }
}, 60_000);

enabled === false is the load-bearing signal. A non-zero maskFailures count means something has been breaking, but the SDK is still trying. enabled === false means it has stopped trying.

What the mask sees

The event your function receives is a regular IngestionEvent. Two shapes:

type IngestionEvent =
  | { kind: "conversation"; format: "anthropic" | "openai_chat"; payload: ConversationEnvelope }
  | { kind: "definition"; payload: AgentDefinitionPayload };

On conversation events, the request that reaches the mask has system and tools already stripped out, those move to definition events. So if a regex pattern can appear inside a system prompt, you need to handle both kind === "conversation" and kind === "definition" to catch every occurrence.

IngestionEvent is part of the SDK’s public API. The shape is committed to semantic versioning, breaking changes ship in a new major version with the change called out in the changelog. Pin a major version, write your mask against the type, and you can lean on it.

Verify in the dashboard

Once your mask is wired up, deploy it, run a wrapped LLM call against your normal flow, and open the conversation in the calado dashboard. The request body that calado received is the masked one. If the placeholders are there and the raw values are not, you are done. If the dashboard still shows raw values, check calado.status().maskConfigured first, then maskedEvents to confirm the function is actually running on traffic.

​Why this exists

​The mask callback

​A real regex example

​Pair with a detector

​Fields you probably want to scrub

​Stable placeholders across a conversation

​Asserting the mask is configured in CI

​Measuring mask cost in production

​Handling streaming partial captures

​Beyond redaction: dropping events entirely

​Failure mode is fail-closed

​Auto-disable and recovery

​What the mask sees

​Verify in the dashboard

Why this exists

The `mask` callback

A real regex example

Pair with a detector

Fields you probably want to scrub

Stable placeholders across a conversation

Asserting the mask is configured in CI

Measuring mask cost in production

Handling streaming partial captures

Beyond redaction: dropping events entirely

Failure mode is fail-closed

Auto-disable and recovery

What the mask sees

Verify in the dashboard