The LangChain handler is in preview. Pin exactly (
calado-langchain==0.0.1) until 1.0 — see Upgrading.Overview
calado-langchain is a BaseCallbackHandler subclass. You attach it to a chain and LangChain itself dispatches the events calado needs — chain inputs and outputs, LLM messages, tool calls, retrievals, errors, latency, and the parent-child run tree. There is nothing to call from your own code.
Compared to wrapping the LLM client directly, the callback handler captures the layer above:
| Captured | Wrap-SDK | Callback handler |
|---|---|---|
| LLM messages | yes | yes |
| Tool boundaries | no | yes |
| Retrievals | no | yes |
| Chain hierarchy | no | yes |
| Retries and intermediate reasoning | no | yes |
anthropic / openai Python clients, send events with the Direct API.
LangGraph
LangGraph reuses LangChain’s callback system. The same handler works — node transitions arrive as nested chain runs and are rendered as a tree in the dashboard. No extra configuration.LangGraph
Installation
langchain-core>=0.2.10,<0.5.
Your API key is generated in your agent’s Settings page on app.calado.ai. See Quickstart for the step-by-step.
Quick start
Construct the handler with no arguments (it readsCALADO_API_KEY from the environment) and pass it on the chain’s config.
The
session_id is optional — calado still captures runs without it — but strongly recommended. Without it, every top-level chain invocation is treated as a one-off and patterns are harder to surface.Verifying it works
On the first successful POST, calado prints a one-line message so you know the wire is up:logging module (logging.getLogger("calado")) and mirrored to stderr when no handler is attached, so the line is visible by default and easy to silence or redirect.
For programmatic checks, call status():
debug=True at construction for per-batch stderr lines:
If you saw the
Connected line and events_sent > 0, the integration is working. Open your agent page on app.calado.ai to see the conversation.Zero-touch mode
Set environment variables and skip the per-chain wiring entirely:Scoping to one or more chains
Pass the handler on the chains you want reported. Group turns into a session by passingconfigurable.session_id:
What gets captured
| calado field | LangChain source |
|---|---|
conversation.messages | Root chain inputs and outputs |
| Step name | Run.name |
| Step hierarchy | run_id / parent_run_id |
| System prompt | Serialized prompt template (serialized.kwargs) |
| Tool schemas | Serialized tool definitions (serialized.kwargs.tools) |
| Per-step inline definition | A child run’s system-role input message, attached as that step’s inlineDefinition so the sub-agent is analyzed against its own prompt |
| Tool calls | on_tool_start / on_tool_end |
| Retrievals | on_retriever_end |
| Errors | on_*_error |
| Latency | Computed from start/end timestamps |
| Session id | RunnableConfig.configurable.session_id |
| Tags and metadata | Pass-through from RunnableConfig |
Redacting sensitive data
Pass amask callable at construction. It runs in-process, per child run, before transport. Return a modified dict, or return None to drop just that step (the rest of the tree is captured).
await the mask. If the mask raises 100 times in a row, the transport disables itself fail-closed and prints:
Runtime behavior
- Sandboxed. LangChain catches and logs handler exceptions. calado cannot break your chain.
-
Streaming. Per-token events buffer in the accumulator and materialize on
on_llm_end. One row per LLM call, not per token. -
Batching. Events accumulate until
batch_size(default 10) orflush_interval_s(default 30s) is reached. The first event of a process force-flushes immediately so you don’t wait 30 seconds for theConnectedline. -
Root-end flush. Each conversation is materialized on the root
on_chain_endand queued atomically. -
Crash safety.
atexitdrains the queue synchronously on the foreground thread with a 5s bounded timeout. Hard crashes andCtrl-Cmay lose in-flight trees. - Retry. 5xx and network errors retry with backoff. 4xx responses drop immediately.
-
401 auto-disable. After three consecutive 401 responses, the transport disables and prints one line:
-
Child-run cap. A single root run may have at most 950 child runs. Above the cap the server returns a structured 400 and the handler prints:
In practice the 5 MB request body cap is often the binding constraint and is reached before 950 runs when payloads are large. Whichever limit you hit first, the fix is the same: split via
configurable.session_idor reduce trace verbosity. -
Hung runs. Accumulator entries older than
max_run_age_s(default 1h) are evicted, counted inevents_dropped, and logged at WARNING. -
Logging. All output routes through
logging.getLogger("calado"). Attach your own handler to route to Datadog, Sentry, structured logs, or silence it entirely. Stderr is the default mirror. - Threading. Transport runs on a dedicated daemon thread; it does not block your asyncio event loop.
API reference
Constructor
CaladoCallbackHandler is keyword-only. Positional construction is forbidden.
Agent API key. Generated in Settings on the agent page.
Override for self-hosted or staging environments.
Events accumulated before an HTTP flush.
Seconds before the background thread flushes a partial batch.
Evict in-flight accumulator entries older than this many seconds.
Hard ceiling on queued bytes. On overflow the oldest event is dropped.
Print one stderr line per batch flush.
Sync or async callable. See Redacting sensitive data.
Methods
| Method | Purpose |
|---|---|
flush() | Drain the queue synchronously. |
shutdown() | Flush and stop the background thread. |
status() | Return the status dict. |
astatus() | Async wrapper over status() (uses asyncio.to_thread). Use this from a FastAPI async endpoint. |
Type aliases
Formypy --strict users (the package ships py.typed):
Ingest response shape
ThePOST /api/ingest endpoint returns:
agent_id to render the deep link in the Connected log line and warns when the installed package version is below min_sdk_version. Older servers may omit these fields; the handler falls back to the generic View at https://app.calado.ai/agents URL.
Testing
Don’t construct the handler in tests. WithoutCALADO_API_KEY, construction logs a warning and transport stays disabled.
For explicit teardown after construction:
Troubleshooting
[calado] Connected. never prints
enabled: False→ see the 401 section below.mask_failures > 0→ yourmaskis throwing. Fix and restart.events_sent == 0andlast_errorset → inspectlast_error. Often a 4xx from a malformed payload or an unreachablebase_url.events_sent == 0and no error → the chain hasn’t completed a root run yet, orCALADO_API_KEYwas empty at construction.
401 from the server
calado auto-disables after three consecutive 401 responses. Check thatCALADO_API_KEY starts with cl_ and matches an agent you own, then restart the process.
Inspect from a FastAPI endpoint
Mount the async status dict on a hidden route so on-call can read it without a deploy:astatus() from async code — it wraps status() via asyncio.to_thread and does not block the event loop.
Run the doctor
calado-langchain version, the installed langchain-core version, the supported range, the Mask import status, and the fix command if pip resolution fails.
Upgrading
calado-langchain follows a pre-1.0 semver policy: any 0.x.y minor bump may introduce breaking changes. Pin exactly until 1.0:
min_sdk_version on every ingest response (see Ingest response shape). When your installed version drops below it, the handler logs a WARNING via the calado logger.
Migrating after a langchain-core major bump
langchain-core is pinned to >=0.2.10,<0.5. When the next major lands, follow this template:
- Pin
langchain-coreandcalado-langchainexactly in a feature branch. - Run
python -m calado_langchain doctorand confirm the supported range covers your target. - Update both pins, then run the doctor again.
- Run your chain locally. Confirm the
Connectedline and one ingested conversation in the dashboard. - Roll out behind the same flag as the
langchain-corebump.
Production deployment
Uvicorn and Gunicorn
preload_app=False is the supported mode. The handler lazy-registers its atexit hook on the first event in each worker, so each worker drains its own queue cleanly on shutdown.
AWS Lambda
Lambda sendsSIGKILL on freeze, which skips atexit. Disable the hook and flush from your handler:
Google Cloud Run
Cloud Run sendsSIGTERM (10s grace) then SIGKILL. Trap SIGTERM and call flush() before exit, or set CALADO_DISABLE_ATEXIT=true and rely on an explicit per-request flush.
Forked workers
The handler registers anos.register_at_fork(after_in_child=...) hook that clears the parent accumulator in the child. You don’t need to do anything — child workers start with a clean state.
Caps
| Limit | Value |
|---|---|
| Request body size | 5 MB |
| Child runs per root | 950 |
| Conversations + definitions per request | 1,000 |
| Per-message content size | 1 MB |
Next: redacting sensitive data
Strip PII before events leave your process.