feat(closes OPEN-10634): Claude Agent SDK Python integration by viniciusdsmello · Pull Request #641 · openlayer-ai/openlayer-python

viniciusdsmello · 2026-05-12T16:24:08Z

Summary

First-party Openlayer tracing for the Claude Agent SDK (claude-agent-sdk on PyPI).

trace_claude_agent_sdk() one-shot init monkey-patches claude_agent_sdk.query and ClaudeSDKClient so every subsequent call auto-traces, no per-call code change required.
traced_query() per-call wrapper for explicit scoping.
Hooks compose with user-provided hooks (PreToolUse / PostToolUse / PostToolUseFailure); we never replace.
Captures: root AGENT step per query(), nested CHAT_COMPLETION per assistant turn, TOOL per tool call. Subagents nest via parent_tool_use_id. MCP tools parsed (mcp__server__tool). Cost/tokens/duration/session_id/agent_config on root. Stream is a pure observer — identical messages in identical order.

Companion TypeScript PR at openlayer-ai/openlayer-ts#208 (OPEN-10635).

Test plan

14 unit tests pass: pytest tests/integrations/test_claude_agent_sdk.py -v
Live integration test runs end-to-end against the real SDK (gated on ANTHROPIC_API_KEY)
Example notebook examples/tracing/claude_agent_sdk/claude_agent_sdk_tracing.ipynb runs end-to-end
ruff clean on touched files

Plan deviations (documented in individual commits)

Live-test publish hook is _upload_and_publish_trace, not _publish_trace_async as initially planned.
AgentStep has no cost/tokens fields — values land in root_step.metadata and surface at trace level via post_process_trace metadata spreading.
Tracer exposes no _start_step_manually / _end_step_manually; used a context-manager-handle fallback.

Closes OPEN-10634.

🤖 Generated with Claude Code

…tokens Plan deviation: the plan referenced patching '_publish_trace_async', but the actual tracer function is '_upload_and_publish_trace' (see src/openlayer/lib/tracing/tracer.py:1745). Tests patch that one. Also: AgentStep does not carry cost/tokens fields directly, so we log them into root_step.metadata. post_process_trace spreads root metadata into the trace_data payload, which surfaces them at the trace level for ingestion.

Each AssistantMessage in the stream produces a CHAT_COMPLETION child of the root AGENT step, with text content as output, ThinkingBlock content in metadata.thinking, and ToolUseBlock IDs in metadata.tool_calls. Subagent assistant messages (parent_tool_use_id set) push the corresponding Agent ToolStep onto the contextvar stack so they nest beneath it.

…se hooks Adds composed PreToolUse / PostToolUse / PostToolUseFailure hooks. They open a TOOL step on PreToolUse and finalize it on Post-(success|failure). The hooks are appended to user-provided hooks (never replace them) so user hook decisions like permissionDecision still take effect. Uses a _ToolStepHandle helper that owns the create_step context manager and calls __enter__/__exit__ manually so the step can span the two hook callbacks. Plan deviation: tracer.py exposes no _start_step_manually / _end_step_manually primitives; this CM-handle pattern is the workable alternative the plan flagged in its risk callouts. Also parses mcp__<server>__<tool> tool names into metadata.mcp_server and metadata.mcp_tool_name in the same pass.

…Step The _resolve_subagent_parent helper looks up the pending Agent ToolStep by parent_tool_use_id and pushes it onto the contextvar stack before opening the subagent's chat-completion step. The Agent ToolStep is kept open across the subagent's stream by virtue of PostToolUse firing only after the subagent returns.

Tests both error paths exercised by the implementation from A2 and A4: - ResultMessage(subtype='error_max_turns', is_error=True) on the root step's metadata - PostToolUseFailure marking the tool step as errored with the error message

…data

Adds the one-shot init helper that monkey-patches claude_agent_sdk.query. Idempotent — subsequent calls update the module-level _config but don't re-wrap the function. Stashes the original query on the patched callable as _openlayer_original so traced_query can call it without recursing back through the patch.

…de_agent_sdk() Patches ClaudeSDKClient.__init__ (to compose hooks), .query (to open the root AGENT step), and .receive_response (to observe streamed messages and close the step when the generator exhausts). Idempotent via _openlayer_patched sentinel on the class.

…_API_KEY) The test runs a real query() against claude-haiku-4-5 and verifies the wrapper observes the SDK's message stream. It tolerates a trailing Exception from the SDK (which raises after delivering ResultMessage when the API returns is_error=True, e.g. on an invalid Anthropic API key) so the test still exercises trace publishing end-to-end.

Adds: - examples/tracing/claude_agent_sdk/claude_agent_sdk_tracing.ipynb — a concise 12-cell notebook covering install, init, simple query, and a subagent example - openlayer.lib.trace_claude_agent_sdk and openlayer.lib.traced_claude_agent_sdk_query as public re-exports from the integration module, matching the pattern used by trace_google_adk and other integrations

- Imports sorted by ruff --fix in test/mock helpers and the live test - File-level ruff: noqa: ARG001 in the test file (the fake_query helpers match the SDK signature but don't use prompt/options/kwargs) - # noqa: T201 on the example notebook's print() lines, matching how the sibling google-adk example notebook handles the same lint

… title The root AGENT step title was being built from the prompt content ("claude-agent-sdk: Say the word 'banana'..."), making the trace sidebar in Openlayer noisy and inconsistent across runs. Use the stable name "Claude Agent SDK query" instead. The prompt content is still captured in root_step.inputs.prompt where it belongs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nts on root metadata The root AGENT step was missing the user-provided system prompt and subagent definitions that drove the run. Spec called for both, but the initial implementation only captured the SDK's runtime-resolved agent_config from SystemMessage(init), not the user's input options. Capture, on the root step's metadata: - system_prompt (truncated to 4096 chars; supports string, preset dict, and SystemPromptFile dataclass shapes) - agents_defined: { name -> { description, prompt, tools, model } } - options: { model, fallback_model, max_turns, max_budget_usd, permission_mode, cwd, allowed_tools, disallowed_tools, continue_conversation, resume, fork_session } For ClaudeSDKClient, stash the original options at patched_init() time so they're available to patched_query() before our hook injection mutates them. New test: test_options_metadata_captured_on_root_step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ns to exercise options-metadata capture The live test now passes system_prompt and max_turns so the published Openlayer trace surfaces the new metadata captured on the root step (system_prompt, options.max_turns), proving end-to-end that the wrapper captures the user's configuration in addition to the SDK's runtime-resolved agent_config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ase token aliases on every step Previously the trace published to Openlayer was missing key visibility: - Assistant turns with empty text content (e.g. thinking-only turns or pure tool-call turns) appeared blank in the UI. Fall back to a tool call summary, thinking text, or '[no content]' marker so reviewers always see something useful. - Top-level assistant turns had no 'inputs' set, so the UI couldn't show what prompt triggered them. Surface the user's prompt as the step input for top-level turns (subagent turns are driven by their parent's Agent tool call, not a user prompt). - The raw assistant message content was never serialized. Set the ChatCompletionStep's raw_output field with the full block array so reviewers can inspect every text/thinking/tool_use block. - ToolUseBlocks were captured by id only; widen to { id, name, input } so tool calls are inspectable from the assistant turn. - Root step had no rawOutput surface; stash a JSON-serialized ResultMessage in metadata.rawOutput. - promptTokens / completionTokens were stuck in metadata under snake_case keys that the trace UI doesn't render. Add camelCase aliases so the Metrics 'Prompt' and 'Completion' boxes populate. - Capture state.model from SystemMessage(init) and surface it on the root step so reviewers see the resolved model. - Capture state.user_prompt so assistant turn steps can show the triggering prompt as their input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a richer example that exercises every step type the Openlayer wrapper captures, so users can see a full trace tree: - Root AGENT step with system_prompt, agent_config, agents_defined, options, rawOutput. - Per-turn CHAT_COMPLETION steps with prompt/completion tokens, thinking blocks, tool_calls, raw_output. - TOOL steps for: - mcp__file-stats__count_files (custom in-process MCP tool) - Glob and Read (built-in) - Agent (twice: code-reviewer and summary-writer subagents) - Nested subagent steps correlated via parent_tool_use_id. The 'codebase analyzer' scenario walks the agent through three steps: count files by extension, dispatch a code-reviewer subagent to review one file, then dispatch a summary-writer subagent to wrap up. Result is a 4-line markdown report. Verified end-to-end against the live API as a plain Python script (5 turns, real subagent dispatch, real Openlayer trace upload). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

viniciusdsmello added 15 commits May 12, 2026 12:55

test(claude-agent-sdk): add test scaffold and mock helpers

c2c16d2

test(claude-agent-sdk): MCP tool name parsing

1ecd6dc

test(claude-agent-sdk): user hooks compose with Openlayer hooks

ec60af9

test(claude-agent-sdk): redact MCP server env/headers from trace meta…

bcdbd81

…data

test(claude-agent-sdk): wrapper preserves stream identity and order

2f55d09

viniciusdsmello mentioned this pull request May 12, 2026

feat(closes OPEN-10635): Claude Agent SDK TypeScript integration openlayer-ai/openlayer-ts#208

Open

5 tasks

viniciusdsmello and others added 5 commits May 12, 2026 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(closes OPEN-10634): Claude Agent SDK Python integration#641

feat(closes OPEN-10634): Claude Agent SDK Python integration#641
viniciusdsmello wants to merge 20 commits into
mainfrom
vini/open-10633-integration-add-claude-agent-sdk-support

viniciusdsmello commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

viniciusdsmello commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Plan deviations (documented in individual commits)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

viniciusdsmello commented May 12, 2026 •

edited

Loading