Skip to content

feat(closes OPEN-10634): Claude Agent SDK Python integration#641

Open
viniciusdsmello wants to merge 20 commits into
mainfrom
vini/open-10633-integration-add-claude-agent-sdk-support
Open

feat(closes OPEN-10634): Claude Agent SDK Python integration#641
viniciusdsmello wants to merge 20 commits into
mainfrom
vini/open-10633-integration-add-claude-agent-sdk-support

Conversation

@viniciusdsmello
Copy link
Copy Markdown
Contributor

@viniciusdsmello viniciusdsmello commented May 12, 2026

Summary

First-party Openlayer tracing for the Claude Agent SDK (claude-agent-sdk on PyPI).

  • trace_claude_agent_sdk() one-shot init monkey-patches claude_agent_sdk.query and ClaudeSDKClient so every subsequent call auto-traces, no per-call code change required.
  • traced_query() per-call wrapper for explicit scoping.
  • Hooks compose with user-provided hooks (PreToolUse / PostToolUse / PostToolUseFailure); we never replace.
  • Captures: root AGENT step per query(), nested CHAT_COMPLETION per assistant turn, TOOL per tool call. Subagents nest via parent_tool_use_id. MCP tools parsed (mcp__server__tool). Cost/tokens/duration/session_id/agent_config on root. Stream is a pure observer — identical messages in identical order.

Companion TypeScript PR at openlayer-ai/openlayer-ts#208 (OPEN-10635).

Test plan

  • 14 unit tests pass: pytest tests/integrations/test_claude_agent_sdk.py -v
  • Live integration test runs end-to-end against the real SDK (gated on ANTHROPIC_API_KEY)
  • Example notebook examples/tracing/claude_agent_sdk/claude_agent_sdk_tracing.ipynb runs end-to-end
  • ruff clean on touched files

Plan deviations (documented in individual commits)

  1. Live-test publish hook is _upload_and_publish_trace, not _publish_trace_async as initially planned.
  2. AgentStep has no cost/tokens fields — values land in root_step.metadata and surface at trace level via post_process_trace metadata spreading.
  3. Tracer exposes no _start_step_manually / _end_step_manually; used a context-manager-handle fallback.

Closes OPEN-10634.

🤖 Generated with Claude Code

…tokens

Plan deviation: the plan referenced patching '_publish_trace_async', but the
actual tracer function is '_upload_and_publish_trace' (see
src/openlayer/lib/tracing/tracer.py:1745). Tests patch that one.

Also: AgentStep does not carry cost/tokens fields directly, so we log them
into root_step.metadata. post_process_trace spreads root metadata into the
trace_data payload, which surfaces them at the trace level for ingestion.
Each AssistantMessage in the stream produces a CHAT_COMPLETION child of the
root AGENT step, with text content as output, ThinkingBlock content in
metadata.thinking, and ToolUseBlock IDs in metadata.tool_calls.

Subagent assistant messages (parent_tool_use_id set) push the corresponding
Agent ToolStep onto the contextvar stack so they nest beneath it.
…se hooks

Adds composed PreToolUse / PostToolUse / PostToolUseFailure hooks. They open
a TOOL step on PreToolUse and finalize it on Post-(success|failure). The hooks
are appended to user-provided hooks (never replace them) so user hook decisions
like permissionDecision still take effect.

Uses a _ToolStepHandle helper that owns the create_step context manager and
calls __enter__/__exit__ manually so the step can span the two hook callbacks.
Plan deviation: tracer.py exposes no _start_step_manually / _end_step_manually
primitives; this CM-handle pattern is the workable alternative the plan
flagged in its risk callouts.

Also parses mcp__<server>__<tool> tool names into metadata.mcp_server and
metadata.mcp_tool_name in the same pass.
…Step

The _resolve_subagent_parent helper looks up the pending Agent ToolStep by
parent_tool_use_id and pushes it onto the contextvar stack before opening
the subagent's chat-completion step. The Agent ToolStep is kept open across
the subagent's stream by virtue of PostToolUse firing only after the subagent
returns.
Tests both error paths exercised by the implementation from A2 and A4:
- ResultMessage(subtype='error_max_turns', is_error=True) on the root step's
  metadata
- PostToolUseFailure marking the tool step as errored with the error message
Adds the one-shot init helper that monkey-patches claude_agent_sdk.query.
Idempotent — subsequent calls update the module-level _config but don't
re-wrap the function. Stashes the original query on the patched callable
as _openlayer_original so traced_query can call it without recursing back
through the patch.
…de_agent_sdk()

Patches ClaudeSDKClient.__init__ (to compose hooks), .query (to open the root
AGENT step), and .receive_response (to observe streamed messages and close
the step when the generator exhausts). Idempotent via _openlayer_patched
sentinel on the class.
…_API_KEY)

The test runs a real query() against claude-haiku-4-5 and verifies the
wrapper observes the SDK's message stream. It tolerates a trailing
Exception from the SDK (which raises after delivering ResultMessage when
the API returns is_error=True, e.g. on an invalid Anthropic API key) so
the test still exercises trace publishing end-to-end.
Adds:
- examples/tracing/claude_agent_sdk/claude_agent_sdk_tracing.ipynb — a
  concise 12-cell notebook covering install, init, simple query, and a
  subagent example
- openlayer.lib.trace_claude_agent_sdk and
  openlayer.lib.traced_claude_agent_sdk_query as public re-exports from
  the integration module, matching the pattern used by trace_google_adk
  and other integrations
- Imports sorted by ruff --fix in test/mock helpers and the live test
- File-level ruff: noqa: ARG001 in the test file (the fake_query helpers
  match the SDK signature but don't use prompt/options/kwargs)
- # noqa: T201 on the example notebook's print() lines, matching how the
  sibling google-adk example notebook handles the same lint
viniciusdsmello and others added 5 commits May 12, 2026 13:34
… title

The root AGENT step title was being built from the prompt content
("claude-agent-sdk: Say the word 'banana'..."), making the trace
sidebar in Openlayer noisy and inconsistent across runs. Use the
stable name "Claude Agent SDK query" instead. The prompt content is
still captured in root_step.inputs.prompt where it belongs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nts on root metadata

The root AGENT step was missing the user-provided system prompt and
subagent definitions that drove the run. Spec called for both, but the
initial implementation only captured the SDK's runtime-resolved
agent_config from SystemMessage(init), not the user's input options.

Capture, on the root step's metadata:
  - system_prompt (truncated to 4096 chars; supports string, preset
    dict, and SystemPromptFile dataclass shapes)
  - agents_defined: { name -> { description, prompt, tools, model } }
  - options: { model, fallback_model, max_turns, max_budget_usd,
    permission_mode, cwd, allowed_tools, disallowed_tools,
    continue_conversation, resume, fork_session }

For ClaudeSDKClient, stash the original options at patched_init() time
so they're available to patched_query() before our hook injection
mutates them.

New test: test_options_metadata_captured_on_root_step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ns to exercise options-metadata capture

The live test now passes system_prompt and max_turns so the published
Openlayer trace surfaces the new metadata captured on the root step
(system_prompt, options.max_turns), proving end-to-end that the
wrapper captures the user's configuration in addition to the SDK's
runtime-resolved agent_config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase token aliases on every step

Previously the trace published to Openlayer was missing key visibility:

- Assistant turns with empty text content (e.g. thinking-only turns or
  pure tool-call turns) appeared blank in the UI. Fall back to a tool
  call summary, thinking text, or '[no content]' marker so reviewers
  always see something useful.
- Top-level assistant turns had no 'inputs' set, so the UI couldn't
  show what prompt triggered them. Surface the user's prompt as the
  step input for top-level turns (subagent turns are driven by their
  parent's Agent tool call, not a user prompt).
- The raw assistant message content was never serialized. Set the
  ChatCompletionStep's raw_output field with the full block array so
  reviewers can inspect every text/thinking/tool_use block.
- ToolUseBlocks were captured by id only; widen to { id, name, input }
  so tool calls are inspectable from the assistant turn.
- Root step had no rawOutput surface; stash a JSON-serialized
  ResultMessage in metadata.rawOutput.
- promptTokens / completionTokens were stuck in metadata under
  snake_case keys that the trace UI doesn't render. Add camelCase
  aliases so the Metrics 'Prompt' and 'Completion' boxes populate.
- Capture state.model from SystemMessage(init) and surface it on the
  root step so reviewers see the resolved model.
- Capture state.user_prompt so assistant turn steps can show the
  triggering prompt as their input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a richer example that exercises every step type the Openlayer
wrapper captures, so users can see a full trace tree:

- Root AGENT step with system_prompt, agent_config, agents_defined,
  options, rawOutput.
- Per-turn CHAT_COMPLETION steps with prompt/completion tokens,
  thinking blocks, tool_calls, raw_output.
- TOOL steps for:
    - mcp__file-stats__count_files (custom in-process MCP tool)
    - Glob and Read (built-in)
    - Agent (twice: code-reviewer and summary-writer subagents)
- Nested subagent steps correlated via parent_tool_use_id.

The 'codebase analyzer' scenario walks the agent through three steps:
count files by extension, dispatch a code-reviewer subagent to review
one file, then dispatch a summary-writer subagent to wrap up. Result
is a 4-line markdown report.

Verified end-to-end against the live API as a plain Python script
(5 turns, real subagent dispatch, real Openlayer trace upload).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant