feat(closes OPEN-10634): Claude Agent SDK Python integration#641
Open
viniciusdsmello wants to merge 20 commits into
Open
feat(closes OPEN-10634): Claude Agent SDK Python integration#641viniciusdsmello wants to merge 20 commits into
viniciusdsmello wants to merge 20 commits into
Conversation
…tokens Plan deviation: the plan referenced patching '_publish_trace_async', but the actual tracer function is '_upload_and_publish_trace' (see src/openlayer/lib/tracing/tracer.py:1745). Tests patch that one. Also: AgentStep does not carry cost/tokens fields directly, so we log them into root_step.metadata. post_process_trace spreads root metadata into the trace_data payload, which surfaces them at the trace level for ingestion.
Each AssistantMessage in the stream produces a CHAT_COMPLETION child of the root AGENT step, with text content as output, ThinkingBlock content in metadata.thinking, and ToolUseBlock IDs in metadata.tool_calls. Subagent assistant messages (parent_tool_use_id set) push the corresponding Agent ToolStep onto the contextvar stack so they nest beneath it.
…se hooks Adds composed PreToolUse / PostToolUse / PostToolUseFailure hooks. They open a TOOL step on PreToolUse and finalize it on Post-(success|failure). The hooks are appended to user-provided hooks (never replace them) so user hook decisions like permissionDecision still take effect. Uses a _ToolStepHandle helper that owns the create_step context manager and calls __enter__/__exit__ manually so the step can span the two hook callbacks. Plan deviation: tracer.py exposes no _start_step_manually / _end_step_manually primitives; this CM-handle pattern is the workable alternative the plan flagged in its risk callouts. Also parses mcp__<server>__<tool> tool names into metadata.mcp_server and metadata.mcp_tool_name in the same pass.
…Step The _resolve_subagent_parent helper looks up the pending Agent ToolStep by parent_tool_use_id and pushes it onto the contextvar stack before opening the subagent's chat-completion step. The Agent ToolStep is kept open across the subagent's stream by virtue of PostToolUse firing only after the subagent returns.
Tests both error paths exercised by the implementation from A2 and A4: - ResultMessage(subtype='error_max_turns', is_error=True) on the root step's metadata - PostToolUseFailure marking the tool step as errored with the error message
Adds the one-shot init helper that monkey-patches claude_agent_sdk.query. Idempotent — subsequent calls update the module-level _config but don't re-wrap the function. Stashes the original query on the patched callable as _openlayer_original so traced_query can call it without recursing back through the patch.
…de_agent_sdk() Patches ClaudeSDKClient.__init__ (to compose hooks), .query (to open the root AGENT step), and .receive_response (to observe streamed messages and close the step when the generator exhausts). Idempotent via _openlayer_patched sentinel on the class.
…_API_KEY) The test runs a real query() against claude-haiku-4-5 and verifies the wrapper observes the SDK's message stream. It tolerates a trailing Exception from the SDK (which raises after delivering ResultMessage when the API returns is_error=True, e.g. on an invalid Anthropic API key) so the test still exercises trace publishing end-to-end.
Adds: - examples/tracing/claude_agent_sdk/claude_agent_sdk_tracing.ipynb — a concise 12-cell notebook covering install, init, simple query, and a subagent example - openlayer.lib.trace_claude_agent_sdk and openlayer.lib.traced_claude_agent_sdk_query as public re-exports from the integration module, matching the pattern used by trace_google_adk and other integrations
- Imports sorted by ruff --fix in test/mock helpers and the live test - File-level ruff: noqa: ARG001 in the test file (the fake_query helpers match the SDK signature but don't use prompt/options/kwargs) - # noqa: T201 on the example notebook's print() lines, matching how the sibling google-adk example notebook handles the same lint
5 tasks
… title
The root AGENT step title was being built from the prompt content
("claude-agent-sdk: Say the word 'banana'..."), making the trace
sidebar in Openlayer noisy and inconsistent across runs. Use the
stable name "Claude Agent SDK query" instead. The prompt content is
still captured in root_step.inputs.prompt where it belongs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nts on root metadata
The root AGENT step was missing the user-provided system prompt and
subagent definitions that drove the run. Spec called for both, but the
initial implementation only captured the SDK's runtime-resolved
agent_config from SystemMessage(init), not the user's input options.
Capture, on the root step's metadata:
- system_prompt (truncated to 4096 chars; supports string, preset
dict, and SystemPromptFile dataclass shapes)
- agents_defined: { name -> { description, prompt, tools, model } }
- options: { model, fallback_model, max_turns, max_budget_usd,
permission_mode, cwd, allowed_tools, disallowed_tools,
continue_conversation, resume, fork_session }
For ClaudeSDKClient, stash the original options at patched_init() time
so they're available to patched_query() before our hook injection
mutates them.
New test: test_options_metadata_captured_on_root_step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ns to exercise options-metadata capture The live test now passes system_prompt and max_turns so the published Openlayer trace surfaces the new metadata captured on the root step (system_prompt, options.max_turns), proving end-to-end that the wrapper captures the user's configuration in addition to the SDK's runtime-resolved agent_config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase token aliases on every step
Previously the trace published to Openlayer was missing key visibility:
- Assistant turns with empty text content (e.g. thinking-only turns or
pure tool-call turns) appeared blank in the UI. Fall back to a tool
call summary, thinking text, or '[no content]' marker so reviewers
always see something useful.
- Top-level assistant turns had no 'inputs' set, so the UI couldn't
show what prompt triggered them. Surface the user's prompt as the
step input for top-level turns (subagent turns are driven by their
parent's Agent tool call, not a user prompt).
- The raw assistant message content was never serialized. Set the
ChatCompletionStep's raw_output field with the full block array so
reviewers can inspect every text/thinking/tool_use block.
- ToolUseBlocks were captured by id only; widen to { id, name, input }
so tool calls are inspectable from the assistant turn.
- Root step had no rawOutput surface; stash a JSON-serialized
ResultMessage in metadata.rawOutput.
- promptTokens / completionTokens were stuck in metadata under
snake_case keys that the trace UI doesn't render. Add camelCase
aliases so the Metrics 'Prompt' and 'Completion' boxes populate.
- Capture state.model from SystemMessage(init) and surface it on the
root step so reviewers see the resolved model.
- Capture state.user_prompt so assistant turn steps can show the
triggering prompt as their input.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a richer example that exercises every step type the Openlayer
wrapper captures, so users can see a full trace tree:
- Root AGENT step with system_prompt, agent_config, agents_defined,
options, rawOutput.
- Per-turn CHAT_COMPLETION steps with prompt/completion tokens,
thinking blocks, tool_calls, raw_output.
- TOOL steps for:
- mcp__file-stats__count_files (custom in-process MCP tool)
- Glob and Read (built-in)
- Agent (twice: code-reviewer and summary-writer subagents)
- Nested subagent steps correlated via parent_tool_use_id.
The 'codebase analyzer' scenario walks the agent through three steps:
count files by extension, dispatch a code-reviewer subagent to review
one file, then dispatch a summary-writer subagent to wrap up. Result
is a 4-line markdown report.
Verified end-to-end against the live API as a plain Python script
(5 turns, real subagent dispatch, real Openlayer trace upload).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First-party Openlayer tracing for the Claude Agent SDK (
claude-agent-sdkon PyPI).trace_claude_agent_sdk()one-shot init monkey-patchesclaude_agent_sdk.queryandClaudeSDKClientso every subsequent call auto-traces, no per-call code change required.traced_query()per-call wrapper for explicit scoping.PreToolUse/PostToolUse/PostToolUseFailure); we never replace.AGENTstep perquery(), nestedCHAT_COMPLETIONper assistant turn,TOOLper tool call. Subagents nest viaparent_tool_use_id. MCP tools parsed (mcp__server__tool). Cost/tokens/duration/session_id/agent_configon root. Stream is a pure observer — identical messages in identical order.Companion TypeScript PR at openlayer-ai/openlayer-ts#208 (OPEN-10635).
Test plan
pytest tests/integrations/test_claude_agent_sdk.py -vANTHROPIC_API_KEY)examples/tracing/claude_agent_sdk/claude_agent_sdk_tracing.ipynbruns end-to-endruffclean on touched filesPlan deviations (documented in individual commits)
_upload_and_publish_trace, not_publish_trace_asyncas initially planned.AgentStephas nocost/tokensfields — values land inroot_step.metadataand surface at trace level viapost_process_tracemetadata spreading._start_step_manually/_end_step_manually; used a context-manager-handle fallback.Closes OPEN-10634.
🤖 Generated with Claude Code