The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.
-
Updated
May 14, 2026 - Python
The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.
🔍 AI observability skill for Claude Code. Debug LangChain/LangGraph agents by fetching execution traces from LangSmith Studio directly in your terminal.
Local open-source dev tool to debug, secure, and evaluate LLM agents. Provides static analysis, dynamic security checks, and runtime monitoring - integrates with Cursor and Claude Code.
Cut your OpenClaw / ZeroClaw token bill. Find which model earns its cost. Prove whether optimizations actually work. Local, no upload.
Local replay debugger for Browser Use failures with screenshots, model I/O, failed-step timelines, and public-safe HTML exports.
Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.
A real-time observability and debugging layer for AI agents.
Visual debugging, tracing, and replay for agent workflows.
Explain why your agent failed — root-cause debugging, memory attribution, and run divergence for LLM agents.
ChainWatch is a flight data recorder for multi-step AI systems. It's a CLI-based tool that records every step in an AI decision chain, links them together in order, prevents tampering, and allows you to verify the chain's integrity and replay the full decision flow.
🔍 A beautiful web viewer for AI agent session files. Browse Claude Code & OpenClaw conversations with chat-style UI, timeline visualization, and zero setup.
RunLens helps teams compare and debug AI agent runs with step timelines, run diffs, and cost analysis.
Enforce communication discipline & execution hygiene for agent teams. Detect loops, route violations, stale work, and missing ownership.
Local recorder and replay verifier for AI-agent command runs.
Preprint paper package — Agent Trajectory Replay for Debugging Tool-Using AI Workflow Regressions (Zenodo DOI 10.5281/zenodo.20073574)
TDD for AI agents — watch world state morph step-by-step. Drop-in for Vercel AI SDK / Anthropic SDK / LangChain. Scrubbable trajectories + bulk grid view.
Android Agent Reliability Runtime A debugging and safety runtime for mobile GUI agents: detect readiness, block unsafe actions, verify progress, diagnose failures, and save reproducible traces.
MCP/tool call flight recorder | transparent STDIO proxy that logs every AI agent tool call for inspection, debugging, and research
Free self-serve diagnostic for AI coding agents (Claude Code, Cursor, Aider, Codex, custom Agent SDK). 32-rule library detects silent failures, deadlocks, runaway cost, prompt injection, hallucinated tool calls, frozen state, infinite loops, eval drift. Built by an autonomous AI agent.
Add a description, image, and links to the agent-debugging topic page so that developers can more easily learn about it.
To associate your repository with the agent-debugging topic, visit your repo's landing page and select "manage topics."