Skip to content

fix: raise ResourceNotExistError in delete_sandbox when data plane returns "sandbox not found"#89

Merged
OhYee merged 3 commits intomainfrom
copilot/fix-terminated-sandbox-issue
May 6, 2026
Merged

fix: raise ResourceNotExistError in delete_sandbox when data plane returns "sandbox not found"#89
OhYee merged 3 commits intomainfrom
copilot/fix-terminated-sandbox-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 20, 2026

Control plane (list_sandboxes) and data plane (delete_sandbox) can diverge: a TERMINATED sandbox remains visible in list results, but the data plane has already removed it, causing delete_sandbox to raise ClientError('Failed to stop sandbox: sandbox not found') instead of a recognisable "not found" signal.

Changes

  • agentrun/sandbox/client.py — In both delete_sandbox and delete_sandbox_async, when the data plane returns a non-SUCCESS response with a message containing "sandbox not found", raise ResourceNotExistError("Sandbox", sandbox_id) instead of ClientError. This is consistent with the existing HTTP 404 path and avoids returning an incomplete Sandbox object. Updated docstrings document that ResourceNotExistError covers both HTTP 404 and data-plane business-level not-found. A code comment tracks the long-term goal of switching to a stable server-side error_code (e.g. SandboxNotFound).

  • tests/unittests/sandbox/test_client.py — Updated tests to assert ResourceNotExistError is raised for the data-plane not-found case; added boundary and negative tests: case-insensitive message variants, unrelated failure messages still raise ClientError, and empty message raises ClientError.

Before / After

# Before — raises generic ClientError on already-gone TERMINATED sandbox
client.delete_sandbox("91d970881b8d4dcaa5636df64577545b--18ee62652b")
# ClientError: Failed to stop sandbox: sandbox not found

# After — raises ResourceNotExistError (consistent with HTTP 404 path)
# Callers can implement idempotent deletion with a simple except clause:
try:
    client.delete_sandbox("91d970881b8d4dcaa5636df64577545b--18ee62652b")
except ResourceNotExistError:
    pass  # already gone — idempotent

Copilot AI changed the title [WIP] Fix list_sandboxes returning TERMINATED instances when deleting fix: make delete_sandbox idempotent when data plane returns "sandbox not found" Apr 20, 2026
Copilot AI requested a review from OhYee April 20, 2026 16:08
@OhYee
Copy link
Copy Markdown
Member

OhYee commented May 6, 2026

@copilot 修复方向正确(解决 issue #83 的 TERMINATED 残留场景),但建议对实现做一个关键调整。

问题:当前实现把 not-found 静默吞为成功

if "sandbox not found" in message.lower():
    return Sandbox.model_validate(
        {"sandboxId": sandbox_id}, by_alias=True
    )

这有三个隐患:

  1. 吞掉真错误:用户传错 sandbox_id(笔误、跨账号、复制错)时也会"删除成功",bug 隐藏
  2. 返回的 Sandbox 残缺:只有 sandbox_id,其他字段(status/template_name 等)拿默认值,与正常成功响应 shape 不一致
  3. 行为分裂:HTTP 404 路径已经抛 ResourceNotExistError("Sandbox", id)(见 client.pyexcept ClientError as e: if e.status_code == 404),业务码 not-found 静默成功,同语义两种返回

建议:改抛 ResourceNotExistError,与 HTTP 404 路径统一

代码改动(sync + async 两处)

                 message = result.get("message", "")
                 if "sandbox not found" in message.lower():
-                    return Sandbox.model_validate(
-                        {"sandboxId": sandbox_id}, by_alias=True
-                    )
+                    raise ResourceNotExistError("Sandbox", sandbox_id)
                 raise ClientError(
                     status_code=0,
                     message=(
                         "Failed to stop sandbox:"
                         f" {message or 'Unknown error'}"
                     ),
                 )

调用方写法(依然能实现幂等)

try:
    client.delete_sandbox(sid)
except ResourceNotExistError:
    pass  # 已经不在了,幂等达成

优点

还需要做的事

1. 更新 docstring(sync + async)

Raises: 段补充说明:

Raises:
    ResourceNotExistError: Sandbox 不存在(包括 HTTP 404 与数据面业务层 not-found
        两种情形)。调用方可 catch 此异常实现幂等删除。
    ClientError: 客户端错误
    ServerError: 服务器错误

2. 改造现有测试

test_delete_sandbox_not_found_in_response_is_idempotent 改为断言抛 ResourceNotExistError

def test_delete_sandbox_not_found_in_response_raises_resource_not_exist(...):
    mock_data_api.delete_sandbox.return_value = {
        "code": "FAILED",
        "message": "sandbox not found",
    }
    client = SandboxClient()
    with pytest.raises(ResourceNotExistError):
        client.delete_sandbox("sandbox-123")

async 版同理。

3. 补负样本与边界测试

  • 负样本:message 含 "sandbox not running""sandbox is busy" 等其他文本,应仍抛 ClientError(不该被误吞)
  • 大小写:message 为 "Sandbox NOT FOUND""SANDBOX Not Found" 等变体,验证 .lower() 工作正常
  • 边界:message 为空字符串或字段缺失时,应抛 ClientError(不该匹配)

4. 转 Ready for review

base 已能 cleanly merge 到当前 main(含 #95),无需 rebase;merge 后 71 个 sandbox client 测试本地全过。


关于服务端

字符串匹配 "sandbox not found" in message.lower() 是技术债——服务端文案变更即失效。建议在 PR 描述或 issue #83显式跟踪:"长期方案是服务端返回稳定 error_code(如 SandboxNotFound),SDK 切换到按 code 判断"。目前 SDK 字符串补丁是合理的短期容错。

修复以上 4 项后即可合入。

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 6, 2026

@copilot 修复方向正确(解决 issue #83 的 TERMINATED 残留场景),但建议对实现做一个关键调整。

问题:当前实现把 not-found 静默吞为成功...

Done in eec1443. Changes made:

  1. client.py (sync + async) — replaced the stub-return with raise ResourceNotExistError("Sandbox", sandbox_id), aligning the data-plane not-found path with the existing HTTP 404 path. Updated docstrings to document both cases under Raises:, and added a comment noting the long-term plan of switching to a stable server-side error_code.

  2. Tests — renamed/rewrote the idempotent tests to assert ResourceNotExistError; added:

    • case-insensitive variant ("Sandbox NOT FOUND"ResourceNotExistError)
    • negative sample ("sandbox is busy"ClientError)
    • empty message edge case (""ClientError)
    • async equivalents for failure/not-found paths

All 11 TestSandboxClientDeleteSandbox tests pass; type check clean (326 files, 0 errors).

Copilot AI changed the title fix: make delete_sandbox idempotent when data plane returns "sandbox not found" fix: raise ResourceNotExistError in delete_sandbox when data plane returns "sandbox not found" May 6, 2026
Copilot finished work on behalf of OhYee May 6, 2026 10:08
@OhYee OhYee marked this pull request as ready for review May 6, 2026 11:07
Copilot AI review requested due to automatic review settings May 6, 2026 11:07
@OhYee OhYee merged commit eee4e8b into main May 6, 2026
3 checks passed
@OhYee OhYee deleted the copilot/fix-terminated-sandbox-issue branch May 6, 2026 11:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes SandboxClient.delete_sandbox() / delete_sandbox_async() raise ResourceNotExistError("Sandbox", sandbox_id) when the data plane returns a business-level failure containing “sandbox not found”, aligning it with the existing HTTP 404 behavior so callers can implement idempotent deletion reliably.

Changes:

  • Detect data-plane “sandbox not found” failures (case-insensitive) and raise ResourceNotExistError instead of a generic ClientError (sync + async).
  • Update method docstrings to clarify ResourceNotExistError covers both HTTP 404 and data-plane not-found.
  • Extend unit tests to cover not-found message variants and ensure unrelated failures still raise ClientError.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
agentrun/sandbox/client.py Adds message-based not-found detection in delete APIs and updates docstrings/error handling.
tests/unittests/sandbox/test_client.py Adds/updates unit tests for data-plane not-found vs other failure messages (sync + async).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 734 to +746
if result.get("code") != "SUCCESS":
message = result.get("message", "")
# 数据面报告 sandbox 不存在时,与 HTTP 404 路径保持一致,
# 统一抛出 ResourceNotExistError,方便调用方幂等处理。
# When the data plane reports sandbox not found, raise
# ResourceNotExistError for consistency with the HTTP 404 path.
# Callers can catch ResourceNotExistError to implement idempotent
# deletion (e.g. when TERMINATED instances still appear in list
# results but have already been removed from the data plane).
# Note: long-term the server should return a stable error_code
# (e.g. SandboxNotFound) so the SDK can match on that instead
# of a message string.
if "sandbox not found" in message.lower():
Comment on lines 734 to +747
if result.get("code") != "SUCCESS":
message = result.get("message", "")
# 数据面报告 sandbox 不存在时,与 HTTP 404 路径保持一致,
# 统一抛出 ResourceNotExistError,方便调用方幂等处理。
# When the data plane reports sandbox not found, raise
# ResourceNotExistError for consistency with the HTTP 404 path.
# Callers can catch ResourceNotExistError to implement idempotent
# deletion (e.g. when TERMINATED instances still appear in list
# results but have already been removed from the data plane).
# Note: long-term the server should return a stable error_code
# (e.g. SandboxNotFound) so the SDK can match on that instead
# of a message string.
if "sandbox not found" in message.lower():
raise ResourceNotExistError("Sandbox", sandbox_id)
Comment on lines 791 to +804
if result.get("code") != "SUCCESS":
message = result.get("message", "")
# 数据面报告 sandbox 不存在时,与 HTTP 404 路径保持一致,
# 统一抛出 ResourceNotExistError,方便调用方幂等处理。
# When the data plane reports sandbox not found, raise
# ResourceNotExistError for consistency with the HTTP 404 path.
# Callers can catch ResourceNotExistError to implement idempotent
# deletion (e.g. when TERMINATED instances still appear in list
# results but have already been removed from the data plane).
# Note: long-term the server should return a stable error_code
# (e.g. SandboxNotFound) so the SDK can match on that instead
# of a message string.
if "sandbox not found" in message.lower():
raise ResourceNotExistError("Sandbox", sandbox_id)
raise ClientError(
status_code=0,
message=(
"Failed to stop sandbox:"
Comment on lines 716 to +724
Returns:
Sandbox: 停止后的 Sandbox 对象

Raises:
ResourceNotExistError: Sandbox 不存在
ResourceNotExistError: Sandbox 不存在(包括 HTTP 404 与数据面业务层
not-found 两种情形)。调用方可 catch 此异常实现幂等删除。
Sandbox does not exist (covers both HTTP 404 and data-plane
business-level not-found). Callers can catch this exception
for idempotent delete logic.
Comment on lines +871 to +884
def test_delete_sandbox_empty_message_raises_client_error(
self, mock_data_api_class, mock_control_api_class
):
"""message 为空时不应误触 not-found 逻辑,应抛 ClientError。"""
mock_data_api = MagicMock()
mock_data_api.delete_sandbox.return_value = {
"code": "FAILED",
"message": "",
}
mock_data_api_class.return_value = mock_data_api

client = SandboxClient()
with pytest.raises(ClientError, match="Failed to stop sandbox"):
client.delete_sandbox("sandbox-123")
Comment on lines 772 to +781

Returns:
Sandbox: 停止后的 Sandbox 对象

Raises:
ResourceNotExistError: Sandbox 不存在
ResourceNotExistError: Sandbox 不存在(包括 HTTP 404 与数据面业务层
not-found 两种情形)。调用方可 catch 此异常实现幂等删除。
Sandbox does not exist (covers both HTTP 404 and data-plane
business-level not-found). Callers can catch this exception
for idempotent delete logic.
raise ClientError(
status_code=0,
message=(
"Failed to stop sandbox:"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

【bug】AgentRun Python SDK 中 list_sandboxes 仍能查到 TERMINATED 实例,但 delete_sandbox 删除同一 sandbox_id 时返回 sandbox not found

3 participants