fix(cua): correct Gemini key/drag arg names + safer session cleanup#155
Conversation
- Gemini key_combination reads 'keys' (was: 'key_combination'), matching the standalone gemini-computer-use template and Gemini's actual schema. Without this, all keyboard shortcuts silently fail. - Gemini drag_and_drop reads x/y + destination_x/destination_y (was: start_x/start_y/end_x/end_y), matching Gemini's schema. Without this, every drag resolves to (0,0)->(0,0). - Session.stop() resets state inside finally so a thrown replay/delete error doesn't leave _sessionId set, which would otherwise cause a follow-up stop() in the caller's error path to delete the already- destroyed browser and mask the original error. Applied symmetrically in TS and Python. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR modifies CUA (Cursor UI Agent) template code, not kernel API endpoints or Temporal workflows in packages/api/ To monitor this PR anyway, reply with |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1c6b554. Configure here.
| const info = this.info; | ||
|
|
||
| if (this._sessionId) { | ||
| const sessionId = this._sessionId; |
There was a problem hiding this comment.
Second stop() call throws instead of being no-op
Medium Severity
The stop() method calls this.info (TS) / self.info (Python) unconditionally at the top, before the if (this._sessionId) guard. The info getter delegates to the sessionId getter, which throws when _sessionId is null. After the newly-added state reset in finally sets _sessionId to null, a follow-up stop() call from the caller's error path will throw "Session not started" instead of being the safe no-op described in the comment and PR description. The if guard that makes it a no-op is never reached.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 1c6b554. Configure here.
## Summary Stacks on top of the unified CUA template branch. Addresses the new High-Severity Cursor Bugbot finding on PR #143's latest scan: `stop()` throws on repeated call instead of being a no-op. ## Background PR #155 moved the session-state reset (`_sessionId = null`, …) into the `finally` so that a thrown replay-stop or browser-delete error wouldn't leave stale state behind. That fix was correct, but it exposed a latent bug: `stop()` opens with `const info = this.info` (TS) / `info = self.info` (Py), and the `info` getter delegates to the `sessionId` property, which raises when `_sessionId` is null. So once PR #155 reliably cleared `_sessionId` on the first call, the caller's error-path retry would hit the throwing getter and mask the original exception — exactly the failure mode PR #155 was meant to prevent. ## Fix `stop()` now: 1. Short-circuits at the top with a sentinel `SessionInfo` when no session is active — never touches the throwing getter. 2. Builds the live-session `info` from local fields directly so the body never depends on `this.info` / `self.info` either. Symmetric in TS (`session.ts`) and Python (`session.py`). ## Test plan - [x] `go build ./...` - [x] `go test ./pkg/create/...` - [ ] Force a replay-stop failure (e.g. delete the replay out-of-band, or pass an invalid replay id) and confirm calling `session.stop()` twice from the caller's error path no longer raises and the original error surfaces. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Touches session shutdown/cleanup logic in both Python and TypeScript templates; while small, mistakes could lead to leaked browser sessions or masked errors during teardown. > > **Overview** > `KernelBrowserSession.stop()` in both `pkg/templates/python/cua/session.py` and `pkg/templates/typescript/cua/session.ts` is updated to be **idempotent**. > > It now short-circuits when no active session exists (returning a sentinel `SessionInfo` without reading `info`/`sessionId`), and builds the returned `SessionInfo` directly from internal fields so teardown can’t fail due to accessing a getter after state has been cleared. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit db0bdea. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>


Summary
Stacks on top of the unified CUA template branch. Addresses three of the outstanding Cursor Bugbot findings on PR #143.
Gemini argument-name mismatches
Both bugs cause the affected actions to be silently/structurally broken — fixed in TS and Python in lockstep with the standalone
gemini-computer-usetemplates.key_combinationread fromargs.key_combination→ now readsargs.keys. Gemini's schema sends the combo inkeys(a single+-joined string), so the previous code always saw an empty combo and silently dropped every shortcut.drag_and_dropread fromstart_x / start_y / end_x / end_y→ now readsx / y / destination_x / destination_y. Gemini's schema uses the latter; the previous names always resolved to0, so every drag went(0,0) -> (0,0).Session cleanup ordering (TS + Python)
session.stop()previously placed the state-reset (_sessionId = null, …) after thetry / finallythat performs replay-stop + browser-delete. IfstopReplay()threw, thefinallydeleted the browser, the exception then propagated past the cleanup lines, and a follow-upstop()from the caller's error path would attempt to delete the already-destroyed session — masking the original error.Moved the state-reset into the
finally, so a secondstop()is a safe no-op regardless of how the first attempt unwound.Bugbot findings I checked but did not change
These are flagged on the PR but already fixed in the current branch (probably resolved between scans):
getSystemPrompt()function that readsnew Date()per task.session.pyusingbrowsers.delete— already usesbrowsers.delete_by_id.screenshotaction — already returns[]forscreenshot.inline_dataPartper response whenresult["screenshot"]is set.PREDEFINED_ACTIONSunused — leftover constant; harmless. Left as-is to keep the diff focused.Test plan
go build ./...go test ./pkg/create/...Note
Medium Risk
Moderate risk: changes input parsing for Gemini actions and alters session teardown ordering, which could affect browser lifecycle and replay cleanup if edge cases were missed.
Overview
Fixes Gemini CUA action argument mismatches in both TS and Python so
key_combinationreads thekeys"+"-joined string anddrag_and_dropusesx/yanddestination_x/destination_y.Makes
KernelBrowserSession.stop()safer by resetting session/replay state inside thefinallyblock and deleting the browser using a capturedsessionId, preventing double-deletes when replay stopping throws.Reviewed by Cursor Bugbot for commit 1c6b554. Bugbot is set up for automated code reviews on this repo. Configure here.