Add unified CUA template with multi-provider fallback by masnwilliams · Pull Request #143 · kernel/cli

masnwilliams · 2026-04-08T18:42:43Z

Summary

Adds a new cua template (TypeScript + Python) that consolidates the separate anthropic-computer-use, openai-computer-use, and gemini-computer-use templates into a single multi-provider template
Provider selection via CUA_PROVIDER env var, with automatic fallback via CUA_FALLBACK_PROVIDERS
Each provider adapter is self-contained with full agent loop implementation
Shared browser session lifecycle with replay recording support
Registered as a new template in templates.go for both TypeScript and Python

Structure

pkg/templates/{typescript,python}/cua/
  index.ts / main.py          — Kernel app entrypoint
  session.ts / session.py     — Browser session lifecycle
  providers/
    index.ts / __init__.py    — Provider factory + fallback
    anthropic.ts / .py        — Anthropic Claude adapter
    openai.ts / .py           — OpenAI GPT adapter
    gemini.ts / .py           — Google Gemini adapter

Test plan

go build ./... passes
go test ./pkg/create/... passes
kernel create shows "Unified CUA" template for both TS and Python
Deploy TS template with Anthropic provider and run a task
Deploy TS template with OpenAI provider and run a task
Deploy TS template with Gemini provider and run a task
Test fallback by setting an invalid primary key with valid fallback
Repeat above for Python template

🤖 Generated with Claude Code

Note

Medium Risk
Adds a sizable new template with provider-selection/fallback logic and browser session lifecycle/replay handling, which may affect new-user flows and external API integrations. Existing templates are mostly untouched aside from template registry/sorting updates.

Overview
Adds a new cua (“Unified CUA”) template for both TypeScript and Python that runs a computer-use agent against Anthropic/OpenAI/Gemini with CUA_PROVIDER selection and optional CUA_FALLBACK_PROVIDERS automatic fallback.

Registers the new template in pkg/create/templates.go (including deploy/invoke samples and template ordering), and introduces new template projects under pkg/templates/{typescript,python}/cua with provider-specific adapters, a shared browser session manager (including optional replay recording), and accompanying .env.example/README/dependency files.

^{Reviewed by Cursor Bugbot for commit 51b69fb. Bugbot is set up for automated code reviews on this repo. Configure here.}

kernel-internal · 2026-04-08T18:46:58Z

🔧 CI Fix Available
I've pushed a fix for the CI failure by adding the missing _gitignore files for the new CUA templates.

👉 Click here to create a PR with the fix

Consolidates the separate anthropic-computer-use, openai-computer-use, and gemini-computer-use templates into a single "cua" template that supports all three providers with automatic fallback. - TypeScript and Python templates with identical structure - Provider selection via CUA_PROVIDER env var - Optional fallback chain via CUA_FALLBACK_PROVIDERS - Shared browser session lifecycle with replay support - Each provider adapter is self-contained and customizable - Registered as "cua" template in templates.go Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provider resolution at module load crashes during Hypeman's build/discovery phase when env vars aren't available. Use lazy initialization so providers are resolved on first invocation instead. Also fix TS type errors: narrow candidate.content in Gemini provider, cast input items in OpenAI provider, simplify computer_call_output construction. Made-with: Cursor

…odel inputs - Bump all TS and Python deps to latest versions - Fix Anthropic computer use: use computer_20251124 with computer-use-2025-11-24 beta flag (claude-sonnet-4-6 requires the newer tool version) - Fix OpenAI: add missing screenshot action handler - Fix Python: correct SDK API (kernel.App), fix session.delete call, add missing openai dependency - Restore provider and model as per-request payload overrides (were dropped in rewrite). Provider uses a typed enum (anthropic | openai | gemini). Made-with: Cursor

… API, session delete - Add missing screenshot action handler in Python OpenAI provider - Use Part.from_function_response() instead of FunctionResponsePart() in Python Gemini provider (pydantic extra_forbidden in google-genai >=1.71) - Fix session cleanup: use delete_by_id() instead of delete() Made-with: Cursor

…n Gemini - TS Anthropic: move SYSTEM_PROMPT to getSystemPrompt() function so the date is computed per-request instead of freezing at module load - Python Gemini: include screenshot data as inline_data Part alongside function responses so the model can see action results - Remove unused PREDEFINED_ACTIONS list from Python Gemini Made-with: Cursor

…eout) Add optional `browser` field to CUA payload for per-request browser session configuration. Supports proxy_id, profile (id/name/save_changes), extensions, and timeout_seconds. Viewport and stealth remain deploy-time defaults since CUA providers depend on consistent viewport dimensions. Made-with: Cursor

When session_id is provided in the payload, the CUA task uses that existing browser session directly instead of creating a new one. The caller is responsible for the session lifecycle. This lets users pre-configure browsers with any settings and reuse sessions across tasks. Made-with: Cursor

When an external session_id is provided, retrieve the browser's real viewport dimensions via browsers.retrieve() instead of hardcoding 1280x800. This ensures coordinate mapping is correct regardless of how the browser was created. Made-with: Cursor

dprevoznik · 2026-04-09T21:29:14Z

+// Shared interface every provider adapter must implement.
+export interface TaskOptions {
+  query: string;
+  model?: string;


Each provider hardcodes model-specific API features that will break if you swap in a different model. Sharing Cursor's analysis, which is pretty aligned with what i experienced building other templates. I think it would be smart to at least have in the template which models from which providers are compatible. I don't think we need to lock it down to specific models, particularly if we have defaults.

Cursor Reco
The model field should either be removed from the public payload (keep it as an env var only) or the template should validate model compatibility per provider before calling the API. At minimum, the README and input description should warn that only specific computer-use-capable models work. Right now a user seeing that free-text field in the dashboard will absolutely try claude-haiku-3-5 or gpt-4o and get a confusing 400 error.

dprevoznik · 2026-04-09T21:45:46Z

+          break;
+        }
+        case 'scroll_document':
+        case 'scroll_at': {


The Gemini provider in both ts-cua and python-cua is not likely optimized for scroll behavior. In the standalone templates, we have logic in place (though not perfect) to handle the fact that Gemini reports back a magnitude as the value in pixels.

Cursor summary of issue:
It's missing the magnitude ÷ 60 pixel-to-notch conversion and the max(1, min(17, ...)) clamp. When Gemini asks to scroll with its default magnitude (~400 pixels), the unified template will fire 400 wheel notches instead of 7. Both TS and Python have the same bug — they're consistent with each other, but both wrong compared to the standalone template.

Getting API exhaustion errors with gemini right now, but wanted to surface this. I put in the % 60 and clamp logic in the standalone templates. It likely could be improved

The behavior in anthropic and openai providers matches the standalone templates right now.

dprevoznik

Left some comments. All three providers working on both ts and python versions. Though Gemini api exhaustion error made it so I couldn't test the scroll logic comment I made.

Prefer PageUp and PageDown in the provider prompts so long-page navigation is more reliable across the unified CUA templates. Made-with: Cursor

Bring in main branch template updates and resolve the registry overlap by keeping both the unified CUA and Tzafon template entries. Made-with: Cursor

socket-security · 2026-04-13T22:16:21Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	npm/@onkernel/sdk@0.47.0
	npm/@anthropic-ai/sdk@0.86.1
	npm/openai@6.36.0
	pypi/anthropic@0.99.0

View full report

dprevoznik

Other models look good. Everything working.

Gemini feedback:

System-prompt level Scroll behavior change makes sense. We may still want to change the default scroll action handler logic as I suggested since if the model chooses to use regular scroll it will be large jumps currently. Thoughts?

## Summary Stacks on top of the unified CUA template branch. Fixes the Gemini scroll handler bug Danny flagged in review. Gemini's computer-use API reports scroll `magnitude` in **pixels** (default ~400), but `computer.scroll`'s `delta_x` / `delta_y` expects **wheel notches**. The cua adapter was passing `magnitude` through unchanged, so a default Gemini scroll fired ~400 notches instead of ~7. The standalone `gemini-computer-use` template already does the right thing — this just brings the unified adapter in line: - default magnitude: `3` → `400` (pixels, matching Gemini's spec) - divide by `PX_PER_NOTCH` (60) and clamp to `MAX_NOTCHES_PER_ACTION` (17) - applied symmetrically in TS (`providers/gemini.ts`) and Python (`providers/gemini.py`) The `anthropic` and `openai` adapters already match their standalone equivalents — no changes needed there. ## Test plan - [ ] `go build ./...` passes (verified locally) - [ ] `go test ./pkg/create/...` passes (verified locally) - [ ] Deploy CUA template with Gemini provider, ask it to scroll a long page; confirm scroll distance is page-sized, not catastrophic - [ ] Repeat for Python template Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…155) ## Summary Stacks on top of the unified CUA template branch. Addresses three of the outstanding Cursor Bugbot findings on PR #143. ### Gemini argument-name mismatches Both bugs cause the affected actions to be silently/structurally broken — fixed in TS and Python in lockstep with the standalone `gemini-computer-use` templates. - **`key_combination`** read from `args.key_combination` → now reads `args.keys`. Gemini's schema sends the combo in `keys` (a single `+`-joined string), so the previous code always saw an empty combo and silently dropped every shortcut. - **`drag_and_drop`** read from `start_x / start_y / end_x / end_y` → now reads `x / y / destination_x / destination_y`. Gemini's schema uses the latter; the previous names always resolved to `0`, so every drag went `(0,0) -> (0,0)`. ### Session cleanup ordering (TS + Python) `session.stop()` previously placed the state-reset (`_sessionId = null`, …) **after** the `try / finally` that performs replay-stop + browser-delete. If `stopReplay()` threw, the `finally` deleted the browser, the exception then propagated past the cleanup lines, and a follow-up `stop()` from the caller's error path would attempt to delete the already-destroyed session — masking the original error. Moved the state-reset into the `finally`, so a second `stop()` is a safe no-op regardless of how the first attempt unwound. ### Bugbot findings I checked but did *not* change These are flagged on the PR but already fixed in the current branch (probably resolved between scans): - Anthropic system prompt date freezing — already a `getSystemPrompt()` function that reads `new Date()` per task. - Python `session.py` using `browsers.delete` — already uses `browsers.delete_by_id`. - Python OpenAI provider missing `screenshot` action — already returns `[]` for `screenshot`. - Python Gemini dropping screenshots — already appends an `inline_data` `Part` per response when `result["screenshot"]` is set. - `PREDEFINED_ACTIONS` unused — leftover constant; harmless. Left as-is to keep the diff focused. ## Test plan - [x] `go build ./...` - [x] `go test ./pkg/create/...` - [ ] Deploy CUA template with Gemini provider, ask it to perform a drag, then a keyboard shortcut (e.g. ctrl+a, ctrl+l) — confirm both succeed - [ ] Force a replay-stop failure (e.g. invalid replay state) and confirm session.stop() can be called twice without crashing  --- > [!NOTE] > [Cursor Bugbot](https://cursor.com/bugbot) is generating a summary for commit 1c6b554. Configure [here](https://www.cursor.com/dashboard/bugbot).  Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary Stacks on top of the unified CUA template branch. Addresses the new High-Severity Cursor Bugbot finding on PR #143's latest scan: `stop()` throws on repeated call instead of being a no-op. ## Background PR #155 moved the session-state reset (`_sessionId = null`, …) into the `finally` so that a thrown replay-stop or browser-delete error wouldn't leave stale state behind. That fix was correct, but it exposed a latent bug: `stop()` opens with `const info = this.info` (TS) / `info = self.info` (Py), and the `info` getter delegates to the `sessionId` property, which raises when `_sessionId` is null. So once PR #155 reliably cleared `_sessionId` on the first call, the caller's error-path retry would hit the throwing getter and mask the original exception — exactly the failure mode PR #155 was meant to prevent. ## Fix `stop()` now: 1. Short-circuits at the top with a sentinel `SessionInfo` when no session is active — never touches the throwing getter. 2. Builds the live-session `info` from local fields directly so the body never depends on `this.info` / `self.info` either. Symmetric in TS (`session.ts`) and Python (`session.py`). ## Test plan - [x] `go build ./...` - [x] `go test ./pkg/create/...` - [ ] Force a replay-stop failure (e.g. delete the replay out-of-band, or pass an invalid replay id) and confirm calling `session.stop()` twice from the caller's error path no longer raises and the original error surfaces.  --- > [!NOTE] > **Medium Risk** > Touches session shutdown/cleanup logic in both Python and TypeScript templates; while small, mistakes could lead to leaked browser sessions or masked errors during teardown. > > **Overview** > `KernelBrowserSession.stop()` in both `pkg/templates/python/cua/session.py` and `pkg/templates/typescript/cua/session.ts` is updated to be **idempotent**. > > It now short-circuits when no active session exists (returning a sentinel `SessionInfo` without reading `info`/`sessionId`), and builds the returned `SessionInfo` directly from internal fields so teardown can’t fail due to accessing a getter after state has been cleared. > > Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit db0bdea. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).  Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit c684ca7. Configure here.}

…used openai dep (#157) ## Summary Two bugbot findings on commit \`c684ca7\`: 1. **Medium** — Python Gemini provider sent screenshots as a separate \`Part(inline_data=...)\` entry in the user content after the \`FunctionResponse\` part. With multiple function calls per turn the model can't bind a screenshot to its originating call. The standalone \`python/gemini-computer-use\` template and the TS unified template both nest the screenshot as a \`FunctionResponsePart\` inside \`FunctionResponse.parts\`. This PR matches that structure and adds the predefined-actions allowlist that gates screenshot inclusion. 2. **Low** — \`openai\` was listed in \`pyproject.toml\` but never imported. The OpenAI provider uses raw \`httpx\` against the Responses API. Removed. ## Test plan - [ ] Smoke run python cua with Gemini against a multi-call turn and confirm screenshot binds to the originating call - [ ] \`uv sync\` after dep change  --- > [!NOTE] > **Medium Risk** > Moderate risk because it changes the structure of Gemini tool-call response parts, which could affect how multi-call turns are interpreted by the model or SDK. Dependency removal is low risk but may impact downstream installs if they relied on the extra package. > > **Overview** > **Gemini Python CUA now nests screenshots inside each tool call response.** Instead of sending a standalone `Part(inline_data=...)` after the `FunctionResponse`, screenshots are attached as `FunctionResponse.parts` (as `FunctionResponsePart`/`FunctionResponseBlob`) so multi-call turns can reliably associate images with the correct action; screenshot inclusion is gated by a `PREDEFINED_ACTIONS` allowlist. > > **Template deps cleanup.** Removes the unused `openai` dependency from `pyproject.toml`. > > Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit ee48a5c. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).

masnwilliams marked this pull request as ready for review April 8, 2026 20:09

masnwilliams force-pushed the hypeship/unified-cua-template branch from 99891de to 73255f9 Compare April 8, 2026 20:16

cursor Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread pkg/templates/typescript/cua/providers/anthropic.ts

Comment thread pkg/templates/python/cua/providers/gemini.py Outdated

Comment thread pkg/templates/python/cua/session.py Outdated

Comment thread pkg/templates/python/cua/providers/gemini.py Outdated

masnwilliams added 2 commits April 8, 2026 16:32

cursor Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread pkg/templates/python/cua/providers/openai.py

masnwilliams requested a review from dprevoznik April 8, 2026 22:14

masnwilliams added 3 commits April 8, 2026 21:54

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread pkg/templates/typescript/cua/session.ts

dprevoznik reviewed Apr 9, 2026

View reviewed changes

fix: guide CUA templates toward page scrolling

f9cb9c7

Prefer PageUp and PageDown in the provider prompts so long-page navigation is more reliable across the unified CUA templates. Made-with: Cursor

masnwilliams requested a review from dprevoznik April 13, 2026 22:00

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread pkg/templates/typescript/cua/providers/gemini.ts

Merge origin/main into hypeship/unified-cua-template

143a6eb

Bring in main branch template updates and resolve the registry overlap by keeping both the unified CUA and Tzafon template entries. Made-with: Cursor

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread pkg/templates/typescript/cua/providers/gemini.ts

Comment thread pkg/templates/typescript/cua/providers/gemini.ts

dprevoznik approved these changes Apr 16, 2026

View reviewed changes

masnwilliams and others added 2 commits May 1, 2026 15:56

Merge branch 'main' into hypeship/unified-cua-template

822ebd0

masnwilliams mentioned this pull request May 2, 2026

fix(cua): correct Gemini key/drag arg names + safer session cleanup #155

Merged

4 tasks

cursor Bot reviewed May 4, 2026

View reviewed changes

Comment thread pkg/templates/typescript/cua/session.ts

masnwilliams mentioned this pull request May 5, 2026

fix(cua): make session.stop() idempotent on repeated call #156

Merged

3 tasks

cursor Bot reviewed May 5, 2026

View reviewed changes

Comment thread pkg/templates/python/cua/providers/gemini.py Outdated

Comment thread pkg/templates/python/cua/pyproject.toml Outdated

Conversation

masnwilliams commented Apr 8, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Structure

Test plan

Uh oh!

kernel-internal Bot commented Apr 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dprevoznik Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

dprevoznik Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

dprevoznik Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

dprevoznik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

socket-security Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dprevoznik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

masnwilliams commented Apr 8, 2026 •

edited by cursor Bot

Loading

socket-security Bot commented Apr 13, 2026 •

edited

Loading