Skip to content

feat(tools): queue hosted-key tool calls instead of failing with 429#4416

Draft
TheodoreSpeaks wants to merge 2 commits intostagingfrom
feat/queued-hosted-key
Draft

feat(tools): queue hosted-key tool calls instead of failing with 429#4416
TheodoreSpeaks wants to merge 2 commits intostagingfrom
feat/queued-hosted-key

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented May 3, 2026

Summary

  • Hosted-key tool calls (Sim-provided keys, not BYOK) now enqueue onto a per-workspace+provider FIFO queue. Only the head of the queue consumes from the token bucket — strict ordering, no racing.
  • Different workspaces have independent queues. BYOK paths short-circuit before any of this and are unaffected.
  • Total wait (queue position + bucket refill) capped at 5 minutes; over the cap returns the existing 429 result.
  • Crash-tolerant: each ticket has a heartbeat key (TTL 30s, refreshed every 10s while waiting). Dead heads are reaped lazily by the next caller. Queue list TTL is 10 minutes for fully abandoned queues.
  • One Lua script per poll (reap + head-check + self-presence-check atomic) keeps Redis traffic low under contention.
  • Bump Exa search hosted RPM from 5 → 60.
  • New telemetry: platform.hosted_key.queue_waited (with queuePosition field) and platform.hosted_key.queue_wait_exceeded.

Type of Change

  • New feature

Testing

  • 39 hosted-key tests pass (15 queue + 24 rate-limiter, including FIFO ordering, head-only consume, dead-head reap, cap-exceeded, missing-ticket fall-through)
  • 141/141 across rate-limiter + tools regression
  • Manually verified in dev: depth, head rotation, heartbeat refresh, drain rate match the bucket config
  • bun run lint clean
  • bun run check:api-validation:strict passes

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 5, 2026 0:30am

Request Review

Replace the per-call distributed lock with a Redis-backed FIFO queue so
callers within a workspace get strict ordering instead of racing the
bucket. Adds heartbeat-based crash recovery and dead-head reaping in a
single Lua script. Bumps Exa search hosted RPM from 5 to 60.
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@BugBot review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 5, 2026

PR Summary

Medium Risk
Introduces blocking FIFO queueing and wait loops around hosted-key acquisition (Redis + polling/heartbeats) and changes tool retry behavior to re-enter the queue after upstream 429s, which could impact throughput/latency and failure modes under contention or Redis issues.

Overview
Hosted-key acquisition is changed from immediate token-bucket racing/429s to a per-workspace+provider FIFO queue: callers enqueue, wait until they reach the head (with heartbeat refresh), then wait for actor and (custom) dimension capacity up to a 5-minute cap before returning the existing 429-style error.

Adds a new Redis-backed HostedKeyQueue (Lua-based checkHead with dead-head reaping, TTLs, and fail-open behavior when Redis is unavailable) plus new telemetry events platform.hosted_key.queue_waited and platform.hosted_key.queue_wait_exceeded.

Tool execution now optionally re-acquires a hosted key and retries once after upstream 429 backoff is exhausted, and Exa search hosted RPM is increased from 5 to 60; tests are expanded to cover queue ordering, heartbeat, cap timeouts, and wait-then-succeed flows.

Reviewed by Cursor Bugbot for commit 0b80ed3. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0b80ed3. Configure here.

}

const sleepMs = Math.max(MIN_QUEUE_RETRY_DELAY_MS, result.retryAfterMs)
await sleep(sleepMs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleep duration can exceed heartbeat TTL, breaking FIFO

Medium Severity

In waitForActorCapacity and waitForDimensionCapacity, the sleep duration is Math.max(MIN_QUEUE_RETRY_DELAY_MS, result.retryAfterMs). The token bucket's retryAfterMs can be up to ~60 seconds (equal to refillIntervalMs), but TICKET_HEARTBEAT_TTL_SECONDS is only 30 seconds. The heartbeat is refreshed before the sleep, so during a 60-second sleep the heartbeat expires at the 30-second mark. Another caller's checkHead Lua script then reaps the "dead" ticket, allowing a second caller to simultaneously act as queue head — breaking the strict FIFO ordering guarantee that the queue is designed to provide.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0b80ed3. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant