fix(orchestrator): capture sidecar stderr + raise default boot timeout (closes #1)#2
Open
evannadeau wants to merge 1 commit intoSpawnBox-dev:mainfrom
Open
Conversation
closes SpawnBox-dev#1) Three changes to mcp/server.ts make the sidecar boot path self-diagnosing on slow connections and configurable for users who need more time: 1. Capture spawned-sidecar stderr to <pluginRoot>/.sidecar.log instead of stdio: "ignore". File opened (truncated) once per startSidecar call, shared across all uvx/python/python3 fallback attempts. Falls back to "ignore" if openSync throws — preserves prior behavior on open failure. 2. Raise default uvx boot timeout 60s → 180s; expose ORCH_SIDECAR_BOOT_TIMEOUT_MS env override applied to all spawn attempts. Eliminates the residential-broadband failure where downloading ~2 GB of bge-m3 model + onnxruntime wheels exceeds the prior budget. Fast-link users see no behavior change (boot still completes well inside 180s on gigabit). 3. system_status and install_embeddings now reference the sidecar log path and, when the bge-m3 model isn't yet cached, suggest the timeout env var. install_embeddings(check) adds a "bge-m3 model cache: present / not yet downloaded" line that disambiguates first-run from broken-run. .gitignore picks up .sidecar.log. dist/server.js rebuilt via `bun run build`. bun run typecheck: clean. bun test: 330 pass / 1 fail (pre-existing — hooks.test.ts session-activity nudge text mismatch reproduces against upstream/main without this patch).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1.
Three changes against
mcp/server.ts, plus the rebuiltdist/server.js:<pluginRoot>/.sidecar.log(truncated each boot).trySpawnaccepts a log FD; replacesstdio: "ignore". FD opened once instartSidecar, shared across all uvx → python → python3 fallback attempts. Falls back to"ignore"ifopenSyncthrows — preserves prior behavior on open failure.ORCH_SIDECAR_BOOT_TIMEOUT_MSenv override + uvx default 60 s → 180 s. Eliminates the residential-link failure described in orchestrator: silent sidecar boot failure on first install_embeddings — 60s timeout exceeded by cold-start downloads #1. Python/python3 fallback timeouts honor the env var but default unchanged at 30 s (assumes uvx already cached deps).system_statusandinstall_embeddingsnow reference the log path and, when the bge-m3 model isn't yet cached, suggest the timeout env var.install_embeddings(check)adds abge-m3 model cache: present (~10s boot expected) | not yet downloaded (~2 GB on first boot)line that disambiguates first-run from broken-run..gitignorepicks up.sidecar.log.dist/server.jsrebuilt viabun run build.Tested
bun run typecheck— clean.bun test— 330 pass / 1 fail. The single failure is intests/hooks/hooks.test.ts(session-activity-nudge text mismatch) and reproduces againstupstream/mainwithout this patch — pre-existing, not introduced here.lstart+ HF cache blob mtimes:10:00:41— cold sidecar boot started (HF cache empty)10:01:54— 2.27 GBmodel.onnx_datafinished downloading (73 s in)~10:02:02— Model ready, port file written, sidecar healthy (~80 s total)10:01:41, mid-download..sidecar.logcaptured the previously-discarded boot diagnostics (download progress, HF rate-limit warning, ONNX load, port write, "Listening on …").install_embeddings(check)with cache populated returned the expectedbge-m3 model cache: present (~10s boot expected)line.Known minor limitation
.sidecar.logopens in"w"mode perstartSidecarcall, so a secondstartSidecar(e.g. oninstall_embeddings(install)after an automatic session-start spawn) truncates the prior attempt's log. Strictly better thanstdio: "ignore", but consider switching to"a"(append) or rotate-on-open if maintainers want multi-attempt forensics. Happy to follow up.Not changed
install_embeddings(install)failure output, which is where users hit it.