explorer: interim recall fix — search description (#168 Direction A, knowingly slower)#177
Conversation
…rection A) Knowingly accepts the latency regression measured in the isamplesorg#167 baseline to fix false-zero results: queries like 'pottery Cyprus' returned 0 results in the previous live, even though ~7,124 samples actually match (Cyprus appears only in description, which samples_map_lite doesn't carry). 'Searching but broken' is worse UX than 'searching slowly but correctly.' This is **interim**, not the future search backend: - The latency regression (cold pottery 8.7s → 12.0s; multi-term 5.1s → 14.8s) violates the locked thresholds in isamplesorg#168. - The proper substrate work in isamplesorg#169-isamplesorg#172 fixes both recall AND latency via a sample-centric document projection with hash- partitioned BM25 indexes. - This code path goes away when isamplesorg#171 lands. isamplesorg#170 is unblocked by this PR. Implementation: - doSearch() swapped from samples_map_lite to sample_facets_v2 for search; CTE-then-keyed-join shape (NOT naive LEFT JOIN — native benchmarks showed 8x penalty for the naive form on `pottery`, enough to time out in browser). - Field weights: label=3, place_name=2, description=1. - query-spec.qmd updated to honestly describe the interim state. - Search-help UI updated: cold-search latency caveat, links to isamplesorg#169 as the path that lifts both gaps. - Placeholder restored to `pottery Cyprus` — a query that *now works*. Refs isamplesorg#165, isamplesorg#167, isamplesorg#168, isamplesorg#169. Closes isamplesorg#168. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local-preview verification (per Codex review residual ask)Ran against the post-merge local preview. Codex's two eyeball checks both clean: `?perf=1` panel updates per search ✓After running `pottery Cyprus` in the search bar, the panel grew from 8 boot-time rows to 9, and row 9 reads: ``` The per-search append code in `doSearch()` finally block (#173 round-2 fix) wires through. Subsequent searches add rows. Search-help line reads cleanly ✓DOM-extracted text:
Visible (helpVisible=true; bounding-box height > 0). Placeholder shows the new `pottery Cyprus` example. Recall recovery confirmed ✓`pottery Cyprus` returns 50 results (was 0 in the pre-#177 baseline). Camera flies to first result at 34.99, 33.71 — Cyprus coordinates. Cold elapsed 11.69 s — matches the #167 baseline measurement (12 s for multi-term) within noise. The latency cost is real and was knowingly accepted in the PR framing. Ready to merge. |
…subset (isamplesorg#179) Run after the round-2 SQL fix (commit cc79ec0). All 10 canonical queries pass cleanly in 4m6s including the new area-scope case. Highlights: - single-common (pottery): 10.5s cold, 4.6s warm, 50 results - multi-term (pottery Cyprus): 10.0s cold, 4.5s warm, 50 results (was 0 before isamplesorg#177 Direction A) - diacritic (Çatalhöyük): 13.2s cold, 4.9s warm, 50 results (was 0 before isamplesorg#177) - area-scope (pottery × Cyprus camera): 10.5s cold, 4.2s warm, 50 results — confirms the round-2 fix (was 0 before cc79ec0) - composed-source / composed-source-material: ~6s cold, faster because the source filter dramatically reduces the candidate set Latency profile: 10-13s cold, 4-5s warm. Within the same envelope as the pre-area-scope baseline; the new SQL doesn't materially change cold/warm timings vs the world path. field_subset string in test + JSON was stale (still said "label+place_name samples_map_lite") — landed in test edit that was abandoned with the honesty-fix branch when Direction B shipped first. Corrected now. Refs isamplesorg#167, isamplesorg#168, isamplesorg#178, isamplesorg#179. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* explorer: two-button search semantics (#178 Light path) Per Hana's mockup discussed in 2026-05-08 tech call: replace the single 'Search' button with two scope buttons. Light extension of option C — same backend, viewport-scoped variant adds an outer-query lat/lng BETWEEN predicate. UI: - '.search-bar' loses the inline button (input only). - New '.search-actions' row below the input: 'Search Selected Areas' (orange #ef6c00) and 'Search Entire World' (blue #1565c0). Match the mockup's color/intent coding. - Search-help line unchanged (still warns about cold-search latency). Backend (explorer.qmd doSearch): - doSearch(scope) accepts 'area' or 'world'. - For 'area', computeViewRectangle() → outer-query predicate `AND l.latitude BETWEEN ... AND l.longitude BETWEEN ...`. Dateline-crossing handled by splitting longitude into two ranges when west > east. - The viewport predicate goes on the OUTER query (post-join), not the inner CTE, because lat/lng live in samples_map_lite, not sample_facets_v2. Implication: area-scoped searches can return < 50 results when the inner top-50 don't all satisfy viewport — users widen by panning. Acceptable v1 behavior. - Auto-fly to first result suppressed for area-scope (the user is already at the area they care about; flying would zoom in and disorient). URL state: - New ?search_scope=area|world param. Default 'world' (omitted from URL). Hydrated on boot from URL; persisted by persistSearchScope() (separate from writeQueryState which doesn't know about scope). - Enter key uses the last-clicked scope (or URL-hydrated scope on cold boot, defaulting to world). Instrumentation: - isamples.search structured log gains 'scope' field. - ?perf=1 panel row format: 'search #N <scope>: "<term>" (<count>)'. Tests: - New 'area-scope' canonical query in test_search_perf.py uses url_hash to set the camera before clicking 'Search Selected Areas'. - _run_search takes a scope param routing to #searchAreaBtn or #searchWorldBtn. - _measure_one_query honors query['url_hash'] and query['filters']['scope']. Doc: - EXPLORER_STATE.md §6 gains a 'Light-path addendum' explaining the two-button design as an extension of option C, NOT a revisit of A/B/C. Heavy revisit deferred until #170-#172 land. Verified locally: area click at lat=35,lng=33,alt=2Mm → 0 results (confirmed natively: no top-50 pottery in that rect), camera stays put. World click → 50+ results, camera flies to top-1 (Italy). URL hydration round-trips ?search_scope=area correctly. Closes #178. Refs #163, #165, PR #166, PR #177. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * explorer: fix area-mode false zeroes — viewport before top-50 (#179 review) Codex round-2 review caught that the previous shape applied the viewport predicate AFTER the global top-50 selection. Effect: 'Search Selected Areas' was actually 'current viewport among the global top 50,' not 'top 50 within the current viewport.' For broad terms like `pottery`, the global top-50 happens to all live in one Alaska collection (label='Pottery AM662:...', score=3 each, all at lat=57.7 lng=-152.4); a Cyprus-area query would return 0 even though Cyprus genuinely has 50+ pottery hits. This was the original false-zero problem in disguise. Fix: split into two SQL shapes. - World mode: unchanged. CTE over sample_facets_v2 → top-50 → LEFT JOIN samples_map_lite. Coord-less samples still appear (lat/lng null) since they're legitimate text matches. - Area mode: INNER JOIN samples_map_lite inside the candidate selection, viewport BETWEEN predicate applied BEFORE ORDER BY ... LIMIT 50. Drops coord-less samples (area-scoped search by definition requires coords). Top-50 within area, not within global. Verified natively (TIGHT Cyprus rect lat 30-40 lng 25-40): - Old SQL: 0 of top-50 pass viewport - New SQL: 50 of top-50 Verified in browser at Cyprus camera (lat=35, lng=33, alt=1Mm): 'Search Selected Areas' for `pottery` returns 50+ results, all at the Dead Sea pottery site (31.13, 35.53) — exactly what the user expects. Both SQL shapes use f.-qualified column names so the same searchWhere/score strings work for both. EXPLORER_STATE.md §6 Light-path addendum updated to describe the two shapes and why area mode requires coords. Refs #178, #179. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * tests: refresh perf-smoke baseline with area-scope + corrected field_subset (#179) Run after the round-2 SQL fix (commit cc79ec0). All 10 canonical queries pass cleanly in 4m6s including the new area-scope case. Highlights: - single-common (pottery): 10.5s cold, 4.6s warm, 50 results - multi-term (pottery Cyprus): 10.0s cold, 4.5s warm, 50 results (was 0 before #177 Direction A) - diacritic (Çatalhöyük): 13.2s cold, 4.9s warm, 50 results (was 0 before #177) - area-scope (pottery × Cyprus camera): 10.5s cold, 4.2s warm, 50 results — confirms the round-2 fix (was 0 before cc79ec0) - composed-source / composed-source-material: ~6s cold, faster because the source filter dramatically reduces the candidate set Latency profile: 10-13s cold, 4-5s warm. Within the same envelope as the pre-area-scope baseline; the new SQL doesn't materially change cold/warm timings vs the world path. field_subset string in test + JSON was stale (still said "label+place_name samples_map_lite") — landed in test edit that was abandoned with the honesty-fix branch when Direction B shipped first. Corrected now. Refs #167, #168, #178, #179. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary — interim recall fix, NOT the future FTS backend
`doSearch()` swapped from `samples_map_lite.parquet` to `sample_facets_v2.parquet` so live search now covers `label + description + place_name` instead of `label + place_name`. This trades latency for honest recall, knowingly violating the latency thresholds locked in #168.
The previous live search returned zero results for `pottery Cyprus` even though ~7,124 samples actually match (Cyprus appears only in description). The #168 honesty-fix landed in #176 documented this gap; this PR closes it functionally as an interim state while the proper substrate work in #169-#172 builds the real FTS that fixes both recall and latency.
Why this is shipping despite the threshold failure
This is the kind of decision the threshold rule was a guess about, not an answer to. The data flipped the answer.
What this is NOT
What changes
Cold-search caveat
The new `.search-help` UI line warns users that the first cold search can take 10-15 seconds. This is honest. The substrate work fixes it. Users who hit that latency see a "Searching..." indicator (existing behavior) plus the new explanatory line.
A future PR could add an explicit progress spinner or shimmer; out of scope here.
What this unblocks
Test plan
Closes #168 (functionally — the recall gap that PR #176 documented). Refs #165, #167, #169-#172, PR #166, PR #173, PR #175, PR #176.
🤖 Generated with Claude Code