Skip to content

explorer: interim recall fix — search description (#168 Direction A, knowingly slower)#177

Merged
rdhyee merged 1 commit intoisamplesorg:mainfrom
rdhyee:explorer-search-interim-recall
May 8, 2026
Merged

explorer: interim recall fix — search description (#168 Direction A, knowingly slower)#177
rdhyee merged 1 commit intoisamplesorg:mainfrom
rdhyee:explorer-search-interim-recall

Conversation

@rdhyee
Copy link
Copy Markdown
Contributor

@rdhyee rdhyee commented May 8, 2026

Summary — interim recall fix, NOT the future FTS backend

`doSearch()` swapped from `samples_map_lite.parquet` to `sample_facets_v2.parquet` so live search now covers `label + description + place_name` instead of `label + place_name`. This trades latency for honest recall, knowingly violating the latency thresholds locked in #168.

The previous live search returned zero results for `pottery Cyprus` even though ~7,124 samples actually match (Cyprus appears only in description). The #168 honesty-fix landed in #176 documented this gap; this PR closes it functionally as an interim state while the proper substrate work in #169-#172 builds the real FTS that fixes both recall and latency.

Why this is shipping despite the threshold failure

This is the kind of decision the threshold rule was a guess about, not an answer to. The data flipped the answer.

What this is NOT

What changes

  • `explorer.qmd` `doSearch()` SQL: CTE-then-keyed-join over `sample_facets_v2` (filter to top-50 first, then keyed JOIN to `samples_map_lite` for coordinates). Native DuckDB benchmark showed the naive LEFT JOIN form is 8× slower (4.2 s vs 0.5 s for `pottery`) and times out in browser; the CTE form is the only shippable shape.
  • `explorer.qmd` search-help line: updated to "Searches labels, descriptions, and place names. First search can take 10-15 seconds while data loads; subsequent searches are faster." Forward-link to Explorer FTS Track 2: search_index_v1 contract doc #169.
  • `explorer.qmd` placeholder example: `pottery Cyprus` (a query that now works).
  • `query-spec.qmd`: §3.2 wording reflects the interim state with explicit "interim" framing; §5.1 binding describes the new query shape.

Cold-search caveat

The new `.search-help` UI line warns users that the first cold search can take 10-15 seconds. This is honest. The substrate work fixes it. Users who hit that latency see a "Searching..." indicator (existing behavior) plus the new explanatory line.

A future PR could add an explicit progress spinner or shimmer; out of scope here.

What this unblocks

Test plan

  • Open Explorer; verify `pottery Cyprus` now returns 50 results in ~10-15 s cold, ~3-5 s warm.
  • Verify `Çatalhöyük` returns 50 results.
  • Verify source filter still works (`pottery` × OPENCONTEXT only).
  • Verify a result with no lat/lng still renders in side panel without crashing (LEFT JOIN can produce null coords for samples not in samples_map_lite).
  • Verify the `?perf=1` panel shows the new search timings per the explorer: search perf-smoke baseline (#167) #173 instrumentation.
  • Verify the search-help line is visible and reads honestly.

Closes #168 (functionally — the recall gap that PR #176 documented). Refs #165, #167, #169-#172, PR #166, PR #173, PR #175, PR #176.

🤖 Generated with Claude Code

…rection A)

Knowingly accepts the latency regression measured in the isamplesorg#167 baseline
to fix false-zero results: queries like 'pottery Cyprus' returned 0
results in the previous live, even though ~7,124 samples actually
match (Cyprus appears only in description, which samples_map_lite
doesn't carry). 'Searching but broken' is worse UX than 'searching
slowly but correctly.'

This is **interim**, not the future search backend:
- The latency regression (cold pottery 8.7s → 12.0s; multi-term
  5.1s → 14.8s) violates the locked thresholds in isamplesorg#168.
- The proper substrate work in isamplesorg#169-isamplesorg#172 fixes both recall AND
  latency via a sample-centric document projection with hash-
  partitioned BM25 indexes.
- This code path goes away when isamplesorg#171 lands. isamplesorg#170 is unblocked
  by this PR.

Implementation:
- doSearch() swapped from samples_map_lite to sample_facets_v2 for
  search; CTE-then-keyed-join shape (NOT naive LEFT JOIN — native
  benchmarks showed 8x penalty for the naive form on `pottery`,
  enough to time out in browser).
- Field weights: label=3, place_name=2, description=1.
- query-spec.qmd updated to honestly describe the interim state.
- Search-help UI updated: cold-search latency caveat, links to isamplesorg#169
  as the path that lifts both gaps.
- Placeholder restored to `pottery Cyprus` — a query that *now works*.

Refs isamplesorg#165, isamplesorg#167, isamplesorg#168, isamplesorg#169.
Closes isamplesorg#168.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdhyee
Copy link
Copy Markdown
Contributor Author

rdhyee commented May 8, 2026

Local-preview verification (per Codex review residual ask)

Ran against the post-merge local preview. Codex's two eyeball checks both clean:

`?perf=1` panel updates per search ✓

After running `pottery Cyprus` in the search bar, the panel grew from 8 boot-time rows to 9, and row 9 reads:

```
search #1: "pottery Cyprus" (50) 11.69 s
```

The per-search append code in `doSearch()` finally block (#173 round-2 fix) wires through. Subsequent searches add rows.

Search-help line reads cleanly ✓

DOM-extracted text:

Searches labels, descriptions, and place names. First search can take 10-15 seconds while data loads; subsequent searches are faster. Tracking issue: faster substrate FTS in progress.

Visible (helpVisible=true; bounding-box height > 0). Placeholder shows the new `pottery Cyprus` example.

Recall recovery confirmed ✓

`pottery Cyprus` returns 50 results (was 0 in the pre-#177 baseline). Camera flies to first result at 34.99, 33.71 — Cyprus coordinates.

Cold elapsed 11.69 s — matches the #167 baseline measurement (12 s for multi-term) within noise. The latency cost is real and was knowingly accepted in the PR framing.

Ready to merge.

@rdhyee rdhyee merged commit 51f7523 into isamplesorg:main May 8, 2026
1 check passed
rdhyee added a commit to rdhyee/isamplesorg.github.io that referenced this pull request May 8, 2026
…subset (isamplesorg#179)

Run after the round-2 SQL fix (commit cc79ec0). All 10 canonical
queries pass cleanly in 4m6s including the new area-scope case.

Highlights:
- single-common (pottery): 10.5s cold, 4.6s warm, 50 results
- multi-term (pottery Cyprus): 10.0s cold, 4.5s warm, 50 results
  (was 0 before isamplesorg#177 Direction A)
- diacritic (Çatalhöyük): 13.2s cold, 4.9s warm, 50 results
  (was 0 before isamplesorg#177)
- area-scope (pottery × Cyprus camera): 10.5s cold, 4.2s warm,
  50 results — confirms the round-2 fix (was 0 before cc79ec0)
- composed-source / composed-source-material: ~6s cold, faster
  because the source filter dramatically reduces the candidate set

Latency profile: 10-13s cold, 4-5s warm. Within the same envelope
as the pre-area-scope baseline; the new SQL doesn't materially
change cold/warm timings vs the world path.

field_subset string in test + JSON was stale (still said
"label+place_name samples_map_lite") — landed in test edit that
was abandoned with the honesty-fix branch when Direction B
shipped first. Corrected now.

Refs isamplesorg#167, isamplesorg#168, isamplesorg#178, isamplesorg#179.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rdhyee added a commit that referenced this pull request May 8, 2026
* explorer: two-button search semantics (#178 Light path)

Per Hana's mockup discussed in 2026-05-08 tech call: replace the
single 'Search' button with two scope buttons. Light extension of
option C — same backend, viewport-scoped variant adds an outer-query
lat/lng BETWEEN predicate.

UI:
- '.search-bar' loses the inline button (input only).
- New '.search-actions' row below the input: 'Search Selected Areas'
  (orange #ef6c00) and 'Search Entire World' (blue #1565c0). Match
  the mockup's color/intent coding.
- Search-help line unchanged (still warns about cold-search latency).

Backend (explorer.qmd doSearch):
- doSearch(scope) accepts 'area' or 'world'.
- For 'area', computeViewRectangle() → outer-query predicate
  `AND l.latitude BETWEEN ... AND l.longitude BETWEEN ...`.
  Dateline-crossing handled by splitting longitude into two ranges
  when west > east.
- The viewport predicate goes on the OUTER query (post-join), not the
  inner CTE, because lat/lng live in samples_map_lite, not
  sample_facets_v2. Implication: area-scoped searches can return < 50
  results when the inner top-50 don't all satisfy viewport — users
  widen by panning. Acceptable v1 behavior.
- Auto-fly to first result suppressed for area-scope (the user is
  already at the area they care about; flying would zoom in and
  disorient).

URL state:
- New ?search_scope=area|world param. Default 'world' (omitted from
  URL). Hydrated on boot from URL; persisted by persistSearchScope()
  (separate from writeQueryState which doesn't know about scope).
- Enter key uses the last-clicked scope (or URL-hydrated scope on
  cold boot, defaulting to world).

Instrumentation:
- isamples.search structured log gains 'scope' field.
- ?perf=1 panel row format: 'search #N <scope>: "<term>" (<count>)'.

Tests:
- New 'area-scope' canonical query in test_search_perf.py uses
  url_hash to set the camera before clicking 'Search Selected Areas'.
- _run_search takes a scope param routing to #searchAreaBtn or
  #searchWorldBtn.
- _measure_one_query honors query['url_hash'] and query['filters']['scope'].

Doc:
- EXPLORER_STATE.md §6 gains a 'Light-path addendum' explaining the
  two-button design as an extension of option C, NOT a revisit of
  A/B/C. Heavy revisit deferred until #170-#172 land.

Verified locally: area click at lat=35,lng=33,alt=2Mm → 0 results
(confirmed natively: no top-50 pottery in that rect), camera stays
put. World click → 50+ results, camera flies to top-1 (Italy).
URL hydration round-trips ?search_scope=area correctly.

Closes #178. Refs #163, #165, PR #166, PR #177.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: fix area-mode false zeroes — viewport before top-50 (#179 review)

Codex round-2 review caught that the previous shape applied the
viewport predicate AFTER the global top-50 selection. Effect: 'Search
Selected Areas' was actually 'current viewport among the global top
50,' not 'top 50 within the current viewport.' For broad terms like
`pottery`, the global top-50 happens to all live in one Alaska
collection (label='Pottery AM662:...', score=3 each, all at lat=57.7
lng=-152.4); a Cyprus-area query would return 0 even though Cyprus
genuinely has 50+ pottery hits. This was the original false-zero
problem in disguise.

Fix: split into two SQL shapes.

- World mode: unchanged. CTE over sample_facets_v2 → top-50 → LEFT
  JOIN samples_map_lite. Coord-less samples still appear (lat/lng
  null) since they're legitimate text matches.
- Area mode: INNER JOIN samples_map_lite inside the candidate
  selection, viewport BETWEEN predicate applied BEFORE
  ORDER BY ... LIMIT 50. Drops coord-less samples (area-scoped search
  by definition requires coords). Top-50 within area, not within global.

Verified natively (TIGHT Cyprus rect lat 30-40 lng 25-40):
- Old SQL: 0 of top-50 pass viewport
- New SQL: 50 of top-50

Verified in browser at Cyprus camera (lat=35, lng=33, alt=1Mm):
'Search Selected Areas' for `pottery` returns 50+ results, all at
the Dead Sea pottery site (31.13, 35.53) — exactly what the user
expects.

Both SQL shapes use f.-qualified column names so the same
searchWhere/score strings work for both. EXPLORER_STATE.md §6
Light-path addendum updated to describe the two shapes and why
area mode requires coords.

Refs #178, #179.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* tests: refresh perf-smoke baseline with area-scope + corrected field_subset (#179)

Run after the round-2 SQL fix (commit cc79ec0). All 10 canonical
queries pass cleanly in 4m6s including the new area-scope case.

Highlights:
- single-common (pottery): 10.5s cold, 4.6s warm, 50 results
- multi-term (pottery Cyprus): 10.0s cold, 4.5s warm, 50 results
  (was 0 before #177 Direction A)
- diacritic (Çatalhöyük): 13.2s cold, 4.9s warm, 50 results
  (was 0 before #177)
- area-scope (pottery × Cyprus camera): 10.5s cold, 4.2s warm,
  50 results — confirms the round-2 fix (was 0 before cc79ec0)
- composed-source / composed-source-material: ~6s cold, faster
  because the source filter dramatically reduces the candidate set

Latency profile: 10-13s cold, 4-5s warm. Within the same envelope
as the pre-area-scope baseline; the new SQL doesn't materially
change cold/warm timings vs the world path.

field_subset string in test + JSON was stale (still said
"label+place_name samples_map_lite") — landed in test edit that
was abandoned with the honesty-fix branch when Direction B
shipped first. Corrected now.

Refs #167, #168, #178, #179.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explorer FTS Track 1b: Honesty fix for query-spec / live mismatch

1 participant