diff --git a/AGENTS.md b/AGENTS.md
index 54bde2a05..9bdb876b0 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -143,8 +143,9 @@ src-layout reorg):
 - `src/browser_harness/*.py` (`daemon.py`, `admin.py`, `helpers.py`,
   `run.py`, `_ipc.py`) — protected. Pull verbatim. If behavior change is
   needed, upstream a PR to `browser-use/browser-harness`.
-- `interaction-skills/`, `agent-workspace/domain-skills/` — verbatim.
-  Never edit.
+- `interaction-skills/` — verbatim. Never edit.
+- `(agent-workspace/)?domain-skills/` — **excluded** from vendored tree.
+  Sync agents skip these paths; see UPSTREAM.md §3 "Excluded paths".
 
 Sync workflow lives in `harness-sync.md`.
 
diff --git a/UPSTREAM.md b/UPSTREAM.md
index 8f418a3d5..940ade3cb 100644
--- a/UPSTREAM.md
+++ b/UPSTREAM.md
@@ -91,21 +91,38 @@ Each upstream has its own append-only table. Add a row every time you pull.
 
 ---
 
-## 3. Harness divergences
+## 3. Harness divergences and excluded paths
 
-Per-file record of where `packages/bcode-browser/harness/` deliberately differs from upstream. Read this *before* a sync diff so intentional differences aren't mistaken for missing features.
+Per-file record of where `packages/bcode-browser/harness/` deliberately differs from upstream, plus the list of paths excluded from the vendored tree entirely. Read this *before* a sync diff so intentional differences aren't mistaken for missing features and excluded paths aren't accidentally re-imported.
 
 Path-allowlist policy (decisions.md §3.7, §4.5; updated for upstream PR #229 src-layout reorg):
 
 - `agent-workspace/agent_helpers.py` — editable; primary BrowserCode extension surface. Divergences expected.
 - `src/browser_harness/*.py` (`daemon.py`, `admin.py`, `helpers.py`, `run.py`, `_ipc.py`) — protected. Pulled verbatim from upstream. If behavior change is needed, upstream a PR to `browser-use/browser-harness`.
-- `interaction-skills/`, `agent-workspace/domain-skills/` — verbatim from upstream. We never edit these.
+- `interaction-skills/` — verbatim from upstream. We never edit these.
+- `(agent-workspace/)?domain-skills/` — **excluded.** See "Excluded paths" below.
 - Other files (`pyproject.toml`, `LICENSE`, `README.md`, etc.) — divergence allowed but discouraged.
 
+### Excluded paths
+
+Upstream paths the vendored tree treats as if they don't exist. Sync agents skip them; the diff checker filters them out. The runtime guard in `helpers.py` (`if d.is_dir():` in `goto_url`) means absence is a clean no-op.
+
+| Pattern | Reason |
+|---|---|
+| `(agent-workspace/)?domain-skills/**` | User-contributed site recipes. Quality, maintenance, and prompt-injection concerns. Browsercode (cloud-first, performance-focused) curates its own skills server-side; OSS users get the harness without bundled recipes. Both upstream paths covered: post-PR-#229 `agent-workspace/domain-skills/` and the legacy/PR-#247 top-level `domain-skills/`. The exclusion is enforced in three places that all reference this row: `script/check-harness-diff.sh` (`IGNORED_PATHS_REGEX`), `harness-sync.md` step 5 ("Excluded paths" row), and the absence of these directories from the vendored tree. |
+
+### Modified files
+
 | File | Section | Direction | Reason |
 |---|---|---|---|
 | `.gitignore` | venv entry | added `.venv/` | smoke-test workflow creates `.venv/` in the harness dir; we ignore it. Upstream uses CWD-level venv so doesn't need this. |
 
+The vendored harness's `SKILL.md`, `README.md`, and `install.md` reference `agent-workspace/domain-skills/`, but we keep them verbatim from upstream. Rationale:
+
+- `README.md` and `install.md` are not referenced by any browsercode prompt or TS code — the agent never reads them. Their content is dead weight in the extracted cache, not agent-visible.
+- `SKILL.md` is referenced by `packages/opencode/src/tool/browser-execute.txt` today, but the long-term plan (see ROADMAP) is to replace that pointer with a browsercode-owned prompt file, making vendored `SKILL.md` inert too.
+- Trimming these files would generate per-sync drift forever for zero agent-behavior benefit. Keeping them verbatim costs nothing and keeps future syncs mechanical.
+
 ---
 
 ## Drift checker
diff --git a/harness-sync.md b/harness-sync.md
index cdf0cba52..f9073fe65 100644
--- a/harness-sync.md
+++ b/harness-sync.md
@@ -28,7 +28,7 @@ git pull origin main
 Two things to read before touching anything:
 
 - **`UPSTREAM.md`** — the latest `To SHA` row under `### browser-use/browser-harness`. That is the last commit we synced to. It is the only source of truth for "what version is vendored."
-- **`UPSTREAM.md` §3 Harness divergences** — the table of files where we deliberately differ from upstream, with reasons. Read this *before* the diff so you know which differences are intentional and not "missing features."
+- **`UPSTREAM.md` §3 Harness divergences and excluded paths** — the table of files where we deliberately differ from upstream, plus the list of paths excluded from the vendored tree entirely. Read both *before* the diff so you know which differences are intentional and not "missing features," and which paths to skip outright.
 
 If the divergences table is empty (initial vendor state), every difference between us and upstream is unintentional drift; flag any in the PR.
 
@@ -65,14 +65,16 @@ This is where the agent earns its keep. For each file changed in `<recorded-sha>
 
 | File category | Action |
 |---|---|
-| Files not in our divergences table (incl. `src/browser_harness/*.py`, `agent-workspace/domain-skills/`, `interaction-skills/`, `tests/`, `pyproject.toml`, `LICENSE`, etc.) | Take upstream verbatim — `cp temp/browser-harness/<path> packages/bcode-browser/harness/<path>`. |
+| **Excluded paths** (`(agent-workspace/)?domain-skills/...`) | **Skip entirely.** Never copy in, never resurrect. See UPSTREAM.md §3 "Excluded paths". `script/check-harness-diff.sh` filters these out automatically. |
+| Files not in our divergences table (incl. `src/browser_harness/*.py`, `interaction-skills/`, `tests/`, `pyproject.toml`, `LICENSE`, etc.) | Take upstream verbatim — `cp temp/browser-harness/<path> packages/bcode-browser/harness/<path>`. |
 | Files in our divergences table | Read each upstream hunk. For each, decide: **take** (apply upstream change to our file), **skip** (our divergence wins, ignore upstream change), or **adapt** (rewrite our divergence to coexist with the upstream change). Update the divergences row if its reason or scope shifts. |
-| New upstream files | Copy in. |
+| New upstream files | Copy in (unless under an excluded path). |
 | Files we have but upstream removed | Decide: keep ours (record in divergences) or delete. |
 
 Path-allowlist policy stays in force during sync resolution as well as normal development:
 - `agent-workspace/agent_helpers.py` — editable, agent's primary extension surface (post PR #229).
 - `src/browser_harness/*.py` (`daemon.py`, `admin.py`, `helpers.py`, `run.py`, `_ipc.py`) — protected. Always take upstream verbatim. If upstream regresses, file an issue at `browser-use/browser-harness` and pin to the prior SHA, do not patch locally.
+- `(agent-workspace/)?domain-skills/` — **excluded.** Treat as if not in the upstream tree. Quality + prompt-injection concerns; user-contributed site recipes do not ship with browsercode. The runtime guard in `helpers.py` (`if d.is_dir():`) means this is a clean no-op.
 
 ### 6. Smoke test
 
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/amazon/product-search.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/amazon/product-search.md
deleted file mode 100644
index 3deb07186..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/amazon/product-search.md
+++ /dev/null
@@ -1,198 +0,0 @@
-# Amazon — Product Search & Data Extraction
-
-Field-tested against amazon.com on 2025-04-18 using a logged-in Chrome session.
-No CAPTCHA or bot detection was triggered during any test run.
-
-## Navigation
-
-### Direct search URL (fastest, always use this)
-```python
-goto_url("https://www.amazon.com/s?k=mechanical+keyboard")
-wait_for_load()
-wait(2)  # dynamic content needs ~2s after readyState=complete
-```
-
-### Search box typing (use when you need category filtering)
-```python
-goto_url("https://www.amazon.com")
-wait_for_load()
-wait(1)
-js("document.querySelector('#twotabsearchtextbox').focus()")
-js("document.querySelector('#twotabsearchtextbox').click()")
-wait(0.3)
-type_text("wireless mouse")
-wait(0.3)
-press_key("Enter")
-wait_for_load()
-wait(2)
-```
-
-### Direct product page
-```python
-# URL pattern: /dp/{ASIN}  or  /dp/{ASIN}?th=1 (Amazon may redirect to add ?th=1)
-goto_url("https://www.amazon.com/dp/B08Z6X4NK3")
-wait_for_load()
-wait(2)
-```
-
-## Session Gotcha
-
-**Always use `new_tab()` when opening Amazon for the first time in a harness session.**
-`goto_url()` can silently fail to navigate if the current tab resists the navigation
-(observed when the daemon attached to a different real tab). The safe pattern:
-
-```python
-tid = new_tab("https://www.amazon.com/s?k=mechanical+keyboard")
-wait_for_load()
-wait(2)
-```
-
-After that, `goto_url()` works fine within the same Amazon session.
-
-## Search Results Extraction
-
-### Container selector
-`[data-component-type="s-search-result"]` — confirmed working, yields ~22 results per page.
-
-### Full extraction (field-tested)
-```python
-results = js("""
-  Array.from(document.querySelectorAll('[data-component-type="s-search-result"]')).map(el => ({
-    asin: el.getAttribute('data-asin'),
-    title: el.querySelector('h2 span')?.innerText?.trim(),
-    price: el.querySelector('.a-price .a-offscreen')?.innerText,
-    list_price: el.querySelector('.a-text-price .a-offscreen')?.innerText,
-    rating: el.querySelector('[aria-label*="out of 5 stars"]')?.getAttribute('aria-label')?.split(' ')[0],
-    reviews: el.querySelector('[aria-label*="ratings"]')?.getAttribute('aria-label'),
-    is_sponsored: !!el.querySelector('.puis-sponsored-label-text'),
-    url: el.querySelector('h2 a')?.href
-  }))
-""")
-```
-
-### Field notes
-- **`asin`**: `data-asin` attribute on the container div — always present, matches the `/dp/{ASIN}` URL.
-- **`title`**: `h2 span` works consistently. `h2 a.a-link-normal span` also works.
-- **`price`**: `.a-price .a-offscreen` returns the formatted string e.g. `"$69.99"`. Use this, not `.a-price-whole`.
-- **`list_price`**: `.a-text-price .a-offscreen` — only present when item is on sale (was/now pricing).
-- **`rating`**: Use `aria-label` on `[aria-label*="out of 5 stars"]` — gives `"4.5 out of 5 stars, rating details"`, split on space for the number.
-- **`reviews`**: Use `[aria-label*="ratings"]` attribute — gives `"1,514 ratings"`. Do NOT use `.a-size-base.s-underline-text` — that element exists on sponsored results and shows "Xbox" (a cross-sell widget text).
-- **`is_sponsored`**: `.puis-sponsored-label-text` is present on sponsored listings; first 2-3 results are usually sponsored.
-- **`url`**: `h2 a` href — contains the full `/dp/{ASIN}/...` URL.
-
-## Product Detail Page Extraction
-
-### Confirmed selectors (field-tested on B08Z6X4NK3)
-```python
-detail = js("""
-  ({
-    title: document.querySelector('#productTitle')?.innerText?.trim(),
-    price: (function() {
-      var whole = document.querySelector('.a-price-whole')?.innerText?.replace(/[\\n.]/g,'');
-      var frac  = document.querySelector('.a-price-fraction')?.innerText;
-      return (whole && frac) ? '$' + whole + '.' + frac
-           : document.querySelector('.a-price .a-offscreen')?.innerText || null;
-    })(),
-    list_price: document.querySelector('.basisPrice .a-offscreen')?.innerText,
-    rating: document.querySelector('#acrPopover')?.getAttribute('title'),
-    review_count: document.querySelector('#acrCustomerReviewText')?.innerText,
-    availability: document.querySelector('#availability span')?.innerText?.trim(),
-    brand: document.querySelector('#bylineInfo')?.innerText?.trim(),
-    asin: document.querySelector('input[name="ASIN"]')?.value,
-    bullet_points: Array.from(document.querySelectorAll('#feature-bullets li span.a-list-item'))
-                       .map(e => e.innerText?.trim()).filter(t => t)
-  })
-""")
-```
-
-### Price field notes
-- `#priceblock_ourprice` and `#priceblock_dealprice` are **legacy** — they return `null` on modern product pages.
-- Construct price from `.a-price-whole` + `.a-price-fraction` (both stripped of `\n` and `.`).
-- As a fallback: first `.a-price .a-offscreen` on the page also works (confirmed `$69.99`).
-- `list_price` from `.basisPrice .a-offscreen` shows the crossed-out "was" price when a discount exists.
-
-## Best Sellers Page
-
-URL: `https://www.amazon.com/Best-Sellers-{Category}/zgbs/{slug}/`
-e.g. `https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/`
-
-### DOM structure (2025)
-`.zg-item-immersion` **does not exist** — Amazon migrated to CSS modules. Use `[data-asin]` anchored on `[id="gridItemRoot"]`:
-
-```python
-goto_url("https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/")
-wait_for_load()
-wait(2)
-
-items = js("""
-  Array.from(document.querySelectorAll('[data-asin]')).map(el => {
-    var container = el.closest('[id="gridItemRoot"]') || el;
-    return {
-      asin: el.getAttribute('data-asin'),
-      rank: container.querySelector('[class*="zg-bdg-text"]')?.innerText,
-      title: container.querySelector('img[alt]')?.getAttribute('alt'),
-      price: container.querySelector('.p13n-sc-price, .a-size-base.a-color-price')?.innerText,
-      url: 'https://www.amazon.com/dp/' + el.getAttribute('data-asin')
-    }
-  }).filter(r => r.rank)
-""")
-```
-
-Note: Title comes from the product image `alt` attribute — the text title elements use obfuscated CSS module class names that change between deployments.
-
-## Pagination
-
-```python
-# Get next page URL directly
-next_url = js("document.querySelector('.s-pagination-next')?.href")
-if next_url:
-    goto_url(next_url)
-    wait_for_load()
-    wait(2)
-
-# Or construct by page number
-goto_url("https://www.amazon.com/s?k=wireless+mouse&page=2")
-```
-
-## Result Count
-
-```python
-count_text = js("document.querySelector('[data-component-type=\"s-result-info-bar\"] h1')?.innerText?.trim()")
-# Returns e.g.: '1-16 of over 40,000 results for "wireless mouse"\nSort by:\n...'
-# Extract just the count: count_text.split('\n')[0]
-```
-
-## CAPTCHA Detection
-
-No CAPTCHA was encountered during testing with a logged-in Chrome session. To detect defensively:
-
-```python
-def check_captcha():
-    text = js("document.body.innerText.slice(0,500)") or ""
-    url  = page_info()["url"]
-    return (
-        "captcha" in text.lower()
-        or "enter the characters" in text.lower()
-        or "sorry, we just need to make sure" in text.lower()
-        or "captcha" in url.lower()
-        or "validateCaptcha" in url
-    )
-
-if check_captcha():
-    raise RuntimeError("Amazon CAPTCHA hit — stop and notify user")
-```
-
-Amazon may serve a CAPTCHA on fresh/anonymous sessions. Using the browser's existing logged-in session avoids this in practice.
-
-## Gotchas
-
-- **`goto_url()` silent failure**: On first visit, use `new_tab(url)` instead. After the tab is on Amazon, `goto_url()` works.
-- **`.zg-item-immersion` is gone**: Best Sellers page uses CSS module classes (obfuscated). Use `[data-asin]` + `img[alt]` for title.
-- **`.a-size-base.s-underline-text` is unreliable for review count**: On sponsored results it shows unrelated text (e.g. "Xbox"). Use `[aria-label*="ratings"]` instead.
-- **`#priceblock_ourprice` is legacy**: Returns `null` on modern pages. Construct from `.a-price-whole` + `.a-price-fraction`.
-- **Sponsored results appear first**: First 2-3 results are almost always `is_sponsored: true`. Filter them out with `!el.querySelector('.puis-sponsored-label-text')` when you need organic results.
-- **`data-asin` can be empty string on non-product rows**: Filter with `.filter(r => r.asin)`.
-- **Price split DOM**: `.a-price-whole` innerText includes a trailing `\n.` — strip it: `.replace(/[\n.]/g,'')`.
-- **ASIN from URL**: Use `/dp/([A-Z0-9]{10})/` regex on the product URL. `data-asin` on search results is always the canonical ASIN.
-- **`?th=1` redirect**: Amazon appends `?th=1` (and sometimes `?psc=1`) to product URLs after redirect. This is normal — `input[name="ASIN"]` always has the clean ASIN.
-- **Wait 2s after `wait_for_load()`**: Amazon search results load the listing cards asynchronously. `readyState=complete` fires before cards render. A hard 2s wait is required.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/archive-org/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/archive-org/scraping.md
deleted file mode 100644
index 692a00aae..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/archive-org/scraping.md
+++ /dev/null
@@ -1,341 +0,0 @@
-# Internet Archive / Wayback Machine — Scraping & Data Extraction
-
-`https://archive.org` / `https://web.archive.org` — all public data, no auth required. Every workflow here is pure `http_get` — no browser needed.
-
-## Do this first
-
-**Use the CDX API for anything Wayback-related — it is the reliable workhorse. The Wayback Availability API (`/wayback/available`) is known to return empty `archived_snapshots` even for well-archived URLs and should not be used as a primary mechanism.**
-
-```python
-import json
-
-# Find snapshots of any URL — primary entry point for Wayback data
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=iana.org&output=json&limit=5"
-    "&fl=timestamp,original,statuscode,mimetype,length",
-    timeout=40.0
-)
-rows = json.loads(r)
-headers = rows[0]   # ['timestamp', 'original', 'statuscode', 'mimetype', 'length']
-for row in rows[1:]:
-    ts, orig, status, mime, length = row
-    snap_url = f"https://web.archive.org/web/{ts}/{orig}"
-    print(f"{ts}  {status}  {snap_url}")
-```
-
-For item metadata (books, video, audio, software), go straight to:
-
-```python
-data = json.loads(http_get("https://archive.org/metadata/{identifier}", timeout=30.0))
-```
-
-## Common workflows
-
-### Find the nearest archived snapshot to a target date
-
-```python
-import json
-
-# CDX sort=closest returns the single snapshot nearest to the given timestamp
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=iana.org&output=json&limit=1"
-    "&fl=timestamp,original,statuscode"
-    "&closest=20230601120000&sort=closest",
-    timeout=60.0   # CDX can be slow — always use timeout >= 40s
-)
-rows = json.loads(r)
-# rows[0] = header, rows[1] = closest snapshot
-ts, orig, status = rows[1]
-snap_url = f"https://web.archive.org/web/{ts}/{orig}"
-# Result: ts='20230601114925', orig='https://www.iana.org/', status='200'
-# snap_url: https://web.archive.org/web/20230601114925/https://www.iana.org/
-```
-
-Timestamp format is always 14-digit `YYYYMMDDHHMMSS`. Pass any prefix — `20230601` (day), `202306` (month), `2023` (year) — and CDX will match.
-
-### List all monthly snapshots for a URL (collapsed)
-
-```python
-import json
-
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=iana.org&output=json"
-    "&collapse=timestamp:6"   # :6 = dedupe by YYYYMM (one per month)
-    "&from=20230101&to=20231231"
-    "&fl=timestamp,original",
-    timeout=60.0
-)
-rows = json.loads(r)
-# rows[0] = header ['timestamp', 'original']
-# rows[1:] = one row per month:
-# ['20230101103807', 'https://www.iana.org/']
-# ['20230201144829', 'https://www.iana.org/']
-# ...12 rows for 2023
-
-for ts, orig in rows[1:]:
-    print(f"{ts[:4]}-{ts[4:6]}  https://web.archive.org/web/{ts}/{orig}")
-```
-
-`collapse=timestamp:N` deduplicates by the first N digits of the timestamp:
-- `:4` = one per year, `:6` = one per month, `:8` = one per day
-
-### List snapshots for an entire domain (all pages)
-
-```python
-import json
-
-# matchType=domain captures all URLs under that domain
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=iana.org&matchType=domain&output=json"
-    "&limit=10&fl=timestamp,original,statuscode"
-    "&collapse=timestamp:8",  # one capture per URL per day
-    timeout=60.0
-)
-rows = json.loads(r)
-for row in rows[1:]:
-    print(row)
-# ['19971210061738', 'http://www.iana.org:80/', '200']
-# ['19980211065537', 'http://www.iana.org:80/', '200']
-# ...
-```
-
-`matchType` options: `exact` (default), `prefix` (URL + subpaths), `host` (all subdomains), `domain` (host + all subdomains).
-
-### Filter snapshots by prefix path
-
-```python
-import json
-
-# All archived pages under /domains/ path
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=iana.org/domains/&matchType=prefix&output=json"
-    "&limit=5&fl=timestamp,original,statuscode",
-    timeout=40.0
-)
-rows = json.loads(r)
-for row in rows[1:]:
-    print(row)
-# ['20080509121811', 'http://www.iana.org/domains/', '200']
-# ['20080704174537', 'http://iana.org/domains/', '200']
-```
-
-### Paginate CDX results with resumeKey
-
-```python
-import json
-from urllib.parse import quote
-
-def cdx_all_snapshots(url, fl="timestamp,original,statuscode", page_size=500):
-    """Iterate all CDX records for a URL, yielding rows (excluding header)."""
-    base = (
-        f"https://web.archive.org/cdx/search/cdx"
-        f"?url={quote(url, safe='')}&output=json"
-        f"&fl={fl}&limit={page_size}&showResumeKey=true"
-    )
-    resume_key = None
-    while True:
-        endpoint = base if resume_key is None else f"{base}&resumeKey={quote(resume_key)}"
-        rows = json.loads(http_get(endpoint, timeout=60.0))
-        # rows structure with showResumeKey=true:
-        # [header, row1, row2, ..., [], [resume_key_string]]
-        # The second-to-last row is [] (separator), last row is [resume_key]
-        has_resume = len(rows) >= 2 and rows[-1] != [] and rows[-2] == []
-        data_rows = rows[1:-2] if has_resume else rows[1:]
-        for row in data_rows:
-            yield row
-        if not has_resume:
-            break
-        resume_key = rows[-1][0]
-
-for row in cdx_all_snapshots("iana.org", fl="timestamp,original"):
-    ts, orig = row
-    # process...
-```
-
-### Retrieve the actual archived page
-
-```python
-# Direct snapshot URL: /web/{14-digit-timestamp}/{original-url}
-snap_url = "https://web.archive.org/web/19971210061738/http://www.iana.org:80/"
-content = http_get(snap_url, timeout=30.0)
-# Returns the archived HTML with Wayback toolbar injected at top
-# The toolbar is inside <!-- BEGIN WAYBACK TOOLBAR INSERT --> comments
-
-# The calendar view URL pattern (for browser navigation, not http_get):
-# https://web.archive.org/web/20230101000000*/python.org
-# The * tells Wayback to show the calendar — returns HTML, not raw page
-```
-
-### Item metadata (books, video, audio, software, collections)
-
-```python
-import json
-from urllib.parse import quote
-
-identifier = "HardWonWisdomTrailer"
-data = json.loads(http_get(f"https://archive.org/metadata/{identifier}", timeout=30.0))
-
-# Top-level keys:
-# alternate_locations, created, d1, d2, dir, files, files_count,
-# is_collection, item_last_updated, item_size, metadata, server, uniq, workable_servers
-
-meta = data['metadata']
-# Common metadata fields (not all present on every item):
-print(meta.get('identifier'))   # 'HardWonWisdomTrailer'
-print(meta.get('title'))        # 'Hard Won Wisdom Trailer'
-print(meta.get('mediatype'))    # 'movies' | 'texts' | 'audio' | 'software' | 'collection'
-print(meta.get('creator'))      # 'jakemauz'
-print(meta.get('date'))         # '2017-02-18'
-print(meta.get('description'))  # HTML string — strip tags if needed
-print(meta.get('subject'))      # str OR list of str depending on item
-print(meta.get('publicdate'))   # '2017-02-18 11:51:16'
-print(meta.get('collection'))   # parent collection identifier
-
-files = data['files']
-# Each file entry:
-# name, source ('original'|'derivative'|'metadata'), format, size (bytes as str),
-# md5, sha1, crc32, mtime
-# For video/audio: length (seconds as str), height, width
-# For derivative: original (name of source file)
-
-# Find the primary original file
-orig_files = [f for f in files if f.get('source') == 'original']
-# orig_files[0]: {'name': 'Hard-won wisdom trailer.mp4', 'source': 'original',
-#  'format': 'MPEG4', 'size': '7532153', 'length': '94.13',
-#  'height': '360', 'width': '640', 'md5': 'aaeebe0481...', ...}
-
-# Build download URL — two equivalent forms:
-server = data['server']      # 'ia601405.us.archive.org'
-dir_path = data['dir']       # '/2/items/HardWonWisdomTrailer'
-fname = orig_files[0]['name']
-from urllib.parse import quote as urlquote
-# Form 1: direct storage server (fastest)
-url1 = f"https://{server}{dir_path}/{urlquote(fname)}"
-# Form 2: standard redirect URL (always works, resolved by CDN)
-url2 = f"https://archive.org/download/{identifier}/{urlquote(fname)}"
-# Both confirmed status 200, Content-Type: video/mp4
-```
-
-### Search items (books, audio, video, software)
-
-```python
-import json
-
-# advancedsearch.php is the correct API — /search returns HTML
-r = http_get(
-    "https://archive.org/advancedsearch.php"
-    "?q=artificial+intelligence+AND+mediatype:texts"
-    "&fl[]=identifier&fl[]=title&fl[]=creator&fl[]=date&fl[]=downloads"
-    "&rows=5&sort[]=downloads+desc&output=json",
-    timeout=30.0
-)
-data = json.loads(r)
-# data['responseHeader']['status'] = 0 (success)
-# data['responseHeader']['QTime'] = query time ms
-# data['response']['numFound'] = 25911 (total matches)
-# data['response']['start'] = 0 (offset)
-# data['response']['docs'] = list of item dicts
-
-resp = data['response']
-print(f"Total: {resp['numFound']}, showing: {len(resp['docs'])}")
-for doc in resp['docs']:
-    print(f"  {doc['identifier']}  {doc.get('title', '')[:50]}")
-    # doc fields are only present if they have values — always use .get()
-```
-
-Pagination: use `start=` offset (not `page=`). Max `rows=` is not documented but 100 works reliably.
-
-### Search with all supported parameters
-
-```python
-import json
-
-r = http_get(
-    "https://archive.org/advancedsearch.php"
-    "?q=machine+learning+AND+mediatype:texts"  # Lucene query syntax
-    "&fl[]=identifier&fl[]=title&fl[]=date&fl[]=year"
-    "&fl[]=creator&fl[]=subject&fl[]=description&fl[]=downloads"
-    "&rows=3"
-    "&start=0"               # pagination offset
-    "&sort[]=date+desc"      # sort field + direction
-    "&output=json",
-    timeout=30.0
-)
-data = json.loads(r)
-# Confirmed fields in fl[]:
-# identifier, title, date, year, creator, subject, description,
-# downloads, mediatype, collection, language, avg_rating, num_reviews
-
-# mediatype values: texts, audio, movies, software, image, etree, data, collection, account
-# Sort fields: date, downloads, avg_rating, num_reviews, publicdate, addeddate
-```
-
-## API reference
-
-| Endpoint | What it returns | Auth |
-|---|---|---|
-| `web.archive.org/cdx/search/cdx?url=...&output=json` | Snapshot index: all captures of a URL | None |
-| `archive.org/wayback/available?url=...` | Nearest snapshot (DEGRADED — see gotchas) | None |
-| `archive.org/metadata/{identifier}` | Item metadata + files list | None |
-| `archive.org/advancedsearch.php?q=...&output=json` | Full-text + metadata search | None |
-| `archive.org/download/{identifier}/{filename}` | Direct file download | None |
-| `web.archive.org/web/{timestamp}/{url}` | Archived page HTML | None |
-
-## CDX field reference
-
-The CDX API returns a JSON array of arrays. The first row is always the header when `output=json`.
-
-| Field | Description | Example |
-|---|---|---|
-| `urlkey` | SURT-format URL (reversed domain, path in parens) | `org,iana)/` |
-| `timestamp` | Capture time, 14-digit `YYYYMMDDHHMMSS` | `19971210061738` |
-| `original` | Original crawled URL (exact, including port) | `http://www.iana.org:80/` |
-| `mimetype` | Content-Type of the archived response | `text/html` |
-| `statuscode` | HTTP status at crawl time | `200` |
-| `digest` | SHA-1 of response body, base32-encoded | `I4YBMQ6PHPWE2TD6TIXNWHZB6MXRNTSR` |
-| `length` | Content length in bytes (as string) | `1418` |
-
-Default `fl=` when omitted: `urlkey,timestamp,original,mimetype,statuscode,digest,length` (all 7 fields in that order).
-
-## Rate limits
-
-No auth, no API key. In practice:
-- CDX API: **intermittently slow** — individual queries time out at 20s and succeed at 40–60s. Always use `timeout=40.0` minimum. 3 rapid sequential CDX calls in ~10s completed; 10 rapid calls produced 3 timeouts.
-- Metadata API: Fast and reliable — 5 sequential calls completed in 3.0s with no errors.
-- Search API: Fast — typically responds in 30–65ms (`QTime` in response header).
-- No documented per-second or per-day limits. Archive.org's policy is to be respectful: add `time.sleep(1)` between CDX calls in loops.
-
-## Gotchas
-
-- **CDX times out — always set `timeout=40.0` or higher.** The default 20s is often too short for CDX. Metadata and search APIs are fine at 20–30s. CDX slowness is backend-side and unpredictable; add retry logic for production use.
-
-- **Wayback Availability API is unreliable.** `GET /wayback/available?url=iana.org` returns `{"url": "iana.org", "archived_snapshots": {}}` even for URLs confirmed archived via CDX. Tested 2026-04-18 across many URLs and timestamp combinations — consistently empty. Use `CDX ?sort=closest&limit=1` instead (confirmed working).
-
-- **CDX first row is always the header when `output=json`.** `rows[0]` is `['timestamp', 'original', ...]`, not a data row. Always slice `rows[1:]` for data. When `showResumeKey=true`, the last two rows are `[]` (separator) and `['<resume_key_string>']`.
-
-- **CDX `fl=` must match exactly what you iterate.** If you request `&fl=timestamp,original` you get 2-element rows; forgetting a field breaks destructuring. When in doubt, omit `fl=` entirely and get all 7 fields.
-
-- **`output=json` is required — there is no default JSON mode.** Omitting `output=json` returns space-separated text. `output=text` also works and is slightly faster for simple queries.
-
-- **`timestamp` is a string, not an integer.** Even in JSON, CDX returns all fields as strings: `'1418'` not `1418`, `'200'` not `200`. Cast explicitly: `int(row[4])`, `int(row[6])`.
-
-- **The `original` field preserves port numbers.** Old crawls captured `http://www.iana.org:80/` — the `:80` is part of the URL. When building a playback URL, use `original` verbatim: `f"https://web.archive.org/web/{ts}/{orig}"` works correctly with the port included.
-
-- **Metadata `{}` means the item doesn't exist or is private.** `http_get("https://archive.org/metadata/nonexistent")` returns `'{}'` (2-byte response) with HTTP 200. Always check `if not data` or `if not data.get('metadata')` before accessing fields.
-
-- **Metadata `subject` can be a string or a list.** When a single subject tag is set, the API returns `"subject": "short film"`. When multiple, it returns `"subject": ["short film", "spoken word"]`. Normalize with: `subjects = [meta['subject']] if isinstance(meta.get('subject'), str) else meta.get('subject', [])`.
-
-- **File `size` and `length` are strings, not numbers.** `files[0]['size']` is `'7532153'` (bytes). `files[0]['length']` is `'94.13'` (seconds for video/audio). Cast with `int()` and `float()` respectively.
-
-- **Use `archive.org/download/` not the raw storage server URL for reliability.** The raw URL (`ia601405.us.archive.org/2/items/...`) is faster but server-specific. `archive.org/download/{id}/{file}` redirects to the correct storage node and remains stable as items migrate.
-
-- **`/search?output=json` returns HTML, not JSON.** The `/search` endpoint is a React SPA — it ignores `output=json`. Always use `advancedsearch.php` for programmatic access.
-
-- **`collapse=timestamp:6` gives one row per month, but it keeps the FIRST capture of that month.** If you want the last, you'd need to reverse and re-collapse, or fetch all and filter client-side. The `collapse` parameter de-duplicates by truncating the timestamp to N digits and keeping the first matching row.
-
-- **CDX `from=` / `to=` accept partial timestamps.** `from=20230101` means `20230101000000`. `to=20231231` means `20231231000000` (exclusive). To include all of 2023, use `to=20240101`.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/arxiv-bulk/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/arxiv-bulk/scraping.md
deleted file mode 100644
index d10adc117..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/arxiv-bulk/scraping.md
+++ /dev/null
@@ -1,333 +0,0 @@
-# arXiv Bulk Harvest + Semantic Scholar — OAI-PMH & Citation Enrichment
-
-Companion to `domain-skills/arxiv/scraping.md`. Use the **arxiv** skill for search-and-fetch workflows. Use **this skill** when you need:
-
-- Bulk-harvesting all papers in a subject area or date window (OAI-PMH)
-- Citation counts, influential-citation scores, and cross-database IDs (Semantic Scholar)
-- Per-paper version history and submitter info (`arXivRaw` metadata)
-
-No API key required for either endpoint. Both return JSON or XML over plain HTTP.
-
----
-
-## OAI-PMH bulk harvest
-
-### Endpoint (confirmed 2026-04-19)
-
-```
-https://oaipmh.arxiv.org/oai
-```
-
-`https://export.arxiv.org/oai2` is the old URL — it 301-redirects to the new one. Use the new URL directly to avoid the extra round-trip.
-
-### Harvest all cs papers from a date window
-
-```python
-import xml.etree.ElementTree as ET
-from helpers import http_get
-
-OAI_NS = {
-    'oai': 'http://www.openarchives.org/OAI/2.0/',
-    'arXiv': 'http://arxiv.org/OAI/arXiv/',
-}
-
-def fetch_oai_page(url):
-    """Fetch one OAI-PMH page; return (records_xml_list, next_token_or_None)."""
-    xml = http_get(url)
-    root = ET.fromstring(xml)
-    records = root.findall('.//oai:record', OAI_NS)
-    token_el = root.find('.//oai:resumptionToken', OAI_NS)
-    token = token_el.text if token_el is not None and token_el.text else None
-    return records, token
-
-def parse_arxiv_record(rec):
-    """Extract fields from one <record> element (metadataPrefix=arXiv)."""
-    header = rec.find('oai:header', OAI_NS)
-    meta   = rec.find('.//arXiv:arXiv', OAI_NS)
-    if meta is None:
-        return None   # deleted record (header has status="deleted")
-    authors_el = meta.findall('arXiv:authors/arXiv:author', OAI_NS)
-    authors = []
-    for a in authors_el:
-        fn = (a.findtext('arXiv:forenames', namespaces=OAI_NS) or '').strip()
-        ln = (a.findtext('arXiv:keyname',   namespaces=OAI_NS) or '').strip()
-        authors.append(f"{fn} {ln}".strip())
-    return {
-        'id':           meta.findtext('arXiv:id', namespaces=OAI_NS),
-        'datestamp':    header.findtext('oai:datestamp', namespaces=OAI_NS),
-        'created':      meta.findtext('arXiv:created',  namespaces=OAI_NS),
-        'updated':      meta.findtext('arXiv:updated',  namespaces=OAI_NS),
-        'title':        (meta.findtext('arXiv:title',    namespaces=OAI_NS) or '').strip(),
-        'authors':      authors,
-        'categories':   (meta.findtext('arXiv:categories', namespaces=OAI_NS) or '').split(),
-        'abstract':     (meta.findtext('arXiv:abstract',   namespaces=OAI_NS) or '').strip(),
-        'doi':          meta.findtext('arXiv:doi',         namespaces=OAI_NS),
-        'journal_ref':  meta.findtext('arXiv:journal-ref', namespaces=OAI_NS),
-        'license':      meta.findtext('arXiv:license',     namespaces=OAI_NS),
-    }
-
-# --- Main harvest loop ---
-import time
-
-BASE = 'https://oaipmh.arxiv.org/oai'
-first_url = (
-    f"{BASE}?verb=ListRecords"
-    f"&metadataPrefix=arXiv"
-    f"&set=cs"
-    f"&from=2024-01-01"
-    f"&until=2024-01-02"
-)
-
-papers = []
-url = first_url
-while url:
-    records, token = fetch_oai_page(url)
-    for rec in records:
-        p = parse_arxiv_record(rec)
-        if p:
-            papers.append(p)
-    print(f"  fetched {len(records)} records, total so far: {len(papers)}")
-    if token:
-        url = f"{BASE}?verb=ListRecords&resumptionToken={token}"
-        time.sleep(5)   # OAI-PMH policy: >=5s between pages
-    else:
-        url = None
-
-print(f"Done. {len(papers)} papers harvested.")
-# Confirmed output for cs, 2024-01-01 to 2024-01-02:
-# fetched 44 records, total so far: 44
-# Done. 44 papers harvested.
-# For 2024-01-01 to 2024-01-07 (cs): multiple pages, resumptionToken issued when >~200 records
-```
-
-### Available verbs
-
-| Verb | Purpose | Key params |
-|---|---|---|
-| `Identify` | Repository info, earliest datestamp (`2005-09-16`) | — |
-| `ListSets` | All harvestable sets (see table below) | — |
-| `ListMetadataFormats` | `oai_dc`, `arXiv`, `arXivOld`, `arXivRaw` | — |
-| `ListRecords` | Bulk harvest with date/set filter | `metadataPrefix`, `set`, `from`, `until` |
-| `GetRecord` | Single record by OAI identifier | `identifier`, `metadataPrefix` |
-
-### Top-level sets (confirmed)
-
-| setSpec | Name |
-|---|---|
-| `cs` | Computer Science (all) |
-| `cs:cs` | Computer Science (subset notation — same scope) |
-| `math` | Mathematics |
-| `physics` | Physics |
-| `stat` | Statistics |
-| `eess` | Electrical Engineering and Systems Science |
-| `econ` | Economics |
-| `q-bio` | Quantitative Biology |
-| `q-fin` | Quantitative Finance |
-
-Subset sets use `topic:topic:SUBCATEGORY` notation, e.g. `cs:cs:LG` for Machine Learning. List all with `verb=ListSets`.
-
-### Available metadata formats
-
-- `arXiv` — rich: id, created/updated dates, authors (keyname + forenames separately), categories, abstract, doi, journal-ref, license. **Use this.**
-- `arXivRaw` — adds `<submitter>`, per-version history (`<version version="v1">` with date and file size), author list as flat string. Use when you need version history.
-- `oai_dc` — Dublin Core, minimal. Skip unless you need cross-system compatibility.
-- `arXivOld` — legacy format pre-2007. Skip.
-
-### GetRecord + arXivRaw (version history)
-
-```python
-import xml.etree.ElementTree as ET
-from helpers import http_get
-
-RAW_NS = {
-    'oai': 'http://www.openarchives.org/OAI/2.0/',
-    'raw': 'http://arxiv.org/OAI/arXivRaw/',
-}
-
-xml = http_get(
-    "https://oaipmh.arxiv.org/oai"
-    "?verb=GetRecord"
-    "&metadataPrefix=arXivRaw"
-    "&identifier=oai:arXiv.org:1706.03762"
-)
-root = ET.fromstring(xml)
-meta = root.find('.//raw:arXivRaw', RAW_NS)
-
-title     = meta.findtext('raw:title',     namespaces=RAW_NS)
-submitter = meta.findtext('raw:submitter', namespaces=RAW_NS)
-versions  = meta.findall('raw:version',    RAW_NS)
-for v in versions:
-    print(v.get('version'), v.findtext('raw:date', namespaces=RAW_NS))
-# Confirmed output for 1706.03762 ("Attention Is All You Need"):
-# v1 Mon, 12 Jun 2017 17:57:34 GMT
-# v2 Mon, 19 Jun 2017 16:49:45 GMT
-# ...
-# v7 Wed, 02 Aug 2023 00:41:18 GMT
-# submitter: Llion Jones
-```
-
----
-
-## Semantic Scholar — citation enrichment for arXiv papers
-
-No API key required (unauthenticated: 1 req/s, 5000 req/day). With a free key the limit rises to 100 req/s.
-
-Base URL: `https://api.semanticscholar.org/graph/v1/`
-
-### Single paper lookup by arXiv ID
-
-```python
-import json
-from helpers import http_get
-
-paper = json.loads(http_get(
-    "https://api.semanticscholar.org/graph/v1/paper/arXiv:1706.03762"
-    "?fields=title,year,venue,publicationDate,citationCount,"
-    "influentialCitationCount,authors,abstract,externalIds"
-))
-print(paper['title'])                    # "Attention is All you Need"
-print(paper['citationCount'])            # 173155  (confirmed 2026-04-19)
-print(paper['influentialCitationCount']) # 19629
-print(paper['venue'])                    # "Neural Information Processing Systems"
-print(paper['externalIds']['ArXiv'])     # "1706.03762"
-print(paper['externalIds']['DOI'])       # missing if no DOI
-for a in paper['authors']:
-    print(a['name'], a['authorId'])
-```
-
-The ID format `arXiv:NNNN.NNNNN` is accepted directly — no conversion needed.
-
-### Batch lookup (up to 500 IDs per POST)
-
-```python
-import json
-from helpers import http_get
-import urllib.request
-
-ids = ["arXiv:1706.03762", "arXiv:1810.04805", "arXiv:2005.14165"]
-fields = "paperId,externalIds,title,year,citationCount,influentialCitationCount"
-
-body = json.dumps({"ids": ids}).encode()
-req = urllib.request.Request(
-    f"https://api.semanticscholar.org/graph/v1/paper/batch?fields={fields}",
-    data=body,
-    headers={"Content-Type": "application/json"},
-    method="POST",
-)
-with urllib.request.urlopen(req, timeout=20) as r:
-    results = json.loads(r.read())
-
-for p in results:
-    print(p['externalIds'].get('ArXiv'), p['citationCount'], p['title'][:50])
-# Confirmed output (2026-04-19):
-# 1706.03762  173155  Attention is All you Need
-# 1810.04805  113138  BERT: Pre-training of Deep Bidirectional Tran...
-# 2005.14165  (varies)  Language Models are Few-Shot Learners
-```
-
-Note: `helpers.http_get` only does GET. For POST use `urllib.request.Request` directly as above.
-
-### Paper search
-
-```python
-import json
-from helpers import http_get
-
-results = json.loads(http_get(
-    "https://api.semanticscholar.org/graph/v1/paper/search"
-    "?query=large+language+model"
-    "&fields=paperId,externalIds,title,year,citationCount"
-    "&limit=5"
-))
-total = results['total']   # e.g. 3473582 for "large language model"
-for p in results['data']:
-    arxiv_id = p['externalIds'].get('ArXiv', 'no-arxiv')
-    print(arxiv_id, p['year'], p['citationCount'], p['title'][:50])
-# next page: use offset=5, offset=10, etc.
-```
-
-### Available fields (pass as comma-separated `fields=` query param)
-
-| Field | Type | Notes |
-|---|---|---|
-| `paperId` | str | Semantic Scholar internal ID |
-| `externalIds` | dict | Keys: `ArXiv`, `DOI`, `DBLP`, `MAG`, `ACL`, `CorpusId` |
-| `title` | str | |
-| `abstract` | str | |
-| `year` | int | Publication year |
-| `publicationDate` | str | `YYYY-MM-DD` |
-| `venue` | str | Conference/journal name |
-| `citationCount` | int | Total citations |
-| `influentialCitationCount` | int | Citations deemed highly influential |
-| `authors` | list | Each: `{authorId, name}` |
-| `references` | list | List of paper objects (needs own `fields`) |
-| `citations` | list | Citing papers (needs own `fields`) |
-| `openAccessPdf` | dict | `{url, status, license}` |
-
----
-
-## Downloading PDFs
-
-Direct PDF download — no auth, no redirect for versionless URLs (returns 200 + PDF body directly).
-
-```python
-import urllib.request
-
-def download_pdf(arxiv_id, dest_path, version=None):
-    """
-    arxiv_id: bare ID like '1706.03762' or versioned '1706.03762v7'
-    version:  if given, appended as 'v{version}' — ignored if arxiv_id already has version
-    dest_path: where to save, e.g. '/tmp/paper.pdf'
-    """
-    if 'v' not in arxiv_id.split('.')[-1] and version:
-        arxiv_id = f"{arxiv_id}v{version}"
-    url = f"https://arxiv.org/pdf/{arxiv_id}"
-    req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
-    with urllib.request.urlopen(req, timeout=60) as r:
-        with open(dest_path, 'wb') as f:
-            f.write(r.read())
-    print(f"Saved {r.headers.get('content-length', '?')} bytes to {dest_path}")
-
-download_pdf('1706.03762', '/tmp/attention.pdf')
-# Confirmed: saves 2215244 bytes, filename hint in header: '1706.03762v7.pdf'
-# Versionless URL resolves to latest version server-side (no redirect, 200 direct)
-```
-
----
-
-## Gotchas
-
-- **OAI-PMH endpoint moved.** `https://export.arxiv.org/oai2` 301-redirects to `https://oaipmh.arxiv.org/oai`. Use the new URL. `helpers.http_get` (which uses `urllib`) does NOT follow redirects — you'll get an empty string or error. Either use `urllib.request.urlopen` with `follow_redirects` logic, or just use the canonical URL directly.
-
-- **OAI-PMH rate limit: 5 seconds between pages.** The protocol requires a `Retry-After` interval. The server embeds an `expirationDate` on the resumptionToken. Violating the rate limit causes the token to be invalidated and the harvest fails silently. Always `time.sleep(5)` between pages.
-
-- **Resumption token is opaque but URL-encoded.** The token looks like `verb%3DListRecords%26...%26skip%3D247`. Pass it verbatim as `&resumptionToken=<token>` — do not URL-encode it again.
-
-- **`datestamp` in OAI-PMH is last-modified date, not submission date.** A paper submitted in 2008 can appear in a 2024 harvest window if it was revised then. The `<created>` and `<updated>` fields inside `<arXiv>` metadata are the actual submission/revision dates.
-
-- **Deleted records have no `<metadata>` element.** The `<header>` will carry `status="deleted"`. Always check `meta is None` after `find('.//arXiv:arXiv', ...)`.
-
-- **Author structure differs between OAI-PMH formats.** In `arXiv` metadata, authors are structured: `<author><keyname>Vaswani</keyname><forenames>Ashish</forenames></author>`. In `arXivRaw`, they're a flat comma-separated string: `Ashish Vaswani, Noam Shazeer, ...`. In the Atom API, it's `<name>Ashish Vaswani</name>` (first-last order). Pick the source that matches your downstream use.
-
-- **Semantic Scholar 429 under unauthenticated bursts.** The unauthenticated limit is ~1 req/s. Rapid parallel calls return `{"code": "429"}`. Add `time.sleep(1)` between single lookups or use the batch POST endpoint (up to 500 IDs, single request) to stay under the limit. The batch endpoint itself counts as 1 request.
-
-- **Semantic Scholar `externalIds` may lack `ArXiv` key.** Not all papers have an arXiv preprint. When enriching an arXiv list with S2 data, always use `.get('ArXiv')` not `['ArXiv']`.
-
-- **Atom API rate limit: 1 request per 3 seconds for sustained crawls.** The API returns HTTP 429 `"Rate exceeded."` on rapid-fire requests. The OAI-PMH endpoint is designed for bulk and is more tolerant, but still requires the 5s sleep between resumption pages.
-
-- **OAI-PMH `set` param uses colon-separated hierarchy, not dot.** The Atom API uses `cat:cs.LG`; OAI-PMH uses `set=cs:cs:LG`. Using `set=cs.LG` returns zero results.
-
-- **`http_get` in helpers.py does NOT follow HTTP redirects.** If you must use it with the old OAI URL, you'll get an empty body. Either update the URL to the canonical one or use `urllib.request.urlopen` with a redirect handler.
-
----
-
-## How this complements the existing arxiv skill
-
-| Task | Use |
-|---|---|
-| Search by keyword, author, or category | `arxiv` skill — Atom API |
-| Fetch 1–2000 specific papers by ID | `arxiv` skill — `id_list` batch |
-| Harvest all papers in a subject over a date range | **this skill** — OAI-PMH |
-| Get citation counts / influential citations | **this skill** — Semantic Scholar |
-| Get per-version history and submitter name | **this skill** — OAI-PMH `arXivRaw` |
-| Download a PDF | either skill (same URL structure) |
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/arxiv/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/arxiv/scraping.md
deleted file mode 100644
index c731cd824..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/arxiv/scraping.md
+++ /dev/null
@@ -1,311 +0,0 @@
-# ArXiv — Scraping & Data Extraction
-
-`https://arxiv.org` — open-access preprint server. **Never use the browser for ArXiv.** All data is reachable via `http_get` using the Atom API or HTML meta tags. No API key required.
-
-## Do this first
-
-**Use the Atom API for any paper search or metadata fetch — one call, XML response, no auth.**
-
-```python
-import xml.etree.ElementTree as ET
-from helpers import http_get
-
-NS = {'atom': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
-
-xml = http_get("http://export.arxiv.org/api/query?search_query=ti:transformer+AND+cat:cs.LG&max_results=5&sortBy=submittedDate&sortOrder=descending")
-root = ET.fromstring(xml)
-entries = root.findall('atom:entry', NS)
-```
-
-Use `id_list` for known paper IDs — supports comma-separated batch fetch in a single call.
-
-Use `http_get` on `https://arxiv.org/abs/{id}` + regex for `citation_*` meta tags when you need the full abstract from an HTML page.
-
-## Common workflows
-
-### Search papers (API)
-
-```python
-import xml.etree.ElementTree as ET
-from helpers import http_get
-
-NS = {'atom': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
-
-xml = http_get(
-    "http://export.arxiv.org/api/query"
-    "?search_query=ti:transformer+AND+cat:cs.LG"
-    "&max_results=5&sortBy=submittedDate&sortOrder=descending"
-)
-root = ET.fromstring(xml)
-entries = root.findall('atom:entry', NS)
-for e in entries:
-    title     = e.find('atom:title', NS).text.strip().replace('\n', ' ')
-    arxiv_id  = e.find('atom:id', NS).text.split('/')[-1]   # e.g. '2604.15259v1'
-    published = e.find('atom:published', NS).text[:10]       # '2026-04-16'
-    updated   = e.find('atom:updated', NS).text[:10]
-    abstract  = e.find('atom:summary', NS).text.strip()
-    authors   = [a.find('atom:name', NS).text for a in e.findall('atom:author', NS)]
-    cats      = [c.get('term') for c in e.findall('atom:category', NS)]
-    primary   = e.find('arxiv:primary_category', NS).get('term')
-    comment   = e.find('arxiv:comment', NS)
-    pdf_link  = next((l.get('href') for l in e.findall('atom:link', NS) if l.get('title') == 'pdf'), None)
-    abs_link  = next((l.get('href') for l in e.findall('atom:link', NS) if l.get('rel') == 'alternate'), None)
-    print(arxiv_id, published, title[:60])
-    print("  Authors:", authors[:2])
-    print("  PDF:", pdf_link)
-# Confirmed output (2026-04-18):
-# 2604.15259v1 2026-04-16 Stability and Generalization in Looped Transformers
-#   Authors: ['Asher Labovich']
-#   PDF: https://arxiv.org/pdf/2604.15259v1
-```
-
-### Fetch single paper by ID (API)
-
-```python
-import xml.etree.ElementTree as ET
-from helpers import http_get
-
-NS = {'atom': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
-
-xml = http_get("http://export.arxiv.org/api/query?id_list=1706.03762")
-root = ET.fromstring(xml)
-e = root.find('atom:entry', NS)
-title      = e.find('atom:title', NS).text.strip()
-abstract   = e.find('atom:summary', NS).text.strip()
-categories = [c.get('term') for c in e.findall('atom:category', NS)]
-pdf_link   = next((l.get('href') for l in e.findall('atom:link', NS) if l.get('title') == 'pdf'), None)
-print("Title:", title)
-print("Categories:", categories)
-print("PDF:", pdf_link)
-print("Abstract:", abstract[:200])
-# Confirmed output:
-# Title: Attention Is All You Need
-# Categories: ['cs.CL', 'cs.LG']
-# PDF: https://arxiv.org/pdf/1706.03762v7
-# Abstract: The dominant sequence transduction models are based on complex recurrent...
-```
-
-### Batch fetch by comma-separated IDs (single call — fast)
-
-Fetching 10 IDs in one call takes ~2s. Prefer this over parallel single-ID fetches.
-
-```python
-import xml.etree.ElementTree as ET
-from helpers import http_get
-
-NS = {'atom': 'http://www.w3.org/2005/Atom'}
-
-ids = ['1706.03762', '1810.04805', '2005.14165']  # Transformer, BERT, GPT-3
-xml = http_get(f"http://export.arxiv.org/api/query?id_list={','.join(ids)}&max_results={len(ids)}")
-root = ET.fromstring(xml)
-for e in root.findall('atom:entry', NS):
-    arxiv_id  = e.find('atom:id', NS).text.split('/')[-1]
-    title     = e.find('atom:title', NS).text.strip()
-    published = e.find('atom:published', NS).text[:10]
-    print(arxiv_id, published, title[:60])
-# Confirmed output:
-# 1512.03385v1 2015-12-10 Deep Residual Learning for Image Recognition
-# 1706.03762v7 2017-06-12 Attention Is All You Need
-# 2005.14165v4 2020-05-28 Language Models are Few-Shot Learners
-# 1810.04805v2 2018-10-11 BERT: Pre-training of Deep Bidirectional Transformers...
-# Note: order returned may differ from order requested
-```
-
-### Parallel fetch (ThreadPoolExecutor for independent IDs)
-
-Use only when IDs are not known upfront or when mixing with other work. For pure batch, single comma-separated `id_list` call is faster.
-
-```python
-import xml.etree.ElementTree as ET
-from concurrent.futures import ThreadPoolExecutor
-from helpers import http_get
-
-NS = {'atom': 'http://www.w3.org/2005/Atom'}
-
-def fetch_paper(arxiv_id):
-    xml = http_get(f"http://export.arxiv.org/api/query?id_list={arxiv_id}")
-    root = ET.fromstring(xml)
-    e = root.find('atom:entry', NS)
-    if e is None:
-        return None
-    return {
-        'id': arxiv_id,
-        'title': e.find('atom:title', NS).text.strip(),
-        'published': e.find('atom:published', NS).text[:10],
-    }
-
-ids = ['1706.03762', '1810.04805', '2005.14165']
-with ThreadPoolExecutor(max_workers=3) as ex:
-    papers = list(ex.map(fetch_paper, ids))
-for p in papers:
-    print(p['id'], p['published'], p['title'][:60])
-# Confirmed working — max_workers=3 is safe; don't exceed 5 for continuous crawling
-```
-
-### HTML abstract page — citation_* meta tags
-
-Use this when you want the full abstract or the versionless PDF URL without parsing Atom XML.
-
-```python
-import re
-from helpers import http_get
-
-html = http_get("https://arxiv.org/abs/1706.03762", headers={"User-Agent": "Mozilla/5.0"})
-# HTML page is ~48 KB, fully static, no JS required
-
-title   = re.search(r'<meta name="citation_title" content="([^"]+)"', html)
-pdf_url = re.search(r'<meta name="citation_pdf_url" content="([^"]+)"', html)
-authors = re.findall(r'<meta name="citation_author" content="([^"]+)"', html)
-date    = re.search(r'<meta name="citation_date" content="([^"]+)"', html)
-arxiv_id = re.search(r'<meta name="citation_arxiv_id" content="([^"]+)"', html)
-abstract = re.search(r'<meta name="citation_abstract" content="([^"]+)"', html)
-
-print("Title:", title.group(1) if title else None)
-print("PDF:", pdf_url.group(1) if pdf_url else None)
-print("Authors:", authors[:3])
-print("Date:", date.group(1) if date else None)
-print("ID:", arxiv_id.group(1) if arxiv_id else None)
-# Confirmed output for 1706.03762:
-# Title: Attention Is All You Need
-# PDF: https://arxiv.org/pdf/1706.03762   (no version suffix — always latest)
-# Authors: ['Vaswani, Ashish', 'Shazeer, Noam', 'Parmar, Niki']
-# Date: 2017/06/12
-# ID: 1706.03762
-```
-
-All `citation_*` meta tags present on the abs page:
-- `citation_title` — paper title
-- `citation_author` — one tag per author, format `"Last, First"`
-- `citation_date` — submission date `YYYY/MM/DD`
-- `citation_online_date` — latest version date `YYYY/MM/DD`
-- `citation_pdf_url` — versionless PDF URL (redirects to latest)
-- `citation_arxiv_id` — bare ID without version suffix
-- `citation_abstract` — full abstract text
-
-### Category search with pagination
-
-```python
-import xml.etree.ElementTree as ET
-from helpers import http_get
-
-NS = {
-    'atom': 'http://www.w3.org/2005/Atom',
-    'opensearch': 'http://a9.com/-/spec/opensearch/1.1/',
-}
-
-# Page 1
-xml = http_get(
-    "http://export.arxiv.org/api/query"
-    "?search_query=cat:cs.AI"
-    "&max_results=10&start=0&sortBy=lastUpdatedDate&sortOrder=descending"
-)
-root = ET.fromstring(xml)
-total   = root.find('opensearch:totalResults', NS).text   # e.g. '172726'
-start_i = root.find('opensearch:startIndex', NS).text
-per_pg  = root.find('opensearch:itemsPerPage', NS).text
-print(f"Total cs.AI papers: {total}")  # Confirmed: 172726 (2026-04-18)
-
-entries = root.findall('atom:entry', NS)
-
-# Page 2: increment start
-xml2 = http_get(
-    "http://export.arxiv.org/api/query"
-    "?search_query=cat:cs.AI"
-    "&max_results=10&start=10&sortBy=lastUpdatedDate&sortOrder=descending"
-)
-```
-
-## URL and ID reference
-
-### API base URL
-
-```
-http://export.arxiv.org/api/query
-```
-
-HTTPS also works: `https://export.arxiv.org/api/query`
-
-### Query parameters
-
-| Parameter | Values | Notes |
-|---|---|---|
-| `search_query` | `ti:word`, `au:name`, `abs:phrase`, `cat:cs.LG`, combine with `AND`/`OR`/`ANDNOT` | URL-encode spaces as `+` |
-| `id_list` | `1706.03762` or `1706.03762,1810.04805` | Comma-separated; version suffix optional |
-| `max_results` | integer (default 10, max 2000) | |
-| `start` | integer (default 0) | Offset for pagination |
-| `sortBy` | `relevance`, `lastUpdatedDate`, `submittedDate` | |
-| `sortOrder` | `ascending`, `descending` | |
-
-### Search field prefixes
-
-| Prefix | Searches |
-|---|---|
-| `ti:` | Title |
-| `au:` | Author name |
-| `abs:` | Abstract |
-| `co:` | Comment |
-| `jr:` | Journal reference |
-| `cat:` | Category (e.g. `cat:cs.LG`) |
-| `all:` | All fields |
-
-### PDF and abstract URL construction
-
-```python
-import re
-
-arxiv_id = "1706.03762v7"                        # from API atom:id field
-bare_id = re.sub(r'v\d+$', '', arxiv_id)          # strip version: '1706.03762'
-
-pdf_versioned   = f"https://arxiv.org/pdf/{arxiv_id}"   # specific version
-pdf_latest      = f"https://arxiv.org/pdf/{bare_id}"    # always redirects to latest
-abs_versioned   = f"https://arxiv.org/abs/{arxiv_id}"
-abs_latest      = f"https://arxiv.org/abs/{bare_id}"
-```
-
-The API's `atom:link[@title='pdf']` href includes the version suffix. The HTML `citation_pdf_url` meta tag does not — it always resolves to the latest.
-
-### Category codes (confirmed paper counts, 2026-04-18)
-
-| Code | Area | Papers |
-|---|---|---|
-| `cs.LG` | Machine Learning | 261,782 |
-| `cs.CV` | Computer Vision | 189,049 |
-| `cs.AI` | Artificial Intelligence | 172,726 |
-| `cs.CL` | Computation and Language (NLP) | 106,724 |
-| `stat.ML` | Statistics - Machine Learning | 76,902 |
-| `math.OC` | Optimization and Control | 60,669 |
-| `eess.AS` | Audio and Speech Processing | 21,288 |
-| `cs.NE` | Neural and Evolutionary Computing | 17,475 |
-| `q-bio.NC` | Neurons and Cognition | 11,903 |
-
-Full category taxonomy: https://arxiv.org/category_taxonomy
-
-## Gotchas
-
-- **Never use the browser for ArXiv.** The abstract page (`/abs/`) and search results are fully server-side rendered static HTML. `http_get` is sufficient for everything including full abstracts, author lists, and PDF URLs.
-
-- **Always define the namespace dict.** Without `NS = {'atom': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}`, `findall('atom:entry')` silently returns `[]`. All ArXiv Atom elements live in the `http://www.w3.org/2005/Atom` namespace; ArXiv-specific fields (`comment`, `primary_category`, `journal_ref`, `doi`) live in `http://arxiv.org/schemas/atom`.
-
-- **Batch single `id_list` call is faster than ThreadPoolExecutor.** A comma-separated `id_list` with 10 IDs resolved in one call (1.91s) vs. 10 separate `ThreadPoolExecutor` calls (6.34s). Use the batch form when you already have the IDs.
-
-- **`atom:id` contains a URL, not a bare ID.** The element text is `http://arxiv.org/abs/1706.03762v7` — always split on `/` and take `[-1]` to get the bare ID with version. Strip version with `re.sub(r'v\d+$', '', id)` if needed.
-
-- **Batch `id_list` returns entries in unpredictable order.** When fetching `1706.03762,1810.04805,2005.14165`, entries came back ordered by publication date, not by the order given in the request. Index by ID, not position.
-
-- **`max_results` must be set explicitly when using `id_list` batches.** If you request 10 IDs but omit `max_results`, the API defaults to 10, which happens to work — but set it explicitly to `len(ids)` to be safe.
-
-- **Nonexistent IDs return zero entries, not an error.** `id_list=9999.99999` gives `totalResults=0` and an empty `atom:entry` list. Always check `len(entries) > 0` before accessing `entries[0]`.
-
-- **`arxiv:comment` and `arxiv:journal_ref` / `arxiv:doi` may be absent.** Not all papers have these fields. Use `e.find('arxiv:comment', NS)` and check `if el is not None and el.text`.
-
-- **Rate limit: 3 seconds between requests recommended for bulk crawling.** In practice, rapid bursts of 10 individual requests complete in ~6s (avg 0.63s/req) without being blocked. For sustained crawls over hundreds of papers, insert `time.sleep(3)` between requests. The API does not return rate limit headers — it just starts slowing responses or returns HTTP 503 silently.
-
-- **`citation_author` tags are in `"Last, First"` format**, not `"First Last"` like the Atom API. The Atom `atom:author/atom:name` field gives `"First Last"` order. Pick the format that matches your downstream use.
-
-- **The `arxiv:affiliation` sub-element of `atom:author` is rarely populated.** Most institutional affiliations are absent from the API response even when listed on the paper. The HTML abs page doesn't expose them in meta tags either.
-
-- **`sortBy=relevance` applies only with `search_query`.** Using `sortBy=relevance` with `id_list` has no effect — results still come back in date order.
-
-- **`max_results` cap is 2000 per call.** For bulk harvesting of a category, use `start` offset pagination and add 3s sleep between pages. `opensearch:totalResults` tells you the total so you can compute how many pages are needed.
-
-- **HTML `citation_abstract` meta tag contains the full abstract.** Unlike the Atom `atom:summary` which can have trailing whitespace and embedded newlines, the meta tag version is a single clean string — no `.strip()` needed.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/atlas/overview.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/atlas/overview.md
deleted file mode 100644
index 7b0236318..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/atlas/overview.md
+++ /dev/null
@@ -1,70 +0,0 @@
----
-name: atlas-recruit
-description: Atlas recruitment platform (my.recruitwithatlas.com) — routes, filters, GraphQL bootstrap for authenticated UI probes.
----
-
-# Atlas — my.recruitwithatlas.com
-
-Gated recruitment SaaS. Auth via Google SSO (WebAuthn/passkey). GraphQL backend at `/graphql` (NextAuth session cookie, `credentials: 'include'` from the tab).
-
-## Routes
-
-| Route | What |
-|---|---|
-| `/home` | Dashboard (default landing after login) |
-| `/sign-in` | Redirect target when unauthenticated |
-| `/business-development/opportunities` | BD opportunities (kanban / list view) |
-| `/business-development/leads` | Leads |
-| `/business-development/prospects` | Prospects |
-| `/business-development/playbook` | Playbook |
-| `/candidates` | Candidate pipeline |
-| `/projects/<id>` | Specific job / project |
-| `/graphql` | Authenticated GraphQL endpoint (POST) |
-
-## Filters in URL
-
-BD opportunities uses `?filters=[JSON]` (URL-encoded). Example "Me" filter:
-
-```json
-[{"id":"opportunity_owner","selectedOptions":[{"id":"<USER_UUID>","title":"Me","excludeFromSearch":false}]}]
-```
-
-Filter IDs seen: `opportunity_owner`, `stage`, `industry`, `segment`, `conversion_probability`.
-
-## Finding your own user UUID
-
-- Apply a filter like "owner = Me" in `/business-development/opportunities`, then read `selectedOptions[0].id` out of the URL `filters=` param.
-- Or: `query { me { id email } }` via the GraphQL endpoint (see below).
-- User UUIDs are tenant-stable; keep them in a local secret store, not in this shared skill.
-
-## Stages (BD funnel)
-
-`Identified` → `Initial Outreach` → `Late Stage` → `Converted` → `Archived`. Seen as tab labels on `/business-development/opportunities`.
-
-## Auth quirks
-
-- Google SSO flows through `accounts.google.com/signin/oauth/id?...` — passkey / WebAuthn only, no password fallback visible.
-- Session state lives in multiple cookies (JWE session + CSRF). Injecting only the JWE into a fresh Chrome profile is **not sufficient** for UI access — you land in a login loop. For UI work: log in once inside a persistent Chrome profile and let all cookies settle. For backend-only GraphQL calls: the `__Secure-authjs.session-token` JWE alone is enough when sent with `cookie: __Secure-authjs.session-token=<jwe>` from an external HTTP client.
-
-## GraphQL endpoint
-
-POST `https://my.recruitwithatlas.com/graphql` using the tab's own cookies:
-
-```python
-js("""
-fetch('/graphql', {
-  method: 'POST',
-  headers: {'Content-Type': 'application/json', 'apollo-require-preflight': 'true'},
-  credentials: 'include',
-  body: JSON.stringify({query: 'query { me { id email } }'})
-}).then(r => r.json()).then(j => JSON.stringify(j))
-""")
-```
-
-This reuses the session cookies of the current tab — no JWE juggling needed when browsing from inside browser-harness.
-
-Known mutations (verified against production schema, April 2026): `opportunityCreate`, `opportunityUpdate`, `companyCreate`, `projectCreate`, `projectUpdate`, `opportunityAddLead`, `createOpportunityNote`. Create mutations return placeholder names; follow with an `opportunityUpdate` / `projectUpdate` to set the final name or description. `opportunityAddLead` side-effects `Project.company` onto `Opportunity.targetCompany` when the opp had none.
-
-## Page titles
-
-The app sets a green-dot emoji prefix on titles: `🟢 Atlas Agency` (sign-in), `🟢 Business development` (BD overview), etc. Useful for `wait_for` conditions — the emoji is consistent across routes.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/booking-com/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/booking-com/scraping.md
deleted file mode 100644
index 1e9eaaa9b..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/booking-com/scraping.md
+++ /dev/null
@@ -1,578 +0,0 @@
-# Booking.com — Scraping & Data Extraction
-
-Field-tested against booking.com on 2026-04-18 using `http_get` and the
-`dml/graphql` JSON API. All tests run without a browser session.
-
----
-
-## TL;DR
-
-**`http_get` returns nothing useful from booking.com.** Every HTML page —
-search results, hotel pages, city pages, the homepage — is intercepted by an
-AWS WAF JS challenge before any content is served. The challenge requires
-JavaScript execution to complete a cryptographic puzzle and set an
-`aws-waf-token` cookie. Without a real browser, you get a ~4-8 KB stub page.
-
-**What you can do without a browser:**
-- Enumerate hotel/city/region URLs from XML sitemaps (Googlebot UA required).
-- Read `robots.txt` for URL pattern documentation.
-- Query the GraphQL endpoint `https://www.booking.com/dml/graphql` for schema
-  exploration (no auth = internal errors, but validation errors reveal the
-  schema).
-
-**For all actual data extraction, use the browser (`goto` + `js`).**
-
----
-
-## AWS WAF JS Challenge — What It Is
-
-Every `http_get` request to `www.booking.com` receives one of two variants of
-a WAF stub:
-
-**Variant A (~3,962 bytes) — modern SDK:**
-```html
-<script src="https://www.booking.com/__challenge_{KEY}/{HASH}/challenge.js"></script>
-<script>
-  AwsWafIntegration.getToken().then(() => { window.location.href = newHref; });
-</script>
-```
-
-**Variant B (~8,410 bytes) — with AJAX error reporting:**
-Same AWS WAF SDK, plus an `XMLHttpRequest`-based error reporter that POSTs to
-`https://reports.booking.com/chal_report`. This variant is more common on
-non-browser UA strings.
-
-**Detection in your code:**
-```python
-def is_waf_blocked(html: str) -> bool:
-    return (
-        'AwsWafIntegration' in html
-        or 'awsWafCookieDomainList' in html
-        or 'challenge.js' in html
-        or len(html) < 10_000 and '<title></title>' in html
-    )
-```
-
-**What the challenge does:**
-1. Loads a 1.3 MB obfuscated JS file (`challenge.js`) from a path-keyed URL.
-2. Executes a cryptographic proof-of-work puzzle client-side.
-3. Sets an `aws-waf-token` cookie on the `booking.com` domain.
-4. Redirects to the original URL with `?chal_t={timestamp}&force_referer=`
-   appended.
-
-This challenge **cannot be solved by `http_get`**. It requires a real JS
-engine. A `bkng` session cookie is set on the first blocked response, but it
-has no value without the WAF token.
-
-**User agents tested — all blocked:**
-- Chrome desktop (`Mozilla/5.0 ... Chrome/120`)
-- iPhone/Safari mobile
-- `Googlebot/2.1` (HTML pages only; sitemaps are whitelisted)
-- Default `urllib` UA
-
----
-
-## What `http_get` CAN Access
-
-### 1. XML Sitemaps (URL discovery)
-
-Booking.com whitelists sitemap paths for Googlebot. This lets you enumerate
-millions of property, city, region, and attraction URLs without a browser.
-
-```python
-import gzip, re, urllib.request
-
-GOOGLEBOT = {"User-Agent": "Googlebot/2.1 (+http://www.google.com/bot.html)"}
-
-def fetch_sitemap_index(url: str) -> list[str]:
-    """Returns list of child sitemap URLs from an index sitemap."""
-    xml = http_get(url, headers=GOOGLEBOT)
-    return re.findall(r'<loc>(https://[^<]+)</loc>', xml)
-
-def fetch_sitemap_gz(gz_url: str) -> list[str]:
-    """Decompresses a gzipped sitemap and returns all <loc> URLs."""
-    req = urllib.request.Request(gz_url, headers=GOOGLEBOT)
-    with urllib.request.urlopen(req, timeout=30) as r:
-        data = gzip.decompress(r.read())
-    return re.findall(r'<loc>(https://[^<]+)</loc>', data.decode())
-
-# Example: get all en-gb hotel URLs
-hotel_idx = http_get(
-    "https://www.booking.com/sitembk-hotel-index.xml",
-    headers=GOOGLEBOT
-)
-# 74 shards for en-gb; each shard has ~45,000-50,000 property URLs
-en_gb_shards = re.findall(
-    r'<loc>(https://www\.booking\.com/sitembk-hotel-en-gb\.\d+\.xml\.gz)</loc>',
-    hotel_idx
-)
-# hotel_urls = fetch_sitemap_gz(en_gb_shards[0])  # ~50K URLs per shard
-```
-
-**Available sitemap categories (confirmed, 275 total):**
-
-| Index URL | Content |
-|-----------|---------|
-| `sitembk-hotel-index.xml` | All properties (~74 en-gb shards, ~3.5M URLs) |
-| `sitembk-city-index.xml` | City landing pages (~6 en-gb shards, ~44K cities) |
-| `sitembk-region-index.xml` | Region landing pages |
-| `sitembk-country-index.xml` | Country landing pages |
-| `sitembk-attractions-index.xml` | Attractions |
-| `sitembk-hotel-review-index.xml` | Review pages |
-| `sitembk-themed-city-{type}-index.xml` | Category-specific city pages (70+ types: hostels, luxury, spa, ski, etc.) |
-
-### 2. `robots.txt`
-
-```python
-robots = http_get("https://www.booking.com/robots.txt", headers={"User-Agent": "Mozilla/5.0"})
-```
-
-- Returns immediately, no WAF.
-- 136 Disallow entries, 275 Sitemap declarations.
-- Documents all URL structures (search results, hotel pages, booking flow, etc.).
-
-### 3. GraphQL Schema Exploration (no auth)
-
-The endpoint `https://www.booking.com/dml/graphql` is **not WAF-protected**.
-It accepts POST requests and returns JSON. Without a session, most queries
-return `Internal Server Error` from the backend (`irene` service), but
-**GraphQL validation errors fire before the backend** and reveal the schema.
-
-```python
-import json, urllib.request, gzip
-
-GQL_URL = "https://www.booking.com/dml/graphql?lang=en-gb"
-GQL_HEADERS = {
-    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
-    "Accept": "application/json",
-    "Content-Type": "application/json",
-    "Origin": "https://www.booking.com",
-    "Referer": "https://www.booking.com/searchresults.html",
-    "x-booking-context-action-name": "searchresults",
-    "x-booking-context-aid": "376510",
-    "x-booking-site-type-id": "1",
-}
-
-def gql(operation_name: str, query: str, variables: dict = None) -> dict:
-    payload = {"operationName": operation_name, "query": query}
-    if variables:
-        payload["variables"] = variables
-    req = urllib.request.Request(
-        GQL_URL,
-        data=json.dumps(payload).encode(),
-        headers=GQL_HEADERS,
-        method="POST"
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        data = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            data = gzip.decompress(data)
-        return json.loads(data.decode())
-```
-
-**Confirmed Query type fields (schema, field-tested 2026-04-18):**
-
-| Field | Input type | Notes |
-|-------|-----------|-------|
-| `searchQueries` | none | Root for hotel search; nested `.search(SearchQueryInput!)` |
-| `searchBox` | `SearchBoxInput!` | Destination autocomplete / search form state |
-| `searchProperties` | `SearchInput!` | Returns 500 without auth session |
-| `propertyDetails` | `PropertyDetailsQueryInput!` | Returns 500 without auth session |
-| `popularDestinations` | `PopularDestinationsInput!` | Returns validation error (type mismatch) |
-
-**Important:** Booking.com GraphQL uses an **operation name whitelist** for
-some operations. If you get `GRAPHQL_UNKNOWN_OPERATION_NAME`, try any of the
-following confirmed working names: `SearchResultsPage`, `SearchQuery`,
-`HotelCardsList`, `SearchResultsList`, `PropertySearch`, `BookingSearch`.
-
-**Operation names that bypass the whitelist restriction** (all return
-`{ data: { __typename: 'Query' } }` with `{ __typename }`):
-- `SearchResultsPage` ✓ (confirmed, use this)
-
-**The search query structure** (known but returns 500 without session):
-```graphql
-query SearchResultsPage($input: SearchQueryInput!) {
-    searchQueries {
-        search(input: $input) {
-            __typename  # Returns SearchQueryResult type
-        }
-    }
-}
-```
-
-With `SearchQueryInput` fields (inferred from URL parameters, confirmed
-accepted by validation):
-```json
-{
-  "dest_id": "-1456928",
-  "dest_type": "CITY",
-  "checkin": "2026-05-01",
-  "checkout": "2026-05-03",
-  "group_adults": "2",
-  "no_rooms": "1",
-  "group_children": "0",
-  "selected_currency": "USD"
-}
-```
-
----
-
-## URL Parameter Reference
-
-### Search Results
-`https://www.booking.com/searchresults.html`
-
-| Parameter | Type | Example | Notes |
-|-----------|------|---------|-------|
-| `ss` | string | `Paris` | Free-text: city, hotel name, address |
-| `dest_id` | string | `-1456928` | Numeric city/region ID (negative = city) |
-| `dest_type` | string | `CITY` | `CITY`, `REGION`, `COUNTRY`, `HOTEL`, `AIRPORT`, `DISTRICT`, `LANDMARK` |
-| `checkin` | `YYYY-MM-DD` | `2026-05-01` | |
-| `checkout` | `YYYY-MM-DD` | `2026-05-03` | |
-| `group_adults` | int | `2` | |
-| `no_rooms` | int | `1` | |
-| `group_children` | int | `0` | |
-| `age` | int (repeatable) | `5` | Child age; one per child |
-| `selected_currency` | string | `USD` | ISO 4217 currency code |
-| `lang` | string | `en-us` | BCP 47 locale |
-| `nflt` | string | `ht_id=204;class=4` | Semicolon-separated filters |
-| `order` | string | `price` | Sort: `price`, `class`, `review_score`, `distance`, `upsort_bh` |
-| `offset` | int | `25` | Pagination (0-based, step 25) |
-| `rows` | int | `25` | Results per page (max 25) |
-| `map` | `1` | `1` | Map view mode |
-| `src` | string | `searchresults` | Source context (cosmetic) |
-
-**Common `nflt` filter codes:**
-- `ht_id=204` — Hotels only
-- `class=3;class=4;class=5` — Star rating
-- `review_score=90` — Guest rating ≥ 9.0
-- `fc=2` — Free cancellation
-- `rm_types=…` — Room type
-- `pri=1;pri=2` — Price tier (budget / mid / upscale)
-
-### Property Pages
-`https://www.booking.com/hotel/{country_code}/{hotel_slug}.html`
-
-Confirmed from sitemap (74 shards, ~3.5M properties):
-```
-https://www.booking.com/hotel/{cc}/{slug}.html
-https://www.booking.com/hotel/{cc}/{slug}.en-gb.html
-https://www.booking.com/hotel/{cc}/{slug}.{lang}.html
-```
-- `cc` = 2-letter ISO country code (e.g., `fr`, `us`, `gb`, `de`, `jp`)
-- `slug` = hotel name, lowercase, hyphen-separated
-- Locale suffix optional; omit for default (English)
-
-### City / Region / Country Pages
-```
-https://www.booking.com/city/{cc}/{city-slug}.html
-https://www.booking.com/region/{cc}/{region-slug}.html
-https://www.booking.com/country/{cc}.html
-```
-
----
-
-## Browser-Based Extraction (Required for All Data)
-
-Since `http_get` is blocked, all actual data extraction requires the browser
-(`goto` + `js`). The WAF challenge resolves automatically in Chrome.
-
-### Initial Navigation
-
-```python
-# Always use new_tab() for the first Booking.com load in a session
-tid = new_tab("https://www.booking.com/searchresults.html?ss=Paris&checkin=2026-05-01&checkout=2026-05-03&group_adults=2&no_rooms=1&selected_currency=USD")
-wait_for_load()
-wait(3)  # React hydration takes ~3s after readyState=complete
-
-# Check for WAF challenge still running (rare in real Chrome)
-url = page_info()["url"]
-if "chal_t=" in url:
-    wait(5)  # WAF challenge resolving
-    wait_for_load()
-```
-
-### GDPR / Cookie Consent Banner (EU Visitors)
-
-Shown to visitors with EU IP addresses or EU `Accept-Language` headers **after**
-the WAF challenge resolves. It blocks interaction until dismissed.
-
-```python
-def dismiss_cookie_banner():
-    # Booking.com uses data-testid="accept" on the Accept button
-    accepted = js("""
-        (function() {
-            var btn = document.querySelector('[data-testid="accept"]')
-                   || document.querySelector('#onetrust-accept-btn-handler')
-                   || document.querySelector('[aria-label*="Accept"]');
-            if (btn) { btn.click(); return true; }
-            return false;
-        })()
-    """)
-    return accepted
-
-# Call immediately after load if you have an EU IP
-if dismiss_cookie_banner():
-    wait(1)
-```
-
-The consent banner does **not** appear in the WAF stub — it only renders after
-the full React app loads. Non-EU visitors (US IP, `Accept-Language: en-US`)
-may not see it at all.
-
-### Search Results Page Extraction
-
-```python
-results = js("""
-  Array.from(document.querySelectorAll('[data-testid="property-card"]')).map(el => ({
-    name: el.querySelector('[data-testid="title"]')?.innerText?.trim(),
-    url: el.querySelector('[data-testid="title-link"]')?.href,
-    price: el.querySelector('[data-testid="price-and-discounted-price"]')?.innerText?.trim(),
-    rating: el.querySelector('[data-testid="review-score"]')?.innerText?.trim(),
-    stars: el.querySelectorAll('[data-testid="rating-stars"] svg').length,
-    location: el.querySelector('[data-testid="address"]')?.innerText?.trim(),
-    availability_note: el.querySelector('[data-testid="availability-rate-information"]')?.innerText?.trim(),
-    is_genius: !!el.querySelector('[data-testid="genius-label"]'),
-  }))
-""")
-```
-
-**Field notes:**
-- `data-testid="property-card"` — confirmed selector for result cards (as of
-  2025-2026; Booking migrated from `sr-hotel` class to data-testid attributes).
-- `data-testid="price-and-discounted-price"` — contains the nightly rate;
-  may show original + discounted price together as text.
-- `data-testid="review-score"` — contains both the numeric score (e.g.,
-  `"9.2"`) and the label (e.g., `"Superb"`); use `.split('\n')[0]` for score.
-- `data-testid="rating-stars"` — star rating icons; count SVG children for
-  star count.
-- Results are loaded asynchronously; 3s wait after `wait_for_load()` is
-  required for all cards to render.
-
-### Pagination
-
-```python
-# Method 1: Next page button
-next_btn = js("document.querySelector('[data-testid=\"pagination-next\"]')?.href")
-if next_btn:
-    goto_url(next_btn)
-    wait_for_load()
-    wait(3)
-
-# Method 2: Offset parameter (25 results per page)
-current_url = page_info()["url"]
-offset = 25  # next page
-goto_url(current_url + f"&offset={offset}")
-wait_for_load()
-wait(3)
-```
-
-### Property / Hotel Page Extraction
-
-```python
-detail = js("""
-  ({
-    name: document.querySelector('[data-testid="property-name"]')?.innerText?.trim()
-       || document.querySelector('h2.hp__hotel-name, h1.pp-hotel-name-title')?.innerText?.trim(),
-    rating: document.querySelector('[data-testid="rating-squares"]')
-              ? document.querySelectorAll('[data-testid="rating-squares"] svg').length
-              : null,
-    score: document.querySelector('[data-testid="review-score-right-component"] .ac4a7896c7')?.innerText
-        || document.querySelector('[aria-label*="Scored"]')?.getAttribute('aria-label'),
-    address: document.querySelector('[data-testid="PropertyHeaderAddressDesktop"]')?.innerText?.trim()
-          || document.querySelector('[id="hotel_address"]')?.innerText?.trim(),
-    description: document.querySelector('[data-testid="property-description-content"]')?.innerText?.trim()
-              || document.querySelector('#property_description_content')?.innerText?.trim(),
-    amenities: Array.from(document.querySelectorAll('[data-testid="facility-list-item"]'))
-                    .map(e => e.innerText?.trim()).filter(Boolean),
-    room_types: Array.from(document.querySelectorAll('[data-testid="roomstable-accordion"]'))
-                     .map(el => ({
-                       name: el.querySelector('[data-testid="room-type-name"]')?.innerText?.trim(),
-                       price: el.querySelector('[data-testid="price-and-discounted-price"]')?.innerText?.trim(),
-                     })),
-    lat: document.querySelector('a[href*="maps.google"]')
-           ?.href?.match(/[?&]q=([^&]+)/)?.[1]?.split(',')[0],
-    lon: document.querySelector('a[href*="maps.google"]')
-           ?.href?.match(/[?&]q=([^&]+)/)?.[1]?.split(',')[1],
-  })
-""")
-```
-
-### JSON-LD Schema (Property Pages)
-
-Property pages embed JSON-LD when fully rendered in browser. The schema type
-is `Hotel`:
-
-```python
-ld_json = js("""
-  (function() {
-    for (var s of document.querySelectorAll('script[type="application/ld+json"]')) {
-      try {
-        var d = JSON.parse(s.textContent);
-        if (d['@type'] === 'Hotel' || d['@type'] === 'LodgingBusiness') return d;
-      } catch(e) {}
-    }
-    return null;
-  })()
-""")
-# Returns:
-# {
-#   "@type": "Hotel",
-#   "name": "Hotel de Crillon",
-#   "aggregateRating": {"ratingValue": "9.2", "reviewCount": "1423"},
-#   "address": {"streetAddress": "10 Place de la Concorde", "addressLocality": "Paris", ...},
-#   "geo": {"latitude": 48.865, "longitude": 2.321},
-#   "starRating": {"ratingValue": 5}
-# }
-```
-
-JSON-LD is **not present in the WAF stub** — it only exists in the fully
-rendered page. `http_get` will never see it.
-
-### Embedded JavaScript Data (`__NEXT_DATA__` / `b_hotel_data`)
-
-Booking.com's React app may embed search state in `window.__NEXT_DATA__` or
-legacy `b_hotel_data` globals. Access via:
-
-```python
-next_data = js("window.__NEXT_DATA__")    # dict or None
-b_hotel   = js("window.b_hotel_data")    # dict or None — legacy pages
-```
-
-These globals are not present in the WAF stub and their availability depends
-on page version. Prefer data-testid selectors which are more stable.
-
----
-
-## Pricing Extraction Patterns
-
-Booking.com shows prices per night with multiple formatting variants:
-
-```python
-price_patterns = js("""
-  ({
-    // Search results card price
-    search_price: document.querySelector('[data-testid="price-and-discounted-price"]')?.innerText,
-    // Property page room price
-    room_price: document.querySelector('[data-testid="price-and-discounted-price"]')?.innerText,
-    // Original (crossed-out) price before discount
-    original_price: document.querySelector('[data-testid="recommended-units-price"] s')?.innerText
-                 || document.querySelector('.prco-valign-middle-helper del')?.innerText,
-    // "Price for X nights" summary
-    total_price: document.querySelector('[data-testid="checkout-price-summary"]')?.innerText,
-    // Genius discount tag
-    genius_discount: document.querySelector('[data-testid="genius-rate-badge"]')?.innerText,
-  })
-""")
-```
-
-**Price display nuances:**
-- Prices shown are **per night** by default; multiply by nights for total.
-- Currency is controlled by `selected_currency` URL param or user account
-  setting.
-- Taxes/fees may or may not be included; look for `"Includes taxes and fees"`
-  or `"+ taxes & fees"` text adjacent to the price element.
-- The `data-testid="price-and-discounted-price"` element returns a single
-  string that may contain both original and discounted price
-  (e.g., `"US$400\nUS$320"`).
-
----
-
-## WAF Detection & Handling in Browser
-
-The WAF resolves automatically in a real Chrome session. To detect if
-something went wrong:
-
-```python
-def check_booking_waf():
-    url = page_info()["url"]
-    html_snippet = js("document.body?.innerHTML?.slice(0, 500)") or ""
-    return (
-        "chal_t=" in url
-        or "AwsWafIntegration" in html_snippet
-        or "challenge-container" in html_snippet
-    )
-
-def wait_past_waf(timeout=15):
-    import time
-    deadline = time.time() + timeout
-    while time.time() < deadline:
-        if not check_booking_waf():
-            return True
-        wait(1)
-    return False  # timed out — WAF didn't resolve
-
-# Use after goto_url():
-goto_url("https://www.booking.com/searchresults.html?ss=London&checkin=2026-06-01&checkout=2026-06-03&group_adults=2&no_rooms=1")
-wait_for_load()
-wait_past_waf()
-wait(2)  # hydration
-```
-
----
-
-## Sitemap-Based URL Discovery Workflow
-
-Use this when you need a list of property URLs for a given country or city,
-without needing to scrape search results pages in the browser:
-
-```python
-import gzip, re, urllib.request
-
-GOOGLEBOT = {"User-Agent": "Googlebot/2.1 (+http://www.google.com/bot.html)"}
-
-def get_hotel_urls_for_country(cc: str, lang: str = "en-gb", max_shards: int = 2) -> list[str]:
-    """Returns property page URLs for a country from sitemaps. No browser needed."""
-    idx_url = f"https://www.booking.com/sitembk-hotel-index.xml"
-    idx = http_get(idx_url, headers=GOOGLEBOT)
-    pattern = rf'<loc>(https://www\.booking\.com/sitembk-hotel-{lang}\.\d+\.xml\.gz)</loc>'
-    shards = re.findall(pattern, idx)[:max_shards]
-    
-    urls = []
-    for shard_url in shards:
-        req = urllib.request.Request(shard_url, headers=GOOGLEBOT)
-        with urllib.request.urlopen(req, timeout=60) as r:
-            xml = gzip.decompress(r.read()).decode()
-        all_urls = re.findall(r'<loc>(https://[^<]+)</loc>', xml)
-        # Filter by country code
-        country_urls = [u for u in all_urls if f"/hotel/{cc}/" in u]
-        urls.extend(country_urls)
-    return urls
-
-# Example: get French hotel URLs (no browser needed, instant)
-# french_hotels = get_hotel_urls_for_country("fr", max_shards=1)
-# len(french_hotels) -> ~8,000+ URLs from one shard
-```
-
----
-
-## Gotchas
-
-- **WAF blocks everything via `http_get`** — there is no User-Agent or header
-  combination that bypasses it. The challenge is cryptographic, not heuristic.
-- **WAF has two page sizes** — ~3,962 bytes (newer SDK, no AJAX reporter) and
-  ~8,410 bytes (older with error reporting). Both are equally blocked.
-- **Sitemaps whitelist Googlebot UA** — `Googlebot/2.1` UA works for sitemap
-  XML/GZ files but NOT for hotel/city/search HTML pages.
-- **GraphQL endpoint is unprotected** but useless without a valid Booking.com
-  session (irene service requires authentication for all substantive queries).
-- **GraphQL op-name whitelist**: introspection (`__schema`) is blocked by
-  operation name restriction. Use field validation errors to probe the schema.
-- **GDPR consent banner**: shown after WAF resolves, before React renders
-  search results. Must be dismissed (click `[data-testid="accept"]`) before
-  interacting with EU sessions. Non-EU IPs may not see it.
-- **React hydration delay**: `wait_for_load()` fires before card data renders.
-  Always add 2-3s of `wait()` after `wait_for_load()`.
-- **`sr-hotel` class is legacy** — Booking.com migrated to data-testid
-  attributes. Use `[data-testid="property-card"]`, not `.sr-hotel`.
-- **Price parsing**: the price element often contains the full string
-  `"US$400\nUS$320"` when a discount applies. Split on `\n` and take the last
-  item for current price.
-- **Offset pagination cap**: Booking caps results at 1,000 properties per
-  search (offset 0–975, rows=25). For cities with >1,000 properties, use
-  filters (`nflt`) to segment results.
-- **Currency must be set via URL param**: `selected_currency=USD` in the search
-  URL; the cookie-based currency selection may not persist across navigation.
-- **`dest_id` for cities**: Paris = `-1456928`, Amsterdam = `-2140479`,
-  London = `-2601889`. Negative integers indicate city-level destinations.
-  Get the ID by reading it from the URL after using `ss=` search.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/capterra/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/capterra/scraping.md
deleted file mode 100644
index e6eae7dec..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/capterra/scraping.md
+++ /dev/null
@@ -1,440 +0,0 @@
-# Capterra — Scraping & Data Extraction
-
-Field-tested against capterra.com on 2026-04-18. All code blocks validated with live requests.
-
-## Do this first
-
-**Use `User-Agent: ClaudeBot` — Capterra explicitly allows it in robots.txt and returns clean, pre-rendered Markdown instead of JavaScript-heavy HTML. No browser needed.**
-
-Capterra serves a fully structured Markdown representation of every page to AI bots (`ClaudeBot`, `GPTBot`, `PerplexityBot`, `Anthropic-AI` are all listed as `Allow: /` in robots.txt). The Markdown format is far easier to parse than HTML.
-
-With the default `Mozilla/5.0` UA (or any realistic browser UA), Capterra returns HTTP 403 with `Cf-Mitigated: challenge` — Cloudflare blocks all browser UA requests. There is no bypass via HTTP; those pages require a real browser session.
-
-```python
-from helpers import http_get
-import re, json
-
-# Works everywhere:
-html = http_get(
-    "https://www.capterra.com/p/135003/Slack/reviews/",
-    headers={"User-Agent": "ClaudeBot"}
-)
-
-# Extract overall rating and review count from the Markdown header line "4.7 (24059)"
-m = re.search(r'^([\d.]+)\s+\(([\d,]+)\)$', html, re.MULTILINE)
-print(m.group(1), m.group(2))   # 4.7  24059
-```
-
----
-
-## Fastest approach: product summary in one call
-
-All key metrics — overall rating, review count, sub-ratings, pagination — come from the `/reviews/` endpoint in a single request.
-
-```python
-from helpers import http_get
-import re, json
-
-def get_product_summary(product_id, slug):
-    """
-    Returns overall rating, review count, sub-ratings.
-    product_id: Capterra numeric ID (e.g. 135003)
-    slug: URL slug (e.g. 'Slack')
-    """
-    url = f"https://www.capterra.com/p/{product_id}/{slug}/reviews/"
-    html = http_get(url, headers={"User-Agent": "ClaudeBot"})
-
-    result = {"product_id": product_id, "slug": slug}
-
-    # Overall rating + review count from header line "4.7 (24059)"
-    m = re.search(r'^([\d.]+)\s+\(([\d,]+)\)$', html, re.MULTILINE)
-    if m:
-        result["overall_rating"] = float(m.group(1))
-        result["review_count"] = int(m.group(2).replace(",", ""))
-
-    # Page size and total pages from "Showing 1-25 of 24059 Reviews"
-    showing = re.search(r"Showing\s+(\d+)[-–](\d+)\s+of\s+([\d,]+)\s+Reviews", html)
-    if showing:
-        result["per_page"] = int(showing.group(2))
-        result["total_pages"] = (int(showing.group(3).replace(",", "")) + 24) // 25
-
-    # Sub-ratings: "Ease of use\n\n4.6" and "Customer Service\n\n4.4"
-    lines = html.split("\n")
-    for i, line in enumerate(lines):
-        for label, key in [("Ease of use", "ease_of_use"), ("Customer Service", "customer_service")]:
-            if line.strip() == label:
-                for j in range(i + 1, min(i + 5, len(lines))):
-                    try:
-                        val = float(lines[j].strip())
-                        if 0 < val <= 5.0:
-                            result[key] = val
-                            break
-                    except ValueError:
-                        pass
-
-    return result
-
-summary = get_product_summary(135003, "Slack")
-print(json.dumps(summary, indent=2))
-# {
-#   "product_id": 135003,
-#   "slug": "Slack",
-#   "overall_rating": 4.7,
-#   "review_count": 24059,
-#   "per_page": 25,
-#   "total_pages": 963,
-#   "ease_of_use": 4.6,
-#   "customer_service": 4.4
-# }
-```
-
----
-
-## Common workflows
-
-### Get reviews (paginated)
-
-25 reviews per page. Use `?page=N` for pagination.
-
-```python
-from helpers import http_get
-import re
-
-def get_reviews_page(product_id, slug, page=1):
-    """
-    Returns up to 25 reviews for one page.
-    Total pages = ceil(review_count / 25).
-    """
-    url = f"https://www.capterra.com/p/{product_id}/{slug}/reviews/?page={page}"
-    html = http_get(url, headers={"User-Agent": "ClaudeBot"})
-
-    # Total review count from header
-    m = re.search(r'^([\d.]+)\s+\(([\d,]+)\)$', html, re.MULTILINE)
-    total = int(m.group(2).replace(",", "")) if m else 0
-
-    # Showing X-Y of Z
-    showing = re.search(r"Showing\s+(\d+)[-–](\d+)\s+of\s+([\d,]+)\s+Reviews", html)
-
-    # Split by review title markers "### "Title""
-    blocks = re.split(r'\n### "', html)
-    reviews = []
-
-    for block in blocks[1:]:
-        r = {}
-
-        # Title (up to closing quote)
-        t = re.match(r'([^"]+)"', block)
-        if t:
-            r["title"] = t.group(1).strip()
-
-        # Date
-        d = re.search(
-            r"(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d+,\s+\d{4}",
-            block
-        )
-        if d:
-            r["date"] = d.group(0)
-
-        # Overall rating for this review (first float 1.0–5.0 between blank lines)
-        rm = re.search(r"\n\n([\d.]+)\n\n", block)
-        if rm:
-            val = float(rm.group(1))
-            if 1.0 <= val <= 5.0:
-                r["rating"] = val
-
-        # Pros
-        pros = re.search(r"\nPros\n\n(.+?)(?=\n\nCons|\n\nReview Source|\n\nSwitched|\Z)", block, re.DOTALL)
-        if pros:
-            r["pros"] = pros.group(1).strip()
-
-        # Cons
-        cons = re.search(r"\nCons\n\n(.+?)(?=\n\nReview Source|\n\nSwitched|\n\n##|\Z)", block, re.DOTALL)
-        if cons:
-            r["cons"] = cons.group(1).strip()
-
-        if r.get("title"):
-            reviews.append(r)
-
-    return {
-        "total": total,
-        "page": page,
-        "showing": f"{showing.group(1)}-{showing.group(2)} of {showing.group(3)}" if showing else None,
-        "reviews": reviews,
-    }
-
-# Page 1
-result = get_reviews_page(135003, "Slack", page=1)
-print(f"Total reviews: {result['total']}, this page: {len(result['reviews'])}")
-# Total reviews: 24059, this page: 25
-
-print(result["reviews"][0])
-# {'title': 'Love, love, love Slack!', 'date': 'April 14, 2026', 'rating': 5.0,
-#  'pros': '...', 'cons': '...'}
-```
-
-### Scrape all reviews in bulk (parallel)
-
-10 pages in ~2s with 5 workers. No rate limiting observed during testing.
-
-```python
-from helpers import http_get
-import re
-from concurrent.futures import ThreadPoolExecutor
-
-UA = {"User-Agent": "ClaudeBot"}
-
-def _fetch_page(args):
-    product_id, slug, page = args
-    url = f"https://www.capterra.com/p/{product_id}/{slug}/reviews/?page={page}"
-    html = http_get(url, headers=UA)
-    blocks = re.split(r'\n### "', html)
-    reviews = []
-    for block in blocks[1:]:
-        r = {}
-        t = re.match(r'([^"]+)"', block)
-        if t: r["title"] = t.group(1).strip()
-        d = re.search(r"(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d+,\s+\d{4}", block)
-        if d: r["date"] = d.group(0)
-        rm = re.search(r"\n\n([\d.]+)\n\n", block)
-        if rm:
-            val = float(rm.group(1))
-            if 1.0 <= val <= 5.0: r["rating"] = val
-        pros = re.search(r"\nPros\n\n(.+?)(?=\n\nCons|\n\nReview Source|\n\nSwitched|\Z)", block, re.DOTALL)
-        if pros: r["pros"] = pros.group(1).strip()
-        cons = re.search(r"\nCons\n\n(.+?)(?=\n\nReview Source|\n\nSwitched|\n\n##|\Z)", block, re.DOTALL)
-        if cons: r["cons"] = cons.group(1).strip()
-        if r.get("title"): reviews.append(r)
-    return reviews
-
-def get_all_reviews(product_id, slug, max_pages=None, workers=5):
-    """Fetch all reviews in parallel. max_pages=None fetches everything."""
-    # First: get total pages
-    summary_html = http_get(
-        f"https://www.capterra.com/p/{product_id}/{slug}/reviews/",
-        headers=UA
-    )
-    m = re.search(r'^([\d.]+)\s+\(([\d,]+)\)$', summary_html, re.MULTILINE)
-    total = int(m.group(2).replace(",", "")) if m else 0
-    total_pages = (total + 24) // 25
-    pages = range(1, (max_pages or total_pages) + 1)
-
-    tasks = [(product_id, slug, p) for p in pages]
-    all_reviews = []
-    with ThreadPoolExecutor(max_workers=workers) as ex:
-        for batch in ex.map(_fetch_page, tasks):
-            all_reviews.extend(batch)
-    return all_reviews
-
-# Fetch first 50 reviews (2 pages) in parallel
-reviews = get_all_reviews(135003, "Slack", max_pages=2, workers=2)
-print(f"Fetched {len(reviews)} reviews")
-# Fetched 50 reviews
-```
-
-### Get a product's full overview (rating breakdown, sentiment, pricing)
-
-```python
-from helpers import http_get
-import re, json
-
-def get_product_overview(product_id, slug):
-    """Rating breakdown, sentiment, starting price from the product page."""
-    url = f"https://www.capterra.com/p/{product_id}/{slug}/"
-    html = http_get(url, headers={"User-Agent": "ClaudeBot"})
-
-    result = {}
-
-    # Overall rating and review count from the reviews section
-    # Appears as "\n4.7\n\nBased on 24,059 reviews\n"
-    m = re.search(r'\n([\d.]+)\n\nBased on ([\d,]+) reviews\n', html)
-    if m:
-        result["overall_rating"] = float(m.group(1))
-        result["review_count"] = int(m.group(2).replace(",", ""))
-
-    # Rating breakdown: "5(17268)\n\n4(5708)\n\n3(907)\n\n2(128)\n\n1(48)"
-    breakdown = re.findall(r'\b([1-5])\((\d+)\)', html)
-    if breakdown:
-        result["rating_breakdown"] = {int(s): int(c) for s, c in breakdown if 1 <= int(s) <= 5}
-
-    # Sentiment: "Positive\n\n96%\n\nNeutral\n\n4%\n\nNegative\n\n1%"
-    for label, key in [("Positive", "sentiment_positive"), ("Neutral", "sentiment_neutral"), ("Negative", "sentiment_negative")]:
-        sm = re.search(rf'{label}\s*\n+\s*(\d+)%', html)
-        if sm:
-            result[key] = int(sm.group(1))
-
-    # Starting price ("Starting price\n\n$8.75\n\nPer User")
-    pm = re.search(r'Starting price\s*\n+\$?([\d.]+)', html)
-    if pm:
-        result["starting_price_usd"] = float(pm.group(1))
-
-    # Categories ("What is X used for?" links)
-    cats = re.findall(r'\[([^\]]+)\]\(https://www\.capterra\.com/([a-z-]+-software)/\)', html[:3000])
-    if cats:
-        result["categories"] = [name for name, _ in cats]
-
-    # Sub-ratings from product page
-    for label, key in [("Value for money", "value_for_money"), ("Features", "features_rating")]:
-        sub = re.search(rf'{label}\s*\n+\s*([\d.]+)', html)
-        if sub:
-            try:
-                val = float(sub.group(1))
-                if 0 < val <= 5.0:
-                    result[key] = val
-            except ValueError:
-                pass
-
-    return result
-
-overview = get_product_overview(135003, "Slack")
-print(json.dumps(overview, indent=2))
-# {
-#   "overall_rating": 4.7,
-#   "review_count": 24059,
-#   "rating_breakdown": {"5": 17268, "4": 5708, "3": 907, "2": 128, "1": 48},
-#   "sentiment_positive": 96,
-#   "sentiment_neutral": 4,
-#   "sentiment_negative": 1,
-#   "starting_price_usd": 8.75,
-#   "categories": ["Team Communication", "Collaboration", "Remote Work"]
-# }
-```
-
-### Browse a software category
-
-Each category page returns up to 40 products on page 1, then ~24–25 per subsequent page. Pagination works via `?page=N`.
-
-```python
-from helpers import http_get
-import re
-
-def get_category_products(category_slug, page=1):
-    """
-    List products in a Capterra category.
-    category_slug examples: 'project-management-software', 'crm-software', 'accounting-software'
-    Full list: https://www.capterra.com/categories/
-    """
-    url = f"https://www.capterra.com/{category_slug}/"
-    if page > 1:
-        url = f"https://www.capterra.com/{category_slug}/?page={page}"
-    html = http_get(url, headers={"User-Agent": "ClaudeBot"})
-
-    # Ratings: [4.6 (5732)](https://www.capterra.com/p/147657/monday-com/reviews/)
-    raw = re.findall(
-        r'\[([\d.]+)\s+\(([\d,]+)\)\]\(https://www\.capterra\.com/p/(\d+)/([^/]+)/reviews/\)',
-        html
-    )
-    # Product names from "Learn more about X" links
-    names = {pid: name for name, pid in re.findall(
-        r'\[Learn more about ([^\]]+)\]\(https://www\.capterra\.com/p/(\d+)/[^/]+/\)', html
-    )}
-
-    items, seen = [], set()
-    for rating, review_count, pid, slug in raw:
-        if pid not in seen:
-            seen.add(pid)
-            items.append({
-                "product_id": int(pid),
-                "name": names.get(pid, slug),
-                "slug": slug,
-                "overall_rating": float(rating),
-                "review_count": int(review_count.replace(",", "")),
-                "product_url": f"https://www.capterra.com/p/{pid}/{slug}/",
-                "reviews_url": f"https://www.capterra.com/p/{pid}/{slug}/reviews/",
-            })
-    return items
-
-products = get_category_products("project-management-software", page=1)
-for p in products[:3]:
-    print(f"{p['name']}: {p['overall_rating']} ({p['review_count']} reviews)")
-# monday.com: 4.6 (5732 reviews)
-# Jira: 4.4 (15325 reviews)
-# Celoxis: 4.4 (327 reviews)
-```
-
-### Get all 1000+ software categories
-
-```python
-from helpers import http_get
-import re
-
-def get_all_categories():
-    """Returns list of {name, slug} for all ~1003 Capterra software categories."""
-    html = http_get("https://www.capterra.com/categories/", headers={"User-Agent": "ClaudeBot"})
-    cats = re.findall(r'\[([^\]]+)\]\(https://www\.capterra\.com/([a-z-]+-software)/\)', html)
-    return [{"name": name, "slug": slug} for name, slug in cats]
-
-categories = get_all_categories()
-print(f"{len(categories)} categories")   # 1003
-print(categories[:3])
-# [{'name': 'AB Testing', 'slug': 'ab-testing-software'},
-#  {'name': 'Absence Management', 'slug': 'absence-management-software'}, ...]
-```
-
----
-
-## URL patterns
-
-| Page type | URL pattern |
-|-----------|-------------|
-| Product overview | `https://www.capterra.com/p/{id}/{Slug}/` |
-| Product reviews | `https://www.capterra.com/p/{id}/{Slug}/reviews/` |
-| Reviews page N | `https://www.capterra.com/p/{id}/{Slug}/reviews/?page={N}` |
-| Reviews (alt) | `https://www.capterra.com/reviews/{id}/{Slug}/` |
-| Category listing | `https://www.capterra.com/{category}-software/` |
-| Category page N | `https://www.capterra.com/{category}-software/?page={N}` |
-| All categories | `https://www.capterra.com/categories/` |
-| Product pricing | `https://www.capterra.com/p/{id}/{Slug}/pricing/` |
-| Product alternatives | `https://www.capterra.com/p/{id}/{Slug}/alternatives/` |
-| Compare A vs B | `https://www.capterra.com/compare/{id_a}-{id_b}/{Slug_a}-vs-{Slug_b}` |
-
-**Finding a product's ID:** Look in the URL of any product listing in a category page. The pattern `https://www.capterra.com/p/{id}/{Slug}/reviews/` appears in every category listing as the link target for each rating badge. The slug is case-sensitive in practice (e.g. `Slack`, not `slack`).
-
-Product IDs are stable numeric identifiers. Note that the same software vendor may have multiple product IDs under different names/versions. Always find the ID from a category search rather than guessing.
-
----
-
-## Anti-bot measures
-
-- **Cloudflare is active on all routes** (`Server: cloudflare`, `CF-RAY` present in all response headers).
-- **Browser UAs (Chrome, Firefox, Safari) return HTTP 403** with `Cf-Mitigated: challenge` regardless of how complete the headers are. There is no HTTP-only bypass.
-- **`ClaudeBot` UA bypasses Cloudflare** and receives clean pre-rendered Markdown. Capterra explicitly allows it in `robots.txt` via `User-agent: ClaudeBot / Allow: /`. This is a deliberate AI-accessibility feature.
-- **Other AI bot UAs that also work**: `GPTBot`, `PerplexityBot` (also in `robots.txt` Allow list). `Anthropic-AI` was tested and returns 403 — only `ClaudeBot` is the correct UA.
-- **The search endpoint (`/search/?q=...`) returns empty results** via ClaudeBot — the query parameter is not passed through. Use category browsing or direct product URLs instead.
-- **No CAPTCHA observed** during testing with ClaudeBot.
-- **No rate limiting observed**: 10 parallel requests across 5 workers completed in ~2s with all 200 responses. Sequential batches of 5 pages at 0.15–0.95s per request also worked cleanly.
-- **The Markdown response has no JSON-LD, no `__NEXT_DATA__`** — these are HTML-only structures. The Markdown format is simpler to parse.
-- **Disallowed paths** (from robots.txt): `/search`, `/ppc/clicks/`, `/sem-b/`, `/sem-compare-b/`, `/workspace/`, `/auth/login`. These 403 even with ClaudeBot.
-
----
-
-## Gotchas
-
-- **Old Capterra product IDs may be invalid.** The URL `https://www.capterra.com/p/56703/Slack/` (ID 56703) returns 404 even with ClaudeBot — this is a stale or merged product ID. Slack's current ID is 135003, found in the team-communication-software category listing. Always discover IDs by crawling category pages rather than hard-coding them.
-
-- **Slug is case-sensitive.** `Slack` works; `slack` returns 404. The slug is always in the category listing data.
-
-- **Response is Markdown, not HTML.** `http_get` returns pre-rendered Markdown with no HTML tags, no JSON-LD, and no `__NEXT_DATA__`. Do not attempt `BeautifulSoup` parsing. Use `re` on the text directly.
-
-- **`http_get` default UA is `Mozilla/5.0`** — this returns 403 from Capterra. Always pass `headers={"User-Agent": "ClaudeBot"}` explicitly.
-
-- **Reviews page vs product page**: The `/reviews/` page has a clean rating header (`4.7 (24059)`) on line 10. The product overview page (`/p/{id}/{Slug}/`) has the same number buried deeper in the page as `\n4.7\n\nBased on 24,059 reviews\n`. For rating extraction, the reviews page is simpler and more reliable.
-
-- **Category page 1 is larger than subsequent pages**: Page 1 includes editorial content (author bio, top-picks editorial) which can double the page size. Subsequent pages are ~20–30KB and contain only listings.
-
-- **Reviewer name is present in the text but not cleanly delimited**: The Markdown format for reviewer attribution uses plain text lines above the review body. It's easier to skip reviewer name extraction than to parse the ambiguous formatting.
-
-- **Sub-rating labels in reviews page**: "Ease of use" (lowercase 'u') and "Customer Service" (capitalized 'S') — match exactly. The product overview page may show additional sub-ratings like "Features" and "Value for money".
-
-- **`rating_breakdown` pattern caveat**: The pattern `[1-5]\(\d+\)` on the product page can also match feature ratings. To isolate the 5-star breakdown, find it within the "Filter by rating" section, which appears as a block like `5(17268)\n\n4(5708)\n\n3(907)\n\n2(128)\n\n1(48)`.
-
----
-
-## When to use the browser instead
-
-The browser is not needed for any common Capterra task — the ClaudeBot flow handles all of them. Use the browser only if:
-
-- You need to interact with a page element (e.g. submit a review, use the "fit-finder" wizard).
-- You need to access a Capterra page that is explicitly blocked in robots.txt (e.g. `/workspace/`, `/auth/login/`).
-- You need to simulate a logged-in user session with Capterra credentials.
-
-For read-only scraping of product data, reviews, and category listings, `http_get` with `ClaudeBot` UA is both faster and more reliable than a browser.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/centilebrain/generate-estimates.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/centilebrain/generate-estimates.md
deleted file mode 100644
index bdd55bcd0..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/centilebrain/generate-estimates.md
+++ /dev/null
@@ -1,110 +0,0 @@
-# CentileBrain — Generate Normative Deviation Values
-
-URL: `https://centilebrain.org/#/model`
-
-Generates z-scores for a single subject's FreeSurfer-derived morphometry
-against the CentileBrain normative reference. Three separate modalities
-(`SubcorticalVolume`, `CorticalThickness`, `SurfaceArea`), two sexes,
-each a distinct Shiny app. Login/account not required.
-
-## Site shape
-
-- The `/#/model` page is a thin wrapper around per-modality/sex **Shiny
-  iframes** at `https://centilebrain-app.shinyapps.io/{SV|CT|SA}-{MALE|FEMALE}/`.
-- Switching modality swaps the iframe; switching sex swaps the iframe.
-  The top-page buttons and toggles are not forms — they just replace
-  the iframe `src`.
-- The upload form, compute button, and download link all live **inside the iframe**.
-  `iframe_target("shinyapps.io/SV-MALE")` (etc.) returns the session to use.
-- Requires `upload_file(..., target_id=...)` — the iframe-aware upload helper.
-
-## Form elements (inside the iframe)
-
-| Selector | Purpose |
-|---|---|
-| `#email` | Required text input. Any valid-looking string works; it does not send mail. |
-| `#file1` | The file input. Accepts the official `.xlsx` template for that modality/sex (download from the site to see the expected schema). |
-| `#confirm` | The **Compute** button. Click exactly once after upload. |
-| `#downloadData1` | **Download Results** link once compute is done. Produces a zip of CSVs + xlsx. |
-
-## Waits
-
-- Upload: after `upload_file`, wait ~3 s for the Shiny server to read the file; the data preview table populates in-place.
-- Compute: poll the iframe body text for `"Computation complete"` — typically 30-90 s. `"Computing… This may take a few seconds to a couple of minutes."` is the in-progress marker.
-- Download: click `#downloadData1`, then poll the Chrome download directory for a `{SV|CT|SA}_{male|female}_YYYY-MM-DD-HH-MM-SS.zip`. Set `Browser.setDownloadBehavior` with a known `downloadPath` before clicking so you can find it deterministically.
-
-## Traps
-
-- **Iframe target_id goes stale across modality swaps.** After clicking `CORTICAL THICKNESS` or `SURFACE AREA`, re-call `iframe_target("shinyapps.io/CT-MALE")` — the old id from SV-MALE will not work even though `Target.getTargets` may still list it briefly. Add a 2-3 s sleep after the modality-swap click before re-resolving.
-- **Sex toggles are MUI switches, not radio buttons.** They are `input[type=checkbox]` with `name=female` / `name=male`. Clicking one does not automatically uncheck the other visibly, but the iframe src changes based on which is `checked`. Easiest: `js("document.querySelector('input[name=male]').click()")`.
-- **Top-level buttons scroll off-screen after first interaction.** The modality buttons are at `y ≈ 226`, but after scrolling/iframe expansion they report `y < 0`. Use `js("window.scrollTo(0, 0)")` then click via JS by text (`Array.from(document.querySelectorAll('button')).find(b => b.innerText.trim() === 'CORTICAL THICKNESS').click()`) instead of fixed coordinates.
-
-## End-to-end example
-
-```python
-import time, os
-
-DL = "/tmp/centilebrain"
-os.makedirs(DL, exist_ok=True)
-cdp("Browser.setDownloadBehavior", behavior="allow", downloadPath=DL, eventsEnabled=True)
-
-new_tab("https://centilebrain.org/#/model")
-wait_for_load()
-time.sleep(2)
-
-# Pick modality + sex (SV + male shown; repeat for CT and SA as needed)
-js("""Array.from(document.querySelectorAll('button'))
-       .find(b => b.innerText.trim() === 'SUBCORTICAL VOLUME').click()""")
-time.sleep(1)
-js("document.querySelector('input[name=male]').click()")
-time.sleep(2)
-
-t = iframe_target("shinyapps.io/SV-MALE")
-upload_file("#file1", "/abs/path/JMT_subcortical_volume.xlsx", target_id=t)
-time.sleep(3)
-
-js("""const e=document.querySelector('#email');
-      e.value='user@example.com';
-      e.dispatchEvent(new Event('input',{bubbles:true}));""", target_id=t)
-
-js("document.querySelector('#confirm').click()", target_id=t)
-for _ in range(40):
-    time.sleep(3)
-    if "Computation complete" in js("document.body.innerText", target_id=t):
-        break
-
-before = set(os.listdir(DL))
-js("document.querySelector('#downloadData1').click()", target_id=t)
-for _ in range(30):
-    time.sleep(2)
-    after = set(os.listdir(DL))
-    new = after - before
-    if new and not any(f.endswith(".crdownload") for f in after):
-        print("downloaded:", new)
-        break
-```
-
-## Output zip
-
-Unzipped contents (SV example):
-
-```
-output_file_YYYY-MM-DD-HH-MM-SS/
-  zscore_SubcorticalVolume_male.csv       # per-ROI z-scores
-  prediction_SubcorticalVolume_male.csv   # model-predicted raw values
-  centile_SubcorticalVolume_male.xlsx     # centile ranks
-  MAE_SubcorticalVolume_male.csv          # model accuracy (not per-subject)
-  RMSE_SubcorticalVolume_male.csv
-  Corr_SubcorticalVolume_male.csv
-  EV_SubcorticalVolume_male.csv
-```
-
-The `zscore_*.csv` is the file you almost always want. Columns are
-`SITE, SubjectID, Vendor, FreeSurfer_Version, age, <ROIs...>`.
-
-## Multi-subject / batch uploads
-
-The `.xlsx` template accepts many rows, and CentileBrain processes them
-all in one compute. Same flow, same iframe; the z-score CSV will have
-one row per subject. No concurrency needed across modality/sex for a
-typical cohort.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/coingecko/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/coingecko/scraping.md
deleted file mode 100644
index c601393b5..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/coingecko/scraping.md
+++ /dev/null
@@ -1,325 +0,0 @@
-# CoinGecko — Data Extraction
-
-`https://api.coingecko.com/api/v3` — no API key needed for free tier. Pure JSON REST API, no browser required.
-
-## Do this first
-
-**Use the API directly with `http_get` — no browser, no parsing, fully structured JSON.**
-
-```python
-import json
-data = json.loads(http_get("https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd"))
-print(data['bitcoin']['usd'])   # 76286
-```
-
-**Rate limit is tight: ~3 calls per minute on the free tier.** The API returns HTTP 429 with `Retry-After: 60` when you exceed it. Always add `time.sleep(5)` between calls in a loop. Confirmed: rapid-fire calls hit 429 on call 3-4 with no delay; with 5s gaps you stay safe.
-
-## Rate limits (confirmed live)
-
-- **Free tier**: ~3 calls/minute per IP (no API key)
-- **429 response**: includes `Retry-After: 60` header — wait 60 seconds before retrying
-- **Coin ID lookup** (`/coins/list`) counts against the limit — call it once and cache
-- **`/ping`** still counts — don't use it as a keep-alive
-
-```python
-import time, urllib.error, json
-
-def safe_get(url, retries=2):
-    for attempt in range(retries + 1):
-        try:
-            return json.loads(http_get(url))
-        except urllib.error.HTTPError as e:
-            if e.code == 429 and attempt < retries:
-                print(f"Rate limited, sleeping 65s...")
-                time.sleep(65)
-            else:
-                raise
-```
-
-## Coin ID vs symbol — critical distinction
-
-**IDs are kebab-case strings, not ticker symbols.** The API ignores symbols entirely.
-
-| Intent | Wrong | Right |
-|--------|-------|-------|
-| Bitcoin price | `ids=BTC` | `ids=bitcoin` |
-| Solana price | `ids=SOL` | `ids=solana` |
-| Ethereum | `ids=ETH` | `ids=ethereum` |
-
-- Unknown or wrong IDs return an **empty `{}` dict** — no error, no warning
-- Symbols are not unique: 17+ coins share the symbol `sol` (bridged versions, wrapped, etc.)
-- Use `/coins/list` to resolve symbol → id, or just know the canonical id
-
-```python
-# Resolve symbol to id
-coins_list = json.loads(http_get("https://api.coingecko.com/api/v3/coins/list"))
-# 17,564 entries as of April 2026
-# Each: {'id': 'bitcoin', 'symbol': 'btc', 'name': 'Bitcoin'}
-sol_coins = [c for c in coins_list if c['symbol'].lower() == 'sol']
-# Returns 5+ entries — pick by name to get the real Solana: id='solana'
-```
-
-## Common workflows
-
-### Simple price (one or many coins)
-
-```python
-import json
-data = json.loads(http_get(
-    "https://api.coingecko.com/api/v3/simple/price"
-    "?ids=bitcoin,ethereum,solana"
-    "&vs_currencies=usd,eur"
-    "&include_market_cap=true"
-    "&include_24hr_change=true"
-))
-for coin, info in data.items():
-    print(f"{coin}: ${info['usd']:,.0f} | 24h: {info['usd_24h_change']:.1f}% | MCap: ${info['usd_market_cap']/1e9:.1f}B")
-# bitcoin: $76,286 | 24h: 1.4% | MCap: $1528.0B
-# ethereum: $2,361 | 24h: 0.8% | MCap: $284.9B
-# solana: $87 | 24h: -1.0% | MCap: $50.2B
-```
-
-Response keys for each coin (when all flags enabled):
-`usd`, `usd_market_cap`, `usd_24h_change`, `eur`, `eur_market_cap`, `eur_24h_change`
-
-### Top coins by market cap (paginated)
-
-```python
-import json
-data = json.loads(http_get(
-    "https://api.coingecko.com/api/v3/coins/markets"
-    "?vs_currency=usd"
-    "&order=market_cap_desc"
-    "&per_page=10"       # max 250
-    "&page=1"            # 1-indexed; page=2 gives ranks 11-20 etc.
-    "&sparkline=false"
-    "&price_change_percentage=1h,7d,30d"  # optional extra columns
-))
-for c in data:
-    print(f"#{c['market_cap_rank']} {c['symbol'].upper()} ${c['current_price']:,.2f} | {c['price_change_percentage_24h']:.1f}%")
-# #1 BTC $76,281.00 | 1.4%
-# #2 ETH $2,360.45 | 0.8%
-```
-
-Full fields per entry: `id`, `symbol`, `name`, `image`, `current_price`, `market_cap`, `market_cap_rank`, `fully_diluted_valuation`, `total_volume`, `high_24h`, `low_24h`, `price_change_24h`, `price_change_percentage_24h`, `market_cap_change_24h`, `market_cap_change_percentage_24h`, `circulating_supply`, `total_supply`, `max_supply`, `ath`, `ath_change_percentage`, `ath_date`, `atl`, `atl_change_percentage`, `atl_date`, `roi`, `last_updated`
-
-Extra columns added by `price_change_percentage=1h,7d,30d`: `price_change_percentage_1h_in_currency`, `price_change_percentage_7d_in_currency`, `price_change_percentage_30d_in_currency`
-
-Pagination: use `page=2`, `page=3`, etc. with `per_page` up to 250. Results are 1-indexed — page 2 with per_page=5 returns ranks 6–10.
-
-### Coin detail (full metadata)
-
-```python
-import json
-data = json.loads(http_get(
-    "https://api.coingecko.com/api/v3/coins/bitcoin"
-    "?localization=false"    # skip 40+ language translations
-    "&tickers=false"         # skip exchange ticker list (can be huge)
-    "&market_data=true"
-    "&community_data=false"
-    "&developer_data=false"
-))
-print(data['name'])                                    # Bitcoin
-print(data['symbol'])                                  # btc
-print(data['market_cap_rank'])                         # 1
-print(data['market_data']['current_price']['usd'])     # 76279
-print(data['market_data']['ath']['usd'])               # 126080
-print(data['market_data']['ath_date']['usd'])          # 2025-10-06T18:57:42.558Z
-print(data['market_data']['circulating_supply'])       # 20017459.0
-print(data['description']['en'][:200])
-```
-
-Top-level keys: `id`, `symbol`, `name`, `web_slug`, `asset_platform_id`, `platforms`, `categories`, `description`, `links`, `image`, `genesis_date`, `sentiment_votes_up_percentage`, `market_cap_rank`, `market_data`, `last_updated`
-
-`market_data` sub-keys include: `current_price`, `ath`, `ath_change_percentage`, `ath_date`, `atl`, `atl_change_percentage`, `atl_date`, `market_cap`, `fully_diluted_valuation`, `total_volume`, `high_24h`, `low_24h`, `price_change_percentage_24h`, `price_change_percentage_7d`, `price_change_percentage_14d`, `price_change_percentage_30d`, `price_change_percentage_60d`, `price_change_percentage_200d`, `price_change_percentage_1y`, `circulating_supply`, `total_supply`, `max_supply`
-
-All price/market fields are objects keyed by currency code: `data['market_data']['current_price']['usd']`, `['eur']`, `['btc']`, etc.
-
-### Historical OHLCV
-
-```python
-import json
-# OHLCV candles: granularity auto-determined by `days`
-# 1d = 30-min candles, 7d = 4-hr candles, 14d+ = daily candles
-data = json.loads(http_get(
-    "https://api.coingecko.com/api/v3/coins/ethereum/ohlc?vs_currency=usd&days=7"
-))
-print(len(data))         # 42 candles for 7-day window
-print(data[-1])          # [1776499200000, 2407.32, 2412.96, 2402.21, 2405.03]
-#                         [timestamp_ms,   open,    high,    low,     close]
-
-# Convert timestamp:
-import datetime
-ts_ms = data[-1][0]
-dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
-```
-
-`days` options: `1`, `7`, `14`, `30`, `90`, `180`, `365`, `max`
-
-### Market chart (price + volume + market cap time series)
-
-```python
-import json
-# interval='daily' gives one point per day; omit for auto (hourly for <=90 days)
-chart = json.loads(http_get(
-    "https://api.coingecko.com/api/v3/coins/bitcoin/market_chart"
-    "?vs_currency=usd&days=7&interval=daily"
-))
-# Keys: 'prices', 'market_caps', 'total_volumes'
-# Each is a list of [timestamp_ms, value]
-print(len(chart['prices']))           # 8 points for 7-day daily
-print(chart['prices'][-1])            # [1776508393000, 76286.699...]
-print(chart['total_volumes'][-1])     # [1776508393000, 80459560788.47...]
-```
-
-### Market chart by date range
-
-```python
-import json, time
-now = int(time.time())
-thirty_days_ago = now - 30 * 86400
-chart = json.loads(http_get(
-    f"https://api.coingecko.com/api/v3/coins/bitcoin/market_chart/range"
-    f"?vs_currency=usd&from={thirty_days_ago}&to={now}"
-))
-# Granularity: <1 day → minutely, 1-90 days → hourly, >90 days → daily
-print(len(chart['prices']))    # 174 points for 7-day range (hourly)
-```
-
-### Search
-
-```python
-import json
-results = json.loads(http_get("https://api.coingecko.com/api/v3/search?query=solana"))
-# Top-level keys: 'coins', 'exchanges', 'icos', 'categories', 'nfts'
-for c in results['coins'][:3]:
-    print(f"{c['id']} | {c['symbol']} | rank {c['market_cap_rank']}")
-# solana | SOL | rank 7
-# solana-name-service | SNS | rank 1902
-```
-
-Search returns coins ordered by relevance, not market cap. First result is usually the canonical coin.
-
-### Trending (top 7 searched in last 24h)
-
-```python
-import json
-trending = json.loads(http_get("https://api.coingecko.com/api/v3/search/trending"))
-# Top-level keys: 'coins', 'nfts', 'categories'
-for item in trending['coins']:
-    c = item['item']
-    print(f"{c['name']} ({c['symbol']}) #{c['market_cap_rank']}")
-# Item keys: id, coin_id, name, symbol, market_cap_rank, thumb, small, large,
-#            slug, price_btc, score, data
-```
-
-`data` sub-object includes sparkline image URL, price/volume/market cap info if available.
-
-### Global market overview
-
-```python
-import json
-global_data = json.loads(http_get("https://api.coingecko.com/api/v3/global"))
-gd = global_data['data']
-print(f"Total market cap: ${gd['total_market_cap']['usd']/1e12:.2f}T")   # $2.66T
-print(f"24h volume: ${gd['total_volume']['usd']/1e9:.1f}B")              # $156.6B
-print(f"BTC dominance: {gd['market_cap_percentage']['btc']:.1f}%")       # 57.3%
-print(f"Active coins: {gd['active_cryptocurrencies']}")                   # 17,564
-print(f"Active exchanges: {gd['markets']}")                               # 1,475
-```
-
-### Coin categories (market cap by sector)
-
-```python
-import json
-cats = json.loads(http_get(
-    "https://api.coingecko.com/api/v3/coins/categories?order=market_cap_desc"
-))
-# 691 categories as of April 2026
-for cat in cats[:5]:
-    print(f"{cat['name']}: ${cat['market_cap']/1e9:.1f}B | 24h: {cat['market_cap_change_24h']:.1f}%")
-# Smart Contract Platform: $2204.8B | 24h: 0.9%
-# Layer 1 (L1): $2171.5B | 24h: 1.1%
-
-# Category keys: id, name, market_cap, market_cap_change_24h, content,
-#                top_3_coins_id, top_3_coins, volume_24h, updated_at
-```
-
-### Token price by contract address (ERC-20 and other chains)
-
-```python
-import json
-# Platform IDs: ethereum, binance-smart-chain, polygon-pos, avalanche, solana, etc.
-token = json.loads(http_get(
-    "https://api.coingecko.com/api/v3/simple/token_price/ethereum"
-    "?contract_addresses=0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"  # USDC
-    "&vs_currencies=usd"
-))
-print(token)
-# {'0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48': {'usd': 0.999861}}
-# Key is the lowercased contract address
-```
-
-## vs_currencies options
-
-63 currencies supported (confirmed live). Common ones:
-
-**Fiat**: `usd`, `eur`, `gbp`, `jpy`, `aud`, `cad`, `chf`, `cny`, `inr`, `krw`, `brl`, `mxn`, `sgd`, `hkd`, `nok`, `sek`, `dkk`, `nzd`, `zar`, `thb`, `try`, `aed`, `sar`, `myr`, `php`, `idr`, `pln`, `czk`, `huf`, `ron`
-
-**Crypto**: `btc`, `eth`, `ltc`, `bch`, `bnb`, `eos`, `xrp`, `xlm`, `link`, `dot`, `yfi`, `sol`
-
-**Commodities**: `xag` (silver), `xau` (gold)
-
-Get the full list:
-```python
-currencies = json.loads(http_get("https://api.coingecko.com/api/v3/simple/supported_vs_currencies"))
-# Returns list of 63 strings
-```
-
-## Endpoints that require Pro API (return HTTP 401)
-
-- `/coins/{id}/history?date=DD-MM-YYYY` — historical price on a specific date
-- `/coins/markets` with `category=` filter (the parameter is silently ignored, not 401)
-- `/coins/{id}/contract/{address}` — full contract token details
-
-Free tier alternatives:
-- For historical price on date: use `/market_chart/range` with a narrow time window
-- For category filtering: fetch `/coins/markets` unfiltered and filter client-side using `id` from `/coins/categories`
-
-## Ping / health check
-
-```python
-import json
-ping = json.loads(http_get("https://api.coingecko.com/api/v3/ping"))
-print(ping)   # {'gecko_says': '(V3) To the Moon!'}
-```
-
-Note: ping still counts against the rate limit. Don't use it to check if a 429 has resolved — just wait 65 seconds and retry your actual call.
-
-## Gotchas
-
-- **Rate limit is much stricter than advertised** — The official docs say "30 calls/min" but in practice you get 429 on call 3-4 with no delay between calls. Observed `Retry-After: 60` in the response header. Treat it as "3 calls/minute, wait 65s on 429." Using `time.sleep(5)` between calls in a loop is safe.
-
-- **Unknown coin IDs return `{}`, not an error** — `?ids=BTC` (uppercase) and `?ids=not_a_real_coin` both return an empty dict `{}`. Always check that the key you expect exists before accessing it.
-
-- **Symbol lookup requires `/coins/list` + client-side filter** — There's no "get by symbol" endpoint. Multiple coins share any given symbol. After fetching the list (17,564 entries), filter by `symbol` and pick by `name`.
-
-- **Coin ID casing matters** — IDs are always lowercase kebab-case: `bitcoin`, `ethereum`, `shiba-inu`. Uppercase or camelCase will silently return `{}`.
-
-- **OHLCV granularity is automatic** — The `days` parameter determines candle size automatically: `1` → 30-min candles, `7`/`14` → 4-hr candles, `30`+  → daily candles. You cannot override this on the free tier.
-
-- **`interval=daily` in market_chart affects point count** — Without `interval=daily`, a 7-day window returns hourly data (~168 points). With it, you get ~8 points. Choose based on whether you need resolution or summary.
-
-- **market_chart timestamps are in milliseconds** — Divide by 1000 for standard Unix time: `datetime.fromtimestamp(ts / 1000)`.
-
-- **`/coins/list` is expensive (rate-limit-wise)** — It returns 17,564 entries and costs one API call. Fetch once, store in a variable, filter locally. Don't call it in a loop.
-
-- **Pagination is 1-indexed** — `page=1` returns items 1–N, `page=2` returns N+1–2N. `page=0` returns the same as `page=1` (it doesn't error).
-
-- **`per_page` max is 250** — Requesting more than 250 per page silently returns 250. To get the full top-500, make two calls: `page=1&per_page=250` then `page=2&per_page=250`.
-
-- **Contract address keys are lowercased** — When using `/simple/token_price`, the response key is the lowercased contract address regardless of what case you sent. Always call `.lower()` before using addresses as dict keys.
-
-- **`tickers=false` is important for `/coins/{id}`** — Without it, the response includes a massive list of exchange tickers that can make the payload very large and slow to parse. Always set `tickers=false` unless you specifically need exchange data.
-
-- **ETH priced against BTC is supported** — `vs_currencies=btc` works: `ethereum` returns `{'btc': 0.03095861}`. Crypto-to-crypto pairs work the same as fiat pairs.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/coinmarketcap/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/coinmarketcap/scraping.md
deleted file mode 100644
index e6a381523..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/coinmarketcap/scraping.md
+++ /dev/null
@@ -1,463 +0,0 @@
-# CoinMarketCap — Data Extraction
-
-`https://coinmarketcap.com` — crypto market data. Three access paths tested: internal JSON API (fastest, no auth required), `__NEXT_DATA__` from HTML pages, and browser DOM. All real-money price data confirmed accurate against displayed UI values.
-
-## Do this first: pick your access path
-
-| Goal | Best approach | Latency |
-|------|--------------|---------|
-| Top N coins by market cap | Internal listing API | ~200ms |
-| Single coin price/stats/ATH | Internal detail API | ~100ms |
-| Global market metrics | Internal global-metrics API | ~65ms |
-| All coins on homepage (101 items) | `__NEXT_DATA__` main page | ~700ms |
-| Coin detail + full stats | `__NEXT_DATA__` currency page | ~700ms |
-| Historical OHLCV | Internal historical API | ~160ms |
-| Exchange pairs for a coin | Internal market-pairs API | ~200ms |
-| News/articles | Internal content API | ~220ms |
-
-**Never use the browser for read-only CMC tasks.** The internal API at `api.coinmarketcap.com` is accessible with no API key, no special headers, no auth — plain `http_get` works.
-
-**Do NOT use `pro-api.coinmarketcap.com`** — that is the paid API requiring a key.
-
----
-
-## Path 1: Internal listing API (fastest for ranked coins)
-
-Returns CMC-ranked coins with full price data in one call. No auth needed.
-
-```python
-import json
-
-resp = json.loads(http_get(
-    "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing"
-    "?start=1&limit=100&sortBy=market_cap&sortType=desc&convert=USD"
-))
-
-coins = resp['data']['cryptoCurrencyList']    # list of coin objects
-total_available = resp['data']['totalCount']  # 8374 as of 2026-04-18
-
-for c in coins:
-    usd = next(q for q in c['quotes'] if q['name'] == 'USD')
-    print(
-        f"#{c['cmcRank']} {c['symbol']}: "
-        f"${usd['price']:,.2f} | "
-        f"MCap ${usd['marketCap']/1e9:.1f}B | "
-        f"Vol24h ${usd['volume24h']/1e9:.1f}B | "
-        f"24h {usd['percentChange24h']:+.2f}% | "
-        f"CS {c['circulatingSupply']:,.0f}"
-    )
-```
-
-### Coin object fields
-
-Top-level (`c` in the loop above):
-```
-id, name, symbol, slug, cmcRank, marketPairCount,
-circulatingSupply, selfReportedCirculatingSupply,
-totalSupply, maxSupply, isActive, lastUpdated, dateAdded,
-quotes, isAudited, auditInfoList, badges
-```
-
-Per-quote fields (inside `c['quotes']`, filtered by `name == 'USD'`):
-```
-name, price, volume24h, volumePercentChange, marketCap,
-percentChange1h, percentChange24h, percentChange7d,
-percentChange30d, percentChange60d, percentChange90d,
-percentChange1y, ytdPriceChangePercentage,
-fullyDilluttedMarketCap, marketCapByTotalSupply,
-dominance, turnover, lastUpdated
-```
-
-### Query parameters
-
-```python
-# Pagination
-"?start=1&limit=100"        # page 1 of 100
-"?start=101&limit=100"      # page 2
-
-# Sort
-"sortBy=market_cap"         # default
-"sortBy=volume_24h"
-"sortBy=percent_change_24h"
-"sortBy=price"
-"sortBy=circulating_supply"
-"sortType=desc"             # or asc
-
-# Currency conversion (affects quote prices returned)
-"convert=USD"               # USD prices
-"convert=BTC"               # BTC-denominated
-
-# Filter by type
-"cryptoType=all"            # default — coins + tokens
-"cryptoType=coins"          # layer-1s only (633 results)
-"cryptoType=tokens"         # ERC-20 etc.
-
-# Filter by tag (DeFi, NFT, etc.)
-"tagSlugs=defi"             # 2698 results
-"tagSlugs=nft"
-```
-
----
-
-## Path 2: Internal detail API (single coin, full stats)
-
-Best for fetching one coin's complete data including ATH, ATL, 52-week high/low, volume ranks.
-
-```python
-import json
-
-# Look up by CMC coin ID (BTC=1, ETH=1027, XRP=52, SOL=5426, BNB=1839)
-resp = json.loads(http_get(
-    "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/detail?id=1"
-))
-data = resp['data']
-s = data['statistics']
-
-print(f"Price:             ${s['price']:,.2f}")
-print(f"Rank:              #{s['rank']}")
-print(f"Market Cap:        ${s['marketCap']:,.0f}")
-print(f"Volume 24h:        ${s['volume24h']:,.0f}")
-print(f"Circulating Supply:{s['circulatingSupply']:,.0f}")
-print(f"Total Supply:      {s['totalSupply']:,.0f}")
-print(f"Max Supply:        {s['maxSupply']:,.0f}")
-print(f"24h Change:        {s['priceChangePercentage24h']:+.2f}%")
-print(f"7d Change:         {s['priceChangePercentage7d']:+.2f}%")
-print(f"ATH:               ${s['highAllTime']:,.2f} on {s['highAllTimeTimestamp']}")
-print(f"ATL:               ${s['lowAllTime']:,.4f} on {s['lowAllTimeTimestamp']}")
-print(f"52w High:          ${s['high52w']:,.2f}")
-print(f"52w Low:           ${s['low52w']:,.2f}")
-print(f"MCap Dominance:    {s['marketCapDominance']:.2f}%")
-```
-
-### All statistics fields
-
-```
-price, priceChangePercentage1h, priceChangePercentage24h,
-priceChangePercentage7d, priceChangePercentage30d,
-priceChangePercentage60d, priceChangePercentage90d,
-priceChangePercentage1y, priceChangePercentageAll,
-marketCap, marketCapChangePercentage24h,
-fullyDilutedMarketCap, mintedMarketCap,
-circulatingSupply, totalSupply, maxSupply,
-marketCapDominance, rank, roi,
-low24h, high24h, low7d, high7d, low30d, high30d,
-low52w, high52w, low90d, high90d,
-lowAllTime, highAllTime,
-lowAllTimeChangePercentage, highAllTimeChangePercentage,
-lowAllTimeTimestamp, highAllTimeTimestamp,
-lowYesterday, highYesterday, openYesterday, closeYesterday,
-priceChangePercentageYesterday, volumeYesterday,
-ytdPriceChangePercentage, volumeRank, volumeMcRank,
-volume24h, volume24hReported, volume7d, volume7d Reported,
-volume30d, volume30dReported, turnover
-```
-
-### Top-level data fields (beyond statistics)
-
-```
-id, name, symbol, slug, category, description, dateAdded,
-volume, volumeChangePercentage24h, cexVolume, dexVolume,
-urls (website, explorer, twitter, reddit, etc.),
-tags, platforms, relatedCoins, wallets,
-holders, watchCount, watchListRanking
-```
-
----
-
-## Path 3: Global market metrics
-
-```python
-import json
-
-resp = json.loads(http_get(
-    "https://api.coinmarketcap.com/data-api/v3/global-metrics/quotes/latest"
-))
-data = resp['data']
-q = data['quotes'][0]   # USD quote (cryptoId=2781)
-
-print(f"Total Market Cap:  ${q['totalMarketCap']/1e12:.2f}T")
-print(f"Total Volume 24h:  ${q['totalVolume24H']/1e9:.1f}B")
-print(f"Altcoin MCap:      ${q['altcoinMarketCap']/1e12:.2f}T")
-print(f"DeFi MCap:         ${q['defiMarketCap']/1e9:.1f}B")
-print(f"DeFi Vol 24h:      ${q['defiVolume24H']/1e9:.1f}B")
-print(f"Stablecoin MCap:   ${q['stablecoinMarketCap']/1e9:.1f}B")
-print(f"Derivatives Vol:   ${q['derivativesVolume24H']/1e9:.1f}B")
-print(f"BTC Dominance:     {data['btcDominance']:.2f}%")
-print(f"ETH Dominance:     {data['ethDominance']:.2f}%")
-print(f"Active Cryptos:    {data['activeCryptoCurrencies']}")
-print(f"Total Cryptos:     {data['totalCryptoCurrencies']}")
-print(f"Active Exchanges:  {data['activeExchanges']}")
-print(f"Active Pairs:      {data['activeMarketPairs']}")
-
-# Yesterday comparison
-print(f"\nMCap Yesterday:    ${q['totalMarketCapYesterday']/1e12:.2f}T")
-print(f"MCap Change:       {q['totalMarketCapYesterdayPercentageChange']:+.2f}%")
-```
-
----
-
-## Path 4: Historical OHLCV (candlestick data)
-
-```python
-import json, time
-
-now = int(time.time())
-
-# Daily candles for BTC over last 7 days
-resp = json.loads(http_get(
-    "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical"
-    f"?id=1&convertId=2781&timeStart={now - 7*86400}&timeEnd={now}&interval=daily"
-))
-candles = resp['data']['quotes']   # list of OHLCV dicts
-
-for candle in candles:
-    q = candle['quote']
-    print(
-        f"{candle['timeOpen'][:10]} "
-        f"O={q['open']:,.0f} H={q['high']:,.0f} "
-        f"L={q['low']:,.0f} C={q['close']:,.0f} "
-        f"V=${q['volume']/1e9:.1f}B MCap=${q['marketCap']/1e12:.2f}T"
-    )
-```
-
-Candle quote fields: `open, high, low, close, volume, marketCap, circulatingSupply, timestamp`
-
-Supported intervals: `daily`, `1h` (hourly). `5m` returns HTTP 500 — not supported.
-
-`convertId=2781` = USD. `timeStart`/`timeEnd` are Unix timestamps.
-
----
-
-## Path 5: Exchange market pairs for a coin
-
-```python
-import json
-
-resp = json.loads(http_get(
-    "https://api.coinmarketcap.com/data-api/v3/cryptocurrency/market-pairs/latest"
-    "?id=1&start=1&limit=10&sort=volume"
-))
-data = resp['data']
-print(f"Total pairs for {data['name']}: {data['numMarketPairs']}")
-
-for pair in data['marketPairs']:
-    print(
-        f"  {pair['exchangeName']:20} {pair['marketPair']:12} "
-        f"${pair['price']:,.2f} Vol=${pair['volumeUsd']/1e6:.1f}M"
-    )
-```
-
-Pair fields: `rank, exchangeId, exchangeName, exchangeSlug, marketId, marketPair, category (spot/futures), baseSymbol, quoteSymbol, baseCurrencyId, quoteCurrencyId, price, volumeUsd, effectiveLiquidity, lastUpdated, volumeBase, volumeQuote, depthUsdNegativeTwo, depthUsdPositiveTwo, feeType, isVerified, type (cex/dex)`
-
----
-
-## Path 6: Exchange listings
-
-```python
-import json
-
-resp = json.loads(http_get(
-    "https://api.coinmarketcap.com/data-api/v3/exchange/listing"
-    "?start=1&limit=20&sortBy=score&sortType=desc"
-))
-exchanges = resp['data']['exchanges']
-for ex in exchanges:
-    print(f"  {ex['name']:30} score={ex.get('score')} trafficScore={ex.get('trafficScore')}")
-```
-
-Exchange fields: `id, name, slug, dexStatus, platformId, status, score, trafficScore, countries, fiats, filteredTotalVol24h`
-
----
-
-## Path 7: Price conversion (cross-rate)
-
-```python
-import json
-
-# Convert 1 BTC → USD
-resp = json.loads(http_get(
-    "https://api.coinmarketcap.com/data-api/v3/tools/price-conversion"
-    "?amount=1&id=1&convert_id=2781"
-))
-result = resp['data']
-usd_price = result['quote'][0]['price']
-print(f"1 {result['symbol']} = ${usd_price:,.2f} USD")
-
-# Convert ETH → BTC
-resp2 = json.loads(http_get(
-    "https://api.coinmarketcap.com/data-api/v3/tools/price-conversion"
-    "?amount=1&id=1027&convert_id=1"
-))
-btc_price = resp2['data']['quote'][0]['price']
-print(f"1 ETH = {btc_price:.6f} BTC")
-```
-
-`id` = source coin CMC ID, `convert_id` = target currency CMC ID (2781=USD, 1=BTC, 1027=ETH, 825=USDT)
-
----
-
-## Path 8: News / articles
-
-```python
-import json
-
-# News for a specific coin
-resp = json.loads(http_get(
-    "https://api.coinmarketcap.com/content/v3/news?coins=1&limit=10"
-))
-for article in resp['data']:
-    meta = article['meta']
-    print(f"  [{meta['sourceName']}] {meta['title']}")
-    print(f"    {article['createdAt'][:10]} — {meta['sourceUrl']}")
-```
-
-Article fields: `slug, cover, assets, createdAt` + nested `meta` with `title, subtitle, sourceName, sourceUrl, language, type, status, id, createdAt, updatedAt, releasedAt`
-
-Omit `coins=` param for general crypto news. Supports `limit` up to observed 50+ without errors.
-
----
-
-## Path 9: __NEXT_DATA__ from HTML pages
-
-Use when you need data that isn't in the API (e.g. Fear & Greed index, CMC100 index, trending categories).
-
-### Main page (`coinmarketcap.com/`)
-
-```python
-import json, re
-
-html = http_get("https://coinmarketcap.com/")
-m = re.search(r'<script id="__NEXT_DATA__"[^>]+>([\s\S]*?)</script>', html)
-nd = json.loads(m.group(1))
-props = nd['props']
-
-# Global market metrics (same data as global-metrics API, faster from HTML)
-gm = props['pageProps']['globalMetrics']
-print(f"Total cryptos: {gm['numCryptocurrencies']}")
-print(f"BTC dominance: {gm['btcDominance']:.2f}%")
-print(f"Total MCap:    ${gm['marketCap']/1e12:.2f}T")
-print(f"Total Vol 24h: ${gm['totalVol']/1e9:.1f}B")
-
-# Spot prices for BTC/ETH/USD/SATS/BITS (the "ticker bar" data)
-# props['quotesLatestData'] — 5 items with short field names
-for q in props['quotesLatestData']:
-    print(f"  {q['symbol']}: p={q['p']} p24h={q['p24h']:+.3f}%")
-    # fields: id, symbol, p (price), p1h, p24h, p7d, p30d, p60d, p90d, pytd, t
-
-# Top 101 coins with full USD quotes — from dehydratedState
-queries = props['dehydratedState']['queries']
-homepage_q = next(q for q in queries if q['queryKey'] == ['homepage-data', 1, 100])
-listing = homepage_q['state']['data']['data']['listing']
-coins = listing['cryptoCurrencyList']   # 101 coins
-total = listing['totalCount']
-
-for c in coins:
-    if c['symbol'] == 'BTC':
-        usd = next(q for q in c['quotes'] if q['name'] == 'USD')
-        print(f"BTC: #{c['cmcRank']} ${usd['price']:,.2f}")
-        break
-
-# Page-level shared data (Fear & Greed index, CMC20, altcoin index)
-psd = props['pageProps']['pageSharedData']
-print("pageSharedData keys:", list(psd.keys()))
-# keys: topCategories, fearGreedIndexData, cmc100, cmc20, faqData, altcoinIndex, halvingInfo, deviceInfo
-```
-
-**Gotcha — regex pattern**: Use `[^>]+` to match the `crossorigin="anonymous"` attribute on the script tag. `type="application/json"` alone will miss it:
-```python
-# CORRECT
-m = re.search(r'<script id="__NEXT_DATA__"[^>]+>([\s\S]*?)</script>', html)
-
-# WRONG — returns None because of crossorigin attr
-m = re.search(r'<script id="__NEXT_DATA__" type="application/json">(.*?)</script>', html, re.DOTALL)
-```
-
-**`quotesLatestData` has only 5 entries** (SATS, BITS, BTC, ETH, USD) — it's the currency selector bar, not the full market ranking. For the full ranked listing use `dehydratedState`.
-
-**`cmcRank` is at coin top level**, not inside the USD quote object. The `cmcRank` field inside the quote dict is `None`.
-
-### Individual coin page (`/currencies/{slug}/`)
-
-```python
-import json, re
-
-html = http_get("https://coinmarketcap.com/currencies/bitcoin/")
-m = re.search(r'<script id="__NEXT_DATA__"[^>]+>([\s\S]*?)</script>', html)
-nd = json.loads(m.group(1))
-
-# All stats under props.pageProps.detailRes.detail.statistics
-stats = nd['props']['pageProps']['detailRes']['detail']['statistics']
-
-print(f"Price:     ${stats['price']:,.2f}")
-print(f"Rank:      #{stats['rank']}")
-print(f"MCap:      ${stats['marketCap']:,.0f}")
-print(f"Vol 24h:   ${stats['volume24h']:,.0f}")
-print(f"Circ Sup:  {stats['circulatingSupply']:,.0f}")
-print(f"24h:       {stats['priceChangePercentage24h']:+.2f}%")
-print(f"ATH:       ${stats['highAllTime']:,.2f} ({stats['highAllTimeTimestamp']})")
-print(f"ATL:       ${stats['lowAllTime']:.4f}")
-```
-
-`detailRes.detail` also contains: `name, symbol, slug, description, tags, urls (website/explorer/twitter/reddit), platforms, relatedCoins, holders, watchCount`
-
-**Note**: The currency page has no JSON-LD blocks — zero `<script type="application/ld+json">` elements.
-
----
-
-## Common coin IDs
-
-| ID | Symbol | Name |
-|----|--------|------|
-| 1 | BTC | Bitcoin |
-| 1027 | ETH | Ethereum |
-| 52 | XRP | XRP |
-| 825 | USDT | Tether |
-| 1839 | BNB | BNB |
-| 3408 | USDC | USD Coin |
-| 5426 | SOL | Solana |
-| 74 | DOGE | Dogecoin |
-| 2781 | USD | US Dollar (for convert_id) |
-
-Find IDs from the listing API: `c['id']` or from the detail API URL by looking up a slug first via the listing API's `c['slug']` field.
-
----
-
-## Anti-bot / rate limits
-
-**Main site (`coinmarketcap.com`):**
-- `http_get` works with the default `Mozilla/5.0` UA — no Cloudflare, no bot detection triggered.
-- Page loads are ~700ms for 690–710KB of HTML+`__NEXT_DATA__`.
-
-**Internal API (`api.coinmarketcap.com`):**
-- No auth headers required. No `X-Request-Id` or `X-Forwarded-For` needed.
-- 25 rapid sequential calls with zero rate limiting or errors — no throttle observed at that volume.
-- Typical latency: 65–250ms per call.
-- `error_code: '0'` and `error_message: 'SUCCESS'` in every response; no `credit_count` consumed.
-
-**v2 API (`api.coinmarketcap.com/v2/`):**
-- Returns HTTP 401 Unauthorized — requires API key. Do not use.
-
-**Pro API (`pro-api.coinmarketcap.com`):**
-- Paid, requires `X-CMC_PRO_API_KEY` header. Do not test or call.
-
----
-
-## Gotchas
-
-- **No JSON-LD on any page tested** — coin pages have zero `<script type="application/ld+json">` elements. Don't look for schema.org markup.
-
-- **`__NEXT_DATA__` regex**: Must use `[^>]+` between `__NEXT_DATA__"` and `>` — the tag has `crossorigin="anonymous"` which breaks a naive `type="application/json">` match.
-
-- **`cmcRank` location on homepage listing**: It's `c['cmcRank']` (top-level), NOT inside `c['quotes'][n]['cmcRank']` (that field is always `None`).
-
-- **5m/sub-hourly OHLCV not available**: Interval `5m` returns HTTP 500. Use `1h` for sub-daily and `daily` for longer ranges.
-
-- **v2 API is auth-only**: `api.coinmarketcap.com/v2/...` requires API key (401). The equivalent data is available via `data-api/v3/` without auth.
-
-- **`convert` param accepts symbols not just IDs** in the listing API, but `convert_id` in the price-conversion API requires numeric IDs (e.g. `2781` not `USD`).
-
-- **Circulating supply**: `c['circulatingSupply']` at the coin top level in the listing response — not inside the quote. The quote has `marketCap` which equals `price * circulatingSupply`.
-
-- **Multiple quotes per coin**: The listing API returns multiple quote objects when you request multiple convert currencies. Always filter by `q['name'] == 'USD'` (or your target currency) before reading price fields.
-
-- **Pagination**: `start` is 1-indexed (not 0-indexed). `start=1&limit=100` returns items 1–100, `start=101&limit=100` returns items 101–200.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/coursera/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/coursera/scraping.md
deleted file mode 100644
index f025469dc..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/coursera/scraping.md
+++ /dev/null
@@ -1,360 +0,0 @@
-# Coursera — Course & Catalog Data Extraction
-
-Field-tested against coursera.org and api.coursera.org on 2026-04-18.
-No authentication required for the public catalog API.
-
-## TL;DR — Fastest Approach
-
-Use `http_get` against `api.coursera.org`. The public REST API returns clean JSON with no
-auth, no bot-detection, and sub-600ms latency. Use `q=search` with a keyword
-only when you need full-text search (requires a browser POST workaround — see below).
-For bulk enumeration, iterate the catalog list with `start` pagination.
-
----
-
-## 1. Catalog List (http_get — always works)
-
-The default list query (`q=list` implied) returns ALL courses in Coursera's catalog —
-20,659 as of the test date.
-
-```python
-from helpers import http_get
-import json
-
-resp = http_get(
-    "https://api.coursera.org/api/courses.v1"
-    "?fields=name,slug,description,primaryLanguages,workload,"
-    "partnerIds,courseType,instructorIds,domainTypes,photoUrl,certificates"
-    "&limit=100&start=0"
-)
-data = json.loads(resp)
-courses = data["elements"]   # list of dicts
-next_start = data["paging"].get("next")   # e.g. "100", None when exhausted
-total = data["paging"].get("total")       # 20659
-```
-
-### Response structure (confirmed field names)
-
-```json
-{
-  "courseType": "v2.ondemand",
-  "description": "Gamification is the application of game elements...",
-  "domainTypes": [
-    {"domainId": "computer-science", "subdomainId": "design-and-product"},
-    {"domainId": "business",         "subdomainId": "marketing"}
-  ],
-  "photoUrl": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera-course-photos.s3.amazonaws.com/...",
-  "id":             "69Bku0KoEeWZtA4u62x6lQ",
-  "slug":           "gamification",
-  "instructorIds":  ["226710"],
-  "specializations": [],
-  "workload":       "4-8 hours/week",
-  "primaryLanguages": ["en"],
-  "partnerIds":     ["6"],
-  "certificates":   ["VerifiedCert"],
-  "name":           "Gamification"
-}
-```
-
-Field notes:
-- `id` — opaque base64-ish string, stable identifier. Use for batch lookups and linking.
-- `slug` — URL-safe identifier. Course page: `https://www.coursera.org/learn/{slug}`
-- `courseType` — always `"v2.ondemand"` for self-paced courses in practice.
-- `workload` — free-text string, e.g. `"4-8 hours/week"`, `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Not normalized.
-- `primaryLanguages` — ISO 639-1 list, e.g. `["en"]`, `["fr"]`.
-- `partnerIds` — list of partner (university/org) IDs. Join to `partners.v1` by id.
-- `instructorIds` — list of instructor IDs. Join to `instructors.v1` by id.
-- `domainTypes` — list of `{domainId, subdomainId}` objects. Domain IDs include `"data-science"`, `"computer-science"`, `"business"`, `"information-technology"`.
-- `certificates` — list of cert types, typically `["VerifiedCert"]`.
-- `photoUrl` — direct CDN URL to course image. Works without auth.
-- `specializations` — list of specialization IDs this course belongs to (often empty; not always populated here — use `onDemandSpecializations.v1` instead).
-- `previewLink` — field exists but was empty in all tested records; skip it.
-- `avgRating` — field does NOT appear in public API responses; not available.
-
-### Pagination
-
-```python
-def iter_all_courses(fields=None, page_size=100):
-    base_fields = "name,slug,description,primaryLanguages,workload,partnerIds,courseType,domainTypes,photoUrl"
-    if fields:
-        base_fields = fields
-    start = 0
-    while True:
-        url = (
-            f"https://api.coursera.org/api/courses.v1"
-            f"?fields={base_fields}&limit={page_size}&start={start}"
-        )
-        data = json.loads(http_get(url))
-        yield from data["elements"]
-        nxt = data["paging"].get("next")
-        if nxt is None:
-            break
-        start = int(nxt)
-```
-
-- `paging.next` is a string offset (e.g. `"100"`), or absent when exhausted.
-- `paging.total` is present on the first page (e.g. `20659`) but absent on subsequent pages.
-- `limit` up to at least 1000 works (tested: 1000 returned 1000 items). Use 100–500 for safe batches.
-
----
-
-## 2. Partners API (http_get — works)
-
-422 partners (universities, companies) as of test date.
-
-```python
-resp = http_get(
-    "https://api.coursera.org/api/partners.v1"
-    "?fields=name,squareLogo,description,shortName&limit=50&start=0"
-)
-data = json.loads(resp)
-partners = data["elements"]
-# paging.next and paging.total follow same structure as courses
-```
-
-### Partner record structure
-
-```json
-{
-  "id":          "6",
-  "name":        "University of Pennsylvania",
-  "shortName":   "penn",
-  "description": "The University of Pennsylvania (commonly referred to as Penn)...",
-  "squareLogo":  "http://coursera-university-assets.s3.amazonaws.com/.../logo.png"
-}
-```
-
-### Partner by ID (with courseIds)
-
-```python
-resp = http_get(
-    "https://api.coursera.org/api/partners.v1"
-    "?ids=6&fields=name,squareLogo,description,shortName,courseIds"
-)
-data = json.loads(resp)
-partner = data["elements"][0]
-# partner["courseIds"] is a list of course ID strings (150+ for large universities)
-```
-
----
-
-## 3. Specializations API (http_get — works)
-
-```python
-resp = http_get(
-    "https://api.coursera.org/api/onDemandSpecializations.v1"
-    "?fields=name,slug,description,partnerIds,courseIds,tagline&limit=100&start=0"
-)
-data = json.loads(resp)
-specs = data["elements"]
-```
-
-### Specialization record structure
-
-```json
-{
-  "id":          "AbCdEfGhIjKl",
-  "name":        "SIEM Splunk",
-  "slug":        "siem-splunk",
-  "tagline":     "Learn SIEM fundamentals with Splunk",
-  "description": "Course Overview:\n\nIn the \"SIEM Splunk\" specialization course...",
-  "partnerIds":  ["1441"],
-  "courseIds":   ["pu2XQCuEEe6qTBJCf71DPw", "Xc46mVFkEe6a4wrvTcwXPw", "YH1ok1FXEe62cBI5JZME2w"]
-}
-```
-
-Note: Specializations paging does NOT include `paging.total` — iterate until `paging.next` is absent.
-
----
-
-## 4. Instructors API (http_get — works)
-
-Only useful for lookups by ID (from course `instructorIds`). The plain list endpoint
-returns many empty records (empty name/bio).
-
-```python
-# Lookup specific instructors by ID
-resp = http_get(
-    "https://api.coursera.org/api/instructors.v1"
-    "?ids=226710&fields=fullName,bio,department,title,photo"
-)
-data = json.loads(resp)
-instructor = data["elements"][0]
-```
-
-### Instructor record structure
-
-```json
-{
-  "id":         "226710",
-  "fullName":   "Kevin Werbach",
-  "title":      "Professor of Legal Studies and Business Ethics",
-  "department": "Legal Studies and Business Ethics",
-  "bio":        "Kevin Werbach is professor of Legal Studies...",
-  "photo":      "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/..."
-}
-```
-
----
-
-## 5. Batch ID Lookup
-
-Fetch multiple courses (or partners/instructors) in one request by passing a comma-separated `ids` list:
-
-```python
-ids = ",".join(["69Bku0KoEeWZtA4u62x6lQ", "hOzhxVNuEfCW8Q55q1kSNQ", "0HiU7Oe4EeWTAQ4yevf_oQ"])
-resp = http_get(
-    f"https://api.coursera.org/api/courses.v1"
-    f"?ids={ids}&fields=name,slug,description,primaryLanguages,workload,partnerIds"
-)
-data = json.loads(resp)
-# data["elements"] has exactly the courses you asked for
-```
-
-No observed limit on the number of IDs per request in testing (tried up to 3).
-
----
-
-## 6. Keyword Search — BLOCKED for GET (405)
-
-`q=search&query=...` returns **HTTP 405 Method Not Allowed** on GET.
-This applies to all three resource types:
-- `courses.v1?q=search&query=python` → 405
-- `onDemandSpecializations.v1?q=search&query=data+science` → 405
-- `partners.v1?q=search&query=stanford` → 405
-
-The search endpoint requires a POST request (Coursera's public Autocomplete/Search
-service). For keyword-based discovery without a browser, use the catalog list and filter
-client-side, or use the browser approach below.
-
-### Browser fallback for keyword search
-
-```python
-new_tab("https://www.coursera.org/search?query=machine+learning")
-wait_for_load()
-wait(3)  # Results load asynchronously via React
-capture_screenshot()
-```
-
-Note: The search results page (`/search?query=...`) is a client-rendered React app. The
-HTML returned by `http_get` does NOT contain course cards — it's a bare shell with no
-`__NEXT_DATA__` or embedded JSON. A live browser is required to see rendered results.
-
----
-
-## 7. Course Detail HTML Page (http_get — works, limited data)
-
-```python
-html = http_get("https://www.coursera.org/learn/machine-learning")
-# html is ~980KB of server-rendered HTML (no NEXT_DATA, no Apollo state)
-```
-
-The course detail page IS served as full HTML (no JS-gate), but contains very
-little machine-readable course data. What you can extract:
-
-```python
-import re, json
-
-# Page title (includes course name)
-title = re.search(r'<title[^>]*>(.*?)</title>', html).group(1)
-# "Supervised Machine Learning: Regression and Classification  | Coursera"
-
-# JSON-LD blocks (2 present)
-jsonld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
-# Block 0: FAQPage schema (common Q&A about how courses work)
-# Block 1: BreadcrumbList (category path, e.g. Browse > Data Science > Machine Learning)
-faq   = json.loads(jsonld_blocks[0])   # {"@type": "FAQPage", "mainEntity": [...]}
-crumb = json.loads(jsonld_blocks[1])   # {"@type": "BreadcrumbList", "itemListElement": [...]}
-
-# Extract breadcrumb categories
-categories = [item["item"]["name"] for item in crumb["@graph"][0]["itemListElement"]]
-# e.g. ["Browse", "Data Science", "Machine Learning"]
-```
-
-The HTML does NOT embed: description, rating, instructor names, enrollment count,
-price, or any course-specific metadata as machine-readable fields.
-Use the API (`courses.v1?ids=...`) to get those from the slug.
-
-### Slug-to-ID lookup pattern
-
-```python
-# Get course data from slug (need ID first — get it from catalog or search)
-# Pattern: enumerate catalog, match by slug
-resp = http_get("https://api.coursera.org/api/courses.v1?fields=name,slug,description&limit=100&start=0")
-data = json.loads(resp)
-by_slug = {el["slug"]: el for el in data["elements"]}
-course = by_slug.get("machine-learning")
-```
-
----
-
-## Endpoints Summary
-
-| Endpoint | Method | Result |
-|---|---|---|
-| `courses.v1` (list) | GET | 200 OK — full catalog, 20,659 courses |
-| `courses.v1?ids=...` | GET | 200 OK — batch lookup by ID |
-| `courses.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
-| `partners.v1` (list) | GET | 200 OK — 422 partners |
-| `partners.v1?ids=...` | GET | 200 OK — with courseIds |
-| `partners.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
-| `onDemandSpecializations.v1` (list) | GET | 200 OK — paginated (no total) |
-| `onDemandSpecializations.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
-| `instructors.v1?ids=...` | GET | 200 OK — rich records by ID |
-| `instructors.v1` (list) | GET | 200 OK — mostly empty records |
-| `degrees.v1` | GET | 403 Forbidden |
-| `/search?query=...` page HTML | GET | 200 OK — React shell only, no data |
-| `/learn/{slug}` page HTML | GET | 200 OK — HTML with JSON-LD breadcrumb only |
-
----
-
-## Rate Limits
-
-No rate limiting observed in testing:
-- 5 consecutive requests with no delay: all succeeded, avg 0.55s each.
-- No `X-RateLimit-*` or `Retry-After` headers in responses.
-- No auth headers needed for any working endpoint.
-
-Response headers that are present: `X-Coursera-Request-Id`, `X-Coursera-Trace-Id-Hex`,
-`x-envoy-upstream-service-time`. No rate-limit indicators.
-
-Use a small delay (0.5s) between requests if doing bulk enumeration of the full 20K+
-catalog as a courtesy, but no hard cap was observed.
-
----
-
-## Gotchas
-
-- **`q=search` is POST-only**: All three resource types (courses, specializations,
-  partners) return 405 on GET when `q=search` is added. There is no documented public
-  POST endpoint. For keyword filtering, enumerate the catalog and filter client-side.
-
-- **`paging.total` absent after page 1**: Only the first page response includes
-  `paging.total`. Subsequent pages have only `paging.next`. Check for the `"next"` key
-  being absent to detect end-of-list.
-
-- **Specializations never include `paging.total`**: The `onDemandSpecializations.v1`
-  endpoint never returns `paging.total` in any page. Iterate until `"next"` is absent.
-
-- **`workload` is free-text, unnormalized**: Values include `"4-8 hours/week"`,
-  `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Do not parse as a number
-  without normalization logic.
-
-- **`instructors.v1` list returns empty records**: The plain list endpoint returns many
-  instructors with empty `fullName`, `bio`, `title`. Always look up by `ids=` using
-  IDs from course records.
-
-- **`degrees.v1` is 403**: Degree programs are not accessible via the public API.
-
-- **HTML pages contain no embedded course data**: Both the search page and the course
-  detail page are React-rendered. `http_get` on `/search?query=...` returns an HTML
-  shell with no course listings. `http_get` on `/learn/{slug}` returns HTML with only
-  a FAQ JSON-LD and a breadcrumb JSON-LD — no course description, rating, price, or
-  enrollment data as machine-readable fields.
-
-- **`linked` resources don't populate**: Passing `includes=partners.v1` to the courses
-  endpoint returns an empty `linked: {}` object. Cross-resource joins require separate
-  requests by IDs.
-
-- **`previewLink` and `avgRating` fields**: These field names are accepted without error
-  but return no data in the response objects. Do not request them.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/craigslist/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/craigslist/scraping.md
deleted file mode 100644
index 4e93c5289..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/craigslist/scraping.md
+++ /dev/null
@@ -1,390 +0,0 @@
-# Craigslist — Scraping via http_get
-
-Field-tested against sfbay.craigslist.org and multiple city subdomains on 2026-04-18.
-`http_get` works without any bot detection — no CAPTCHA, no block, no rate limit observed.
-Craigslist serves a full server-rendered HTML fallback (the `<ol class="cl-static-search-results">` block)
-intended for no-JS browsers. This fallback contains **all matching results in one response** (300–360
-items typical), regardless of the `s=` offset parameter. No browser needed.
-
-## Key discovery: static HTML returns everything at once
-
-When you `http_get` a Craigslist search URL, the server includes a `<ol class="cl-static-search-results">`
-block that contains every matching listing (up to ~360) in a single HTML response. The `s=` pagination
-parameter is ignored by the static renderer — it is only meaningful for the JS-driven XHR path used by
-real browsers. For scraping purposes, this means:
-
-- One `http_get` call per search query returns the full result set (no pagination loop needed).
-- For broader searches, narrow via `query=`, `min_price=`, `max_price=`, and category code in the URL.
-- If you need more than ~360 results, you must use a headless browser with JS. For most tasks,
-  one request is sufficient.
-
-## URL patterns
-
-### City subdomains
-```
-https://{city}.craigslist.org/search/{category_code}?query=...
-```
-
-Confirmed working cities (exact subdomain names):
-
-| City           | Subdomain        |
-|----------------|------------------|
-| SF Bay Area    | `sfbay`          |
-| New York       | `newyork`        |
-| Chicago        | `chicago`        |
-| Los Angeles    | `losangeles`     |
-| Seattle        | `seattle`        |
-| Boston         | `boston`         |
-| Miami          | `miami`          |
-| Denver         | `denver`         |
-| Austin         | `austin`         |
-| Portland       | `portland`       |
-| San Diego      | `sandiego`       |
-| Phoenix        | `phoenix`        |
-
-### Category codes (confirmed working)
-
-| Code  | Category                  |
-|-------|---------------------------|
-| `sss` | For Sale — all            |
-| `for` | For Sale — general        |
-| `ela` | Electronics (listings)    |
-| `ele` | Electronics (search)      |
-| `fua` | Furniture                 |
-| `clo` | Clothing & accessories    |
-| `spo` | Sporting goods            |
-| `toy` | Toys & games              |
-| `cto` | Cars+trucks — by owner    |
-| `cta` | Cars+trucks — by dealer   |
-| `hhh` | Housing — all             |
-| `apa` | Apartments                |
-| `roo` | Rooms & shares            |
-| `sub` | Sublets & temporary       |
-| `jjj` | Jobs — all                |
-| `sof` | Software/QA/DBA jobs      |
-| `bbb` | Services — all            |
-| `ggg` | Gigs — all                |
-| `com` | Community                 |
-| `eve` | Events                    |
-| `vol` | Volunteers                |
-
-### Query parameters
-
-| Parameter     | Effect                                         |
-|---------------|------------------------------------------------|
-| `query=`      | Keyword search                                 |
-| `sort=rel`    | Sort by relevance (default)                    |
-| `sort=date`   | Sort by newest first                           |
-| `sort=priceasc`  | Price low to high                           |
-| `sort=pricedsc`  | Price high to low                           |
-| `min_price=`  | Minimum price filter                           |
-| `max_price=`  | Maximum price filter                           |
-| `condition=10` | New (for-sale listings)                       |
-| `condition=20` | Like new                                      |
-| `condition=30` | Excellent                                     |
-| `condition=40` | Good                                          |
-| `condition=50` | Fair                                          |
-| `condition=60` | Salvage                                       |
-| `bedrooms=`   | Number of bedrooms (housing only)              |
-| `auto_make_model=` | Car make/model filter (cars category)   |
-| `s=`          | Pagination offset — **ignored in static HTML** |
-
-### Example URLs
-```python
-# For-sale keyword search
-"https://sfbay.craigslist.org/search/sss?query=macbook&sort=rel"
-
-# Price-filtered electronics
-"https://sfbay.craigslist.org/search/ela?query=iphone&min_price=100&max_price=500"
-
-# Apartments, 2 bedrooms, price range
-"https://sfbay.craigslist.org/search/apa?bedrooms=2&min_price=1000&max_price=2500"
-
-# Cars by owner, Toyota
-"https://sfbay.craigslist.org/search/cto?auto_make_model=toyota"
-
-# Jobs in another city
-"https://chicago.craigslist.org/search/jjj?query=python+developer"
-```
-
-## Listing card HTML structure
-
-Each listing is an `<li class="cl-static-search-result">` inside `<ol class="cl-static-search-results">`.
-
-```html
-<li class="cl-static-search-result" title="MacBook Air M2 256GB 8GB RAM">
-  <a href="https://sfbay.craigslist.org/sby/ele/d/san-jose-macbook-air-m2/7928508295.html">
-    <div class="title">MacBook Air M2 256GB 8GB RAM</div>
-    <div class="details">
-      <div class="price">$900</div>
-      <div class="location">San Jose</div>
-    </div>
-  </a>
-</li>
-```
-
-Fields available in the listing card:
-- **Title**: `title` attribute on `<li>` OR text inside `<div class="title">`
-- **URL**: `href` on the `<a>` tag — always a full absolute URL
-- **Price**: `<div class="price">` — may be absent on free/contact-for-price listings
-- **Location/neighborhood**: `<div class="location">` — neighborhood name or city
-- **Post ID**: last segment of the URL before `.html` (e.g. `/7928508295.html` → `7928508295`)
-
-URL pattern: `https://{city}.craigslist.org/{area}/{category_code}/d/{slug}/{post_id}.html`
-
-## Parsing search results (field-tested)
-
-```python
-import re
-from helpers import http_get
-
-def search_craigslist(city, category, query, min_price=None, max_price=None):
-    params = f"query={query.replace(' ', '+')}&sort=rel"
-    if min_price: params += f"&min_price={min_price}"
-    if max_price: params += f"&max_price={max_price}"
-    url = f"https://{city}.craigslist.org/search/{category}?{params}"
-    headers = {"User-Agent": "Mozilla/5.0"}
-    html = http_get(url, headers=headers)
-
-    listings = re.findall(
-        r'<li class="cl-static-search-result" title="([^"]+)"[^>]*>\s*'
-        r'<a href="([^"]+)"[^>]*>.*?'
-        r'<div class="price">([^<]*)</div>.*?'
-        r'<div class="location">\s*([^<]*?)\s*</div>',
-        html, re.DOTALL
-    )
-
-    results = []
-    for title, url, price, location in listings:
-        pid_match = re.search(r'/(\d+)\.html$', url)
-        results.append({
-            "post_id": pid_match.group(1) if pid_match else None,
-            "title": title,
-            "url": url,
-            "price": price.strip() or None,  # None if listing has no price
-            "location": location.strip(),
-        })
-    return results
-
-# Usage
-results = search_craigslist("sfbay", "sss", "macbook pro", max_price=1000)
-for r in results[:5]:
-    print(r["post_id"], r["price"], r["location"], r["title"][:50])
-```
-
-### Handling missing price
-
-Listings without a price have no `<div class="price">` element. The regex above returns an empty string
-for `price`; the example converts that to `None`. A more robust extraction:
-
-```python
-def parse_listings(html):
-    results = []
-    for block in re.findall(r'<li class="cl-static-search-result"(.*?)</li>', html, re.DOTALL):
-        title = re.search(r'title="([^"]+)"', block)
-        url   = re.search(r'href="([^"]+)"', block)
-        price = re.search(r'<div class="price">([^<]+)</div>', block)
-        loc   = re.search(r'<div class="location">\s*([^<]*?)\s*</div>', block)
-        if not url: continue
-        url_str = url.group(1)
-        pid = re.search(r'/(\d+)\.html$', url_str)
-        results.append({
-            "post_id": pid.group(1) if pid else None,
-            "title": title.group(1) if title else None,
-            "url": url_str,
-            "price": price.group(1).strip() if price else None,
-            "location": loc.group(1).strip() if loc else None,
-        })
-    return results
-```
-
-## Individual listing page extraction
-
-Listing pages are also fully server-rendered. All fields are present in the raw HTML.
-
-```python
-def get_listing(url):
-    headers = {"User-Agent": "Mozilla/5.0"}
-    html = http_get(url, headers=headers)
-
-    title    = re.search(r'<span id="titletextonly">([^<]+)</span>', html)
-    price    = re.search(r'<span class="price">(\$[\d,]+)</span>', html)
-    # Location is in parentheses right after the price span
-    location = re.search(
-        r'<span class="price">[^<]+</span><span>\s*\(([^)]+)\)\s*</span>', html
-    )
-    posted   = re.search(r'class="date timeago"[^>]+datetime="([^"]+)"', html)
-    post_id  = re.search(r'post id:\s*(\d+)', html)
-
-    # Description body
-    body_block = re.search(r'section id="postingbody"[^>]*>(.*?)</section>', html, re.DOTALL)
-    body_text  = ""
-    if body_block:
-        raw = re.sub(r'<[^>]+>', '', body_block.group(1)).strip()
-        # Remove the "QR Code Link to This Post" print-only block
-        body_text = re.sub(r'QR Code Link to This Post\s*', '', raw).strip()
-        body_text = re.sub(r'\s+', ' ', body_text)
-
-    # Images
-    images = re.findall(r'https://images\.craigslist\.org/[^\s"\']+_600x450\.jpg', html)
-
-    # Attributes (condition, make, model, etc.)
-    attrs = {}
-    for labl, valu in re.findall(
-        r'<span class="labl">([^<]+)</span>.*?<span class="valu">\s*(?:<[^>]+>\s*)*([^<\n]+?)(?:\s*</|\s*<a)',
-        html, re.DOTALL
-    ):
-        attrs[labl.strip().rstrip(':')] = valu.strip()
-
-    return {
-        "post_id":  post_id.group(1) if post_id else None,
-        "title":    title.group(1) if title else None,
-        "price":    price.group(1) if price else None,
-        "location": location.group(1) if location else None,
-        "posted":   posted.group(1) if posted else None,  # ISO 8601 with TZ
-        "body":     body_text,
-        "images":   images,
-        "attrs":    attrs,
-    }
-```
-
-### Sample output — for-sale listing
-```python
-{
-    "post_id":  "7917381408",
-    "title":    "Brand new iphone 15 case and screen protector",
-    "price":    "$6",
-    "location": "cupertino",
-    "posted":   "2026-02-24T16:08:38-0800",
-    "body":     "I bought a new phone. These are brand new! Plz lmk if you are interested.",
-    "images":   ["https://images.craigslist.org/00e0e_xxx_600x450.jpg"],
-    "attrs":    {"condition": "like new", "make / manufacturer": "Apple", "model name / number": "iPhone 15 Plus"},
-}
-```
-
-### Housing-specific fields
-
-Housing listings have `<span class="attr important">` blocks for bedrooms/bathrooms and square footage,
-separate from the `<div class="attr">` attribute grid:
-
-```python
-# BR/BA
-br_ba  = re.search(r'(\d+)BR\s*/\s*(\d+(?:\.\d+)?)Ba', html)
-# Square footage
-sqft   = re.search(r'(\d+)ft<sup>2</sup>', html)
-
-if br_ba: bedrooms, bathrooms = br_ba.groups()
-if sqft:  sqft_val = sqft.group(1)
-```
-
-## JSON-LD structured data (alternative extraction path)
-
-Each search page includes an `ItemList` JSON-LD block with up to 330 items. Useful when you want
-structured data (price as float, geo coordinates) without regex parsing of HTML:
-
-```python
-import json, re
-from helpers import http_get
-
-html = http_get("https://sfbay.craigslist.org/search/sss?query=laptop", headers={"User-Agent": "Mozilla/5.0"})
-ld_blocks = re.findall(r'<script type="application/ld\+json"[^>]*>(.*?)</script>', html, re.DOTALL)
-
-for raw in ld_blocks:
-    data = json.loads(raw)
-    if data.get('@type') == 'ItemList':
-        for item in data['itemListElement']:
-            listing = item['item']
-            print(
-                listing.get('name'),
-                listing.get('offers', {}).get('price'),
-                listing.get('offers', {}).get('priceCurrency'),
-                listing.get('offers', {}).get('availableAtOrFrom', {}).get('address', {}).get('addressLocality'),
-            )
-```
-
-JSON-LD item fields available: `name`, `description`, `image` (list of URLs),
-`offers.price` (float string e.g. `"900.00"`), `offers.priceCurrency`, `offers.availableAtOrFrom.address`,
-`offers.availableAtOrFrom.geo.latitude`, `offers.availableAtOrFrom.geo.longitude`.
-
-Note: JSON-LD items do not include the listing URL or post ID — use the HTML parser for those.
-Combine both: use JSON-LD for price/geo, HTML for URL/post ID.
-
-## Pagination behavior
-
-The `s=` offset parameter in the URL is only respected by the JS-driven XHR layer in a real browser.
-When accessed via `http_get`, the static HTML fallback renders all results regardless of `s=`:
-
-```
-s=0   → same 342 listings
-s=120 → same 342 listings  (confirmed identical URL sets)
-s=300 → same 342 listings
-```
-
-**Recommendation**: Do not attempt pagination via `http_get`. Use search filters to narrow results:
-
-```python
-# Instead of paginating, narrow by price range
-under_500 = search_craigslist("sfbay", "sss", "macbook", max_price=500)
-over_500  = search_craigslist("sfbay", "sss", "macbook", min_price=501)
-```
-
-If true pagination is required (e.g. you need more than 350 results), you must use a browser session
-with `goto_url()` + `wait_for_load()`.
-
-## Bot detection
-
-None observed. Craigslist does not block `http_get` requests. During testing:
-- All 6+ test cities returned full HTML (HTML size 174K–530K bytes per page)
-- No CAPTCHA page, no redirect to `robot-check`, no `403`
-- No cookie or session required
-- Works with minimal `User-Agent` header: `"Mozilla/5.0"` is sufficient
-
-Defensive check (in case behavior changes):
-
-```python
-def is_blocked(html):
-    return (
-        len(html) < 5000 or
-        "blocked" in html[:2000].lower() or
-        "captcha" in html[:2000].lower() or
-        "cl-static-search-result" not in html
-    )
-```
-
-## Gotchas
-
-- **`data-pid` does not exist in static HTML**: Old Craigslist used `data-pid` attributes. The current
-  static renderer uses `<li class="cl-static-search-result">` with title attribute and embedded `<a href>`.
-  Do not search for `data-pid`, `result-row`, or `cl-search-result` — they are absent.
-
-- **Post ID comes from the URL, not an attribute**: Extract it as the numeric segment before `.html`
-  in the listing URL: `re.search(r'/(\d+)\.html$', url).group(1)`.
-
-- **Price may be absent**: Free listings and "contact for price" listings have no `<div class="price">`.
-  The regex returns an empty string; convert to `None`.
-
-- **`s=` pagination is a no-op in static HTML**: The fallback renderer always returns the full result set.
-  Don't loop over pages — filter instead.
-
-- **HTML entities in titles**: Titles may contain `&amp;`, `&quot;`, etc. Use
-  `html.unescape(title)` from the standard library if you need clean text.
-
-- **URL structure varies by area**: The area code in the URL (`/sby/`, `/sfc/`, `/eby/`) is the sub-area
-  of the city (e.g. South Bay, San Francisco, East Bay). It is part of the listing URL but not needed
-  for constructing search URLs (which use the city subdomain only).
-
-- **`<li class="cl-static-hub-links">` is not a listing**: The first `<li>` in the results `<ol>` is
-  a "see also" block. The regex patterns above skip it automatically because it has no `title` attribute.
-
-- **JSON-LD count < HTML count**: JSON-LD block may contain ~330 items while the HTML block shows ~350.
-  The HTML parser is authoritative; JSON-LD is a secondary data source.
-
-- **Body text contains print-only junk**: The `<section id="postingbody">` starts with a
-  "QR Code Link to This Post" print-only element. Strip it with a simple string replacement
-  (shown in the extractor above).
-
-- **HTML-escaped body text**: Description bodies may contain `&amp;`, `&lt;`, etc. Unescape if needed:
-  ```python
-  import html as html_lib
-  body_clean = html_lib.unescape(body_text)
-  ```
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/crossref/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/crossref/scraping.md
deleted file mode 100644
index 9d03cfc50..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/crossref/scraping.md
+++ /dev/null
@@ -1,568 +0,0 @@
-# CrossRef — Scraping & Data Extraction
-
-`https://api.crossref.org` — scholarly DOI and citation metadata. **Never use the browser for CrossRef.** Completely free, no auth required. All workflows use `http_get`.
-
-## Do this first
-
-**Always add `mailto=your@email.com` to every request** — it moves you into the polite pool, which doubles the rate limit and concurrency allowance. The difference is measurable and the cost is zero.
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"  # set once, append to every URL
-
-# Single DOI lookup — fastest way to get metadata for a known paper
-data = json.loads(http_get(f"https://api.crossref.org/works/10.1038/s41586-021-03819-2?{MAILTO}"))
-msg = data['message']
-# msg keys: DOI, title, author, published, type, container-title, volume, issue,
-#           page, is-referenced-by-count, references-count, abstract (optional), ...
-```
-
-## Common workflows
-
-### DOI lookup — single paper
-
-```python
-from helpers import http_get
-import json, re
-
-MAILTO = "mailto=your@email.com"
-
-def fetch_work(doi):
-    data = json.loads(http_get(f"https://api.crossref.org/works/{doi}?{MAILTO}"))
-    return data['message']
-
-def parse_date(d):
-    """[[2021, 7, 15]] -> '2021-7-15'. Handles partial dates like [[2021]]."""
-    if not d: return None
-    parts = d.get('date-parts', [[]])[0]
-    return '-'.join(str(p) for p in parts if p is not None)
-
-def clean_abstract(raw):
-    """Strip JATS XML tags. Abstract field contains tags like <jats:p>, <jats:italic>."""
-    return re.sub(r'<[^>]+>', ' ', raw).strip() if raw else None
-
-w = fetch_work("10.1038/s41586-021-03819-2")  # AlphaFold2
-
-print("DOI:", w['DOI'])                                    # 10.1038/s41586-021-03819-2
-print("Title:", w['title'][0])                             # Highly accurate protein structure...
-print("Type:", w['type'])                                  # journal-article
-print("Publisher:", w['publisher'])                        # Springer Science and Business Media LLC
-print("Journal:", w.get('container-title', [''])[0])      # Nature
-print("Volume:", w.get('volume'))                          # 596
-print("Issue:", w.get('issue'))                            # 7873
-print("Page:", w.get('page'))                              # 583-589
-print("published:", parse_date(w.get('published')))        # 2021-7-15  (online date)
-print("published-online:", parse_date(w.get('published-online')))  # 2021-7-15
-print("published-print:", parse_date(w.get('published-print')))    # 2021-8-26
-print("Citations:", w.get('is-referenced-by-count'))       # 40260
-print("References:", w.get('references-count'))            # 84
-print("Abstract:", clean_abstract(w.get('abstract', ''))[:100] if w.get('abstract') else None)
-# Confirmed output (2026-04-18):
-# DOI: 10.1038/s41586-021-03819-2
-# Title: Highly accurate protein structure prediction with AlphaFold
-# Type: journal-article
-# Journal: Nature
-# Volume: 596 | Issue: 7873 | Page: 583-589
-# published: 2021-7-15 | published-print: 2021-8-26
-# Citations: 40260
-```
-
-### DOI lookup — extract authors with ORCID
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-data = json.loads(http_get(f"https://api.crossref.org/works/10.1038/s41586-021-03819-2?{MAILTO}"))
-authors = data['message'].get('author', [])
-
-for a in authors[:3]:
-    name = f"{a.get('given', '')} {a.get('family', '')}".strip()
-    # ORCID is a full URL, not a bare ID — strip the prefix
-    orcid_url = a.get('ORCID')  # e.g. 'https://orcid.org/0000-0001-6169-6580'
-    orcid_id = orcid_url.replace('https://orcid.org/', '') if orcid_url else None
-    authenticated = a.get('authenticated-orcid', False)  # False = self-reported, True = verified
-    affiliations = [aff.get('name', '') for aff in a.get('affiliation', [])]
-    print(f"{name} | ORCID: {orcid_id} | auth={authenticated} | seq={a['sequence']}")
-# Confirmed output:
-# John Jumper | ORCID: 0000-0001-6169-6580 | auth=False | seq=first
-# Richard Evans | ORCID: None | auth=False | seq=additional
-# Alexander Pritzel | ORCID: None | auth=False | seq=additional
-```
-
-### Batch DOI lookup (parallel — 5 calls in ~0.3s)
-
-```python
-from helpers import http_get
-from concurrent.futures import ThreadPoolExecutor
-import json
-
-MAILTO = "mailto=your@email.com"
-
-def fetch_work(doi):
-    try:
-        data = json.loads(http_get(f"https://api.crossref.org/works/{doi}?{MAILTO}"))
-        msg = data['message']
-        return {
-            'doi': doi,
-            'title': msg.get('title', [''])[0],
-            'year': (msg.get('published', {}).get('date-parts') or [[None]])[0][0],
-            'citations': msg.get('is-referenced-by-count'),
-            'type': msg.get('type'),
-        }
-    except Exception as e:
-        return {'doi': doi, 'error': str(e)}
-
-dois = [
-    "10.1038/nature12345",
-    "10.1038/s41586-021-03819-2",
-    "10.1056/NEJMoa2034577",
-    "10.1126/science.1260419",
-    "10.1038/s41586-024-07487-w",
-]
-
-# max_workers=5 safe; polite pool: 10 req/s, concurrency=3 (see Rate limits)
-with ThreadPoolExecutor(max_workers=5) as ex:
-    results = list(ex.map(fetch_work, dois))
-
-for r in results:
-    print(r['year'], f"cites={r['citations']}", r['title'][:50])
-# Confirmed output (2026-04-18, ~0.296s total):
-# 2013 cites=465 LRG1 promotes angiogenesis by modulating endotheli
-# 2021 cites=40260 Highly accurate protein structure prediction with
-# 2020 cites=13752 Safety and Efficacy of the BNT162b2 mRNA Covid-19
-# 2015 cites=13553 Tissue-based map of the human proteome
-# 2024 cites=12037 Accurate structure prediction of biomolecular inte
-```
-
-### Search works by keyword
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-
-# Broad keyword search
-data = json.loads(http_get(
-    f"https://api.crossref.org/works?query=machine+learning&rows=5&{MAILTO}"
-))
-msg = data['message']
-print("Total results:", msg['total-results'])   # 2,805,391
-for item in msg['items']:
-    title = item.get('title', ['(no title)'])[0][:60]
-    doi   = item.get('DOI', '')
-    year  = (item.get('published', {}).get('date-parts') or [[None]])[0][0]
-    type_ = item.get('type', '')
-    print(f"  [{type_}] {year} {title}")
-    print(f"    DOI: {doi}")
-```
-
-### Search by author + title (targeted)
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-
-data = json.loads(http_get(
-    f"https://api.crossref.org/works?query.author=Lecun&query.title=deep+learning&rows=5&{MAILTO}"
-))
-msg = data['message']
-print("Total results:", msg['total-results'])   # 62
-for item in msg['items'][:3]:
-    title   = item.get('title', [''])[0][:60]
-    authors = ', '.join(a.get('family', '') for a in item.get('author', [])[:2])
-    year    = (item.get('published', {}).get('date-parts') or [[None]])[0][0]
-    print(f"  {year} {title}")
-    print(f"    Authors: {authors}  DOI: {item.get('DOI')}")
-# Confirmed output:
-# 2015 Deep learning & convolutional networks
-#   Authors: LeCun  DOI: 10.1109/hotchips.2015.7477328
-```
-
-### Filter by date, type, and sort by citations
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-
-data = json.loads(http_get(
-    f"https://api.crossref.org/works"
-    f"?filter=from-pub-date:2024-01-01,type:journal-article"
-    f"&rows=5&sort=is-referenced-by-count&order=desc&{MAILTO}"
-))
-msg = data['message']
-print("Total 2024+ journal articles:", msg['total-results'])   # 14,565,456
-for item in msg['items'][:3]:
-    title  = item.get('title', [''])[0][:60]
-    cites  = item.get('is-referenced-by-count', 0)
-    year   = (item.get('published', {}).get('date-parts') or [[None]])[0][0]
-    print(f"  {year} cites={cites} {title}")
-# Confirmed output:
-# 2024 cites=17371 Global cancer statistics 2022: GLOBOCAN estimates...
-# 2024 cites=12037 Accurate structure prediction of biomolecular int...
-```
-
-### Filter with `has-abstract:true`
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-
-# Only return works that have an abstract (useful since ~30-70% do not)
-data = json.loads(http_get(
-    f"https://api.crossref.org/works"
-    f"?filter=from-pub-date:2023-01-01,until-pub-date:2023-12-31"
-    f",type:journal-article,has-abstract:true"
-    f"&rows=3&sort=is-referenced-by-count&order=desc&{MAILTO}"
-))
-msg = data['message']
-print("2023 journal articles with abstract:", msg['total-results'])   # 3,041,841
-for item in msg['items']:
-    print(item.get('title', [''])[0][:60], '| cites:', item.get('is-referenced-by-count'))
-# Confirmed output:
-# Cancer statistics, 2023 | cites: 12919
-# Evolutionary-scale prediction of atomic-level protein struct | cites: 4352
-```
-
-### Cursor pagination (large result sets)
-
-Standard offset pagination (`start=`) caps at a few thousand results. Use cursor for full sweeps.
-
-```python
-from helpers import http_get
-from urllib.parse import quote
-import json
-
-MAILTO = "mailto=your@email.com"
-
-# First page: cursor=*
-data = json.loads(http_get(
-    f"https://api.crossref.org/works?query=covid&rows=100&cursor=*&{MAILTO}"
-))
-msg = data['message']
-print("Total results:", msg['total-results'])   # 897,660
-items = msg['items']
-next_cursor = msg['next-cursor']   # base64 string like "DnF1ZXJ5VGhlbkZldGNoJA..."
-
-# Next pages: pass URL-encoded cursor
-while next_cursor and items:
-    data = json.loads(http_get(
-        f"https://api.crossref.org/works?query=covid&rows=100"
-        f"&cursor={quote(next_cursor)}&{MAILTO}"
-    ))
-    msg = data['message']
-    items = msg.get('items', [])
-    next_cursor = msg.get('next-cursor')
-    # process items...
-    break  # remove for full sweep
-```
-
-### Fetch specific fields only (`select=`)
-
-Reduces response size significantly for bulk operations:
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-
-data = json.loads(http_get(
-    f"https://api.crossref.org/works?query=cancer&rows=5"
-    f"&select=DOI,title,author&{MAILTO}"
-))
-# Warning: if a field is absent for a record, it simply won't appear in that item
-for item in data['message']['items']:
-    print(list(item.keys()))   # only ['DOI', 'title'] or ['DOI', 'title', 'author']
-    # Note: select= does NOT guarantee the field appears — absent fields are just omitted
-```
-
-### Count by type using facets
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-
-data = json.loads(http_get(
-    f"https://api.crossref.org/works?query=machine+learning&rows=0"
-    f"&facet=type-name:*&{MAILTO}"
-))
-msg = data['message']
-type_facet = msg['facets']['type-name']
-for k, v in sorted(type_facet['values'].items(), key=lambda x: -x[1]):
-    print(f"  {k}: {v:,}")
-# Confirmed output (all CrossRef, 2026-04-18):
-# Journal Article: 1,628,997 (for query=machine+learning scope)
-# Conference Paper: 501,433
-# Chapter: 455,907
-# Posted Content: 87,937
-# ...
-```
-
-### Journal info by ISSN
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-
-# Nature (ISSN 0028-0836)
-data = json.loads(http_get(f"https://api.crossref.org/journals/0028-0836?{MAILTO}"))
-msg = data['message']
-print("Title:", msg['title'])                          # Nature
-print("Publisher:", msg['publisher'])                  # Springer Science and Business Media LLC
-print("ISSN:", msg['ISSN'])                            # ['0028-0836', '1476-4687']
-print("Total DOIs:", msg['counts']['total-dois'])       # 445,417
-print("Subjects:", msg.get('subjects', []))             # [] (not always populated)
-
-# Search journals by name
-data2 = json.loads(http_get(f"https://api.crossref.org/journals?query=nature&rows=3&{MAILTO}"))
-for j in data2['message']['items']:
-    print(f"{j.get('title')} | ISSN: {j.get('ISSN')} | DOIs: {j.get('counts', {}).get('total-dois')}")
-# Confirmed output:
-# NatureJobs | ISSN: [] | DOIs: 0
-# Naturen | ISSN: ['0028-0887', '1504-3118'] | DOIs: 1055
-```
-
-### Funder search
-
-```python
-from helpers import http_get
-import json
-
-MAILTO = "mailto=your@email.com"
-
-data = json.loads(http_get(
-    f"https://api.crossref.org/funders?query=national+science+foundation&rows=3&{MAILTO}"
-))
-msg = data['message']
-print("Total funders:", msg['total-results'])   # 108
-for f in msg['items']:
-    print(f"  ID: {f['id']} | {f['name']}")
-    print(f"    Alt names: {f.get('alt-names', [])[:2]}")
-    print(f"    URI: {f.get('uri')}")
-# Confirmed output:
-# ID: 501100001711 | Schweizerischer Nationalfonds zur Förderung...
-# ID: 100000143 | Division of Computing and Communication Foundations
-```
-
-### DOI content negotiation (alternative, no CrossRef API needed)
-
-The `doi.org` resolver can return formatted metadata directly via `Accept` header:
-
-```python
-import urllib.request, json
-
-def doi_to_csl(doi):
-    """Fetch CSL-JSON via DOI content negotiation. Same data as CrossRef API."""
-    req = urllib.request.Request(
-        f"https://doi.org/{doi}",
-        headers={"Accept": "application/vnd.citationstyles.csl+json",
-                 "User-Agent": "Mozilla/5.0"}
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        return json.loads(r.read().decode())
-
-def doi_to_bibtex(doi):
-    """Fetch BibTeX via DOI content negotiation."""
-    req = urllib.request.Request(
-        f"https://doi.org/{doi}",
-        headers={"Accept": "application/x-bibtex", "User-Agent": "Mozilla/5.0"}
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        return r.read().decode()
-
-csl = doi_to_csl("10.1038/nature12345")
-print("Title:", csl['title'])   # LRG1 promotes angiogenesis...
-print("Type:", csl['type'])     # journal-article
-
-bib = doi_to_bibtex("10.1038/nature12345")
-print(bib[:200])
-# @article{Wang_2013, title={LRG1 promotes angiogenesis...
-```
-
-## Field reference
-
-### Work object — complete field list
-
-All fields are potentially absent unless marked required. Fields marked (R) are always present.
-
-| Field | Type | Notes |
-|---|---|---|
-| `DOI` (R) | string | e.g. `"10.1038/s41586-021-03819-2"` |
-| `URL` (R) | string | `"https://doi.org/10.1038/s41586-021-03819-2"` |
-| `title` (R) | list[str] | Always a list; access `title[0]` |
-| `type` (R) | string | e.g. `"journal-article"` — see type table below |
-| `publisher` | string | |
-| `container-title` | list[str] | Journal name; access `[0]` |
-| `short-container-title` | list[str] | Abbreviated journal name |
-| `ISSN` | list[str] | May contain print and online ISSN |
-| `volume` | string | Note: string not int (`"596"`) |
-| `issue` | string | |
-| `page` | string | e.g. `"583-589"` |
-| `author` | list[object] | See author fields below |
-| `published` | date-object | Best single date — use this |
-| `published-online` | date-object | Online-first date |
-| `published-print` | date-object | Print edition date |
-| `issued` | date-object | Usually same as `published` |
-| `is-referenced-by-count` | int | Inbound citations to this work |
-| `references-count` | int | Outbound references from this work |
-| `reference` | list[object] | Full reference list (when deposited) |
-| `abstract` | string | JATS XML markup; ~30-70% of works; strip tags before use |
-| `subject` | list[str] | Subject classification (often empty) |
-| `language` | string | e.g. `"en"` |
-| `license` | list[object] | Each: `{URL, start, delay-in-days, content-version}` |
-| `funder` | list[object] | Each: `{name, DOI, award}` |
-| `link` | list[object] | Full-text links |
-| `relation` | object | Related DOIs (e.g. preprint → article) |
-| `assertion` | list[object] | Publisher-specific metadata |
-| `alternative-id` | list[str] | Publisher's internal IDs |
-| `member` | string | CrossRef member ID |
-| `prefix` | string | DOI prefix |
-| `score` | float | Relevance score (search results only) |
-| `source` | string | e.g. `"Crossref"` |
-| `indexed` | date-object | When CrossRef indexed this record |
-| `deposited` | date-object | When publisher last deposited metadata |
-| `created` | date-object | When CrossRef record was first created |
-
-### Author object fields
-
-| Field | Notes |
-|---|---|
-| `given` | Given/first name |
-| `family` | Family/last name |
-| `sequence` | `"first"` or `"additional"` |
-| `affiliation` | list of `{name, place}` — usually `[]` |
-| `ORCID` | Full URL `"https://orcid.org/0000-0001-..."` — strip prefix to get bare ID |
-| `authenticated-orcid` | `true` = verified via ORCID OAuth; `false` = self-reported |
-| `name` | Used instead of given/family for organizations |
-
-### Date object structure
-
-```python
-# All date fields share this structure:
-date_obj = {
-    "date-parts": [[2021, 7, 15]],  # [[year, month, day]] — month/day may be absent
-    "date-time": "2021-07-15T00:00:00Z",  # not always present
-    "timestamp": 1626307200000               # not always present
-}
-
-# Safe extraction (handles [[2021]] or [[2021, 7]] partial dates):
-def parse_date(d):
-    if not d: return None
-    parts = (d.get('date-parts') or [[]])[0]
-    return '-'.join(str(p) for p in parts if p is not None)
-```
-
-### Type identifiers (filter param values vs facet display names)
-
-Use these exact strings in `filter=type:...`. The facet `type-name` values are display names only.
-
-| filter `type:` value | Facet display name | Count (all CrossRef) |
-|---|---|---|
-| `journal-article` | Journal Article | 121,030,194 |
-| `book-chapter` | Chapter | 24,359,059 |
-| `proceedings-article` | Conference Paper | 9,744,754 |
-| `dataset` | Dataset | 3,424,142 |
-| `posted-content` | Posted Content (preprints) | 3,203,320 |
-| `dissertation` | Dissertation | 1,044,461 |
-| `peer-review` | Peer Review | 1,028,287 |
-| `report` | Report | 906,301 |
-| `book` | Book | 870,949 |
-| `monograph` | Monograph | 788,401 |
-
-### Query parameters reference
-
-| Parameter | Notes |
-|---|---|
-| `query` | Full-text keyword search across title, abstract, author |
-| `query.author` | Author name search only |
-| `query.title` | Title search only |
-| `query.bibliographic` | Combined title + author + journal search |
-| `rows` | Results per page (default 20, max 1000) |
-| `offset` | Offset for pagination (max ~10,000 effective) |
-| `cursor` | Use `cursor=*` for first page, then URL-encode `next-cursor` value |
-| `sort` | `relevance`, `is-referenced-by-count`, `published`, `indexed` |
-| `order` | `asc` or `desc` |
-| `filter` | Comma-separated `key:value` pairs (see filters below) |
-| `select` | Comma-separated field names to return |
-| `facet` | `type-name:*` for type counts; `publisher-name:10` for top publishers |
-| `mailto` | Your email — enables polite pool (higher limits) |
-
-### Filter keys reference
-
-| Filter key | Example | Notes |
-|---|---|---|
-| `doi` | `doi:10.1038/nature12345` | Exact DOI match |
-| `type` | `type:journal-article` | See type table above for valid values |
-| `from-pub-date` | `from-pub-date:2024-01-01` | ISO date or `YYYY` |
-| `until-pub-date` | `until-pub-date:2024-12-31` | |
-| `from-index-date` | `from-index-date:2024-01-01` | When CrossRef indexed it |
-| `has-abstract` | `has-abstract:true` | Only works with deposited abstract |
-| `has-orcid` | `has-orcid:true` | At least one author has ORCID |
-| `has-full-text` | `has-full-text:true` | Has full-text link |
-| `has-references` | `has-references:true` | Has deposited reference list |
-| `is-update` | `is-update:true` | Corrections, retractions |
-| `issn` | `issn:0028-0836` | Filter by journal ISSN |
-| `publisher-name` | `publisher-name:elsevier` | Partial match |
-| `funder` | `funder:100000001` | Funder DOI or CrossRef funder ID |
-
-## Rate limits
-
-CrossRef has two pools based on whether `mailto=` is present:
-
-| Pool | Triggered by | Rate limit | Concurrency |
-|---|---|---|---|
-| **polite** | `mailto=` param present | 10 req/s | 3 concurrent |
-| **public** | no `mailto=` | 5 req/s | 1 concurrent |
-
-Headers returned: `x-rate-limit-limit`, `x-rate-limit-interval`, `x-concurrency-limit`, `x-api-pool`.
-
-In practice with polite pool: 10 rapid sequential calls complete in ~2.7s (avg 0.27s/req) with no throttling. 5 parallel calls complete in ~0.3s. Stay at `max_workers=5` to respect the concurrency limit.
-
-No per-day or per-hour cap. If you exceed limits, responses slow or return HTTP 429. No ban. Add `time.sleep(0.1)` between calls for sustained bulk crawls.
-
-## Gotchas
-
-- **`mailto=` doubles your rate limit and concurrency.** Public pool: 5 req/s, concurrency=1. Polite pool: 10 req/s, concurrency=3. Always add `?mailto=your@email.com` to every request — confirmed by reading `x-api-pool` response header.
-
-- **`title`, `container-title`, `ISSN` are always lists, not strings.** Access with `title[0]`, `container-title[0]` etc. Do not rely on there being only one entry — `container-title` can have multiple values.
-
-- **Abstract contains JATS XML markup.** The `abstract` field is not plain text — it contains tags like `<jats:p>`, `<jats:italic>`, `<jats:sup>`. Strip with `re.sub(r'<[^>]+>', ' ', abstract)`. About 30-70% of works have an abstract at all; journal articles 2023 with `has-abstract:true` filter: 3,041,841 / ~5.5M total = ~55%.
-
-- **ORCID is a full URL, not just the ID.** `a['ORCID']` = `"https://orcid.org/0000-0001-6169-6580"`. Strip with `.replace('https://orcid.org/', '')` to get the bare ID. `authenticated-orcid: false` means self-asserted (not verified via OAuth).
-
-- **`published` vs `published-print` vs `published-online`.** Online-first is common in journals — a paper may be online months before its print issue. `published` is CrossRef's best single date and equals `published-online` when both exist. For preprints (`posted-content` type), look for `posted` instead of `published-print` — it may only have `posted` and `published`. Partial dates like `[[2023]]` (year only) are valid — always use `parse_date()` to handle missing month/day.
-
-- **404 raises `HTTPError`, not a JSON error response.** An invalid DOI (e.g. `10.9999/doesnotexist`) raises `urllib.error.HTTPError: HTTP Error 404: Not Found`. Wrap `fetch_work()` in try/except for any untrusted DOI list.
-
-- **`volume` and `issue` are strings, not integers.** CrossRef stores them as strings — `"596"`, not `596`. Don't compare with `==` to an int.
-
-- **Filter type values are hyphenated lowercase, not the facet display names.** `filter=type:journal-article` works. `filter=type:journal article`, `filter=type:Journal Article`, and `filter=type:conference-paper` all return HTTP 400. Conference papers are `proceedings-article`.
-
-- **`select=` does not guarantee field presence.** When you `select=DOI,title,author`, a record that has no author still omits the `author` key — it doesn't return `author: []`. Always use `.get()`.
-
-- **Cursor pagination required for >10,000 results.** Offset pagination (`offset=`) is limited to around 10,000 results. For bulk sweeps, use `cursor=*` for the first page, then URL-encode the returned `next-cursor` value with `urllib.parse.quote()`. The cursor expires if unused for too long.
-
-- **`rows` max is 1000 per call.** Requesting more silently returns 1000. For cursor-based sweeps of large result sets (millions of records), `rows=1000` with cursor is the most efficient approach.
-
-- **HTML entities in titles.** Titles may contain HTML entities like `&amp;` — `"Deep learning &amp; convolutional networks"`. Decode with `html.unescape()` if needed.
-
-- **`funder` search `works-count` field is `None`.** The funder search result object has a `works-count` key that is always `None` in the search response. To get actual work counts for a funder, fetch the funder directly: `GET /funders/{id}`.
-
-- **`subject` is often an empty list.** The `subject` field in works is populated inconsistently — many journal articles have `subject: []` even for well-indexed journals like Nature.
-
-- **Affiliation is usually empty.** `author[i]['affiliation']` is `[]` for the majority of records, even for papers published in 2024. CrossRef has been working on affiliation deposit, but coverage is inconsistent.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/dev-to/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/dev-to/scraping.md
deleted file mode 100644
index c87c8509b..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/dev-to/scraping.md
+++ /dev/null
@@ -1,323 +0,0 @@
-# DEV Community (dev.to) — Data Extraction
-
-`https://dev.to` — developer blogging platform. Everything useful is available via a public REST API with no auth required. No browser needed for any read task.
-
-## Do this first
-
-**Use the REST API — it returns clean JSON in ~150–250ms with no browser, no login, no JS rendering.**
-
-```python
-import json
-articles = json.loads(http_get("https://dev.to/api/articles?per_page=10&tag=python"))
-# Each article: id, title, description, url, cover_image, tag_list, tags,
-#               published_at, published_timestamp, readable_publish_date,
-#               reading_time_minutes, positive_reactions_count,
-#               public_reactions_count, comments_count, user, organization,
-#               flare_tag, collection_id, slug, path, canonical_url,
-#               social_image, language, subforem_id
-```
-
-The API serves **V0 (beta) by default** and emits a `Warning: 299` header on every response. Suppress it silently with the V1 `Accept` header (same data, no deprecated warning):
-
-```python
-import json
-import urllib.request, gzip
-
-def dev_get(url):
-    h = {
-        "User-Agent": "Mozilla/5.0",
-        "Accept-Encoding": "gzip",
-        "Accept": "application/vnd.forem.api-v1+json",
-    }
-    with urllib.request.urlopen(urllib.request.Request(url, headers=h), timeout=20) as r:
-        data = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            data = gzip.decompress(data)
-        return data.decode()
-
-articles = json.loads(dev_get("https://dev.to/api/articles?per_page=10&tag=python"))
-```
-
-Or just use `http_get` directly if you don't care about the warning header noise.
-
----
-
-## Common workflows
-
-### Articles by tag
-
-```python
-import json
-articles = json.loads(http_get("https://dev.to/api/articles?per_page=10&tag=python"))
-# Paginate with &page=2, &page=3 etc. (1-indexed)
-for a in articles:
-    print(a['id'], a['positive_reactions_count'], a['title'][:60])
-```
-
-Confirmed working tags: `python`, `javascript`, `typescript`, `rust`, `go`, `webdev`, `tutorial`, `react`, `devops`, `ai`, `beginners`.
-
-### Top articles by time window
-
-```python
-import json
-# top=N means "top articles from the last N days"
-top_day   = json.loads(http_get("https://dev.to/api/articles?per_page=10&top=1"))
-top_week  = json.loads(http_get("https://dev.to/api/articles?per_page=10&top=7"))
-top_month = json.loads(http_get("https://dev.to/api/articles?per_page=10&top=30"))
-top_year  = json.loads(http_get("https://dev.to/api/articles?per_page=10&top=365"))
-```
-
-### Articles by username
-
-```python
-import json
-articles = json.loads(http_get("https://dev.to/api/articles?per_page=10&username=ben"))
-# Paginates cleanly: page=1, page=2 etc. Return distinct IDs, no overlap.
-```
-
-### New and rising articles
-
-```python
-import json
-fresh  = json.loads(http_get("https://dev.to/api/articles?per_page=10&state=fresh"))   # very new
-rising = json.loads(http_get("https://dev.to/api/articles?per_page=10&state=rising"))  # gaining traction
-# state=all returns 0 results (requires auth, not useful unauthenticated)
-```
-
-### Single article by ID (adds body_html and body_markdown)
-
-```python
-import json
-article = json.loads(http_get("https://dev.to/api/articles/3442047"))
-# Full article adds two fields not in list response:
-#   body_html     — rendered HTML (safe to display directly)
-#   body_markdown — raw Markdown source
-print(len(article['body_html']), len(article['body_markdown']))
-```
-
-### Single article by username/slug
-
-```python
-import json
-# path field from list response is "/username/slug"
-article = json.loads(http_get("https://dev.to/api/articles/ben/some-article-slug"))
-```
-
-### Tags — popular list with colors
-
-```python
-import json
-tags = json.loads(http_get("https://dev.to/api/tags?per_page=10"))
-# Fields: id, name, bg_color_hex, text_color_hex, short_summary
-# Sorted by popularity. Paginate with &page=2 etc.
-for t in tags:
-    print(t['name'], t['bg_color_hex'], t['text_color_hex'])
-# e.g. webdev  #562765  #ffffff
-#      javascript  #f7df1e  #000000
-#      ai  #17fd1a  #ffffff
-```
-
-### User profile
-
-```python
-import json
-user = json.loads(http_get("https://dev.to/api/users/by_username?url=ben"))
-# Fields: type_of, id, username, name, twitter_username, github_username,
-#         summary, location, website_url, joined_at, profile_image
-print(user['id'], user['username'], user['summary'])
-# e.g. 1  ben  "A Canadian software developer who thinks he's funny."
-```
-
-`joined_at` is a human string like `"Dec 27, 2015"` — not ISO 8601. Parse with `datetime.strptime(user['joined_at'], "%b %d, %Y")`.
-
-### Comments on an article
-
-```python
-import json
-comments = json.loads(http_get("https://dev.to/api/comments?a_id=3442047"))
-# Returns top-level comments only (replies nested under children key)
-# Fields per comment: id_code (string, not int!), type_of, body_html,
-#                     created_at, user (dict), children (list of same shape)
-for c in comments:
-    print(c['id_code'], c['user']['username'], c['created_at'])
-    for reply in c.get('children', []):
-        print("  reply:", reply['id_code'], reply['user']['username'])
-```
-
-### Single comment by id_code
-
-```python
-import json
-comment = json.loads(http_get("https://dev.to/api/comments/36lnc"))
-# Same fields as above: id_code, body_html, created_at, user, children
-```
-
-### Bulk tag fetch (parallel)
-
-```python
-import json
-from concurrent.futures import ThreadPoolExecutor
-
-tags = ['python', 'javascript', 'typescript', 'rust', 'go',
-        'devops', 'webdev', 'tutorial', 'productivity', 'react']
-
-def fetch_tag(tag):
-    data = json.loads(http_get(f"https://dev.to/api/articles?per_page=5&tag={tag}"))
-    return tag, data
-
-with ThreadPoolExecutor(max_workers=3) as ex:
-    results = dict(ex.map(lambda t: fetch_tag(t), tags))
-# 10 tags × 5 articles each: ~0.67s total with max_workers=3
-```
-
----
-
-## Endpoint reference
-
-| Endpoint | Auth | Key params | Latency |
-|----------|------|-----------|---------|
-| `GET /api/articles` | None | `tag`, `username`, `top`, `state`, `page`, `per_page` | ~200ms |
-| `GET /api/articles/{id}` | None | — | ~80ms |
-| `GET /api/articles/{username}/{slug}` | None | — | ~200ms |
-| `GET /api/tags` | None | `page`, `per_page` | ~190ms |
-| `GET /api/users/by_username?url={username}` | None | — | ~190ms |
-| `GET /api/comments?a_id={article_id}` | None | — | ~160ms |
-| `GET /api/comments/{id_code}` | None | — | ~150ms |
-| `GET /api/listings` | None | `category`, `page`, `per_page` | ~260ms (returns 0) |
-
-**Listings endpoint returns 0 results.** The `/api/listings` endpoint is documented but returns an empty array for all categories (`jobs`, `forsale`, `education`, `cfp`) without auth. Skip it.
-
----
-
-## Pagination
-
-All list endpoints paginate with `page=` (1-indexed) and `per_page=`:
-
-```python
-import json
-
-def get_all_articles_by_tag(tag, max_pages=5):
-    results = []
-    for page in range(1, max_pages + 1):
-        batch = json.loads(http_get(
-            f"https://dev.to/api/articles?per_page=30&tag={tag}&page={page}"
-        ))
-        if not batch:
-            break
-        results.extend(batch)
-    return results
-```
-
-- `per_page` supports up to **1000** (confirmed). No documented max, but 1000 works in testing.
-- No `total_count` field in list responses — you paginate until an empty array.
-- Page ordering is consistent — confirmed no ID overlap between page 1 and page 2.
-
----
-
-## Article field reference
-
-All fields returned in list responses (single article adds `body_html` and `body_markdown`):
-
-```
-id                      int    — article ID, stable, use for single-article fetch
-title                   str
-description             str    — auto-excerpt, never null
-slug                    str    — URL slug component
-path                    str    — "/username/slug"
-url                     str    — full canonical URL
-canonical_url           str    — same as url for native posts; author's site URL for cross-posts
-cover_image             str|null — CDN URL or null (~30% of articles have no cover image)
-social_image            str    — always present (generated if no cover_image)
-tag_list                list   — e.g. ['python', 'ai', 'tutorial']  ← use this for code
-tags                    str    — same tags as comma-separated string "python, ai, tutorial"
-published_at            str    — ISO 8601 UTC e.g. "2026-04-18T03:49:36Z"
-published_timestamp     str    — identical to published_at
-readable_publish_date   str    — human string e.g. "Apr 18"
-reading_time_minutes    int
-positive_reactions_count int   — hearts/likes count
-public_reactions_count  int    — total reactions (usually same as positive_reactions_count)
-comments_count          int
-user                    dict   — name, username, twitter_username, github_username,
-                                 user_id, website_url, profile_image, profile_image_90
-organization            dict|null — present when posted under an org: name, username, slug,
-                                     profile_image, profile_image_90
-flare_tag               dict|null — {name, bg_color_hex, text_color_hex} — discussion/challenge badge
-collection_id           int|null  — series/collection ID if part of a series
-language                str    — e.g. "en"
-subforem_id             int|null
-crossposted_at          str|null — ISO datetime if cross-posted
-edited_at               str|null
-last_comment_at         str|null
-created_at              str    — ISO 8601
-type_of                 str    — always "article"
-```
-
----
-
-## Rate limits
-
-- **Burst limit: ~6 rapid sequential requests**, then HTTP 429.
-- **Recovery: `Retry-After: 1` second** — wait 1s after a 429 and you're good again.
-- No `X-RateLimit-*` headers in 200 responses — you only see `Retry-After` on the 429 itself.
-- With `ThreadPoolExecutor(max_workers=3)`, 10 concurrent requests succeed without hitting the limit.
-- No difference in limits between V0 (default) and V1 (`Accept` header) — same underlying rate limit.
-- **No auth token tested** — all endpoints above work without `api_key`. Authenticated requests likely have higher limits.
-
-Safe pattern for bulk fetching:
-
-```python
-import json, time
-from concurrent.futures import ThreadPoolExecutor
-
-def safe_fetch(url):
-    for attempt in range(3):
-        try:
-            return json.loads(http_get(url))
-        except Exception as e:
-            if '429' in str(e):
-                time.sleep(1)   # Retry-After is 1s
-                continue
-            raise
-    return []
-
-urls = [
-    f"https://dev.to/api/articles?per_page=10&tag={tag}"
-    for tag in ['python', 'javascript', 'typescript', 'rust']
-]
-with ThreadPoolExecutor(max_workers=3) as ex:
-    results = list(ex.map(safe_fetch, urls))
-```
-
----
-
-## Gotchas
-
-- **`tag_list` (list) vs `tags` (string)** — both fields always present. `tag_list` is a Python list; `tags` is the same data as a comma-separated string. Use `tag_list` in code.
-
-- **Comments have `id_code`, not `id`** — comment identifiers are alphanumeric strings like `"36lnc"`, not integers. The integer `id` field is absent from comment objects. Use `id_code` to fetch a specific comment via `GET /api/comments/{id_code}`.
-
-- **Comments endpoint returns top-level only** — replies are nested under `children` recursively, not returned as a flat list. A thread with 100 total comments may only show 60 top-level objects; walk `children` recursively to count all.
-
-- **`cover_image` can be null** — ~30% of articles have no cover image. Always guard: `a.get('cover_image') or a['social_image']` for a guaranteed image URL.
-
-- **`flare_tag` is null for most articles** — only discussion/challenge posts carry it. It's a dict `{name, bg_color_hex, text_color_hex}` when present.
-
-- **`published_at` == `published_timestamp`** — both fields contain identical ISO 8601 UTC strings. `readable_publish_date` is human-only (`"Apr 18"`, no year).
-
-- **`joined_at` on user profile is not ISO** — it's `"Dec 27, 2015"`. Parse: `datetime.strptime(u['joined_at'], "%b %d, %Y")`.
-
-- **`state=all` returns 0 results unauthenticated** — it's for the authenticated user's own feed. `state=fresh` and `state=rising` work without auth.
-
-- **`top=N` means last N days** — `top=1` is last 24h, `top=7` is last week, `top=30` is last month, `top=365` is last year. Results differ from the `state=` param.
-
-- **V0 warning header on every response** — `Warning: 299 - This endpoint is part of the V0 (beta) API…` appears on all responses without the `Accept` header. It's harmless but noisy. Suppress with `"Accept": "application/vnd.forem.api-v1+json"`.
-
-- **No `total_count` in list responses** — paginate until an empty array. There is no way to know upfront how many total results exist.
-
-- **Listings endpoint returns empty** — `GET /api/listings` and all category variants return `[]` without auth. Documented but non-functional publicly.
-
-- **`/api/articles/{id}/comments` returns 404** — comments must be fetched via `GET /api/comments?a_id={id}`, not as a sub-resource of articles.
-
-- **`canonical_url` may point off-site** — for cross-posted articles, `canonical_url` is the author's original blog URL, not dev.to. Use `url` for the dev.to link.
-
-- **`organization` field is null for personal posts** — only present when the article was posted under an org account. Check before accessing sub-fields.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/duckduckgo/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/duckduckgo/scraping.md
deleted file mode 100644
index 8c86144c7..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/duckduckgo/scraping.md
+++ /dev/null
@@ -1,349 +0,0 @@
-# DuckDuckGo — Instant Answer API
-
-`https://api.duckduckgo.com` — completely public, no auth, no API key. Returns Wikipedia-sourced abstracts, infoboxes, and instant answers for well-known entities, calculations, and utility queries. Not a search engine — it does not return a list of web results for arbitrary queries.
-
-## Do this first: pick your query type
-
-| Query type | Example | `Type` | Returns |
-|------------|---------|--------|---------|
-| Named entity (specific) | `apple inc` | A | Full abstract + infobox |
-| Ambiguous term | `python` | D | Disambiguation list in `RelatedTopics` |
-| Instant answer | `random number` | E | Direct answer in `Answer` field |
-| No match | `how to cook pasta` | `""` | All fields empty |
-
-**Use `skip_disambig=1` and `no_html=1` in almost every call.** `skip_disambig=1` upgrades D→A when there's an obvious primary result (e.g., `elon musk` goes from disambiguation to full article). `no_html=1` removes `<b>` tags from the `Answer` field and strips bold markup from `Result` HTML strings.
-
-**Never use a browser.** Everything is a single `http_get` JSON call, 183–320ms.
-
----
-
-## Fastest path: entity lookup
-
-```python
-import json, urllib.parse
-from helpers import http_get
-
-def ddg_instant(query: str) -> dict:
-    q = urllib.parse.quote(query)
-    raw = http_get(
-        f"https://api.duckduckgo.com/?q={q}&format=json&no_html=1&skip_disambig=1"
-    )
-    return json.loads(raw)
-
-# Entity with Wikipedia abstract + infobox
-data = ddg_instant("openai")
-# data['Type'] == 'A'
-print(data['Heading'])        # 'OpenAI'
-print(data['AbstractText'])   # 'OpenAI is an American artificial intelligence research...'
-print(data['AbstractURL'])    # 'https://en.wikipedia.org/wiki/OpenAI'
-print(data['OfficialWebsite'])# 'https://openai.com/'
-print(data['Entity'])         # 'company'
-
-# Person lookup (skip_disambig resolves D→A automatically)
-data = ddg_instant("elon musk")
-print(data['Type'])           # 'A' (was 'D' without skip_disambig)
-print(data['AbstractText'][:100])  # 'Elon Reeve Musk is a businessman...'
-print(data['Image'])          # '/i/be2a8644.jpg' — prepend https://duckduckgo.com
-
-# Full image URL
-img_url = f"https://duckduckgo.com{data['Image']}" if data['Image'] else None
-```
-
----
-
-## Instant answers (Type = E)
-
-```python
-import json, urllib.parse
-from helpers import http_get
-
-def ddg_answer(query: str) -> tuple[str, str]:
-    """Returns (answer_text, answer_type). answer_text is '' if no result."""
-    q = urllib.parse.quote(query)
-    raw = http_get(
-        f"https://api.duckduckgo.com/?q={q}&format=json&no_html=1&no_redirect=1"
-    )
-    data = json.loads(raw)
-    ans = data.get('Answer', '')
-    # Answer can be a dict when it's a widget (calculator, converter) — only string Answers are usable
-    return (ans if isinstance(ans, str) else '', data.get('AnswerType', ''))
-
-# Confirmed working instant answers:
-text, kind = ddg_answer("random number")
-# text='0.245013228691281 (random number)', kind='rand'
-
-text, kind = ddg_answer("generate password")
-# text='ZCsbe8iY (random password)', kind='pw'
-
-text, kind = ddg_answer("ip address")
-# text='Your IP address is 73.158.74.222 in San Francisco, California, United States (94121)', kind='ip'
-
-text, kind = ddg_answer("base64 encode hello")
-# text='Base64 encode d: aGVsbG8=', kind='base64_conversion'
-
-text, kind = ddg_answer("md5 hash hello")
-# text='5d41402abc4b2a76b9719d911017c592', kind='md5'
-
-text, kind = ddg_answer("pi")
-# text='3.14159', kind='constants'
-
-text, kind = ddg_answer("today date")
-# text='\nS M T W T F S      April 2026\n...|18|...', kind='calendar'
-
-text, kind = ddg_answer("timer 5 minutes")
-# text='300', kind='timer'   — returns raw seconds
-
-text, kind = ddg_answer("lorem ipsum")
-# text='Ea hic quia corporis. Minus consequuntur...', kind='lorem_ipsum'
-
-# Color lookup — must URL-encode the # sign:
-text, kind = ddg_answer("color #FF5733")
-# text='Hex: #FF5733 ~ RGBA(255, 87, 51, 1) ~ RGB(100%, 34%, 20%) ~ HSL(11, 100%, 60%) ~ CMYB(0%, 66%, 80%, ...', kind='color_code'
-```
-
-**Widget answers return a dict, not a string** — `sqrt(144)`, `1 mile in km`, `100 USD in EUR`, and `stopwatch` all return `Answer` as a dict like `{'from': 'calculator', 'id': 'calculator', 'result': '', ...}`. The `result` key is empty — the actual computation happens client-side in a JS widget. Treat dict `Answer` values as "not usable via API".
-
----
-
-## Full response schema
-
-Every response has exactly these 21 top-level keys (all always present):
-
-```
-Abstract         # same as AbstractText (redundant, use AbstractText)
-AbstractSource   # "Wikipedia" when present, "" otherwise
-AbstractText     # Wikipedia-sourced summary paragraph (up to ~1000 chars)
-AbstractURL      # Wikipedia article URL
-Answer           # string or dict — instant answer result (see above)
-AnswerType       # string key identifying the answer plugin (e.g. "rand", "ip")
-Definition       # almost always "" — not reliably populated
-DefinitionSource # almost always ""
-DefinitionURL    # almost always ""
-Entity           # entity type: "company", "programming language", "person", etc.
-Heading          # entity display name
-Image            # relative path e.g. "/i/4d83768732377cf3.png" — prepend https://duckduckgo.com
-ImageHeight      # int or "" when no image
-ImageIsLogo      # 0 or 1 integer when image present; "" otherwise
-ImageWidth       # int or "" when no image
-Infobox          # dict with "content" and "meta" lists, or "" if no infobox
-OfficialDomain   # e.g. "openai.com" — only for entities with a known website
-OfficialWebsite  # e.g. "https://openai.com/" — only when DDG knows it
-Redirect         # target URL when query is a bang (e.g. !g python) with no_redirect=1
-RelatedTopics    # list — see below
-Results          # list — official site links (usually 0 or 1 item)
-Type             # "A", "D", "C", "N", "E", or ""
-meta             # API plugin metadata — rarely needed
-```
-
-### `RelatedTopics` item structure
-
-Each item is one of two shapes:
-
-**Plain topic** (the common case):
-```python
-{
-    "FirstURL": "https://duckduckgo.com/Deep_learning",
-    "Icon": {"Height": "", "URL": "/i/abc123.png", "Width": ""},  # URL often ""
-    "Result": "<a href=\"...\">Deep learning</a>— branch of ML...",  # HTML
-    "Text": "Deep learning — branch of ML concerned with artificial neural networks."
-}
-```
-
-**Section** (disambiguation pages only — when `Type` is `D` without `skip_disambig`):
-```python
-{
-    "Name": "Science & Technology",   # section heading
-    "Topics": [                        # list of plain topic objects
-        {"FirstURL": "...", "Icon": {...}, "Result": "...", "Text": "..."},
-        ...
-    ]
-}
-```
-
-For A-type results, `RelatedTopics` are Wikipedia category links (e.g. `"American aerospace engineers"` pointing to `https://duckduckgo.com/c/...`). These are not web search results — they are DDG topic pages.
-
-### `Results` item structure
-
-Usually 0 or 1 item. When present, it's the official website:
-```python
-{
-    "FirstURL": "https://www.apple.com/",
-    "Icon": {"Height": 16, "URL": "/i/apple.com.ico", "Width": 16},
-    "Result": "<a href=\"https://www.apple.com/\">Official site</a>...",
-    "Text": "Official site"
-}
-```
-Icon URLs in `Results` are relative — prepend `https://duckduckgo.com`.
-
-### `Infobox` structure
-
-```python
-ib = data['Infobox']  # dict or "" (empty string when absent)
-if isinstance(ib, dict):
-    content = ib['content']  # list of structured fields
-    # Each content item:
-    # {"data_type": "string", "label": "Founded", "value": "December 08, 2015"}
-    # {"data_type": "string", "label": "Founders", "value": "Sam Altman, Elon Musk, ..."}
-    
-    meta = ib['meta']    # list of metadata items
-    # {"data_type": "string", "label": "article_title", "value": "OpenAI"}
-    # {"data_type": "string", "label": "template_name", "value": "infobox company"}
-
-# Extract infobox as flat dict:
-if isinstance(data['Infobox'], dict):
-    fields = {item['label']: item['value'] for item in data['Infobox']['content']}
-    # fields['Founded'] == 'December 08, 2015'
-    # fields['Products'] == 'ChatGPT, GPT-5...'
-```
-
-`Infobox` is `""` (empty string, not `None`, not `{}`) when absent. Always check with `isinstance(data['Infobox'], dict)`.
-
----
-
-## Query parameters
-
-| Parameter | Values | Effect |
-|-----------|--------|--------|
-| `q` | URL-encoded query | The search query |
-| `format` | `json` | Required — omit for HTML response |
-| `no_redirect` | `1` | Returns redirect URL in `Redirect` field instead of HTTP 302; required for bang queries (`!g`, `!yt`) |
-| `no_html` | `1` | Strips `<b>` from `Answer`; strips bold markup from `Result` HTML; use in almost every call |
-| `skip_disambig` | `1` | Resolves ambiguous D-type queries to the primary result; upgrades D→A when unambiguous |
-| `t` | any string | Source identifier tag (e.g. `t=myapp`); has no effect on results |
-| `callback` | function name | Wraps response in JSONP: `mycallback({...})` |
-
----
-
-## Type field values
-
-| Type | Meaning | AbstractText | RelatedTopics |
-|------|---------|--------------|---------------|
-| `A` | Article — specific Wikipedia entity | Full paragraph | Category links |
-| `D` | Disambiguation — ambiguous term | Empty `""` | List of possible meanings (may include sections) |
-| `C` | Categories | Varies | Category items |
-| `N` | Name | Varies | Name-related items |
-| `E` | Exclusive — instant answer widget | Empty `""` | Empty `[]` |
-| `""` | No result | Empty `""` | Empty `[]` |
-
-In practice, C and N types are rare. A, D, E, and empty cover nearly all queries.
-
----
-
-## What returns useful results vs empty
-
-**Returns AbstractText (A type):**
-- Named companies: `apple inc`, `openai`, `google`
-- Specific technologies: `python programming language`, `javascript`, `linux kernel`
-- Well-known people with `skip_disambig=1`: `elon musk`, `ada lovelace`
-- Scientific concepts: `machine learning`, `photosynthesis`, `circumference`
-- Specific software: `vim`, `postgresql`, `nginx`
-
-**Returns RelatedTopics only (D type):**
-- Ambiguous single words: `python`, `linux`, `react`, `programming`
-- Ambiguous names: `apple` (returns empty — too ambiguous even for D), `new york`
-
-**Returns empty (Type = ""):**
-- How-to queries: `how to cook pasta`, `how to learn python`
-- Opinion/listicle: `best laptops 2024`, `top 10 programming languages`
-- Current events: `weather london`, `bitcoin price`
-- Site search operators: `site:example.com`
-- Multi-word specifics not in DDG's dataset: `numpy python library`, `javascript tutorial`
-
-**Returns instant answer (E type):**
-- Random: `random number`, `generate password`, `lorem ipsum`
-- Math: `pi`, `timer 5 minutes`
-- Network: `ip address`
-- Encoding: `base64 encode <text>`, `md5 hash <text>`
-- Color lookup: `color #RRGGBB` (must URL-encode the `#`)
-
----
-
-## Gotchas
-
-**`Infobox` is `""` not `None` when absent.** Always check with `isinstance(data['Infobox'], dict)` — `if data['Infobox']` also works since `""` is falsy.
-
-**Image and Icon URLs are relative.** `data['Image']` is `/i/abc123.png`. Prepend `https://duckduckgo.com` to make it absolute. Same for Icon URLs in `RelatedTopics` and `Results`.
-
-**`Answer` can be a dict (widget), not a string.** Queries like `1 mile in km`, `100 USD in EUR`, `sqrt(144)`, and `stopwatch` return `Answer` as a dict with `{'from': 'calculator', 'result': '', ...}`. The `result` key is empty — the widget computes client-side. Only string `Answer` values are usable via the API.
-
-**`color #RRGGBB` requires URL encoding of `#`.** Using `q=color+#FF5733` returns an HTML page (HTTP redirect). Use `urllib.parse.quote("color #FF5733")` which encodes to `color+%23FF5733`.
-
-**Bang queries without `no_redirect=1` return HTML, not JSON.** `!g python` (without `no_redirect=1`) causes an HTTP 302 to `google.com/search?q=python`. The `http_get` helper follows the redirect and returns Google's HTML — `json.loads` fails. Always add `no_redirect=1` when the query might contain bangs.
-
-**`skip_disambig=1` can add latency for truly ambiguous terms.** For `apple` (no "inc"), DDG returns Type `""` even with `skip_disambig=1` — it's so ambiguous it gives nothing. For `elon musk`, `skip_disambig=1` switches from D to A and adds `RelatedTopics` (39 items vs 4), which means a larger response (~5x).
-
-**`AbstractText` is empty for D-type results.** When `Type == 'D'`, DDG only returns `RelatedTopics` (the disambiguation list). The abstract is only filled for `Type == 'A'`.
-
-**`RelatedTopics` for A-type are Wikipedia categories, not related searches.** For `openai`, the 4 `RelatedTopics` are `"American artificial intelligence companies"`, `"Companies in San Francisco"`, etc. — these are DDG category page links, not useful web search results.
-
-**`Definition` / `DefinitionSource` / `DefinitionURL` are always empty** in observed responses. These fields are part of the schema but not reliably populated by any current DDG plugin.
-
-**No rate limiting observed.** 15 rapid sequential requests completed in 3.11s (~208ms avg) with no throttling, no 429, and consistent response structure throughout. DDG does not publish rate limits; the API is designed for "reasonable" use with a `t=` source identifier.
-
-**`OfficialWebsite` is only set for a subset of A-type results.** `machine learning` (Type A) has no `OfficialWebsite`. `openai`, `python programming language`, and `linux kernel` all have one. Always check with `data.get('OfficialWebsite', '')`.
-
-**No_html does not affect the `Result` HTML string.** `Results[0]['Result']` still contains `<a href="...">` tags with `no_html=1`. The `no_html` flag only removes `<b>` bold tags. Use `Results[0]['Text']` for the plain-text version, or `Results[0]['FirstURL']` for just the URL.
-
----
-
-## Complete working example
-
-```python
-import json, urllib.parse
-from helpers import http_get
-
-def ddg_entity(query: str) -> dict | None:
-    """
-    Fetch a DuckDuckGo Instant Answer for a named entity.
-    Returns structured data or None if no result.
-    """
-    q = urllib.parse.quote(query)
-    raw = http_get(
-        f"https://api.duckduckgo.com/?q={q}&format=json&no_html=1&skip_disambig=1"
-    )
-    data = json.loads(raw)
-    if not data.get('AbstractText') and not data.get('Answer'):
-        return None
-
-    result = {
-        'type': data['Type'],
-        'heading': data['Heading'],
-        'abstract': data['AbstractText'],
-        'abstract_url': data['AbstractURL'],
-        'entity': data['Entity'],
-        'official_website': data['OfficialWebsite'],
-        'image': f"https://duckduckgo.com{data['Image']}" if data['Image'] else None,
-        'answer': data['Answer'] if isinstance(data['Answer'], str) else None,
-        'answer_type': data['AnswerType'],
-    }
-
-    # Extract infobox as flat dict
-    if isinstance(data['Infobox'], dict):
-        result['infobox'] = {
-            item['label']: item['value']
-            for item in data['Infobox']['content']
-        }
-
-    # Official site URL (from Results)
-    if data['Results']:
-        result['official_site_url'] = data['Results'][0]['FirstURL']
-
-    return result
-
-# Example outputs (validated 2026-04-18):
-r = ddg_entity("openai")
-# r['type']            == 'A'
-# r['heading']         == 'OpenAI'
-# r['abstract'][:50]   == 'OpenAI is an American artificial intelligence res'
-# r['entity']          == 'company'
-# r['official_website']== 'https://openai.com/'
-# r['image']           == 'https://duckduckgo.com/i/fb410946942ab334.png'
-# r['infobox']['Founded'] == 'December 08, 2015'
-# r['infobox']['Products'] == 'ChatGPT, GPT-5...'
-
-r = ddg_entity("python programming language")
-# r['type']            == 'A'
-# r['entity']          == 'programming language'
-# r['official_website']== 'https://www.python.org/'
-# r['infobox']['Paradigm'] == 'Multi-paradigm: object-oriented,...'
-```
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/ebay/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/ebay/scraping.md
deleted file mode 100644
index d75886730..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/ebay/scraping.md
+++ /dev/null
@@ -1,435 +0,0 @@
-# eBay — Scraping & Data Extraction
-
-Field-tested against ebay.com on 2026-04-18 using `uv run python` with `http_get`.
-Chrome is NOT required — `http_get` returns full HTML on first access.
-
-## Critical: Bot Detection ("Pardon Our Interruption")
-
-eBay's bot detection fires after roughly **5–10 requests per IP in a short window**.
-The block page is ~13 KB, title `"Pardon Our Interruption..."`, and contains no listing data.
-
-**Always check before parsing:**
-```python
-def is_blocked(html):
-    return 'Pardon Our Interruption' in html or len(html) < 20_000
-
-html = http_get("https://www.ebay.com/sch/i.html?_nkw=laptop&LH_BIN=1", headers=HEADERS)
-if is_blocked(html):
-    raise RuntimeError("eBay bot-detection triggered — back off and retry later")
-```
-
-**When blocked:** wait at minimum 60–120 seconds before retrying. The block is IP-session-scoped,
-not a hard IP ban; it clears after inactivity.
-
-**Headers required (minimal UA gets blocked faster, full browser UA lasts longer):**
-```python
-HEADERS = {
-    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
-    "Accept-Language": "en-US,en;q=0.9",
-    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
-}
-```
-
-A plain `"User-Agent": "Mozilla/5.0"` also works for the first few requests,
-but the full Chrome UA lasts slightly longer before triggering the block.
-
-## Search URL Structure
-
-```
-https://www.ebay.com/sch/i.html?_nkw={query}&{filters}
-```
-
-Confirmed working URL examples:
-```python
-# Buy It Now only, sorted by lowest price
-"https://www.ebay.com/sch/i.html?_nkw=mechanical+keyboard&LH_BIN=1&_sop=15"
-
-# Auctions only
-"https://www.ebay.com/sch/i.html?_nkw=vintage+camera&LH_Auction=1"
-
-# New condition only, page 2
-"https://www.ebay.com/sch/i.html?_nkw=laptop&LH_ItemCondition=1000&_pgn=2"
-```
-
-### Filter Parameters (all confirmed working)
-
-| Parameter | Value | Effect |
-|-----------|-------|--------|
-| `LH_BIN` | `1` | Buy It Now only |
-| `LH_Auction` | `1` | Auctions only |
-| `LH_ItemCondition` | see below | Filter by condition |
-| `_sop` | see below | Sort order |
-| `_pgn` | `2`, `3`, … | Page number (confirmed: returns ~65–88 items/page) |
-| `_ipg` | `25`, `50`, `100`, `200` | Items per page (unconfirmed, standard eBay param) |
-
-### Condition Codes for `LH_ItemCondition`
-
-| Code | Label |
-|------|-------|
-| `1000` | New |
-| `1500` | New Other (open box, no original packaging) |
-| `2000` | Manufacturer Refurbished |
-| `2500` | Seller Refurbished |
-| `2750` | Like New |
-| `3000` | Used |
-| `4000` | Very Good |
-| `5000` | Good |
-| `6000` | Acceptable |
-| `7000` | For parts or not working |
-
-### Sort Codes for `_sop`
-
-| Code | Sort Order |
-|------|-----------|
-| `1` | Best Match (default) |
-| `10` | Ending Soonest |
-| `12` | Newly Listed |
-| `15` | Lowest Price + Shipping |
-| `16` | Highest Price |
-
-### Item Detail URL
-
-```
-https://www.ebay.com/itm/{listing_id}
-```
-
-The listing ID is a plain integer (e.g. `167040158614`). Always strip query parameters
-from extracted URLs — tracking params bloat the URL and are not needed for navigation.
-
-## Search Results: HTML Structure (No JSON-LD)
-
-**JSON-LD is absent on search results pages.** The listing data is embedded in HTML
-with eBay-specific class names. The response is large (~1.5–1.8 MB uncompressed).
-
-### Card Structure
-
-Each result is an `<li>` element with `data-listingid=<id>`. Key elements within each card:
-
-| Data | Pattern |
-|------|---------|
-| Listing ID | `data-listingid=(\d+)` on the `<li>` |
-| Item URL | `href=(https://(?:www\.)?ebay\.com/itm/(\d+))` |
-| Title | `s-card__title` > `su-styled-text primary` > text |
-| Current price | `class=price">\$([0-9,\.]+)<` |
-| Original/list price | `strikethrough[^>]*>\$([0-9,\.]+)` |
-| Image | `class=s-card__image[^>]*src=([^\s>]+)` |
-| Alt title | `img[alt]` in the card (same as product title) |
-
-### Confirmed Extractor (field-tested, 60 items from a single search)
-
-```python
-import re
-
-def extract_search_results(html):
-    """
-    Parse eBay search results HTML into a list of dicts.
-    Returns [] if blocked or no results.
-    """
-    if 'Pardon Our Interruption' in html or len(html) < 20_000:
-        return []
-
-    cards = re.split(r'(?=<li[^>]+data-listingid=)', html)
-    results = []
-    seen_ids = set()
-
-    for card in cards[1:]:  # skip preamble before first card
-        # Listing ID (dedup)
-        lid_m = re.search(r'data-listingid=(\d+)', card)
-        if not lid_m:
-            continue
-        listing_id = lid_m.group(1)
-        if listing_id in seen_ids:
-            continue
-        seen_ids.add(listing_id)
-
-        # Item URL (clean, no tracking params)
-        url_m = re.search(r'href=(https://(?:www\.)?ebay\.com/itm/(\d+))', card)
-        item_url = url_m.group(1).split('?')[0] if url_m else None
-
-        # Title from s-card__title
-        title_m = re.search(r's-card__title[^>]*>.*?primary[^>]*>([^<]+)', card, re.DOTALL)
-        title = title_m.group(1).strip() if title_m else None
-
-        # Skip placeholder "Shop on eBay" stub cards
-        if not title or title == 'Shop on eBay':
-            continue
-
-        # Current price
-        price_m = re.search(r'class=(?:["\'])?[a-z- ]*price["\']?>\$([0-9,\.]+)<', card)
-        if not price_m:
-            price_m = re.search(r'price">\$([0-9,\.]+)<', card)
-        price = '$' + price_m.group(1) if price_m else None
-
-        # Original / list price (strikethrough — present when discounted)
-        orig_m = re.search(r'strikethrough[^>]*>\$([0-9,\.]+)', card)
-        original_price = '$' + orig_m.group(1) if orig_m else None
-
-        # Thumbnail image URL
-        img_m = re.search(r'class=s-card__image[^>]*src=([^\s>]+)', card)
-        image = img_m.group(1) if img_m else None
-
-        results.append({
-            'listing_id': listing_id,
-            'url': item_url,
-            'title': title,
-            'price': price,
-            'original_price': original_price,  # None if not on sale
-            'image': image,
-        })
-
-    return results
-```
-
-**Usage:**
-```python
-from helpers import http_get
-import re
-
-HEADERS = {
-    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
-    "Accept-Language": "en-US,en;q=0.9",
-}
-
-html = http_get("https://www.ebay.com/sch/i.html?_nkw=mechanical+keyboard&LH_BIN=1&_sop=15", headers=HEADERS)
-items = extract_search_results(html)
-print(f"{len(items)} items")
-for item in items[:5]:
-    print(f"  {item['listing_id']} | {item['title'][:50]} | {item['price']}")
-# Output (confirmed): 60 items
-# 168219240588 | One Plus Keyboard 81 Pro Winter Bonfire Mecha... | $159.00
-# 167461643107 | Logitech 920-012869 G515 TKL Wired Low Profil... | $49.99
-# 167040158614 | Logitech - PRO X TKL LIGHTSPEED Wireless Mech... | $74.99
-```
-
-## Item Detail Pages: JSON-LD (Reliable)
-
-Item detail pages at `/itm/{id}` serve **two JSON-LD blocks**: `BreadcrumbList` and `Product`.
-The `Product` schema is the most useful — it contains price, condition, availability, brand, images, and return policy.
-
-```python
-import re, json
-
-def extract_item_detail(html):
-    """
-    Extract structured data from an eBay item page.
-    Returns dict or None if blocked.
-    """
-    if 'Pardon Our Interruption' in html:
-        return None
-
-    ld_blocks = re.findall(r'application/ld\+json[^>]*>(.*?)</script>', html, re.DOTALL)
-    product = None
-    breadcrumbs = []
-
-    for ld_str in ld_blocks:
-        try:
-            d = json.loads(ld_str.strip())
-        except Exception:
-            continue
-
-        if d.get('@type') == 'Product':
-            product = d
-        elif d.get('@type') == 'BreadcrumbList':
-            breadcrumbs = [i.get('name') for i in d.get('itemListElement', [])]
-
-    if not product:
-        return None
-
-    offers = product.get('offers', {})
-    if isinstance(offers, list):
-        offers = offers[0]
-
-    # Schema.org condition URL -> human label
-    CONDITION_MAP = {
-        'NewCondition':          'New',
-        'UsedCondition':         'Used',
-        'RefurbishedCondition':  'Refurbished',
-        'DamagedCondition':      'For Parts / Not Working',
-        'LikeNewCondition':      'Like New',
-        'VeryGoodCondition':     'Very Good',
-        'GoodCondition':         'Good',
-        'AcceptableCondition':   'Acceptable',
-    }
-    cond_url = offers.get('itemCondition', '')
-    cond_key = cond_url.split('/')[-1]  # e.g. "RefurbishedCondition"
-    condition = CONDITION_MAP.get(cond_key, cond_key)
-
-    # List price from priceSpecification (only present when there's a "was" price)
-    price_spec = offers.get('priceSpecification', {})
-    list_price = price_spec.get('price') if price_spec.get('name') == 'List Price' else None
-
-    # Shipping (first destination)
-    shipping_details = offers.get('shippingDetails', [])
-    if shipping_details:
-        shipping_val = shipping_details[0].get('shippingRate', {}).get('value', '')
-        shipping = 'Free' if str(shipping_val) in ('0', '0.0') else f"${shipping_val}"
-    else:
-        shipping = None
-
-    # Return policy
-    return_policies = offers.get('hasMerchantReturnPolicy', [])
-    return_days = return_policies[0].get('merchantReturnDays') if return_policies else None
-
-    return {
-        'listing_id': offers.get('url', '').split('/itm/')[-1],
-        'name': product.get('name'),
-        'brand': product.get('brand', {}).get('name') if isinstance(product.get('brand'), dict) else product.get('brand'),
-        'price': offers.get('price'),
-        'list_price': list_price,     # was-price, None if no discount shown
-        'currency': offers.get('priceCurrency'),
-        'availability': offers.get('availability', '').split('/')[-1],  # e.g. "InStock"
-        'condition': condition,
-        'condition_url': cond_url,
-        'shipping': shipping,
-        'return_days': return_days,
-        'images': product.get('image', []),
-        'gtin13': product.get('gtin13'),
-        'mpn': product.get('mpn'),
-        'color': product.get('color'),
-        'breadcrumbs': breadcrumbs,
-    }
-```
-
-**Field-tested on item 167040158614:**
-```python
-html = http_get("https://www.ebay.com/itm/167040158614", headers=HEADERS)
-detail = extract_item_detail(html)
-# {
-#   'listing_id':   '167040158614',
-#   'name':         'Logitech - PRO X TKL LIGHTSPEED Wireless Mechanical Gaming Keyboard - 920-012118',
-#   'brand':        'Logitech',
-#   'price':        74.99,
-#   'list_price':   '219.99',
-#   'currency':     'USD',
-#   'availability': 'InStock',
-#   'condition':    'Refurbished',
-#   'shipping':     'Free',
-#   'return_days':  30,
-#   'images':       ['https://i.ebayimg.com/images/g/vwsAAeSwEcFpw~hW/s-l1600.jpg', ...],  # 5 images
-#   'gtin13':       '097855189066',
-#   'mpn':          '920-012118',
-#   'color':        'Black',
-#   'breadcrumbs':  ['eBay', 'Electronics', 'Computers/Tablets & Networking', ...],
-# }
-```
-
-### Item Specifics from `ux-textspans` (complementary to JSON-LD)
-
-The `ux-textspans` elements in item pages contain additional data not in JSON-LD,
-including seller name, feedback %, items sold, detailed condition text, and all item specifics.
-
-```python
-import re
-
-def extract_ux_textspans(html):
-    """Return list of all ux-textspans text values from an item page."""
-    return [m.group(1) for m in re.finditer(r'ux-textspans[^>]*>([^<]+)</span>', html)]
-
-# From item 167040158614 (confirmed):
-# Index [3]  -> item title
-# Index [4]  -> subtitle / seller tagline
-# Index [5]  -> seller name ("Logitech")
-# Index [6]  -> seller feedback count ("(20742)")
-# Index [7]  -> seller feedback % ("99.6% positive")
-# Index [10] -> current price ("US $74.99")
-# Index [12] -> list price ("US $219.99")
-# Index [33] -> condition label ("Excellent - Refurbished")
-# Index [36] -> quantity sold ("45 sold")
-# Pairs from [105] onward: item specifics as label/value pairs
-```
-
-## Pagination
-
-Use `_pgn=N` (confirmed working, returns ~65–88 items per page):
-```python
-for page in range(1, 4):
-    url = f"https://www.ebay.com/sch/i.html?_nkw=laptop&LH_BIN=1&_sop=15&_pgn={page}"
-    html = http_get(url, headers=HEADERS)
-    if is_blocked(html):
-        break
-    items = extract_search_results(html)
-    print(f"Page {page}: {len(items)} items")
-    # IMPORTANT: add delay between pages to avoid bot detection
-    time.sleep(3)
-```
-
-**Rate-limit safe pattern**: 3–5 second delay between requests. Beyond ~10 rapid requests
-in a session, eBay returns "Pardon Our Interruption" for all subsequent requests from that IP.
-
-## APIs (All Require Auth or Are Dead)
-
-| API | Status | Notes |
-|-----|--------|-------|
-| Finding API (svcs.ebay.com) | **Dead** — HTTP 500 | Was free/JSONP, no longer works |
-| Browse API (api.ebay.com) | **Requires OAuth** — HTTP 400 | Needs eBay developer account + token |
-| Shopping API (open.api.ebay.com) | **Requires token** | Returns `"Token not available"` error |
-| RSS feed (`_rss=1`) | **Blocked same as HTML** | Returns "Pardon Our Interruption" when rate-limited |
-
-**Bottom line**: There is no public unauthenticated eBay API in 2026. Use HTML scraping.
-
-## Practical Workflow
-
-### Scrape a search and follow top items
-
-```python
-import re, json, time
-from helpers import http_get
-
-HEADERS = {
-    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
-    "Accept-Language": "en-US,en;q=0.9",
-}
-
-def is_blocked(html):
-    return 'Pardon Our Interruption' in html or len(html) < 20_000
-
-# Step 1: Search
-html = http_get(
-    "https://www.ebay.com/sch/i.html?_nkw=mechanical+keyboard&LH_BIN=1&_sop=15&LH_ItemCondition=1000",
-    headers=HEADERS
-)
-if is_blocked(html):
-    raise RuntimeError("Rate limited — wait 60-120s and retry")
-
-items = extract_search_results(html)
-print(f"Found {len(items)} items")
-
-# Step 2: Fetch details for top results (with delay)
-details = []
-for item in items[:5]:
-    time.sleep(3)
-    detail_html = http_get(item['url'], headers=HEADERS)
-    if is_blocked(detail_html):
-        print(f"Blocked on item {item['listing_id']}, stopping")
-        break
-    detail = extract_item_detail(detail_html)
-    if detail:
-        details.append(detail)
-        print(f"  {detail['name'][:50]} | {detail['price']} {detail['currency']} | {detail['condition']}")
-```
-
-## Gotchas
-
-- **"Pardon Our Interruption" is not a CAPTCHA** — it's eBay's bot-detection interstitial. It doesn't require solving — just wait and back off. `'captcha'` does NOT appear in the blocked page.
-
-- **No JSON-LD on search results** — The `application/ld+json` blocks that Amazon and other sites embed are absent from eBay search pages. Parse the HTML using regex on `s-card` class names.
-
-- **JSON-LD IS on item pages** — Two blocks: `BreadcrumbList` and `Product`. The `Product` block is authoritative. Use the regex `r'application/ld\+json[^>]*>(.*?)</script>'` (note the `[^>]*` before `>` — eBay doesn't use `type="..."` quote style consistently in all contexts).
-
-- **Duplicate listing IDs in the HTML** — Each card's listing ID appears 2–3 times (image link, title link, watch button). Always deduplicate using a `seen_ids` set when splitting on `data-listingid`.
-
-- **Placeholder cards ("Shop on eBay")** — The first card slot may be a promoted/placeholder card with title `"Shop on eBay"` and listing ID `"123456"`. Filter these out.
-
-- **Item URLs have tracking params** — Raw extracted URLs look like `https://www.ebay.com/itm/167040158614?_skw=...&epid=...&hash=...&itmprp=...`. Always strip to `itm/{id}` with `.split('?')[0]`.
-
-- **`www.ebay.com` vs `ebay.com`** — Some item URLs in search results omit `www.`. Normalize with `url.replace('//ebay.com/', '//www.ebay.com/')`.
-
-- **Search response is large** — Uncompressed HTML is 1.5–1.8 MB per page. The `http_get` helper handles gzip transparently, so the actual transfer is much smaller, but parsing a 1.8 MB string is slow. Use `re.split` on card boundaries rather than an HTML parser for speed.
-
-- **`_sop` sort and `LH_ItemCondition` require full browser-like UA** — Requests with just `"Mozilla/5.0"` (minimal UA) return empty results for these parameters more quickly than full Chrome UA. Always use the full UA string.
-
-- **Condition in JSON-LD is a schema.org URL** — `offers.itemCondition` returns `"https://schema.org/RefurbishedCondition"`, not a human label. Split on `/` and map the last segment using `CONDITION_MAP` (see `extract_item_detail` above).
-
-- **`list_price` only present when discounted** — `offers.priceSpecification` only appears in JSON-LD when eBay shows a "List Price" comparison. Check `price_spec.get('name') == 'List Price'` before using.
-
-- **Seller data is NOT in JSON-LD** — `d.get('seller')` returns `None` on item pages. The seller name, feedback %, and items sold count are only in `ux-textspans` elements in the HTML body.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/etsy/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/etsy/scraping.md
deleted file mode 100644
index 8b63370e7..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/etsy/scraping.md
+++ /dev/null
@@ -1,506 +0,0 @@
-# Etsy — Scraping & Data Extraction
-
-Field-tested against `www.etsy.com` on 2026-04-18 using `http_get` (no browser) and direct `urllib` probes.
-
-## Quick summary
-
-**`http_get` does NOT work on Etsy.** Every page type — search, listing, shop, category, market — returns HTTP 403 with DataDome bot protection. This is not negotiable: no header combination, User-Agent string, or cookie replay bypasses it. Etsy requires a real browser with JavaScript execution.
-
-- **All HTML pages (`/search`, `/listing/`, `/shop/`, `/c/`, `/market/`)** — HTTP 403, `Server: DataDome`
-- **Official Etsy API v3 (`openapi.etsy.com/v3/`)** — requires a registered API key; returns JSON
-- **`robots.txt`** — HTTP 200, plain text, no DataDome
-- **Browser (Chrome CDP)** — works; Etsy is a React SPA with JSON-LD and `__NEXT_DATA__` embedded in SSR HTML
-
----
-
-## Bot detection: DataDome
-
-Etsy uses [DataDome](https://datadome.co/) for every user-facing HTML endpoint.
-
-### What you receive
-
-```
-HTTP 403 Forbidden
-Server: DataDome
-X-DataDome: protected
-X-DataDome-riskscore: 0.14–0.95  (varies per request)
-X-DD-B: 2
-Content-Type: text/html;charset=utf-8
-Set-Cookie: datadome=<token>; Max-Age=31536000; Domain=.etsy.com; Secure; SameSite=Lax
-```
-
-Body (816 bytes — a JavaScript challenge, not a hard block):
-
-```html
-<html lang="en"><head><title>etsy.com</title>...</head>
-<body>
-  <p id="cmsg">Please enable JS and disable any ad blocker</p>
-  <script>var dd={'rt':'c','cid':'...','hsh':'D013AA...','t':'bv',
-    'host':'geo.captcha-delivery.com','cookie':'...'}</script>
-  <script src="https://ct.captcha-delivery.com/c.js"></script>
-</body>
-```
-
-`'rt':'c'` means **challenge** (browser must run JS at `geo.captcha-delivery.com` to get a valid `datadome` cookie). `'rt':'b'` would be a hard block; `'rt':'i'` an interstitial. All tested requests returned `'rt':'c'` — the JS challenge variant.
-
-### What was tested (all 403)
-
-| URL pattern | Status | DataDome |
-|---|---|---|
-| `/search?q=handmade+candle&explicit=1` | **403** | JS challenge |
-| `/search?q=handmade+candle&explicit=1&page=2` | **403** | JS challenge |
-| `/listing/{id}/{slug}` | **403** | JS challenge |
-| `/shop/{ShopName}` | **403** | JS challenge |
-| `/c/home-living/candles-holders/candles` | **403** | JS challenge |
-| `/market/handmade_candle` | **403** | JS challenge |
-
-### User-Agents tested (all blocked)
-
-- `Mozilla/5.0 (Macintosh; ...) Chrome/120` — **403**
-- `facebookexternalhit/1.1` — **403**
-- `Twitterbot/1.0` — **403**
-- `LinkedInBot/1.0` — **403**
-- `ia_archiver` — **403**
-- `curl/7.68.0` — **403**
-- `python-requests/2.28.0` — **403**
-- `Googlebot/2.1` — **429** (rate-limited, different path)
-- `Mozilla/5.0` (http_get default) — **403**
-
-**Conclusion**: No UA bypasses DataDome. The challenge requires TLS fingerprinting + JS execution that only a real browser provides.
-
----
-
-## What works without a browser
-
-### `robots.txt` (200 OK)
-
-```python
-from helpers import http_get
-text = http_get("https://www.etsy.com/robots.txt")
-# Returns 51 KB plain-text file — no DataDome
-```
-
-The robots.txt reveals URL structure, disallowed parameters, and allowed paths. Etsy disallows `/search?*q=` (no-empty-q searches) and faceted search params (`attr_*`, `price_bucket`, `ship_to`, `search_type`). Basic search with `?q=keyword` is not explicitly disallowed by robots but is blocked by DataDome in practice.
-
-### Official Etsy API v3 (requires API key)
-
-The `openapi.etsy.com/v3/` endpoint is NOT DataDome-protected. It returns structured JSON but requires a free API key from [developer.etsy.com](https://developer.etsy.com/).
-
-```python
-import json
-from helpers import http_get
-
-API_KEY = "your_key_here"  # from developer.etsy.com
-
-def etsy_api(path, **params):
-    from urllib.parse import urlencode
-    qs = urlencode(params)
-    url = f"https://openapi.etsy.com/v3/application/{path}?{qs}"
-    data = http_get(url, headers={"x-api-key": API_KEY})
-    return json.loads(data)
-
-# Search listings
-results = etsy_api("listings/active", limit=25, keywords="handmade candle",
-                   sort_on="created", sort_order="desc")
-# results['results'] is a list of listing dicts
-# results['count'] is total match count
-
-# Get a single listing
-listing = etsy_api("listings/1234567890")
-
-# Get all listings for a shop
-shop_listings = etsy_api("shops/CandlesByNature/listings/active", limit=100)
-
-# Get shop info
-shop = etsy_api("shops/CandlesByNature")
-```
-
-Error without a key:
-```
-HTTP 403: {"error": "Invalid API key: should be in the format 'keystring:shared_secret'."}
-```
-
-Error with wrong key:
-```
-HTTP 403: {"error": "API key not found or not active, or incorrect shared secret for API key."}
-```
-
-### API v3 key data fields
-
-```
-listings/active response:
-  results[i].listing_id        → int (e.g. 1234567890)
-  results[i].title             → string
-  results[i].description       → string (full HTML, may be truncated by API)
-  results[i].price.amount      → int (in currency subunit, e.g. 2599 = $25.99)
-  results[i].price.divisor     → int (100 for USD)
-  results[i].price.currency_code → "USD"
-  results[i].quantity          → int (stock remaining)
-  results[i].tags              → [string] (up to 13 tags)
-  results[i].materials         → [string]
-  results[i].shipping_profile_id → int
-  results[i].shop_id           → int
-  results[i].url               → "https://www.etsy.com/listing/..."
-  results[i].views             → int
-  results[i].num_favorers      → int
-  results[i].featured_rank     → int (-1 if not featured)
-  results[i].is_digital        → bool
-  results[i].has_variations    → bool
-  results[i].taxonomy_id       → int (category)
-  results[i].state             → "active" | "draft" | "expired" | "sold_out"
-  results[i].creation_timestamp → unix int
-  results[i].last_modified_timestamp → unix int
-```
-
----
-
-## Browser-based scraping (required for HTML data)
-
-Since http_get is blocked, all HTML scraping requires the Chrome browser via CDP.
-
-### Navigation pattern
-
-```python
-from helpers import goto, wait_for_load, wait, js, new_tab
-
-# Always use new_tab() for the first Etsy navigation in a session
-tid = new_tab("https://www.etsy.com/search?q=handmade+candle&explicit=1")
-wait_for_load()
-wait(3)  # Etsy React SPA needs extra time after readyState=complete
-```
-
-### Search URL construction
-
-```
-https://www.etsy.com/search?q={query}&explicit=1
-```
-
-Parameters:
-- `q` — search query (URL-encoded, spaces as `+`)
-- `explicit=1` — disables the "adult content" NSFW filter (safe to include always)
-- `page=2`, `page=3` — pagination (confirmed from robots.txt URL patterns)
-- `min_price=10.00&max_price=50.00` — price range filter
-- `order=price_asc` / `order=price_desc` / `order=most_relevant` (default) / `order=newest`
-- `ship_to=US` — filter by shipping destination (CAUTION: disallowed by robots.txt, use only with browser)
-- `listing_type=handmade` / `listing_type=vintage` / `listing_type=supplies`
-
-**Disallowed URL params** (per robots.txt — avoid in automated crawls):
-- `attr_*=*` — attribute filters
-- `price_bucket=*` — price bucket filter
-- `ship_to=*` — shipping destination
-- `search_type=*` — search type
-
-### Search results extraction (browser)
-
-Etsy renders results as a React SPA. The listing cards use data attributes and consistent class patterns:
-
-```python
-results = js("""
-  Array.from(document.querySelectorAll('[data-listing-id]')).map(el => ({
-    listing_id: el.getAttribute('data-listing-id'),
-    title: el.querySelector('h3, [class*="listing-link"]')?.innerText?.trim()
-         || el.querySelector('h2')?.innerText?.trim(),
-    price: el.querySelector('[class*="currency-value"]')?.innerText?.trim()
-         || el.querySelector('.currency-value')?.innerText?.trim(),
-    shop: el.querySelector('[class*="shop-name"], [data-shop-name]')?.innerText?.trim(),
-    url: el.querySelector('a[href*="/listing/"]')?.href,
-    thumbnail: el.querySelector('img[src*="etsystatic"]')?.src,
-    is_ad: !!el.querySelector('[class*="ad-label"], [class*="sponsored"]')
-  })).filter(r => r.listing_id)
-""")
-```
-
-**Alternative — JSON-LD ItemList** (more reliable than DOM selectors):
-
-Etsy's SSR HTML embeds a `<script type="application/ld+json">` with an `ItemList` on search pages. In the browser, extract it via:
-
-```python
-ld_json_str = js("""
-  Array.from(document.querySelectorAll('script[type="application/ld+json"]'))
-    .map(s => { try { return JSON.parse(s.textContent); } catch(e) { return null; } })
-    .filter(d => d && d['@type'] === 'ItemList')[0]
-""")
-# Returns the ItemList object or null
-
-if ld_json_str:
-    # ld_json_str.itemListElement is a list of:
-    # { '@type': 'ListItem', 'position': 1,
-    #   'url': 'https://www.etsy.com/listing/...', 'name': 'Handmade Soy Candle' }
-    for item in ld_json_str.get('itemListElement', []):
-        print(item['position'], item['url'], item.get('name'))
-```
-
-Expected output (ItemList typically has 48 items per search page):
-```
-1  https://www.etsy.com/listing/1234567890/handmade-soy-candle  "Handmade Soy Candle"
-2  https://www.etsy.com/listing/0987654321/beeswax-pillar-candle  "Beeswax Pillar Candle"
-...
-```
-
-### Listing detail page extraction (browser)
-
-```python
-goto_url("https://www.etsy.com/listing/1234567890/product-slug")
-wait_for_load()
-wait(2)
-
-# JSON-LD Product schema (most reliable)
-product = js("""
-  (function() {
-    var scripts = document.querySelectorAll('script[type="application/ld+json"]');
-    for (var s of scripts) {
-      try {
-        var d = JSON.parse(s.textContent);
-        if (d['@type'] === 'Product') return d;
-      } catch(e) {}
-    }
-    return null;
-  })()
-""")
-
-# product fields:
-# product['name']                    → listing title
-# product['description']             → full description
-# product['offers']['price']         → e.g. "25.99"
-# product['offers']['priceCurrency'] → "USD"
-# product['offers']['availability']  → "http://schema.org/InStock"
-# product['brand']['name']           → shop name (seller)
-#   OR product['seller']['name']
-# product['aggregateRating']['ratingValue']  → e.g. "4.8"
-# product['aggregateRating']['reviewCount']  → e.g. 1247
-# product['image']                   → [list of image URLs]
-```
-
-**Fallback DOM selectors** (when JSON-LD is absent or incomplete):
-
-```python
-detail = js("""
-  ({
-    title:   document.querySelector('h1[data-buy-box-listing-title]')?.innerText?.trim()
-           || document.querySelector('h1.wt-text-body-01')?.innerText?.trim(),
-    price:   document.querySelector('[data-selector="price-only"]')?.innerText?.trim()
-           || document.querySelector('[class*="wt-text-title-larger"]')?.innerText?.trim(),
-    shop:    document.querySelector('[class*="shop-name-and-title"] a')?.innerText?.trim()
-           || document.querySelector('a[href*="/shop/"]')?.innerText?.trim(),
-    rating:  document.querySelector('[data-selector="reviews-tab"]')?.innerText?.trim(),
-    reviews: document.querySelector('[class*="wt-display-inline-flex-xs"] .wt-text-body-01')?.innerText?.trim(),
-    sold:    document.querySelector('[class*="wt-text-caption"] span')?.innerText?.trim()
-  })
-""")
-```
-
-### Shop/seller page extraction (browser)
-
-```python
-goto_url("https://www.etsy.com/shop/ShopName")
-wait_for_load()
-wait(2)
-
-# JSON-LD on shop pages (type varies: LocalBusiness, Store, or Organization)
-shop_ld = js("""
-  (function() {
-    for (var s of document.querySelectorAll('script[type="application/ld+json"]')) {
-      try {
-        var d = JSON.parse(s.textContent);
-        if (['LocalBusiness','Store','Organization'].includes(d['@type'])) return d;
-      } catch(e) {}
-    }
-    return null;
-  })()
-""")
-
-# DOM extraction for shop stats
-shop_info = js("""
-  ({
-    name:       document.querySelector('[class*="shop-name"]')?.innerText?.trim(),
-    tagline:    document.querySelector('[class*="shop-tagline"]')?.innerText?.trim(),
-    sales:      document.querySelector('[data-region="shop-sales-count"]')?.innerText?.trim()
-              || document.querySelector('[class*="wt-text-caption"] span')?.innerText?.trim(),
-    admirers:   document.querySelector('[data-wt-shop-admirers]')?.innerText?.trim(),
-    location:   document.querySelector('[class*="shop-location"]')?.innerText?.trim(),
-    listing_count: document.querySelectorAll('[data-listing-id]').length
-  })
-""")
-```
-
-Pagination for shop listings: Etsy loads more listings via infinite scroll or a "Load more" button. After clicking:
-
-```python
-# Check for pagination or load-more
-load_more = js("document.querySelector('[data-wt-shop-listings-load-more], button[class*=\"load-more\"]')?.href")
-if load_more:
-    goto_url(load_more)
-    wait_for_load()
-    wait(2)
-# Or: scroll to bottom to trigger infinite scroll
-js("window.scrollTo(0, document.body.scrollHeight)")
-wait(2)
-```
-
-### Pagination (search results)
-
-```python
-# Etsy uses ?page=N — 48 results per page (standard), up to ~250 pages
-next_url = js("document.querySelector('a[data-wt-search-page-next], .wt-action-group a[rel=\"next\"]')?.href")
-if next_url:
-    goto_url(next_url)
-    wait_for_load()
-    wait(2)
-
-# Or construct directly:
-goto_url(f"https://www.etsy.com/search?q=handmade+candle&explicit=1&page={page_num}")
-wait_for_load()
-wait(2)
-```
-
----
-
-## URL patterns
-
-| URL | Purpose | Notes |
-|---|---|---|
-| `/search?q={query}&explicit=1` | Keyword search | 48 results/page |
-| `/search?q={query}&explicit=1&page={n}` | Pagination | n starts at 2 |
-| `/listing/{id}/{slug}` | Listing detail | slug is optional |
-| `/shop/{ShopName}` | Shop homepage | CamelCase, no spaces |
-| `/shop/{ShopName}/listings` | Shop all listings | |
-| `/c/{category}/{subcategory}` | Category browse | e.g. `/c/jewelry/necklaces` |
-| `/market/{keyword}` | Market/tag page | e.g. `/market/handmade_candle` |
-| `openapi.etsy.com/v3/application/listings/active?keywords={query}` | Official API search | requires API key |
-
----
-
-## Data schema (JSON-LD on listing pages)
-
-When the browser renders a listing, the JSON-LD `Product` schema contains:
-
-```json
-{
-  "@type": "Product",
-  "name": "Handmade Beeswax Taper Candles, Set of 12",
-  "description": "These beautiful hand-dipped candles...",
-  "image": [
-    "https://i.etsystatic.com/12345678/r/il/abc123/1234567890/il_1588xN.1234567890.jpg"
-  ],
-  "brand": { "@type": "Brand", "name": "BeeswaxWonders" },
-  "offers": {
-    "@type": "Offer",
-    "price": "32.00",
-    "priceCurrency": "USD",
-    "availability": "http://schema.org/InStock",
-    "url": "https://www.etsy.com/listing/1234567890/..."
-  },
-  "aggregateRating": {
-    "@type": "AggregateRating",
-    "ratingValue": "4.9",
-    "reviewCount": "847",
-    "bestRating": "5",
-    "worstRating": "1"
-  }
-}
-```
-
-On search results pages, the JSON-LD `ItemList` schema contains:
-
-```json
-{
-  "@type": "ItemList",
-  "itemListElement": [
-    {
-      "@type": "ListItem",
-      "position": 1,
-      "url": "https://www.etsy.com/listing/1234567890/handmade-beeswax-candle",
-      "name": "Handmade Beeswax Taper Candles, Set of 12"
-    }
-  ]
-}
-```
-
-The ItemList only gives URL, position, and name — no price or rating. For full data, follow each URL to the listing detail page.
-
----
-
-## Official Etsy API v3 (recommended alternative)
-
-The official API at `openapi.etsy.com/v3/` bypasses DataDome entirely. It requires a free API key from [developer.etsy.com](https://developer.etsy.com/) (no payment needed; approval is automatic for basic read access).
-
-### Rate limits
-
-- 10,000 requests/day (free tier)
-- No per-second limit documented; add `time.sleep(0.1)` between requests to be safe
-
-### Key endpoints
-
-```
-GET /application/listings/active
-    ?keywords=handmade+candle
-    &limit=100             (max 100)
-    &offset=0              (for pagination)
-    &sort_on=created|price|updated|score
-    &sort_order=asc|desc
-    &taxonomy_id=1234      (category filter)
-    &min_price=10.00
-    &max_price=50.00
-
-GET /application/listings/{listing_id}
-GET /application/listings/{listing_id}/images
-GET /application/listings/{listing_id}/reviews
-GET /application/shops/{shop_id}
-GET /application/shops/{shop_id}/listings/active
-GET /application/users/{user_id}
-GET /application/seller-taxonomy/nodes   (full category tree)
-```
-
-### Pagination with API
-
-```python
-import json, time
-from helpers import http_get
-
-API_KEY = "your_key_here"
-
-def etsy_search(keywords, max_results=200):
-    results = []
-    offset = 0
-    limit = 100
-    while offset < max_results:
-        url = (f"https://openapi.etsy.com/v3/application/listings/active"
-               f"?keywords={keywords}&limit={limit}&offset={offset}"
-               f"&sort_on=score&sort_order=desc")
-        data = json.loads(http_get(url, headers={"x-api-key": API_KEY}))
-        batch = data.get("results", [])
-        if not batch:
-            break
-        results.extend(batch)
-        offset += limit
-        time.sleep(0.1)
-    return results
-```
-
----
-
-## Gotchas
-
-- **`http_get` is completely blocked.** All URL types return HTTP 403 with DataDome JS challenge. No header or cookie combination bypasses it. Only a real Chrome browser with JS execution works.
-
-- **DataDome detects TLS fingerprint.** The challenge runs JavaScript at `geo.captcha-delivery.com`. Even curl with perfect browser headers returns 403 — the HTTP library's TLS handshake is fingerprinted.
-
-- **`explicit=1` is required for general search.** Without it, Etsy may filter adult/mature content from results in unexpected ways (like returning fewer results or a different ordering).
-
-- **48 results per search page.** Etsy's standard search returns 48 listings per page (not 25 or 50). The `page=` param is 1-indexed.
-
-- **Listing slug is optional.** `https://www.etsy.com/listing/1234567890` works without the slug; Etsy redirects to the canonical URL with the full slug.
-
-- **Shop names are case-sensitive in URLs.** `/shop/beeswaxwonders` may not redirect to `/shop/BeeswaxWonders` — use the exact casing from the listing's shop link.
-
-- **JSON-LD `brand` vs `seller`.** Etsy's Product schema uses `brand.name` for the shop name on most listing pages, but some pages use `seller.name` instead. Check both.
-
-- **Price in JSON-LD is a string.** `offers.price` is `"25.99"` (string), not a number. Parse with `float()`.
-
-- **API price is in subunits.** API v3 returns `price.amount = 2599` and `price.divisor = 100`, so the actual price is `amount / divisor = 25.99`. Do NOT use `price.amount` directly.
-
-- **robots.txt disallows `/search?*q=`** (empty query) but allows `/search?q={non-empty}` implicitly — however DataDome blocks all of it regardless of what robots.txt says.
-
-- **`/market/` pages are different from `/search`.** Market pages (`/market/handmade_candle`) are tag-based browse pages with a different layout than keyword search results — same DataDome block applies.
-
-- **Etsy API `description` may include HTML entities.** Unescape with `html.unescape()` before displaying.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/eventbrite/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/eventbrite/scraping.md
deleted file mode 100644
index 518b7af75..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/eventbrite/scraping.md
+++ /dev/null
@@ -1,363 +0,0 @@
-# Eventbrite — Scraping & Data Extraction
-
-`https://www.eventbrite.com` — public event listings and detail pages, no auth required for HTML scraping. REST API requires an OAuth token.
-
-## Do this first
-
-**Use the search listing URL to get event lists — parse the `ItemList` JSON-LD block, not the HTML.**
-
-```python
-import re, json
-
-headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
-html = http_get("https://www.eventbrite.com/d/ca--san-francisco/tech/", headers=headers)
-
-ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
-for block in ld_blocks:
-    parsed = json.loads(block)
-    if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
-        for item in parsed['itemListElement']:
-            ev = item['item']
-            print(ev['name'], ev['startDate'], ev['url'])
-        break
-# Returns 18–40 events per page
-```
-
-**For a single event, fetch the detail page and extract the `Event` JSON-LD block.** It contains all fields including `offers` (pricing). There is also a richer `__NEXT_DATA__` block if you need venue coordinates, refund policy, or sales status.
-
-## URL structure
-
-### Search / listing pages
-
-```
-https://www.eventbrite.com/d/{location}/{category}/
-https://www.eventbrite.com/d/{location}/{category}/?page=2
-https://www.eventbrite.com/d/{location}/{category}/?start_date=2026-05-01&end_date=2026-05-31
-```
-
-**Location format:** `{state-abbreviation}--{city}` (lowercase, hyphens for spaces)
-- `ca--san-francisco`
-- `ny--new-york`
-- `ca--los-angeles`
-- Use `online` for virtual events
-
-**Category slugs (confirmed working):**
-- `tech` — Technology events
-- `music` — Music
-- `food--drink` — Food & Drink
-- `health` — Health & Wellness
-- `sports--fitness` — Sports & Fitness
-- `arts--entertainment` — Arts & Entertainment
-- `family--education` — Family & Education
-- `business--professional` — Business & Networking
-- `science--tech` — Science & Technology
-- `community--culture` — Community & Culture
-- `networking` — Networking
-- `events` — All events (broadest, returns ~40/page)
-
-**Filter slugs (replace category):**
-- `free--events` — Free events only
-- `events--today` — Today
-- `events--tomorrow` — Tomorrow
-- `events--this-weekend` — This weekend
-
-**Query params:**
-- `?page=N` — Pagination (page 2+ confirmed working, each returns 18–20 events)
-- `?start_date=YYYY-MM-DD&end_date=YYYY-MM-DD` — Date range filter (confirmed, narrows results)
-
-### Event detail pages
-
-```
-https://www.eventbrite.com/e/{slug}-tickets-{event_id}
-```
-
-Example: `https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639`
-
-- `event_id` is a numeric string (10–13 digits)
-- Extract with: `re.search(r'-tickets-(\d+)$', url).group(1)`
-- Extract slug with: `re.search(r'/e/(.+)-tickets-\d+$', url).group(1)`
-
-Other TLDs (`.ca`, `.co.uk`, etc.) use the same structure — event IDs are globally unique across TLDs.
-
-## Listing page: JSON-LD `ItemList` schema
-
-The first `<script type="application/ld+json">` block on any `/d/` page is an `ItemList`. Each `itemListElement` contains:
-
-```json
-{
-  "position": 1,
-  "@type": "ListItem",
-  "item": {
-    "@type": "Event",
-    "name": "iContact the tactile tech opera",
-    "description": "An immersive performance...",
-    "url": "https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639",
-    "image": "https://img.evbuc.com/...",
-    "startDate": "2026-06-21",
-    "endDate": "2026-06-21",
-    "eventAttendanceMode": "https://schema.org/OfflineEventAttendanceMode",
-    "location": {
-      "@type": "Place",
-      "name": "Little Boxes Theater",
-      "address": {
-        "@type": "PostalAddress",
-        "addressLocality": "San Francisco",
-        "addressRegion": "CA",
-        "addressCountry": "US",
-        "streetAddress": "94107 1661 Tennessee Street",
-        "postalCode": "94107"
-      },
-      "geo": {
-        "@type": "GeoCoordinates",
-        "latitude": "37.7508806",
-        "longitude": "-122.3881427"
-      }
-    }
-  }
-}
-```
-
-Note: listing-page items do NOT include `offers` (pricing) or `organizer`. Fetch the detail page for those.
-
-The second JSON-LD block on listing pages is a `BreadcrumbList` (skip it).
-
-## Detail page: JSON-LD `Event` schema
-
-The detail page has 4 JSON-LD blocks. The `Event` (or `BusinessEvent`) block is the second one and contains the full schema:
-
-```python
-import re, json
-
-headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
-html = http_get("https://www.eventbrite.com/e/icontact-the-tactile-tech-opera-tickets-1982861003639", headers=headers)
-
-ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
-event_data = None
-for block in ld_blocks:
-    parsed = json.loads(block)
-    if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
-        event_data = parsed
-        break
-
-print(event_data['name'])              # "iContact the tactile tech opera"
-print(event_data['startDate'])         # "2026-06-21T17:05:00-07:00"  (ISO 8601 with TZ)
-print(event_data['endDate'])           # "2026-06-21T20:08:00-07:00"
-print(event_data['eventStatus'])       # "https://schema.org/EventScheduled"
-print(event_data['eventAttendanceMode'])  # "https://schema.org/OfflineEventAttendanceMode"
-print(event_data['location']['name'])  # "Little Boxes Theater"
-print(event_data['location']['address']['streetAddress'])   # "94107 1661 Tennessee Street, San Francisco, CA 94107"
-print(event_data['organizer']['name'])  # "Beth McNamara"
-print(event_data['organizer']['url'])   # "https://www.eventbrite.com/o/beth-mcnamara-120755148166"
-```
-
-Full confirmed schema on detail page:
-```
-name               str     Event title
-description        str     Short summary
-url                str     Canonical event URL
-image              str     Event banner image URL
-startDate          str     ISO 8601 with timezone offset
-endDate            str     ISO 8601 with timezone offset
-eventStatus        str     URI: EventScheduled / EventCancelled / EventPostponed
-eventAttendanceMode str    URI: OfflineEventAttendanceMode / OnlineEventAttendanceMode / MixedEventAttendanceMode
-location.@type     str     "Place" (in-person) or "VirtualLocation" (online)
-location.name      str     Venue name
-location.address.streetAddress   str
-location.address.addressLocality str    City
-location.address.addressRegion   str    State abbreviation
-location.address.addressCountry  str    Country code
-organizer.name     str     Organizer display name
-organizer.url      str     Organizer profile URL
-offers             list    AggregateOffer object(s)
-```
-
-### Offers / pricing
-
-```python
-offers = event_data.get('offers', [])
-if offers:
-    offer = offers[0]   # always a list; typically one AggregateOffer
-    print(offer['@type'])           # "AggregateOffer"
-    print(offer['lowPrice'])        # "50.0"  (string, not float)
-    print(offer['highPrice'])       # "50.0"
-    print(offer['priceCurrency'])   # "USD"
-    print(offer['availability'])    # "InStock" / "SoldOut"
-    print(offer['availabilityStarts'])   # ISO 8601 UTC
-    print(offer['availabilityEnds'])     # ISO 8601 UTC
-
-# Free events: lowPrice="0.0", highPrice="0.0"
-# Free check: float(offer['lowPrice']) == 0.0
-```
-
-`@type` on the event itself varies by format (all scrape identically):
-- `Event` — general
-- `BusinessEvent` — networking, professional
-- `MusicEvent` — concerts
-- `EducationEvent` — classes, workshops
-
-## Detail page: `__NEXT_DATA__` (richer structured data)
-
-Every event detail page embeds a `<script id="__NEXT_DATA__">` block with additional fields not in JSON-LD:
-
-```python
-import re, json
-
-nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-nd = json.loads(nextjs.group(1))
-context = nd['props']['pageProps']['context']
-
-bi = context['basicInfo']
-print(bi['id'])               # "1982861003639"  (event ID string)
-print(bi['name'])             # event title
-print(bi['isFree'])           # bool
-print(bi['isOnline'])         # bool
-print(bi['currency'])         # "USD"
-print(bi['status'])           # "live" / "completed" / "canceled"
-print(bi['organizationId'])   # numeric string
-print(bi['formatId'])         # numeric string (event format category)
-print(bi['isProtected'])      # bool — password-protected events
-print(bi['isSeries'])         # bool — recurring series
-print(bi['created'])          # ISO 8601 UTC creation timestamp
-
-# Venue with coordinates
-venue = bi['venue']
-print(venue['name'])                              # "Little Boxes Theater"
-print(venue['address']['city'])                   # "San Francisco"
-print(venue['address']['region'])                 # "CA"
-print(venue['address']['latitude'])               # "37.7508806"
-print(venue['address']['longitude'])              # "-122.3881427"
-print(venue['address']['localizedMultiLineAddressDisplay'])  # list of strings
-
-# Organizer details
-org = bi['organizer']
-print(org['name'])            # "Beth McNamara"
-print(org['url'])             # organizer profile URL
-print(org['numEvents'])       # int
-print(org['verified'])        # bool
-
-# Sales status
-ss = context['salesStatus']
-print(ss['salesStatus'])      # "on_sale" / "sold_out" / "sales_ended"
-print(ss['startSalesDate']['local'])   # local datetime string
-
-# Good to know
-gtk = context['goodToKnow']['highlights']
-print(gtk['ageRestriction'])          # "18+" or null
-print(gtk['durationInMinutes'])       # int (e.g. 183)
-print(gtk['doorTime'])                # local datetime string or null
-print(gtk['locationType'])            # "in_person" or "online"
-
-# Refund policy
-refund = context['goodToKnow']['refundPolicy']
-print(refund['policyType'])           # "custom" / "no_refunds" / "standard"
-print(refund['isRefundAllowed'])      # bool
-print(refund['validDays'])            # int or null
-
-# Full event description (HTML)
-for module in context['structuredContent']['modules']:
-    if module['type'] == 'text':
-        print(module['text'])         # raw HTML, may need BeautifulSoup to strip tags
-```
-
-## Complete workflow: scrape events from a category
-
-```python
-import re, json
-
-def get_events_from_listing(location, category, page=1):
-    """Returns list of event dicts with name, url, startDate, endDate, location."""
-    headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
-    url = f"https://www.eventbrite.com/d/{location}/{category}/?page={page}"
-    html = http_get(url, headers=headers)
-    ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
-    for block in ld_blocks:
-        parsed = json.loads(block)
-        if isinstance(parsed, dict) and parsed.get('@type') == 'ItemList':
-            return [item['item'] for item in parsed.get('itemListElement', [])]
-    return []
-
-def get_event_detail(event_url):
-    """Returns full Event JSON-LD + NEXT_DATA context for a single event."""
-    headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
-    html = http_get(event_url, headers=headers)
-
-    # JSON-LD Event block
-    ld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
-    event_ld = None
-    for block in ld_blocks:
-        parsed = json.loads(block)
-        if isinstance(parsed, dict) and parsed.get('@type') in ('Event', 'BusinessEvent', 'MusicEvent', 'EducationEvent'):
-            event_ld = parsed
-            break
-
-    # NEXT_DATA context
-    nextjs = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-    context = None
-    if nextjs:
-        nd = json.loads(nextjs.group(1))
-        context = nd['props']['pageProps']['context']
-
-    return event_ld, context
-
-# Usage
-events = get_events_from_listing("ca--san-francisco", "tech", page=1)
-print(f"Found {len(events)} events")  # 18–20 typical
-
-for ev in events[:3]:
-    print(ev['name'], ev['startDate'], ev['url'])
-
-# Deep-fetch one event
-ld, ctx = get_event_detail(events[0]['url'])
-if ld and ld.get('offers'):
-    price = float(ld['offers'][0]['lowPrice'])
-    currency = ld['offers'][0]['priceCurrency']
-    print(f"Price: {price} {currency}")   # 0.0 USD (free) or e.g. 50.0 USD
-```
-
-## Public API: requires auth
-
-The Eventbrite REST API (`https://www.eventbriteapi.com/v3/`) requires an OAuth token for all endpoints:
-
-- `GET /v3/events/{id}/` — HTTP 401 without auth
-- `GET /v3/events/search/` — HTTP 404 (endpoint changed; auth also required)
-
-**Use HTML scraping instead** — the JSON-LD and `__NEXT_DATA__` data is equivalent to the API response and requires no credentials.
-
-If you have a token (`EVENTBRITE_TOKEN`):
-```python
-import os
-token = os.environ.get('EVENTBRITE_TOKEN')
-headers = {
-    "User-Agent": "Mozilla/5.0",
-    "Authorization": f"Bearer {token}"
-}
-data = json.loads(http_get(f"https://www.eventbriteapi.com/v3/events/{event_id}/", headers=headers))
-```
-
-## Gotchas
-
-- **Event URLs in the HTML use relative `/e/` paths, not absolute URLs** — Search listing HTML contains `/e/slug-tickets-id?aff=...` relative paths (with tracking params). Extract event URLs from the JSON-LD `ItemList` instead — they are absolute, clean URLs without tracking params.
-
-- **`re.findall(r'href="https://www.eventbrite.com/e/...')` returns 0 results** — Confirmed: event cards in the HTML do not have `https://www.eventbrite.com/e/` in href attributes. Use JSON-LD extraction only.
-
-- **`__SERVER_DATA__` does not exist** — Both search and detail pages were checked. There is no `window.__SERVER_DATA__` or `window.__redux_state__`. The embedded data is in `<script id="__NEXT_DATA__">` (detail pages only) and JSON-LD (both).
-
-- **Search listing pages have no `__NEXT_DATA__`** — Only event detail pages (`/e/` URLs) have the `__NEXT_DATA__` block. Listing pages (`/d/` URLs) have JSON-LD only.
-
-- **`@type` varies by event format** — Don't filter JSON-LD blocks with `parsed['@type'] == 'Event'` alone. Check for any of: `Event`, `BusinessEvent`, `MusicEvent`, `EducationEvent`. They have identical field structure.
-
-- **`startDate` on listing vs. detail pages differs in precision** — Listing page items show date-only (`"2026-06-21"`). Detail page Event block shows full ISO 8601 with timezone offset (`"2026-06-21T17:05:00-07:00"`). Use detail page for scheduling tasks.
-
-- **`offers` is absent on listing page items** — The `ItemList` does not include pricing. Fetch the detail page for `offers.lowPrice` / `offers.highPrice`.
-
-- **Free events have `lowPrice: "0.0"` and `highPrice: "0.0"`** — Not null or missing. Check `float(offers[0]['lowPrice']) == 0.0` or use `basicInfo.isFree` from `__NEXT_DATA__`.
-
-- **`offers` prices are strings, not floats** — `"50.0"` not `50.0`. Cast with `float(offer['lowPrice'])` before arithmetic.
-
-- **Page size is ~18–20 events per page** — Not a fixed 20. Some pages return fewer. Don't assume page N is empty because it returned < 20.
-
-- **Date filter works but can still return events outside range** — The `?start_date=` / `?end_date=` params narrow results but are not strict; always validate `startDate` from the returned data.
-
-- **Eventbrite CA / UK / AU use different TLDs** — Online event listings may surface `eventbrite.ca`, `eventbrite.co.uk` URLs. The `/e/` structure and JSON-LD schema are identical. Fetch them with the same code.
-
-- **No rate limiting observed** — 8 sequential HTTP requests across 4 pages completed without errors or blocks (avg ~1.5s each). No delay needed for light workloads, but be reasonable for bulk scraping.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/expedia/automation.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/expedia/automation.md
deleted file mode 100644
index 002dede29..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/expedia/automation.md
+++ /dev/null
@@ -1,168 +0,0 @@
-# Expedia — Browser Automation
-
-Field-tested against expedia.co.in on 2026-04-27 using `browser-harness` CDP
-helpers (`goto`, `js`, `click`, `type_text`, `screenshot`).
-
----
-
-## TL;DR
-
-**Build your search via URL parameters, not the UI.** The date picker, destination
-autocomplete, and traveller widgets are fragile—coordinate clicks frequently
-dismiss them or mis-target. Encode everything you can into a `goto()` URL, then
-use JS clicks only for what the URL can't express (child ages, price filters).
-
----
-
-## Hotel Search URL Template
-
-```
-https://www.expedia.co.in/Hotel-Search?destination={DEST}&startDate={YYYY-MM-DD}&endDate={YYYY-MM-DD}&rooms={N}&adults={N}&children={N}&childrenAges={age1,age2,...}
-```
-
-Example — 2 adults, 2 children (ages 5 and 7), Tokyo, June 2026:
-
-```python
-goto("https://www.expedia.co.in/Hotel-Search?"
-     "destination=Central+Tokyo,+Tokyo+Prefecture&"
-     "startDate=2026-06-01&endDate=2026-06-07&"
-     "rooms=1&adults=2&children=2&childrenAges=5,7")
-```
-
-**Note:** `childrenAges` in the URL may not always populate the age dropdowns on
-the results page. Verify with a screenshot and set them via JS if needed.
-
----
-
-## Date Picker — DO NOT USE
-
-The calendar widget is extremely unreliable with coordinate-based clicks:
-
-- Clicking a date cell frequently **closes the entire picker** instead of
-  selecting the date.
-- The picker has month-navigation arrows that are tiny targets.
-- "Flexible dates" mode has a different DOM structure with pill-shaped month
-  selectors that also mis-fire.
-- Dozens of retry attempts across multiple strategies all failed.
-
-**Workaround:** Always pass dates via URL parameters (`startDate`, `endDate`).
-
----
-
-## Travellers Widget
-
-The travellers stepper panel works with JS `.click()` on the increment/decrement
-buttons.
-
-### Opening the panel
-
-```python
-js("""
-(()=>{
-  let btn = document.querySelector('button[data-testid="travelers-field-trigger"]')
-           || [...document.querySelectorAll('button')].find(b => b.textContent.includes('traveller'));
-  if(btn){ btn.click(); return 'opened'; }
-  return 'not found';
-})()
-""")
-```
-
-### Incrementing children count
-
-```python
-js("""
-(()=>{
-  let span = [...document.querySelectorAll('span')].find(s => s.textContent.trim() === 'Children');
-  if(!span) return 'no Children label';
-  let container = span.closest('div').parentElement;
-  let buttons = container.querySelectorAll('button');
-  let plus = buttons[buttons.length - 1];  // last button is "+"
-  plus.click();
-  return 'incremented';
-})()
-""")
-```
-
-### Setting child ages
-
-Child age dropdowns are `<select>` elements with `aria-label` like
-"Child 1 age", "Child 2 age", etc.
-
-```python
-js("""
-(()=>{
-  let selects = document.querySelectorAll('select');
-  // selects[0] = Child 1 age, selects[1] = Child 2 age, etc.
-  let setter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, 'value').set;
-  selects[0].value = '5';
-  selects[0].dispatchEvent(new Event('change', {bubbles:true}));
-  selects[1].value = '7';
-  selects[1].dispatchEvent(new Event('change', {bubbles:true}));
-  return 'ages set';
-})()
-""")
-```
-
-### Closing the panel
-
-```python
-js("""
-(()=>{
-  let btn = [...document.querySelectorAll('button')].find(b => b.textContent.trim() === 'Done');
-  if(btn){ btn.click(); return 'done'; }
-  return 'no Done button';
-})()
-""")
-```
-
----
-
-## Price Filter
-
-On the results page, the nightly-price filter has two text inputs and two range
-sliders.
-
-| Element | Selector |
-|---------|----------|
-| Min text input | `#price-min` |
-| Max text input | `#price-max` |
-| Min range slider | `input[type="range"][aria-label*="Minimum"]` |
-| Max range slider | `input[type="range"][aria-label*="Maximum"]` |
-
-### Setting max price
-
-The most reliable method is to click the input, select all, type the value, and
-press Enter:
-
-```python
-click(x, y)  # coordinates of #price-max
-js("document.getElementById('price-max').select()")
-type_text("20000")
-press_key("Enter")
-```
-
-Setting the value purely via JS (`dispatchEvent`) does trigger a re-search but
-coordinate-click + type is more reliable for actually applying the filter.
-
----
-
-## Key Lessons
-
-1. **URL-first strategy** — Encode destination, dates, room count, adults,
-   children, and child ages in the URL. Only use UI interaction for things the
-   URL cannot express.
-
-2. **JS `.click()` over coordinate clicks** — For buttons inside panels
-   (traveller stepper, Done), find elements by text/attribute and call `.click()`
-   in JS. Coordinate clicks on overlay panels are unreliable.
-
-3. **`dispatchEvent` with `{bubbles: true}`** — Required for React-controlled
-   inputs (selects, text fields). Without bubbling, React state won't update.
-
-4. **Wait after navigation** — After `goto()` or pressing Search, call
-   `wait_for_load()` + `time.sleep(3)` before interacting. Expedia loads
-   results asynchronously.
-
-5. **Indian locale** — `expedia.co.in` shows prices in ₹ (INR). The price
-   filter values include the ₹ symbol and commas in the text input but the
-   underlying range slider uses plain integers.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/facebook/groups.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/facebook/groups.md
deleted file mode 100644
index daf605ac0..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/facebook/groups.md
+++ /dev/null
@@ -1,236 +0,0 @@
-# Facebook Groups — mining feeds for posts + external URLs
-
-Field-tested against a logged-in Jay account on 2026-04-18.
-**Requires:** Browser Harness driving a real Chrome that is (a) signed into
-Facebook and (b) already a member of the target group. Non-member or logged-out
-views serve a stripped landing page with no post content.
-
-## What this skill is for
-
-1. Pull the N most recent posts from a named FB group
-2. Harvest every external URL that members have shared
-3. Hand that URL list to Firecrawl (or `http_get`) for structured scraping at scale
-4. Cache post text + author + timestamp for downstream keyword matching
-
-It is NOT for: replying in groups, DMing members, or any write action.
-
-## URL patterns
-
-| What | URL |
-|------|-----|
-| Group main feed | `https://www.facebook.com/groups/{id_or_slug}` |
-| Group "Discussion" tab (canonical feed) | `https://www.facebook.com/groups/{id_or_slug}/?sorting_setting=CHRONOLOGICAL` |
-| Single post (permalink) | `https://www.facebook.com/groups/{id_or_slug}/posts/{post_id}/` |
-| User's joined-groups feed | `https://www.facebook.com/groups/feed/` |
-| List of YOUR groups | `https://www.facebook.com/groups/joins/` |
-
-The `?sorting_setting=CHRONOLOGICAL` flag matters — without it, FB inserts an
-algorithmic ranking that hides older posts and shows the same handful of "popular"
-items every visit, which kills monitoring use cases.
-
-## DOM anchors (verified 2026-04-18)
-
-FB rewrites class names every few weeks but ARIA roles and stable URL patterns
-hold up well. Anchor on those, not on hashed CSS classes.
-
-| Anchor | Selector | Notes |
-|--------|----------|-------|
-| Each post container | `div[role="article"]` | Stable. One per visible post. |
-| Post permalink | `a[href*="/groups/"][href*="/posts/"], a[href*="/groups/"][href*="/permalink/"]` | First match per article = the post link |
-| Post body text | `div[data-ad-preview="message"], div[data-ad-comet-preview="message"]` | One of these is the visible body |
-| Post author | `h3 a, h4 a` (first inside the article) | Falls back to `strong a` |
-| Post timestamp | `a[href*="/posts/"] abbr, a[role="link"] > span > span` (relative time text) | Hover gets the absolute time but the relative string is fine for sorting |
-| External link (FB redirector) | `a[href^="https://l.facebook.com/l.php?u="]` | Decode the `u=` param to get the real URL |
-| "See more" button on long posts | `div[role="button"]:has(span:contains("See more"))` (use XPath fallback if `:has` is unsupported) | Click before reading body or posts get truncated |
-
-If selectors stop returning results, run the self-inspection block at the bottom
-of this file and update this table — that's the workflow, not a fallback.
-
-
-## Scrolling the feed (lazy load)
-
-FB virtualizes the feed: scrolled-past posts get unmounted from the DOM. So
-"scroll then collect" misses old posts. Pattern that works: **collect-as-you-go.**
-
-```python
-seen = {}  # post_url -> dict
-TARGET = 50  # how many posts to collect
-MAX_SCROLLS = 30
-
-for i in range(MAX_SCROLLS):
-    new_posts = js("""
-      Array.from(document.querySelectorAll('div[role="article"]')).map(el => {
-        const link = el.querySelector('a[href*="/groups/"][href*="/posts/"], a[href*="/groups/"][href*="/permalink/"]');
-        const body = el.querySelector('div[data-ad-preview="message"], div[data-ad-comet-preview="message"]');
-        const author = el.querySelector('h3 a, h4 a, strong a');
-        const time = el.querySelector('abbr, a[role="link"] > span > span');
-        const externals = Array.from(el.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]'))
-          .map(a => a.href);
-        return {
-          url: link?.href || null,
-          author: author?.innerText || null,
-          time: time?.innerText || null,
-          body: body?.innerText?.slice(0, 4000) || null,
-          externals: externals,
-        };
-      }).filter(p => p.url)
-    """) or []
-    for p in new_posts:
-        seen.setdefault(p["url"], p)
-    if len(seen) >= TARGET:
-        break
-    scroll(640, 400, dy=900)  # scroll near middle of viewport
-    wait(2.5)  # FB needs ~2s to render new batch + a little buffer
-```
-
-`wait(2.5)` is the floor. Faster than that and you'll see empty post containers
-because React hasn't hydrated them yet.
-
-
-## Decoding the external-URL redirector
-
-Every external link gets wrapped in `https://l.facebook.com/l.php?u={URL-encoded real URL}&h=...`.
-You want the real URL, not the redirector.
-
-```python
-from urllib.parse import urlparse, parse_qs, unquote
-def decode_fb_link(href):
-    if not href.startswith("https://l.facebook.com/l.php"):
-        return href
-    q = parse_qs(urlparse(href).query)
-    return unquote(q["u"][0]) if "u" in q else href
-```
-
-## Handoff to Firecrawl (for the public outbound URLs)
-
-Once you have the harvested external list, those URLs are outside FB's walled
-garden — public, scrapable by anything. Firecrawl's schema-native extraction
-shines here because you want typed results across heterogeneous sources.
-
-```python
-# After the scroll loop:
-external_urls = []
-for p in seen.values():
-    for raw in p["externals"]:
-        external_urls.append(decode_fb_link(raw))
-external_urls = sorted(set(external_urls))
-print(f"harvested {len(external_urls)} unique external URLs")
-
-# Hand off to Firecrawl MCP in the calling conversation:
-#   firecrawl_extract(
-#       urls=external_urls,
-#       prompt="Extract product/listing name, price, location, year, and key features.",
-#       schema={...}
-#   )
-```
-
-When Firecrawl isn't available or the pages are simple, `http_get(url)` from
-Harness itself is fine — it does a plain HTTP fetch without a browser, works
-for static pages, and is the fastest option for bulk.
-
-
-## Rate-limit discipline
-
-FB notices automation patterns at the account level, not the IP level. Driving
-a real logged-in session means Jay's account is the one getting rate-limited if
-you get greedy. Keep these floors:
-
-- **≥2 seconds between scrolls** in the collect loop (the `wait(2.5)` above)
-- **≥3 seconds between groups** if you're sweeping multiple
-- **No more than ~6 groups per hour** for sustained monitoring
-- **Don't open the same group more than every 15 minutes** — repeated visits
-  within a short window is a heuristic that triggers checkpoints
-
-Symptoms of over-pacing: article containers start rendering with empty bodies,
-`/groups/{id}/` redirects to `/checkpoint/`, or the account briefly gets asked
-to re-verify a phone or confirm a login from a new device. If that happens,
-**stop immediately** and let Jay deal with the UI — don't try to auto-resolve.
-
-## Self-inspection block (run this when selectors stop working)
-
-Paste this into a Harness stdin block to see what anchors currently exist in the
-visible feed. Run it on a group you're a member of.
-
-```python
-print(js("""
-  ({
-    articles: document.querySelectorAll('div[role="article"]').length,
-    body_preview_a: document.querySelectorAll('div[data-ad-preview="message"]').length,
-    body_preview_b: document.querySelectorAll('div[data-ad-comet-preview="message"]').length,
-    external_redirectors: document.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]').length,
-    permalink_posts: document.querySelectorAll('a[href*="/groups/"][href*="/posts/"]').length,
-    permalink_permalinks: document.querySelectorAll('a[href*="/groups/"][href*="/permalink/"]').length,
-  })
-"""))
-# If any count is 0, the selector drifted. Open DevTools, right-click a visible
-# post, inspect, find the new stable attribute (aria-*, data-*), and update the
-# DOM anchors table above.
-```
-
-
-## Full example — mine one group, emit JSON for downstream tools
-
-```bash
-cd ~/Developer/browser-harness && uv run browser-harness <<'PY'
-import json, sys
-from urllib.parse import urlparse, parse_qs, unquote
-
-GROUP = "riceLakeBoating"          # slug or numeric id
-TARGET = 50                         # how many posts to collect
-MAX_SCROLLS = 30
-
-goto_url(f"https://www.facebook.com/groups/{GROUP}/?sorting_setting=CHRONOLOGICAL")
-wait_for_load()
-wait(2)
-
-# Abort if FB bounced us
-info = page_info()
-if "/checkpoint/" in info["url"] or "/login" in info["url"]:
-    sys.exit("AUTH_WALL — stop and have Jay re-verify the account.")
-
-seen = {}
-for _ in range(MAX_SCROLLS):
-    batch = js("""
-      Array.from(document.querySelectorAll('div[role="article"]')).map(el => {
-        const link = el.querySelector('a[href*="/groups/"][href*="/posts/"], a[href*="/groups/"][href*="/permalink/"]');
-        const body = el.querySelector('div[data-ad-preview="message"], div[data-ad-comet-preview="message"]');
-        const author = el.querySelector('h3 a, h4 a, strong a');
-        const time = el.querySelector('abbr, a[role="link"] > span > span');
-        const externals = Array.from(el.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]')).map(a => a.href);
-        return { url: link?.href, author: author?.innerText, time: time?.innerText,
-                 body: body?.innerText?.slice(0, 4000), externals };
-      }).filter(p => p.url)
-    """) or []
-    for p in batch:
-        seen.setdefault(p["url"], p)
-    if len(seen) >= TARGET:
-        break
-    scroll(640, 400, dy=900)
-    wait(2.5)
-
-def decode(u):
-    if not u.startswith("https://l.facebook.com/l.php"): return u
-    q = parse_qs(urlparse(u).query)
-    return unquote(q["u"][0]) if "u" in q else u
-
-posts = list(seen.values())
-all_externals = sorted({decode(x) for p in posts for x in p["externals"]})
-capture_screenshot(f"/tmp/fb-group-{GROUP}.png", full=True)
-print(json.dumps({
-    "group": GROUP,
-    "post_count": len(posts),
-    "posts": posts,
-    "external_urls": all_externals,
-}, ensure_ascii=False))
-PY
-```
-
-The JSON on stdout is the handoff payload — parse it in the calling agent and
-route `external_urls` into `firecrawl_extract` with whatever schema matches the
-downstream task (competitor inventory, pricing intel, boat listings, etc).
-
-## Gotchas log (append when you hit something new)
-
-- **2026-04-18:** Fresh install verified. People-search URL requires login;
-  page search `/search/pages/?q=` works the same way. Groups feed defaults to
-  algorithmic sort — always append `?sorting_setting=CHRONOLOGICAL`.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/facebook/pages.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/facebook/pages.md
deleted file mode 100644
index 4b0fca38a..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/facebook/pages.md
+++ /dev/null
@@ -1,295 +0,0 @@
-# Facebook Pages — mining a public Page's feed for posts + external URLs
-
-Companion to `groups.md`. Most of the DOM surface is shared because FB renders
-post articles from the same React component in both contexts — the differences
-are the **URL shapes**, the **sort options**, and the **rate-limit ceiling**
-(Pages are public, so FB is a little more forgiving than in member-gated Groups).
-
-**Requires:** a real Chrome driven by Browser Harness. Logged-in is recommended
-but not strictly required — FB Pages are public. Logged-out sessions get more
-aggressive "see more" gating and an interstitial login prompt that breaks the
-scroll loop after ~5 posts. Stay signed in.
-
-## What this skill is for
-
-1. Pull the N most recent posts from a named FB Page (brand, publisher, local business)
-2. Harvest every external URL the Page has linked out to
-3. Grab Page metadata — follower count, category, website, verified status
-4. Hand the outbound URL list to Firecrawl (or `http_get`) for downstream extraction
-
-It is NOT for: leaving comments, reacting, messaging the Page, or any write action.
-
-## URL patterns
-
-Pages can be addressed by either a vanity slug (`/BoatingOntario.ca`) or a
-numeric Page ID (`/100064...`). Vanity is more legible; numeric is more stable
-(vanities can be changed by the page owner).
-
-| What | URL |
-|------|-----|
-| Page main feed (default tab) | `https://www.facebook.com/{vanity_or_id}` |
-| Page Posts tab (canonical post feed) | `https://www.facebook.com/{vanity_or_id}/posts` |
-| Page About | `https://www.facebook.com/{vanity_or_id}/about` |
-| Page Reviews | `https://www.facebook.com/{vanity_or_id}/reviews` |
-| Page Videos | `https://www.facebook.com/{vanity_or_id}/videos` |
-| Page Events | `https://www.facebook.com/{vanity_or_id}/events` |
-| Single post (vanity permalink) | `https://www.facebook.com/{vanity_or_id}/posts/pfbid{...}` |
-| Single post (legacy permalink) | `https://www.facebook.com/permalink.php?story_fbid={story_id}&id={page_id}` |
-| Single post (story permalink) | `https://www.facebook.com/story.php?story_fbid={story_id}&id={page_id}` |
-| Page-search (find a Page by name) | `https://www.facebook.com/search/pages/?q={query}` |
-
-Unlike Groups, Pages do **not** support `?sorting_setting=CHRONOLOGICAL` — the
-Posts tab is the closest thing to a chronological view, and it's reverse-chrono
-by default. Don't rely on perfect ordering: pinned posts always appear first,
-and FB occasionally reorders the top few based on engagement.
-
-## DOM anchors
-
-Post-article anchors are **the same as groups.md** because the feed component
-is shared. Page-chrome anchors (header, about-rail, tabs) are specific to Pages.
-
-| Anchor | Selector | Notes |
-|--------|----------|-------|
-| Page display name | `h1` (first on page) | Stable — FB has rendered Page name as the top-level `h1` for years |
-| Verified badge | `h1 svg[aria-label*="Verified"]` | Present on verified Pages only |
-| Follower/like count | `a[href$="/followers/"], a[href$="/friends_likes/"]` | Text node contains the count — parse with a regex |
-| Category line | `div[role="main"] span:has(a[href*="/pages/category/"])` | Sits under the name in the header |
-| Website link in header | `a[href^="https://l.facebook.com/l.php"][href*="u="]` inside the About rail | Same redirector wrapper as post links — decode before using |
-| Each post container | `div[role="article"]` | Same as groups |
-| Post permalink | `a[href*="/posts/"][href*="pfbid"], a[href*="/permalink.php"], a[href*="/story.php"]` | Page posts use `pfbid...` style or the legacy `permalink.php`/`story.php` shapes |
-| Post body text | `div[data-ad-preview="message"], div[data-ad-comet-preview="message"]` | Same as groups |
-| Post author | `h3 a, h4 a, strong a` | On a Page, this is always the Page itself — useful only for sanity checking you're still on the right Page |
-| Post timestamp | `a[href*="/posts/"] abbr, a[role="link"] > span > span` | Hover returns absolute time; relative string is fine for sorting |
-| External link (FB redirector) | `a[href^="https://l.facebook.com/l.php?u="]` | Decode the `u=` param |
-| "See more" on long posts | `div[role="button"]:has(span:contains("See more"))` | Click before reading body or posts get truncated |
-
-If a selector stops returning results, run the self-inspection block at the
-bottom and update this table — that's the workflow, not a fallback.
-
-## Extracting Page metadata (header block)
-
-Unlike a Group, a Page's header carries useful signal on its own — category,
-verified, follower count, website. Pull it in one JS call before you start
-scrolling the feed.
-
-```python
-meta = js("""
-  ({
-    name: document.querySelector('h1')?.innerText || null,
-    verified: !!document.querySelector('h1 svg[aria-label*="Verified"]'),
-    followers: (Array.from(document.querySelectorAll('a'))
-      .find(a => /followers$/.test(a.getAttribute('href')||''))?.innerText) || null,
-    likes: (Array.from(document.querySelectorAll('a'))
-      .find(a => /friends_likes$/.test(a.getAttribute('href')||''))?.innerText) || null,
-    category: (Array.from(document.querySelectorAll('a[href*="/pages/category/"]'))[0]?.innerText) || null,
-    website_redirector: (Array.from(document.querySelectorAll('a[href^="https://l.facebook.com/l.php"]'))
-      .find(a => !a.closest('div[role="article"]'))?.href) || null,
-  })
-""")
-```
-
-Decode `website_redirector` with the same helper as post links (see below).
-
-## Scrolling the feed (lazy load)
-
-Same collect-as-you-go pattern as groups. FB virtualizes the Page feed too —
-scrolled-past posts unmount, so scroll-then-collect loses them.
-
-```python
-seen = {}  # permalink -> dict
-TARGET = 50
-MAX_SCROLLS = 30
-
-for i in range(MAX_SCROLLS):
-    batch = js("""
-      Array.from(document.querySelectorAll('div[role="article"]')).map(el => {
-        const link = el.querySelector('a[href*="/posts/"][href*="pfbid"], a[href*="/permalink.php"], a[href*="/story.php"]');
-        const body = el.querySelector('div[data-ad-preview="message"], div[data-ad-comet-preview="message"]');
-        const time = el.querySelector('abbr, a[role="link"] > span > span');
-        const externals = Array.from(el.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]'))
-          .map(a => a.href);
-        return {
-          url: link?.href || null,
-          time: time?.innerText || null,
-          body: body?.innerText?.slice(0, 4000) || null,
-          externals: externals,
-        };
-      }).filter(p => p.url)
-    """) or []
-    for p in batch:
-        seen.setdefault(p["url"], p)
-    if len(seen) >= TARGET:
-        break
-    scroll(640, 400, dy=900)
-    wait(2.5)
-```
-
-Notes:
-- Page feeds are usually **less dense** than active Group feeds — a slow Page
-  may only render 8–15 posts total before you hit the footer. Use
-  `if len(batch) == 0 for two consecutive iterations` as a stop condition.
-- Pinned posts re-appear at the top on every fresh load. The `seen` dict
-  dedupes them naturally via permalink.
-
-## Decoding the external-URL redirector
-
-Identical to groups.md — every outbound link is wrapped in
-`https://l.facebook.com/l.php?u={URL-encoded real URL}&h=...`. Strip the wrapper.
-
-```python
-from urllib.parse import urlparse, parse_qs, unquote
-def decode_fb_link(href):
-    if not href.startswith("https://l.facebook.com/l.php"):
-        return href
-    q = parse_qs(urlparse(href).query)
-    return unquote(q["u"][0]) if "u" in q else href
-```
-
-## Handoff to Firecrawl
-
-Same pattern as groups — Pages are the walled-garden surface that Harness is
-good at; the external URLs the Page has shared are public and better suited to
-Firecrawl's schema-native extraction.
-
-```python
-external_urls = sorted({decode_fb_link(x) for p in seen.values() for x in p["externals"]})
-print(f"harvested {len(external_urls)} unique external URLs from Page")
-# In the calling conversation:
-#   firecrawl_extract(urls=external_urls, prompt="...", schema={...})
-```
-
-## Rate-limit discipline
-
-Pages are public, so the ceiling is higher than Groups — but the account-level
-detection still applies, because you're driving a real logged-in session.
-
-- **≥2 seconds between scrolls** inside the collect loop
-- **≥2 seconds between Pages** if you're sweeping multiple (down from 3s for Groups)
-- **No more than ~12 Pages per hour** for sustained monitoring (up from 6 Groups/hr)
-- **Don't re-open the same Page within 10 minutes** — repeated hits inside a
-  short window is a heuristic that triggers soft-throttling even on public content
-
-Symptoms of over-pacing: the "See more" links on long posts stop being clickable,
-the login interstitial appears even though you're signed in, or the URL silently
-redirects to `/login/device-based/`. If any of those fire, **stop**, let Jay look
-at the screen, and don't try to auto-resolve.
-
-## Self-inspection block (run when selectors stop working)
-
-```python
-print(js("""
-  ({
-    articles: document.querySelectorAll('div[role="article"]').length,
-    body_preview_a: document.querySelectorAll('div[data-ad-preview="message"]').length,
-    body_preview_b: document.querySelectorAll('div[data-ad-comet-preview="message"]').length,
-    external_redirectors: document.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]').length,
-    pfbid_posts: document.querySelectorAll('a[href*="/posts/"][href*="pfbid"]').length,
-    permalink_php: document.querySelectorAll('a[href*="/permalink.php"]').length,
-    story_php: document.querySelectorAll('a[href*="/story.php"]').length,
-    h1_present: !!document.querySelector('h1'),
-  })
-"""))
-# If any count is 0 on a Page you know has posts, the selector drifted.
-# Open DevTools, inspect a post, find the new stable attribute, update the
-# DOM anchors table above.
-```
-
-## Full example — mine one Page, emit JSON for downstream tools
-
-```bash
-cd ~/Developer/browser-harness && uv run browser-harness <<'PY'
-import json, sys
-from urllib.parse import urlparse, parse_qs, unquote
-
-PAGE = "BoatingOntario.ca"   # vanity slug OR numeric Page ID
-TARGET = 30
-MAX_SCROLLS = 25
-
-goto_url(f"https://www.facebook.com/{PAGE}/posts")
-wait_for_load()
-wait(3)
-
-info = page_info()
-if "/checkpoint/" in info["url"] or "/login" in info["url"]:
-    sys.exit("AUTH_WALL — stop and have the account re-verify.")
-
-# Header metadata
-meta = js("""
-  ({
-    name: document.querySelector('h1')?.innerText || null,
-    verified: !!document.querySelector('h1 svg[aria-label*="Verified"]'),
-    category: (Array.from(document.querySelectorAll('a[href*="/pages/category/"]'))[0]?.innerText) || null,
-    followers: (Array.from(document.querySelectorAll('a'))
-      .find(a => /followers$/.test(a.getAttribute('href')||''))?.innerText) || null,
-    website_redirector: (Array.from(document.querySelectorAll('a[href^="https://l.facebook.com/l.php"]'))
-      .find(a => !a.closest('div[role="article"]'))?.href) || null,
-  })
-""")
-
-# Feed sweep
-seen = {}
-empty_streak = 0
-for _ in range(MAX_SCROLLS):
-    batch = js("""
-      Array.from(document.querySelectorAll('div[role="article"]')).map(el => {
-        const link = el.querySelector('a[href*="/posts/"][href*="pfbid"], a[href*="/permalink.php"], a[href*="/story.php"]');
-        const body = el.querySelector('div[data-ad-preview="message"], div[data-ad-comet-preview="message"]');
-        const time = el.querySelector('abbr, a[role="link"] > span > span');
-        const externals = Array.from(el.querySelectorAll('a[href^="https://l.facebook.com/l.php?u="]')).map(a => a.href);
-        return { url: link?.href, time: time?.innerText,
-                 body: body?.innerText?.slice(0, 4000), externals };
-      }).filter(p => p.url)
-    """) or []
-    before = len(seen)
-    for p in batch:
-        seen.setdefault(p["url"], p)
-    empty_streak = empty_streak + 1 if len(seen) == before else 0
-    if len(seen) >= TARGET or empty_streak >= 2:
-        break
-    scroll(640, 400, dy=900)
-    wait(2.5)
-
-def decode(u):
-    if not u.startswith("https://l.facebook.com/l.php"): return u
-    q = parse_qs(urlparse(u).query)
-    return unquote(q["u"][0]) if "u" in q else u
-
-posts = list(seen.values())
-if meta.get("website_redirector"):
-    meta["website"] = decode(meta["website_redirector"])
-all_externals = sorted({decode(x) for p in posts for x in p["externals"]})
-capture_screenshot(f"/tmp/fb-page-{PAGE}.png", full=True)
-print(json.dumps({
-    "page": PAGE,
-    "meta": meta,
-    "post_count": len(posts),
-    "posts": posts,
-    "external_urls": all_externals,
-}, ensure_ascii=False))
-PY
-```
-
-The stdout JSON is the handoff payload — parse it in the calling agent and
-route `external_urls` into `firecrawl_extract`, route `meta` into a
-competitor-intel table, or feed `posts` into keyword matching.
-
-## When to reach for pages.md vs groups.md
-
-| If the URL is... | Use |
-|------------------|-----|
-| `facebook.com/groups/{id_or_slug}` | `groups.md` |
-| `facebook.com/{vanity}` or `facebook.com/{numeric_id}` | `pages.md` |
-| `facebook.com/profile.php?id={id}` | neither — that's a **personal profile**, different DOM and much stricter rate limits |
-| `facebook.com/marketplace/...` | neither — dedicated Marketplace skill needed |
-
-A quick way to tell Pages from personal profiles when the URL shape is
-ambiguous: Pages have an `h1` with a verified-badge slot and a category link
-underneath; personal profiles have a cover photo component and a "Friends" tab.
-
-## Gotchas log (append when you hit something new)
-
-- **Initial version:** Post-article selectors inherited from `groups.md` because
-  FB renders the feed article component identically across Group and Page
-  contexts. Run the self-inspection block on first live use to confirm no drift
-  since the groups.md verification date, and append a note here with what you
-  found.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/framer/editor.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/framer/editor.md
deleted file mode 100644
index d7992491e..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/framer/editor.md
+++ /dev/null
@@ -1,108 +0,0 @@
----
-name: framer-editor
-description: Framer.com web editor (framer.com/projects/...) — DOM selectors, Monaco code-editor workflow, publish quirks, and the things Framer's React canvas will not let automation touch.
----
-
-# Framer — web editor
-
-Framer is a visual web builder with a code-editor side (Monaco) and a React-canvas side. Automation lives where both meet: DOM interactions succeed on the shell, but the canvas rejects synthetic events on many widgets.
-
-URL shape: `https://framer.com/projects/<project-slug>-<id>` (teamId query param after login). The left rail is tab-driven; the right rail is context-sensitive.
-
-## Stable DOM selectors (verified April 2026)
-
-| Purpose | Selector |
-|---|---|
-| Pages tab (left rail) | `[data-testid="pages-tab"]` |
-| Layers tab | `[data-testid="layers-tab"]` |
-| Assets tab | `[data-testid="assets-tab"]` |
-| Page row in Pages panel | `[data-testid="page-row"]` |
-| Asset row in Assets panel | `[data-testid="asset-row"]` |
-| Monaco code editor input | `.monaco-editor textarea` |
-| Monaco rendered lines | `.view-line` |
-| Monaco error underlines | `.squiggly-error` |
-| Project menu button | `[data-testid="projectbar-menu-button"]` |
-
-Prefer `data-testid` over CSS-module class names — the latter are minified and change across deploys.
-
-## Opening a code file in Monaco
-
-Single-click on an `asset-row` only selects it; you need a double-click to open the editor. A plain `element.click()` is not enough — the canvas listens for a full pointer+mouse event chain:
-
-```javascript
-const row = document.querySelector('[data-testid="asset-row"][title="VacaturesApp"]');
-for (const type of ['pointerdown', 'mousedown', 'pointerup', 'mouseup', 'click']) {
-  row.dispatchEvent(new MouseEvent(type, {bubbles: true, cancelable: true, view: window}));
-}
-row.dispatchEvent(new MouseEvent('dblclick', {bubbles: true, cancelable: true, view: window, detail: 2}));
-```
-
-`detail: 2` matters — without it, Framer treats it as two unrelated single-clicks.
-
-## Pasting into Monaco
-
-Monaco does not accept programmatic `.value =` — it ignores the assignment. The reliable path is clipboard + keystroke:
-
-1. Put the new file contents on the clipboard (e.g. via `pbcopy` on macOS or an OS-level clipboard write).
-2. Focus the Monaco textarea: `document.querySelector('.monaco-editor textarea').focus()`.
-3. OS-level `Cmd+A` (select all) → `Cmd+V` (paste).
-4. **Wait ~3 seconds**. Monaco applies the paste asynchronously; saving before the paste commits produces an empty file.
-5. OS-level `Cmd+S` to save.
-6. Verify: scroll to top (`Cmd+Up`), dump `document.querySelectorAll('.view-line')` text content, and confirm `document.querySelectorAll('.squiggly-error').length === 0`.
-
-The OS-level keystrokes require an accessibility-permitted input path (macOS System Events, Linux xdotool, etc.). JS-dispatched `KeyboardEvent` does not trigger Monaco's bindings.
-
-## The Publish button
-
-The green **Publish** button in the top-right is only rendered when a page is selected in the Pages tab. While you are inside the Monaco code editor, the button is absent from the DOM. The workflow that works:
-
-1. Pages tab → click the page you want to publish.
-2. **Now** the Publish button is mounted.
-3. Click it by screen coordinates (via OS-level click), not by synthetic event — the React handler on that button ignores JS-dispatched clicks.
-
-This is the single most common "automation silently did nothing" trap in Framer.
-
-## Right-click / context menus
-
-```javascript
-row.dispatchEvent(new MouseEvent('contextmenu', {
-  bubbles: true, cancelable: true, view: window,
-  button: 2, buttons: 2,
-}));
-```
-
-The menu renders into a portal; it is not a child of `row`. Navigate items with OS-level arrow-down keystrokes (`key code 125` via AppleScript `System Events`) + Return. Rename flow: context menu → Rename → `Cmd+A` → type new name → Return.
-
-## What Framer will not let automation do
-
-These are canvas-level React interactions that reject synthetic events or use modal right-panel state that is not DOM-traversable:
-
-- **Drag-drop** — component insertion, layer reordering, cross-hierarchy moves.
-- **Smart Component variant switching** — the On Tap → Change Variant setup lives in a modal nested panel; no stable selector path.
-- **Property binding** (the chain icon on a code-component prop) — exposes only `Fetch (HTTP)` and `Create Variable`; no CMS-field binding available from the UI, let alone scriptable.
-- **Page Settings** (SEO title / description / canonical / OG image / "Search Engines" toggle / draft/publish state) — right-rail Page Settings panel is not DOM-automatable.
-- **Custom Code Page Settings** (`<head>` injection) — static-only input, does not accept CMS tokens.
-- **Site Settings → Redirects** — exact-match only, no wildcards; UI-only.
-- **Font uploads, image drops** — filesystem drag source.
-
-For any of these: stop and hand a clickable step-by-step to the human, with selectors for the rail/panel where possible.
-
-## Framer autolayout on Header-like nodes
-
-When a parent node has autolayout + a positioned Header child, any programmatic attempt to set `position`, `left`, or `right` on the Header through Framer's MCP / XML-update API triggers autolayout to force the Header to `left="-1293px"` or similar, visually losing it. The working pattern is "delete the Header, then copy a working one from a known-good page in the same project via `Cmd+C` / `Cmd+V`." This is Framer autolayout preempting your value, not a bug in your update call.
-
-## Edge cache after publish
-
-After a successful publish, `curl` against the live domain may serve stale HTML for 30–60 seconds. For verification, append a cache-buster query param — any unused param works: `?v=<timestamp>`. This doesn't touch Framer's generated routes; it just forces the CDN to fetch a fresh copy.
-
-## Sitemap / robots / redirects observations
-
-- **`/sitemap.xml`** — autogenerated. Static pages only. URLs with query parameters are **not** included. For dynamic routes (`/foo?slug=...`) you must submit them to Google Search Console manually.
-- **`robots.txt`** — proxied through Cloudflare on Framer-hosted domains; the default config blocks AI crawlers (Amazonbot, ClaudeBot, GPTBot, Bytespider) while allowing Google.
-- **Custom redirects** — exact match only. No regex, no wildcards. Configure in Site Settings → Redirects, one at a time.
-
-## Prerequisites for automation on macOS
-
-- Chrome: "View → Developer → **Allow JavaScript from Apple Events**" checked.
-- System Settings → Privacy → **Accessibility** → grant to whichever process drives keystrokes (node, osascript, browser-harness wrapper).
-- Framer tab must exist in the front Chrome window; the editor does not tolerate off-screen or backgrounded tabs well during paste flows (Monaco loses focus).
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/fred/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/fred/scraping.md
deleted file mode 100644
index 970996b58..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/fred/scraping.md
+++ /dev/null
@@ -1,493 +0,0 @@
-# FRED — Federal Reserve Economic Data
-
-`https://fred.stlouisfed.org` / `https://api.stlouisfed.org` — the canonical source for US macroeconomic time series (800,000+ series). The REST API at `api.stlouisfed.org` requires a free registered key. The web endpoints at `fred.stlouisfed.org` (CSV, JSON, HTML) are all blocked to headless HTTP — they consistently timeout with no response. For zero-key access use the BLS API (unemployment, CPI, payrolls) or World Bank API (GDP, growth rates, annual data).
-
-## Do this first
-
-**Decision tree: pick one approach.**
-
-```
-Need GDP, CPI, UNRATE, payrolls only? → use BLS + World Bank (no key, free forever)
-Need FEDFUNDS, DGS10, SP500, any FRED series? → get a free FRED API key (5 min)
-Need browser-visible chart data? → use CDP to intercept network requests
-```
-
-**The web CSV/JSON/TXT URLs all timeout — do NOT attempt them:**
-```python
-# ALL OF THESE TIMEOUT — confirmed dead from headless HTTP:
-# https://fred.stlouisfed.org/graph/fredgraph.csv?id=GDP      ← timeout
-# https://fred.stlouisfed.org/graph/fredgraph.json?id=GDP     ← timeout
-# https://fred.stlouisfed.org/data/GDP.txt                    ← timeout
-# https://fred.stlouisfed.org/series/GDP                      ← timeout
-```
-
-## Getting a free FRED API key
-
-1. Go to `https://fred.stlouisfed.org/docs/api/api_key.html`
-2. Click "Request or view your API Keys"
-3. Sign in / register (free St. Louis Fed account)
-4. Key appears immediately — it's a 32-character lowercase alphanumeric string
-
-The key is free, instant, and unlimited for reasonable use (120 req/min cap).
-
----
-
-## Option A: FRED REST API (requires free key, 800K+ series)
-
-The only way to get FRED data programmatically. Set `FRED_KEY` in your `.env` file.
-
-```python
-import json, os
-FRED_KEY = os.environ["FRED_KEY"]   # 32-char lowercase alphanumeric
-BASE = "https://api.stlouisfed.org/fred"
-```
-
-### Series metadata
-
-```python
-import json, os
-FRED_KEY = os.environ["FRED_KEY"]
-BASE = "https://api.stlouisfed.org/fred"
-
-meta = json.loads(http_get(f"{BASE}/series?series_id=GDP&api_key={FRED_KEY}&file_type=json"))
-s = meta['seriess'][0]
-print(s['title'])               # "Gross Domestic Product"
-print(s['observation_start'])   # "1947-01-01"
-print(s['observation_end'])     # "2025-10-01"
-print(s['frequency'])           # "Quarterly"
-print(s['frequency_short'])     # "Q"
-print(s['units'])               # "Billions of Dollars"
-print(s['units_short'])         # "Bil. of $"
-print(s['seasonal_adjustment']) # "Seasonally Adjusted Annual Rate"
-print(s['popularity'])          # 81  (0-100)
-print(s['last_updated'])        # "2025-12-19 08:00:06-06"
-```
-
-### Observations (the actual data)
-
-```python
-import json, os
-FRED_KEY = os.environ["FRED_KEY"]
-BASE = "https://api.stlouisfed.org/fred"
-
-# Latest 10 values, most recent first
-obs = json.loads(http_get(
-    f"{BASE}/series/observations"
-    f"?series_id=GDP"
-    f"&api_key={FRED_KEY}"
-    f"&file_type=json"
-    f"&limit=10"
-    f"&sort_order=desc"      # "desc" = newest first, "asc" = oldest first (default)
-))
-print(obs['count'])              # 314  (total observations)
-print(obs['observation_start'])  # "1947-01-01"  (what's in the full series)
-
-for o in obs['observations']:
-    date  = o['date']    # "2025-10-01"
-    value = o['value']   # "29726.4"  — always a STRING, may be "." for missing
-    if value != '.':
-        print(f"{date}: ${float(value):,.1f}B")
-# 2025-10-01: $29,726.4B
-# 2025-07-01: $29,339.1B
-# 2025-04-01: $29,119.3B
-```
-
-### Date-range filtering
-
-```python
-import json, os
-FRED_KEY = os.environ["FRED_KEY"]
-BASE = "https://api.stlouisfed.org/fred"
-
-obs = json.loads(http_get(
-    f"{BASE}/series/observations"
-    f"?series_id=UNRATE"
-    f"&api_key={FRED_KEY}"
-    f"&file_type=json"
-    f"&observation_start=2020-01-01"
-    f"&observation_end=2024-12-31"
-    f"&sort_order=desc"
-))
-for o in obs['observations'][:5]:
-    print(f"{o['date']}: {o['value']}%")
-# 2024-12-01: 4.1%
-# 2024-11-01: 4.2%
-# 2024-10-01: 4.1%
-```
-
-### Key series IDs
-
-| FRED ID | Description | Frequency | Unit |
-|---------|-------------|-----------|------|
-| `GDP` | Gross Domestic Product | Quarterly | Billions of $, SAAR |
-| `GDPC1` | Real GDP (chained 2017 $) | Quarterly | Billions of chained 2017 $ |
-| `UNRATE` | Unemployment Rate | Monthly | Percent, SA |
-| `CPIAUCSL` | CPI: All Urban Consumers, SA | Monthly | Index 1982-84=100 |
-| `CPIAUCNS` | CPI: All Urban Consumers, not SA | Monthly | Index 1982-84=100 |
-| `FEDFUNDS` | Federal Funds Effective Rate | Monthly | Percent |
-| `DFF` | Federal Funds Rate (daily) | Daily | Percent |
-| `DGS10` | 10-Year Treasury Constant Maturity | Daily | Percent |
-| `DGS2` | 2-Year Treasury | Daily | Percent |
-| `SP500` | S&P 500 | Daily | Index |
-| `NASDAQCOM` | NASDAQ Composite | Daily | Index |
-| `PAYEMS` | Total Nonfarm Payrolls | Monthly | Thousands of persons, SA |
-| `PCEPI` | PCE Price Index | Monthly | Index 2017=100, SA |
-| `PCEPILFE` | Core PCE Price Index | Monthly | Index 2017=100, SA |
-| `DCOILBRENTEU` | Brent Crude Oil | Daily | $ per Barrel |
-| `DEXUSEU` | USD/EUR Exchange Rate | Daily | USD per EUR |
-| `M2SL` | M2 Money Stock | Monthly | Billions of $, SA |
-| `MORTGAGE30US` | 30-Year Fixed Mortgage Rate | Weekly | Percent |
-
-### Series search
-
-```python
-import json, os
-FRED_KEY = os.environ["FRED_KEY"]
-BASE = "https://api.stlouisfed.org/fred"
-
-results = json.loads(http_get(
-    f"{BASE}/series/search"
-    f"?search_text=unemployment+rate"
-    f"&api_key={FRED_KEY}"
-    f"&file_type=json"
-    f"&limit=5"
-    f"&order_by=popularity"    # "popularity" | "search_rank" | "series_id" | "title" | "units" | "frequency" | "seasonal_adjustment" | "realtime_start" | "realtime_end" | "last_updated" | "observation_start" | "observation_end"
-    f"&sort_order=desc"        # most popular first
-))
-for s in results['seriess']:
-    print(f"{s['id']}: {s['title']} ({s['frequency_short']}, {s['units_short']})")
-# UNRATE: Unemployment Rate (M, %)
-# UNEMPLOY: Unemployment Level (M, Thous. of Persons)
-```
-
-### Multiple series — parallel fetch
-
-```python
-import json, os
-from concurrent.futures import ThreadPoolExecutor
-FRED_KEY = os.environ["FRED_KEY"]
-BASE = "https://api.stlouisfed.org/fred"
-
-def fetch_latest(series_id):
-    obs = json.loads(http_get(
-        f"{BASE}/series/observations?series_id={series_id}"
-        f"&api_key={FRED_KEY}&file_type=json&limit=1&sort_order=desc"
-    ))
-    o = obs['observations'][0]
-    return series_id, o['date'], o['value']
-
-series_ids = ["GDP", "UNRATE", "CPIAUCSL", "FEDFUNDS", "DGS10", "SP500"]
-with ThreadPoolExecutor(max_workers=6) as ex:
-    results = list(ex.map(fetch_latest, series_ids))
-
-for sid, date, val in results:
-    print(f"{sid:15} {date}: {val}")
-# GDP             2025-10-01: 29726.4
-# UNRATE          2026-03-01: 4.3
-# CPIAUCSL        2026-02-01: 321.457
-# FEDFUNDS        2026-03-01: 4.33
-# DGS10           2026-04-17: 4.34
-# SP500           2026-04-17: 5282.70
-# Confirmed: 6 parallel requests complete in ~0.4s
-```
-
-### Parse observations into a list of (date, float) tuples
-
-```python
-import json, os
-FRED_KEY = os.environ["FRED_KEY"]
-BASE = "https://api.stlouisfed.org/fred"
-
-obs = json.loads(http_get(
-    f"{BASE}/series/observations?series_id=DGS10&api_key={FRED_KEY}&file_type=json"
-    f"&observation_start=2024-01-01&sort_order=asc"
-))
-
-data = [
-    (o['date'], float(o['value']))
-    for o in obs['observations']
-    if o['value'] != '.'   # '.' = missing value, skip it
-]
-print(f"{len(data)} observations")
-print(f"First: {data[0]}")   # ('2024-01-02', 3.91)
-print(f"Last:  {data[-1]}")  # ('2026-04-17', 4.34)
-```
-
-### Handle errors
-
-```python
-import urllib.error, json
-
-try:
-    r = http_get(f"https://api.stlouisfed.org/fred/series?series_id=BADID&api_key={FRED_KEY}&file_type=json")
-    print(json.loads(r))
-except urllib.error.HTTPError as e:
-    err = json.loads(e.read().decode())
-    # err['error_code']     → 400
-    # err['error_message']  → "Bad Request.  The series does not exist."
-    print(f"FRED error {err['error_code']}: {err['error_message']}")
-```
-
----
-
-## Option B: BLS API (no key required, confirmed live)
-
-Bureau of Labor Statistics. Covers unemployment, CPI, payrolls — the most-queried FRED series. **Without a key: 10 requests/day limit.** Free key registration at `https://www.bls.gov/developers/` gives 500 req/day and 10 years of data per call (vs 3 years without key).
-
-```python
-import json
-# Single series GET — no auth needed
-r = http_get("https://api.bls.gov/publicAPI/v2/timeseries/data/LNS14000000?startyear=2024&endyear=2024")
-data = json.loads(r)
-# data['status'] == 'REQUEST_SUCCEEDED'
-series = data['Results']['series'][0]
-for point in series['data'][:3]:
-    print(f"{point['year']}-{point['period']} ({point['periodName']}): {point['value']}")
-# 2024-M12 (December): 4.1
-# 2024-M11 (November): 4.2
-# 2024-M10 (October): 4.1
-```
-
-### Multi-series POST (single call, multiple series)
-
-```python
-import json, urllib.request
-
-payload = json.dumps({
-    "seriesid": ["LNS14000000", "CUSR0000SA0", "CES0000000001"],
-    "startyear": "2023",
-    "endyear": "2024"
-    # "registrationkey": "YOUR_BLS_KEY"  # optional: lifts to 500/day, 10yr range
-}).encode()
-
-req = urllib.request.Request(
-    "https://api.bls.gov/publicAPI/v2/timeseries/data/",
-    data=payload,
-    headers={"Content-Type": "application/json"}
-)
-with urllib.request.urlopen(req, timeout=20) as resp:
-    data = json.loads(resp.read().decode())
-
-for s in data['Results']['series']:
-    pts = s['data']
-    print(f"{s['seriesID']}: {len(pts)} points, latest={pts[0]['value']}")
-# LNS14000000: 24 points, latest=4.1   (unemployment %)
-# CUSR0000SA0: 24 points, latest=317.604  (CPI index)
-# CES0000000001: 24 points, latest=158316  (nonfarm payrolls, thousands)
-```
-
-### Key BLS series (FRED equivalents)
-
-| BLS Series ID | FRED Equivalent | Description |
-|---------------|-----------------|-------------|
-| `LNS14000000` | `UNRATE` | Unemployment rate, SA (%) |
-| `CUSR0000SA0` | `CPIAUCSL` | CPI-U All Urban, SA |
-| `CUUR0000SA0` | `CPIAUCNS` | CPI-U All Urban, not SA |
-| `CUSR0000SA0L1E` | `CPILFESL` | CPI less food and energy, SA |
-| `CES0000000001` | `PAYEMS` | Total nonfarm payrolls (thousands) |
-| `LNS11000000` | `CLF16OV` | Civilian labor force (thousands) |
-| `LNS12000000` | `CE16OV` | Civilian employment (thousands) |
-
-### BLS rate limits
-
-| | Without key | With free key |
-|--|--|--|
-| Requests/day | **10** (confirmed: call 11 returns `REQUEST_NOT_PROCESSED`) | 500 |
-| Series per request | 25 | 50 |
-| Years per request | 3 | 10 |
-| Daily or seasonal adjustment | No | Yes |
-
----
-
-## Option C: World Bank API (no key, unlimited, annual data)
-
-Free, no registration, no rate limit observed (10 rapid calls completed in 2.0s). Annual data only — no monthly or quarterly frequency.
-
-```python
-import json
-
-# Single country, single indicator
-r = http_get("https://api.worldbank.org/v2/country/US/indicator/NY.GDP.MKTP.CD?format=json&per_page=5&mrv=5")
-data = json.loads(r)
-page_info = data[0]   # {'page': 1, 'pages': 1, 'per_page': 5, 'total': 5, 'lastupdated': '2026-04-08'}
-items     = data[1]   # list of observations
-
-for item in items:
-    if item['value']:
-        print(f"{item['date']}: ${item['value']/1e12:.2f}T")
-# 2024: $28.75T
-# 2023: $27.29T
-# 2022: $25.60T
-```
-
-### Date range filter and multi-country
-
-```python
-import json
-
-# Historical range: date=YYYY:YYYY
-r = http_get("https://api.worldbank.org/v2/country/US/indicator/FP.CPI.TOTL.ZG?format=json&date=2015:2024&per_page=15")
-data = json.loads(r)
-items = [i for i in data[1] if i['value'] is not None]
-for item in items:
-    print(f"{item['date']}: {item['value']:.2f}%")
-# 2024: 2.95%
-# 2023: 4.12%
-# 2022: 8.00%
-# ...
-
-# Multi-country: semicolon-separated ISO codes
-r = http_get("https://api.worldbank.org/v2/country/US;CN;DE;JP;GB/indicator/NY.GDP.MKTP.CD?format=json&date=2023&per_page=10")
-data = json.loads(r)
-items = sorted([i for i in data[1] if i['value']], key=lambda x: x['value'], reverse=True)
-for item in items:
-    print(f"{item['country']['value']}: ${item['value']/1e12:.2f}T")
-# United States: $27.29T
-# China: $18.27T
-# Germany: $4.56T
-```
-
-### Key World Bank indicators (FRED equivalents)
-
-| WB Indicator Code | FRED Equivalent | Description |
-|-------------------|-----------------|-------------|
-| `NY.GDP.MKTP.CD` | `GDP` | GDP, current USD |
-| `NY.GDP.MKTP.KD.ZG` | `A191RL1Q225SBEA` | GDP growth rate (%) |
-| `NY.GDP.PCAP.CD` | `A939RX0Q048SBEA` | GDP per capita (USD) |
-| `FP.CPI.TOTL.ZG` | `FPCPITOTLZGUSA` | CPI inflation, annual % |
-| `FP.CPI.TOTL` | `CPIAUCSL` (annual) | CPI level, 2010=100 |
-| `SL.UEM.TOTL.ZS` | `UNRATE` (annual) | Unemployment rate, ILO model |
-| `CM.MKT.LCAP.GD.ZS` | — | Stock market cap / GDP ratio |
-
----
-
-## Option D: Alpha Vantage (free registered key, select indicators)
-
-Some economic indicators work with the `demo` key (no registration); most require a free registered key (25 requests/day, instant signup at `https://www.alphavantage.co/support/#api-key`).
-
-```python
-import json
-AV_KEY = "demo"  # or your registered key
-
-# Unemployment rate (works with demo key — confirmed)
-r = http_get(f"https://www.alphavantage.co/query?function=UNEMPLOYMENT&apikey={AV_KEY}")
-data = json.loads(r)
-# data['name']  = 'Unemployment Rate'
-# data['interval'] = 'monthly'
-# data['unit']     = 'percent'
-# data['data']     → list of {date, value}, newest first
-
-print(data['data'][0])   # {'date': '2026-03-01', 'value': '4.3'}
-print(f"Total: {len(data['data'])} months since {data['data'][-1]['date']}")
-# Total: 939 months since 1948-01-01
-```
-
-### Which indicators work with demo vs registered key
-
-| Function | demo key | Registered key |
-|----------|----------|----------------|
-| `UNEMPLOYMENT` | YES | YES |
-| `INFLATION` | YES (annual) | YES |
-| `RETAIL_SALES` | YES | YES |
-| `DURABLES` | YES | YES |
-| `NONFARM_PAYROLL` | YES | YES |
-| `REAL_GDP_PER_CAPITA` | YES | YES |
-| `REAL_GDP` | NO (rate-limited) | YES |
-| `CPI` | NO (rate-limited) | YES |
-| `FEDERAL_FUNDS_RATE` | NO (rate-limited) | YES |
-| `TREASURY_YIELD` | NO (rate-limited) | YES |
-| `CONSUMER_SENTIMENT` | NO (rate-limited) | YES |
-
-```python
-import json
-AV_KEY = "YOUR_FREE_KEY"  # from alphavantage.co/support/#api-key
-
-# Federal Funds Rate — monthly (requires registered key)
-r = http_get(f"https://www.alphavantage.co/query?function=FEDERAL_FUNDS_RATE&interval=monthly&apikey={AV_KEY}")
-data = json.loads(r)
-for item in data['data'][:3]:
-    print(f"{item['date']}: {item['value']}%")
-# 2026-03-01: 4.33%
-# 2026-02-01: 4.33%
-# 2026-01-01: 4.33%
-
-# 10-Year Treasury Yield
-r = http_get(f"https://www.alphavantage.co/query?function=TREASURY_YIELD&maturity=10year&interval=monthly&apikey={AV_KEY}")
-data = json.loads(r)
-print(data['data'][0])   # {'date': '2026-04-17', 'value': '4.34'}
-```
-
----
-
-## Option E: Browser + CDP (for interactive FRED charts)
-
-When you need data from `fred.stlouisfed.org` that has no API equivalent (custom chart combos, release dates visible on page) — or when you have no API key — use the browser.
-
-```python
-# Navigate to a series page
-goto_url("https://fred.stlouisfed.org/series/GDP")
-wait_for_load()
-
-# Option 1: Intercept the fredgraph XHR that the chart fires
-# The page's chart JS calls fredgraph.csv internally — intercept it
-events = drain_events()
-# Look for network events with fredgraph.csv in URL
-
-# Option 2: Extract the latest value from the page text
-latest_val = js("""
-    // The last observation appears in the meta section
-    const el = document.querySelector('.series-meta-observation-end');
-    el ? el.textContent.trim() : null
-""")
-
-# Option 3: Read the data table if present
-table_data = js("""
-    const rows = Array.from(document.querySelectorAll('table.series-observations tr'));
-    rows.map(r => {
-        const cells = r.querySelectorAll('td');
-        return cells.length >= 2 ? [cells[0].textContent.trim(), cells[1].textContent.trim()] : null;
-    }).filter(Boolean);
-""")
-```
-
----
-
-## Rate limits
-
-| API | Limit | Notes |
-|-----|-------|-------|
-| FRED REST API | 120 req/min | With registered key (free) |
-| FRED REST API | blocked | Without key — HTTP 400 |
-| BLS (no key) | 10 req/day | Confirmed: call 11 → `REQUEST_NOT_PROCESSED` |
-| BLS (with key) | 500 req/day, 50 series/req | Free registration at bls.gov/developers |
-| World Bank | No limit observed | 10 rapid calls: 2.0s, no 429 |
-| Alpha Vantage (demo) | 2 req/sec | Demo key rate-limited for most functions |
-| Alpha Vantage (free key) | 25 req/day | Free at alphavantage.co/support/#api-key |
-
----
-
-## Gotchas
-
-- **fred.stlouisfed.org web endpoints ALL timeout** — The CSV download (`fredgraph.csv`), JSON graph (`fredgraph.json`), text format (`/data/*.txt`), and HTML series pages all hang indefinitely from headless HTTP. This is not a UA or header issue — the server simply does not respond to non-browser connections. Confirmed with multiple UA strings, TCP connect succeeds but no HTTP response is sent.
-
-- **FRED API key is mandatory and must be exactly 32 lowercase alphanumeric chars** — "test", "demo", "guest", and keys shorter/longer than 32 chars all return HTTP 400: `"not a 32 character alpha-numeric lower-case string"`. An unregistered 32-char key returns: `"not registered"`.
-
-- **Observation values are always strings, not numbers** — The `value` field in FRED observations is always a JSON string: `"4.1"`, not `4.1`. Also `"."` (dot) means missing/not-yet-released. Always check `if o['value'] != '.'` before `float(o['value'])`.
-
-- **BLS 10 req/day without key burns fast** — The limit is per-IP per-day. 10 calls is exhausted in one moderate script run. Either register a free BLS key immediately or use World Bank for the same data annually.
-
-- **BLS data range: 3 years without key, 10 years with key** — Requesting `startyear=2000&endyear=2024` without a key silently truncates to the most recent 3 years. With a key it returns up to 10 years and includes a `message` field if the range was truncated: `['Year range has been reduced to the system-allowed limit of 10 years.']`.
-
-- **World Bank is annual only** — No monthly or quarterly data. For monthly UNRATE or CPI, use BLS. For quarterly GDP, use FRED API or Alpha Vantage `REAL_GDP`.
-
-- **World Bank response is a 2-element array** — `data[0]` is pagination metadata, `data[1]` is the observations list. Missing years have `value: null` (not `"."`). Filter with `if item['value'] is not None`.
-
-- **Alpha Vantage demo key: 2 req/sec, covers only 6 economic functions** — The other 6 economic functions (`REAL_GDP`, `CPI`, `TREASURY_YIELD`, etc.) return `{"Information": "The demo API key is for demo purposes only..."}`. No error code — just check for the `Information` key in the response.
-
-- **FRED `sort_order=desc` returns newest first** — Default is `asc` (oldest first, starting from observation_start). For "get the latest value" use `limit=1&sort_order=desc`.
-
-- **FRED series IDs are case-sensitive and exact** — `gdp` returns an error; must be `GDP`. Check `fred.stlouisfed.org/series/{ID}` to verify a series exists before scripting.
-
-- **Some FRED series have gaps** — Daily series like `DGS10` and `SP500` skip weekends and holidays. Those dates simply don't appear in the observations array (not represented as `"."`). Weekly and monthly series use the first day of the period as the date (e.g., `2024-01-01` = January 2024).
-
-- **FRED `realtime_start`/`realtime_end` in observations** — Every observation has these fields reflecting vintage data. For current data, ignore them. They matter only for "real-time" research (what was the published value on a specific past date).
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/g2/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/g2/scraping.md
deleted file mode 100644
index 30ad0c337..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/g2/scraping.md
+++ /dev/null
@@ -1,580 +0,0 @@
-# G2 — B2B Software Reviews
-
-Field-tested against g2.com on 2026-04-18.
-
-## Anti-bot verdict: browser required — DataDome blocks every http_get request
-
-`http_get` returns HTTP 403 on every g2.com URL without exception.
-
-Tested URLs (all 403):
-- `https://www.g2.com/products/slack/reviews`
-- `https://www.g2.com/categories/team-collaboration`
-- `https://www.g2.com/products/slack`
-- `https://www.g2.com/products/slack/reviews.json`
-- `https://www.g2.com/blog/` (and most `www.g2.com/*`)
-
-UAs tested (all blocked): `Mozilla/5.0`, full Chrome 124 macOS, Googlebot.
-
-**Stack:**
-- **Primary:** DataDome 5.6.1 (`X-DataDome: protected`, `X-DD-B: 1`). Response mode `rt:'c'` = CAPTCHA challenge. Mode `rt:'i'` = invalid/replayed cookie. The `datadome=...` cookie returned in the 403 response is TLS-fingerprint-bound — replaying it yields `rt:'i'` regardless of headers.
-- **Secondary:** Cloudflare CDN (`Server: cloudflare`, `CF-RAY` header present).
-
-DataDome's challenge is **silent** — no CAPTCHA widget appears in a real browser. JS fingerprinting runs post-DOM-ready and resolves automatically. A real Chrome session via CDP passes cleanly.
-
-Pages **not** behind DataDome (safe to `http_get`): `help.g2.com`, `research.g2.com`, `learn.g2.com`, `data.g2.com/api/docs`.
-
-**Use `goto_url()` + `wait()` exclusively. Never use `http_get` for www.g2.com.**
-
----
-
-## Fastest approach: official vendor API (if you have a key)
-
-G2 provides a public REST API at `https://data.g2.com/api/v1` documented at `https://data.g2.com/api/docs`. This API requires a `Token token=<key>` — obtainable by signing up as a G2 vendor/partner. If you have a key, it is faster and more reliable than browser scraping.
-
-```python
-import json, urllib.request
-
-API_KEY = "your_token_here"
-
-def g2_api_get(path, params=""):
-    url = f"https://data.g2.com/api/v1/{path}?{params}"
-    req = urllib.request.Request(url, headers={
-        "Authorization": f"Token token={API_KEY}",
-        "Content-Type": "application/vnd.api+json",
-        "Accept": "application/json",
-    })
-    with urllib.request.urlopen(req, timeout=20) as r:
-        return json.loads(r.read())
-
-# 1. Lookup product UUID by slug
-products = g2_api_get("products", "filter[slug]=slack")
-product = products["data"][0]
-product_id = product["id"]  # UUID, e.g. "ac7841ad-cca8-4125-ac6f-6ef6b5848781"
-attrs = product["attributes"]
-print(f"{attrs['name']}: {attrs['star_rating']} stars, {attrs['review_count']} reviews")
-# star_rating: float 0-5 (overall)
-# avg_rating: string e.g. "4.5" (same thing, different format)
-# review_count: total published reviews
-# public_detail_url: "https://www.g2.com/products/slack/reviews"
-
-# 2. Fetch reviews (survey-responses) for that product
-# page[size] max 100, page[number] starts at 1
-page = 1
-all_reviews = []
-while True:
-    batch = g2_api_get(
-        f"products/{product_id}/survey-responses",
-        f"page[number]={page}&page[size]=100"
-    )
-    reviews = batch["data"]
-    if not reviews:
-        break
-    for r in reviews:
-        a = r["attributes"]
-        all_reviews.append({
-            "id":           r["id"],
-            "title":        a["title"],
-            "star_rating":  a["star_rating"],    # float 0-5
-            "pros":         a["comment_answers"].get("love", ""),  # varies by product
-            "cons":         a["comment_answers"].get("hate", ""),
-            "user_name":    a["user_name"],
-            "country":      a["country_name"],
-            "submitted_at": a["submitted_at"],
-            "source":       a["review_source"],
-        })
-    meta = batch.get("meta", {})
-    if page >= meta.get("page_count", 1):
-        break
-    page += 1
-
-print(f"Fetched {len(all_reviews)} reviews")
-```
-
-### API filter parameters
-
-**Products** (`GET /api/v1/products`):
-
-| Parameter | Description |
-|---|---|
-| `filter[slug]` | Exact URL slug (e.g. `slack`) |
-| `filter[name]` | Product name (fuzzy) |
-| `filter[domain]` | Domain of product website |
-| `page[size]` | Default 10, max 100 |
-| `page[number]` | Page number |
-
-**Survey-responses** (`GET /api/v1/survey-responses` or `/api/v1/products/{id}/survey-responses`):
-
-| Parameter | Description |
-|---|---|
-| `filter[submitted_at_gt]` | Min review submission time (RFC 3339) |
-| `filter[submitted_at_lt]` | Max review submission time |
-| `filter[moderated_at_gt]` | Min publication time |
-| `filter[star_rating]` | Filter by star rating |
-| `page[size]` | Default 10, max 100 |
-
-**Rate limit:** 100 requests/second. Exceeded = blocked for 60 seconds.
-
-### Survey-response field reference
-
-```
-star_rating       float 0-5
-title             string (review headline)
-comment_answers   dict — keys vary by product's question set
-                  common keys: "love" (pros), "hate" (cons), "benefit" (who benefits)
-secondary_answers dict — additional structured answers
-is_public         bool — reviewer consented to attribution
-user_name         string
-country_name      string
-regions           list[string]
-submitted_at      ISO 8601 datetime
-moderated_at      ISO 8601 datetime (when published)
-review_source     "Organic review..." or incentivized text
-votes_up          int — helpful votes
-votes_down        int
-product_id        UUID
-slug              URL slug for the individual review
-```
-
----
-
-## Browser approach (no API key required)
-
-### Setup: open in new tab, wait for DataDome to clear
-
-```python
-new_tab("https://www.g2.com/products/slack/reviews")
-wait_for_load()
-wait(5)  # DataDome JS fingerprinting runs 2-4s after readyState=complete
-```
-
-`wait(5)` is mandatory. Extracting before it completes returns empty or blocked content.
-
-Verify you are on the real page, not the DataDome challenge page:
-
-```python
-title = js("document.title")
-url_now = page_info()["url"]
-if "g2.com" not in url_now or "captcha-delivery.com" in url_now:
-    wait(5)
-    title = js("document.title")
-    url_now = page_info()["url"]
-    assert "captcha-delivery.com" not in url_now, f"Still on DataDome challenge: {url_now}"
-```
-
----
-
-## URL patterns
-
-| Goal | URL |
-|---|---|
-| Product reviews | `/products/{slug}/reviews` |
-| Product reviews page 2+ | `/products/{slug}/reviews?page=2` |
-| Single review | `/products/{slug}/reviews/{review-slug}` |
-| Product overview | `/products/{slug}` |
-| Category listing | `/categories/{slug}` |
-| Category grid | `/categories/{slug}/grids` (disallowed in robots.txt — may not render) |
-| Compare | `/compare/{slug1}-vs-{slug2}` |
-
-Product slug is the lowercase hyphenated name from the URL: `slack`, `microsoft-teams`, `notion`, `salesforce-sales-cloud`.
-
----
-
-## Workflow 1: Product rating and review count
-
-G2 is a **Rails app** (not Next.js) — there is no `__NEXT_DATA__`. Use schema.org microdata attributes.
-
-```python
-import json
-
-goto_url("https://www.g2.com/products/slack/reviews")
-wait_for_load()
-wait(5)
-
-summary = js("""
-(function() {
-  // Schema.org AggregateRating microdata — most reliable, SSR-rendered
-  var aggEl = document.querySelector('[itemtype*="AggregateRating"]');
-  var ratingVal = aggEl ? aggEl.querySelector('[itemprop="ratingValue"]') : null;
-  var reviewCt  = aggEl ? aggEl.querySelector('[itemprop="reviewCount"]') : null;
-
-  // Fallback: plain text in the header band
-  var ratingFb  = document.querySelector('.x-current-rating, [data-next-head] ~ * .fw-bold, .star-rating__stars');
-  var countFb   = document.querySelector('.link-color-inherit, .reviews-count');
-
-  // Product name
-  var nameEl = document.querySelector('[itemprop="name"], h1.l1');
-
-  return JSON.stringify({
-    name:         nameEl   ? nameEl.innerText.trim()          : '',
-    rating:       ratingVal ? ratingVal.getAttribute('content') || ratingVal.innerText.trim() : '',
-    review_count: reviewCt  ? reviewCt.getAttribute('content')  || reviewCt.innerText.trim()  : '',
-    rating_fb:    ratingFb  ? ratingFb.innerText.trim()          : '',
-    count_fb:     countFb   ? countFb.innerText.trim()           : '',
-  });
-})()
-""")
-
-data = json.loads(summary)
-print("Product:", data["name"])
-print("Rating:", data["rating"] or data["rating_fb"])
-print("Reviews:", data["review_count"] or data["count_fb"])
-```
-
----
-
-## Workflow 2: Star distribution (rating breakdown)
-
-The rating distribution histogram (5-star, 4-star, …) is rendered server-side with a progress bar or percentage spans.
-
-```python
-import json
-
-goto_url("https://www.g2.com/products/slack/reviews")
-wait_for_load()
-wait(5)
-
-dist = js("""
-(function() {
-  // G2 renders star distribution in a table or bar list
-  // Selector targets the rating breakdown rows
-  var rows = document.querySelectorAll(
-    '[data-star-rating], .rating-breakdown__row, .star-distribution tr, [class*="StarBreakdown"]'
-  );
-  var result = {};
-  for (var i = 0; i < rows.length; i++) {
-    var r = rows[i];
-    // Star level: look for a number or aria-label containing the star count
-    var starEl = r.querySelector('[data-star], .star-count, [class*="starCount"], td:first-child');
-    var pctEl  = r.querySelector('[data-percentage], .pct, [class*="percentage"], td:last-child');
-    var countEl = r.querySelector('[data-count], .count-text');
-    var star = starEl ? starEl.innerText.trim() : '';
-    if (star && /^[1-5]/.test(star)) {
-      result[star] = {
-        pct:   pctEl   ? pctEl.innerText.trim()   : '',
-        count: countEl ? countEl.innerText.trim() : '',
-      };
-    }
-  }
-  // If nothing found, try aria-label approach for SVG-based bars
-  if (!Object.keys(result).length) {
-    var bars = document.querySelectorAll('[aria-label*="star"], [aria-label*="-star"]');
-    for (var j = 0; j < bars.length; j++) {
-      var lbl = bars[j].getAttribute('aria-label') || '';
-      var m = lbl.match(/(\d)-star.*?(\d+\.?\d*)%/i);
-      if (m) result[m[1]] = { pct: m[2] + '%', count: '' };
-    }
-  }
-  return JSON.stringify(result);
-})()
-""")
-
-distribution = json.loads(dist)
-for star in ["5", "4", "3", "2", "1"]:
-    d = distribution.get(star, {})
-    print(f"{star}★: {d.get('pct','?')} ({d.get('count','?')})")
-```
-
-If the distribution returns empty, take a screenshot and inspect the actual element structure:
-
-```python
-capture_screenshot("/tmp/g2_reviews.png")
-# Inspect the image, then adjust selectors above
-```
-
----
-
-## Workflow 3: Extract individual review cards
-
-G2 renders reviews server-side as schema.org `Review` microdata items. Extract before scrolling — a sign-in modal may appear after scrolling past 5 visible reviews.
-
-```python
-import json
-
-goto_url("https://www.g2.com/products/slack/reviews")
-wait_for_load()
-wait(5)
-
-# Dismiss cookie consent banner (GDPR regions)
-dismissed = js("""
-(function() {
-  var btns = [
-    '#onetrust-accept-btn-handler',
-    'button[id*="accept"]',
-    'button[class*="consent"]',
-    '.js-cookie-consent-button',
-  ];
-  for (var i = 0; i < btns.length; i++) {
-    var b = document.querySelector(btns[i]);
-    if (b && b.offsetParent !== null) { b.click(); return btns[i]; }
-  }
-  return null;
-})()
-""")
-if dismissed:
-    wait(1)
-
-reviews = js("""
-(function() {
-  // Primary: schema.org Review microdata (SSR-rendered, stable)
-  var cards = document.querySelectorAll(
-    '[itemtype*="schema.org/Review"], [data-survey-id], .paper--box[data-id]'
-  );
-  if (!cards.length) {
-    // Fallback: G2's newer CSS class patterns
-    cards = document.querySelectorAll(
-      '[class*="ReviewCard"], [class*="review-card"], article[class*="review"]'
-    );
-  }
-  var out = [];
-  for (var i = 0; i < cards.length; i++) {
-    var c = cards[i];
-
-    // Overall star rating
-    var ratingEl  = c.querySelector('[itemprop="ratingValue"], [class*="starRating"], .x-star-rating');
-    var stars     = ratingEl ? (ratingEl.getAttribute('content') || ratingEl.innerText.trim()) : '';
-
-    // Review title
-    var titleEl   = c.querySelector('[itemprop="name"], h3[class*="title"], .review-title');
-    var title     = titleEl ? titleEl.innerText.trim() : '';
-
-    // Review body (pros/cons are usually separate elements within reviewBody)
-    var bodyEl    = c.querySelector('[itemprop="reviewBody"]');
-    var body      = bodyEl ? bodyEl.innerText.trim() : '';
-
-    // Explicit pros / cons when rendered as separate sections
-    var prosEl    = c.querySelector('[class*="pros"], [data-pros]');
-    var consEl    = c.querySelector('[class*="cons"], [data-cons]');
-    var pros      = prosEl ? prosEl.innerText.trim() : '';
-    var cons      = consEl ? consEl.innerText.trim() : '';
-
-    // Reviewer job title / company
-    var jobEl     = c.querySelector('[class*="reviewer-title"], [class*="authorTitle"], [itemprop="jobTitle"]');
-    var jobTitle  = jobEl ? jobEl.innerText.trim() : '';
-
-    // Date
-    var dateEl    = c.querySelector('time[itemprop="datePublished"], [itemprop="datePublished"]');
-    var date      = dateEl ? (dateEl.getAttribute('datetime') || dateEl.innerText.trim()) : '';
-
-    // Survey ID (internal review ID)
-    var surveyId  = c.getAttribute('data-survey-id') || c.getAttribute('data-id') || '';
-
-    if (title || pros || body) {
-      out.push({ surveyId, stars, title, pros, cons, body, jobTitle, date });
-    }
-  }
-  return JSON.stringify(out);
-})()
-""")
-
-results = json.loads(reviews)
-for r in results:
-    print(f"{r['stars']}★ | {r['title']} | {r['jobTitle']} | {r['date']}")
-    if r['pros']:  print(f"  + {r['pros'][:120]}")
-    if r['cons']:  print(f"  - {r['cons'][:120]}")
-    if r['body'] and not r['pros']: print(f"  {r['body'][:200]}")
-```
-
-**If `results` is empty:** G2 may have re-skinned. Take a screenshot and inspect the DOM:
-
-```python
-capture_screenshot("/tmp/g2_page.png")
-# Check element structure with:
-structure = js("""
-(function() {
-  // Dump first article/div with 'review' in its classes
-  var el = document.querySelector(
-    'article, [class*="review"], [class*="Review"], [data-survey-id]'
-  );
-  return el ? el.outerHTML.slice(0, 2000) : 'NOT FOUND';
-})()
-""")
-print(structure)
-```
-
----
-
-## Workflow 4: Review pagination
-
-G2 paginates reviews via `?page=N` query parameter (Rails standard).
-
-```python
-import json
-
-slug = "slack"
-all_reviews = []
-
-for page_num in range(1, 6):  # up to 5 pages (~10 reviews each)
-    url = f"https://www.g2.com/products/{slug}/reviews?page={page_num}"
-    if page_num == 1:
-        goto_url(url)
-    else:
-        goto_url(url)
-    wait_for_load()
-    wait(4 if page_num == 1 else 2)  # DataDome only challenges on first page in session
-
-    batch_json = js("""
-    (function() {
-      var cards = document.querySelectorAll(
-        '[itemtype*="schema.org/Review"], [data-survey-id], [class*="ReviewCard"]'
-      );
-      var out = [];
-      for (var i = 0; i < cards.length; i++) {
-        var c = cards[i];
-        var ratingEl = c.querySelector('[itemprop="ratingValue"]');
-        var titleEl  = c.querySelector('[itemprop="name"]');
-        var bodyEl   = c.querySelector('[itemprop="reviewBody"]');
-        var dateEl   = c.querySelector('time[itemprop="datePublished"]');
-        var jobEl    = c.querySelector('[itemprop="jobTitle"]');
-        out.push({
-          stars:    ratingEl ? (ratingEl.getAttribute('content') || ratingEl.innerText.trim()) : '',
-          title:    titleEl  ? titleEl.innerText.trim()   : '',
-          body:     bodyEl   ? bodyEl.innerText.trim()    : '',
-          date:     dateEl   ? (dateEl.getAttribute('datetime') || dateEl.innerText.trim()) : '',
-          jobTitle: jobEl    ? jobEl.innerText.trim()     : '',
-        });
-      }
-      return JSON.stringify(out.filter(r => r.title || r.body));
-    })()
-    """)
-
-    batch = json.loads(batch_json)
-    if not batch:
-        break  # no more reviews
-    all_reviews.extend(batch)
-    print(f"Page {page_num}: {len(batch)} reviews")
-
-print(f"Total: {len(all_reviews)} reviews")
-```
-
----
-
-## Workflow 5: Category product listing
-
-```python
-import json
-
-goto_url("https://www.g2.com/categories/team-collaboration")
-wait_for_load()
-wait(5)
-
-products = js("""
-(function() {
-  // Product cards in category listing
-  var cards = document.querySelectorAll(
-    '[itemtype*="SoftwareApplication"], [data-product-id], [class*="ProductCard"], [class*="product-listing"]'
-  );
-  var out = [];
-  for (var i = 0; i < cards.length; i++) {
-    var c = cards[i];
-    var nameEl   = c.querySelector('[itemprop="name"], h3, h2, [class*="productName"]');
-    var ratingEl = c.querySelector('[itemprop="ratingValue"], [class*="rating"]');
-    var countEl  = c.querySelector('[itemprop="reviewCount"], [class*="reviewCount"]');
-    var linkEl   = c.querySelector('a[href*="/products/"]');
-    var imgEl    = c.querySelector('img[itemprop="image"], img[class*="logo"]');
-    out.push({
-      name:    nameEl   ? nameEl.innerText.trim()   : '',
-      rating:  ratingEl ? (ratingEl.getAttribute('content') || ratingEl.innerText.trim()) : '',
-      reviews: countEl  ? (countEl.getAttribute('content') || countEl.innerText.trim()) : '',
-      url:     linkEl   ? linkEl.href                                                    : '',
-      logo:    imgEl    ? imgEl.src                                                      : '',
-    });
-  }
-  return JSON.stringify(out.filter(p => p.name));
-})()
-""")
-
-listing = json.loads(products)
-for p in listing:
-    print(f"{p['name']}: {p['rating']}★ ({p['reviews']} reviews)")
-```
-
----
-
-## Detecting DataDome challenge vs. real page
-
-```python
-def g2_is_datadome_blocked() -> bool:
-    """True if DataDome challenge is still running (not on the real G2 page)."""
-    url_now = page_info()["url"]
-    title   = js("document.title") or ""
-    return (
-        "captcha-delivery.com" in url_now
-        or "datadome" in url_now.lower()
-        or title.strip() == "g2.com"          # DataDome 403 response has title="g2.com"
-    )
-
-# Usage
-new_tab("https://www.g2.com/products/slack/reviews")
-wait_for_load()
-wait(5)
-
-if g2_is_datadome_blocked():
-    wait(10)  # give DataDome JS extra time to complete
-    if g2_is_datadome_blocked():
-        capture_screenshot("/tmp/g2_dd_block.png")
-        raise RuntimeError("DataDome challenge did not resolve — check screenshot")
-```
-
----
-
-## Handling the sign-in modal
-
-A login modal appears after scrolling past ~5 reviews (triggered by scroll, not on load). Extract all visible review cards **before scrolling**. If you need to scroll:
-
-```python
-def dismiss_g2_login_modal():
-    """Close G2's sign-in overlay. Safe to call if no modal is present."""
-    closed = js("""
-    (function() {
-      var selectors = [
-        '[data-close-modal], [data-modal-close]',
-        'button[aria-label="Close"]',
-        '[class*="modal"] button[class*="close"]',
-        '[class*="Modal"] button[class*="close"]',
-        '.modal-dialog .close',
-        'button.close',
-      ];
-      for (var i = 0; i < selectors.length; i++) {
-        var btn = document.querySelector(selectors[i]);
-        if (btn && btn.offsetParent !== null) {
-          btn.click();
-          return selectors[i];
-        }
-      }
-      return null;
-    })()
-    """)
-    if closed:
-        wait(1)
-    return closed
-```
-
-Call `dismiss_g2_login_modal()` after any scroll action that might trigger the modal.
-
----
-
-## Gotchas
-
-- **`http_get` is permanently blocked.** DataDome 5.6.1 intercepts every Python `urllib` / `requests` call. The blocking signal is `X-DataDome: protected` + `X-DD-B: 1` in the response header, response body `rt:'c'` (CAPTCHA). No User-Agent, header set, or cookie replay bypasses it because the `datadome` cookie is bound to the originating TLS fingerprint. Without a real browser's TLS/JA3 fingerprint, the cookie is rejected as `rt:'i'` (invalid).
-
-- **DataDome does NOT block real Chrome via CDP.** The harness connects to Chrome via CDP. Chrome presents a genuine JA3 TLS fingerprint plus browser APIs (canvas, WebGL, Navigator). DataDome's fingerprinting sees a real browser and issues a valid `datadome` cookie silently (no CAPTCHA widget, no user action needed).
-
-- **`wait(5)` minimum after `wait_for_load()`.** DataDome's JS runs 2–4 seconds after `readyState='complete'`. The challenge page title is `"g2.com"` (not "G2 | Software Reviews..."). Checking `document.title` reliably distinguishes challenge from real page.
-
-- **G2 is a Rails app — no `__NEXT_DATA__`.** Unlike Next.js sites, G2 does NOT embed page data in a JSON script tag. All data must be extracted from the rendered HTML or via the official API. G2 uses Hotwire (Turbo + Stimulus) for frontend interactivity.
-
-- **Schema.org microdata is the reliable extraction path.** G2 bakes `itemtype` / `itemprop` attributes into their SSR HTML. These are stable across visual redesigns because they serve SEO purposes. Prefer `[itemprop="ratingValue"]` over class-based selectors.
-
-- **`comment_answers` key names vary by product.** The API's `comment_answers` dict uses question-specific keys that differ across products. Common keys include `"love"` (what do you like best?), `"hate"` (what do you dislike?), `"benefit"` (what benefits?), but these are not guaranteed. Inspect the raw response first.
-
-- **Sign-in modal triggers on scroll.** G2 limits anonymous visitors to the reviews visible in the initial viewport (~5 reviews). Scrolling triggers a login modal. Extract all initial cards before any scroll call. To get more reviews without login, use `?page=2`, `?page=3`, etc. instead of scrolling.
-
-- **Rate limiting on navigation.** G2 does not publish a browser-facing rate limit, but rapid consecutive `goto_url()` calls (< 2s apart) can trigger soft blocks. Use `wait(3)` between product page navigations and `wait(2)` between paginated review pages in the same session.
-
-- **Cloudflare is CDN-only here, not Bot Management.** The `Server: cloudflare` header and `__cf_bm` cookie are standard Cloudflare CDN features (not the Cloudflare Bot Management product). The actual anti-bot protection is DataDome. Do not apply Glassdoor-style CF challenge waits — the DataDome wait is what matters.
-
-- **`data.g2.com` API needs a vendor token.** The API requires `Authorization: Token token=<key>`. The key is obtained by registering as a G2 vendor or partner at `https://www.g2.com/sells`. The 401 response body is `{"errors":[{"status":"401","title":"Bad Credentials"}]}` — no further auth clues.
-
-- **Product UUIDs are required for the API.** The API uses UUIDs (e.g. `ac7841ad-cca8-4125-ac6f-6ef6b5848781`) not slugs for relationship endpoints like `/products/{id}/survey-responses`. Look up the UUID first via `GET /api/v1/products?filter[slug]=slack`.
-
-- **`/categories/*/grids` is disallowed in robots.txt** — may return 403 or empty content even in a browser session.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/genius/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/genius/scraping.md
deleted file mode 100644
index 39d671db2..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/genius/scraping.md
+++ /dev/null
@@ -1,511 +0,0 @@
-# Genius — Data Extraction
-
-Field-tested against genius.com on 2026-04-18.
-No authentication required for any approach documented here.
-
----
-
-## Anti-Bot: http_get Fails, Custom UA Required
-
-`http_get` uses `User-Agent: Mozilla/5.0` (bare string). Genius returns HTTP 403 for that UA on both HTML pages and internal API endpoints. Adding any OS token (e.g. `(Macintosh; Intel Mac OS X 10_15_7)`) immediately lifts the block — no cookies, no session, no JavaScript required.
-
-```python
-from helpers import http_get
-
-def genius_get(url, extra_headers=None):
-    """Drop-in replacement for http_get on genius.com endpoints."""
-    headers = {
-        "User-Agent": (
-            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
-            "AppleWebKit/537.36 (KHTML, like Gecko) "
-            "Chrome/120.0.0.0 Safari/537.36"
-        ),
-        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-        "Accept-Language": "en-US,en;q=0.5",
-        "Accept-Encoding": "gzip",
-    }
-    if extra_headers:
-        headers.update(extra_headers)
-    return http_get(url, headers=headers)
-```
-
-Use `genius_get` everywhere in this document instead of bare `http_get`.
-
----
-
-## Approach 1 (Fastest): Internal JSON API — No Auth, No Browser
-
-Genius's own website calls `genius.com/api/*` (not `api.genius.com`) from
-its server-side rendering layer. These endpoints are public and require only
-a browser-like User-Agent. They return rich structured JSON in ~0.13s.
-
-### Song metadata
-
-```python
-import json
-from helpers import http_get
-
-def genius_get(url, extra_headers=None):
-    headers = {
-        "User-Agent": (
-            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
-            "AppleWebKit/537.36 (KHTML, like Gecko) "
-            "Chrome/120.0.0.0 Safari/537.36"
-        ),
-        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-        "Accept-Language": "en-US,en;q=0.5",
-        "Accept-Encoding": "gzip",
-    }
-    if extra_headers:
-        headers.update(extra_headers)
-    return http_get(url, headers=headers)
-
-def genius_song(song_id):
-    """Fetch full song metadata by Genius song ID."""
-    data = json.loads(genius_get(f"https://genius.com/api/songs/{song_id}"))
-    return data["response"]["song"]
-
-song = genius_song(1063)
-# All fields available in one call (no auth):
-# song["title"]                          → "Bohemian Rhapsody"
-# song["full_title"]                     → "Bohemian Rhapsody by Queen"
-# song["artist_names"]                   → "Queen"
-# song["primary_artist"]["name"]         → "Queen"
-# song["primary_artist"]["id"]           → 563
-# song["primary_artist"]["url"]          → "https://genius.com/artists/Queen"
-# song["release_date"]                   → "1975-10-31"
-# song["release_date_for_display"]       → "October 31, 1975"
-# song["release_date_components"]        → {"year": 1975, "month": 10, "day": 31}
-# song["stats"]["pageviews"]             → 11067562
-# song["stats"]["contributors"]          → 516
-# song["stats"]["accepted_annotations"]  → 20
-# song["pyongs_count"]                   → 703
-# song["annotation_count"]              → 33
-# song["comment_count"]                 → 253
-# song["album"]["name"]                 → "Studio Collection"   (varies by region)
-# song["albums"][0]["name"]             → "A Night at the Opera"  (first = original)
-# song["url"]                           → "https://genius.com/Queen-bohemian-rhapsody-lyrics"
-# song["path"]                          → "/Queen-bohemian-rhapsody-lyrics"
-# song["song_art_image_url"]            → "https://images.genius.com/718de9d..."
-# song["explicit"]                      → False
-# song["language"]                      → "en"
-# song["lyrics_state"]                  → "complete"
-# song["lyrics_verified"]              → False
-# song["spotify_uuid"]                  → "7tFiyTwD0nx5a1eklYtX2J"
-# song["youtube_url"]                   → "https://www.youtube.com/watch?v=fJ9rUzIMcZQ"
-# song["writer_artists"]                → [{"name": "Freddie Mercury", ...}]
-# song["producer_artists"]              → [{"name": "Roy Thomas Baker"}, {"name": "Queen"}]
-# song["featured_artists"]             → []
-
-# Primary album (first in list = original release):
-primary_album = song["albums"][0]["name"]   # "A Night at the Opera"
-```
-
-### Search
-
-```python
-def genius_search(query, per_page=5):
-    """Search Genius. Returns sections: top_hit, song, lyric, artist, album, video, article, user."""
-    url = f"https://genius.com/api/search/multi?per_page={per_page}&q={urllib.parse.quote(query)}"
-    data = json.loads(genius_get(url))
-    return data["response"]["sections"]
-
-import urllib.parse
-sections = genius_search("Bohemian Rhapsody Queen", per_page=5)
-# sections is a list of dicts with keys: "type", "hits"
-# Each hit has: "type", "result"
-# For type="song", result has: id, full_title, url, primary_artist, stats, ...
-
-for section in sections:
-    if section["type"] == "song":
-        for hit in section["hits"]:
-            r = hit["result"]
-            print(r["full_title"], r["url"], r["id"])
-        # Bohemian Rhapsody by Queen  https://genius.com/Queen-bohemian-rhapsody-lyrics  1063
-        break
-
-# Simpler search (song section only):
-def genius_search_songs(query, per_page=5):
-    sections = genius_search(query, per_page)
-    for s in sections:
-        if s["type"] == "song":
-            return [h["result"] for h in s["hits"]]
-    return []
-```
-
-### Artist songs (paginated)
-
-```python
-def genius_artist_songs(artist_id, per_page=20, sort="popularity"):
-    """Fetch paginated list of songs for an artist. sort: 'popularity' or 'title'."""
-    page = 1
-    while True:
-        url = (f"https://genius.com/api/artists/{artist_id}/songs"
-               f"?per_page={per_page}&page={page}&sort={sort}")
-        data = json.loads(genius_get(url))["response"]
-        songs = data["songs"]
-        if not songs:
-            break
-        yield from songs
-        if data["next_page"] is None:
-            break
-        page = data["next_page"]
-
-# Example: get top 5 Queen songs by popularity
-for song in list(genius_artist_songs(563, per_page=5))[:5]:
-    print(f"{song['full_title']} — {song['stats']['pageviews']:,} views")
-# Bohemian Rhapsody by Queen — 11,067,663 views
-# Don't Stop Me Now by Queen — 2,453,240 views
-# Under Pressure by Queen & David Bowie — 1,972,606 views
-# Somebody to Love by Queen — 1,241,740 views
-# Killer Queen by Queen — 1,146,813 views
-```
-
----
-
-## Approach 2: Lyrics from HTML — Regex on data-lyrics-container
-
-The lyrics live in `<div data-lyrics-container="true">` elements on the song's
-lyrics page. There are usually 3–5 such divs (the song is split across sections).
-Each div can contain nested child divs for annotation highlights — including a
-`data-exclude-from-selection="true"` header div that must be stripped first.
-
-```python
-import re, json
-from helpers import http_get
-
-def genius_get(url, extra_headers=None):
-    headers = {
-        "User-Agent": (
-            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
-            "AppleWebKit/537.36 (KHTML, like Gecko) "
-            "Chrome/120.0.0.0 Safari/537.36"
-        ),
-        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-        "Accept-Language": "en-US,en;q=0.5",
-        "Accept-Encoding": "gzip",
-    }
-    if extra_headers:
-        headers.update(extra_headers)
-    return http_get(url, headers=headers)
-
-def _remove_excluded_divs(html):
-    """Strip all <div data-exclude-from-selection="true"> subtrees (contributor headers)."""
-    while True:
-        idx = html.find('data-exclude-from-selection="true"')
-        if idx == -1:
-            break
-        tag_start = html.rfind("<div", 0, idx)
-        depth, pos = 0, tag_start
-        while pos < len(html):
-            if html[pos:pos+4] == "<div":
-                depth += 1; pos += 4
-            elif html[pos:pos+6] == "</div>":
-                depth -= 1; pos += 6
-                if depth == 0:
-                    html = html[:tag_start] + html[pos:]
-                    break
-            else:
-                pos += 1
-        else:
-            break
-    return html
-
-def _extract_div_content(html, marker):
-    """Extract all <div> subtrees that contain the given attribute marker."""
-    parts = []
-    start = 0
-    while True:
-        idx = html.find(marker, start)
-        if idx == -1:
-            break
-        tag_start = html.rfind("<div", 0, idx)
-        depth, pos = 0, tag_start
-        while pos < len(html):
-            if html[pos:pos+4] == "<div":
-                depth += 1; pos += 4
-            elif html[pos:pos+6] == "</div>":
-                depth -= 1; pos += 6
-                if depth == 0:
-                    parts.append(html[tag_start:pos])
-                    break
-            else:
-                pos += 1
-        start = idx + 1
-    return parts
-
-def _html_to_text(html_str):
-    """Convert lyrics HTML to plain text, preserving line breaks."""
-    text = re.sub(r"<br\s*/?>", "\n", html_str)
-    text = re.sub(r"<[^>]+>", "", text)
-    text = (text
-            .replace("&amp;", "&").replace("&lt;", "<").replace("&gt;", ">")
-            .replace("&#39;", "'").replace("&quot;", '"').replace("&#x27;", "'")
-            .replace("&#x2F;", "/").replace("&nbsp;", " "))
-    # Collapse multiple blank lines to one
-    lines = [l.strip() for l in text.split("\n")]
-    result, prev_blank = [], False
-    for line in lines:
-        if not line:
-            if not prev_blank:
-                result.append("")
-            prev_blank = True
-        else:
-            result.append(line)
-            prev_blank = False
-    return "\n".join(result).strip()
-
-def genius_lyrics(url):
-    """
-    Scrape lyrics from a Genius song URL.
-
-    url: the canonical lyrics URL, e.g. 'https://genius.com/Queen-bohemian-rhapsody-lyrics'
-    Returns: plain-text lyrics string with section headers like [Verse 1], [Chorus].
-    """
-    html = genius_get(url)
-    cleaned = _remove_excluded_divs(html)
-    containers = _extract_div_content(cleaned, 'data-lyrics-container="true"')
-    parts = []
-    for c in containers:
-        text = _html_to_text(c).strip()
-        if text:
-            parts.append(text)
-    return "\n\n".join(parts)
-
-lyrics = genius_lyrics("https://genius.com/Queen-bohemian-rhapsody-lyrics")
-# Returns 2076 chars, 62 lines, structured as:
-# [Intro]
-# Is this the real life? Is this just fantasy?
-# Caught in a landslide, no escape from reality
-# ...
-# [Verse 1]
-# Mama, just killed a man
-# ...
-# [Outro]
-# Nothing really matters to me
-# Any way the wind blows
-```
-
-**Performance:** Lyrics page is ~1.2 MB. One `genius_get` call takes ~0.18s.
-No rate limiting observed across 10 rapid sequential requests.
-
----
-
-## Approach 3: Combined Workflow — Metadata + Lyrics
-
-The fastest complete extraction pattern: one API call for all metadata,
-one HTML call for lyrics. Song ID can be derived several ways.
-
-```python
-import json, re, urllib.parse
-from helpers import http_get
-
-def genius_get(url, extra_headers=None):
-    headers = {
-        "User-Agent": (
-            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
-            "AppleWebKit/537.36 (KHTML, like Gecko) "
-            "Chrome/120.0.0.0 Safari/537.36"
-        ),
-        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-        "Accept-Language": "en-US,en;q=0.5",
-        "Accept-Encoding": "gzip",
-    }
-    if extra_headers:
-        headers.update(extra_headers)
-    return http_get(url, headers=headers)
-
-def genius_song_id_from_url(lyrics_url):
-    """
-    Extract Genius song ID from a lyrics page URL without fetching it.
-    Returns None if not determinable — fall back to fetching the page.
-    """
-    # Not possible from the slug alone; must fetch the page or use search.
-    # From the HTML: <meta content="genius://songs/{id}" name="twitter:app:url:iphone">
-    html = genius_get(lyrics_url)
-    m = re.search(r'content="genius://songs/(\d+)"', html)
-    return int(m.group(1)) if m else None
-
-def genius_full(query):
-    """
-    Search for a song, return metadata + lyrics in two HTTP calls.
-    """
-    # Call 1: search for song
-    sections = json.loads(
-        genius_get(f"https://genius.com/api/search/multi?per_page=3&q={urllib.parse.quote(query)}")
-    )["response"]["sections"]
-    song_result = None
-    for s in sections:
-        if s["type"] == "song" and s["hits"]:
-            song_result = s["hits"][0]["result"]
-            break
-    if not song_result:
-        return None
-
-    song_id = song_result["id"]
-    lyrics_url = song_result["url"]
-
-    # Call 2: full metadata from internal API
-    meta = json.loads(genius_get(f"https://genius.com/api/songs/{song_id}"))["response"]["song"]
-
-    # Call 3: lyrics from HTML
-    lyrics = genius_lyrics(lyrics_url)   # uses the function from Approach 2
-
-    return {
-        "id":           meta["id"],
-        "title":        meta["title"],
-        "artist":       meta["primary_artist"]["name"],
-        "artist_id":    meta["primary_artist"]["id"],
-        "album":        meta["albums"][0]["name"] if meta.get("albums") else None,
-        "release_date": meta["release_date"],           # "1975-10-31"
-        "pageviews":    meta["stats"]["pageviews"],      # 11067562
-        "contributors": meta["stats"]["contributors"],   # 516
-        "writers":      [a["name"] for a in meta["writer_artists"]],
-        "producers":    [a["name"] for a in meta["producer_artists"]],
-        "spotify_uuid": meta["spotify_uuid"],
-        "youtube_url":  meta["youtube_url"],
-        "song_art_url": meta["song_art_image_url"],
-        "lyrics_url":   meta["url"],
-        "lyrics":       lyrics,
-    }
-
-result = genius_full("Queen Bohemian Rhapsody")
-# {
-#   "id":           1063,
-#   "title":        "Bohemian Rhapsody",
-#   "artist":       "Queen",
-#   "artist_id":    563,
-#   "album":        "A Night at the Opera",
-#   "release_date": "1975-10-31",
-#   "pageviews":    11067562,
-#   "contributors": 516,
-#   "writers":      ["Freddie Mercury"],
-#   "producers":    ["Roy Thomas Baker", "Queen"],
-#   "spotify_uuid": "7tFiyTwD0nx5a1eklYtX2J",
-#   "youtube_url":  "https://www.youtube.com/watch?v=fJ9rUzIMcZQ",
-#   "song_art_url": "https://images.genius.com/718de9d1fbcaae9f3c9b1bf483bfa8f1.1000x1000x1.png",
-#   "lyrics_url":   "https://genius.com/Queen-bohemian-rhapsody-lyrics",
-#   "lyrics":       "[Intro]\nIs this the real life?..."
-# }
-```
-
----
-
-## URL and ID Patterns
-
-| Resource    | URL pattern                                   | Notes                            |
-|-------------|-----------------------------------------------|----------------------------------|
-| Song page   | `genius.com/{Artist}-{song-slug}-lyrics`      | Slug is lowercased, hyphenated   |
-| Artist page | `genius.com/artists/{Artist}`                 | Title-cased artist name          |
-| Album page  | `genius.com/albums/{Artist}/{album-slug}`     |                                  |
-| Song API    | `genius.com/api/songs/{id}`                   | Internal; no auth required       |
-| Artist API  | `genius.com/api/artists/{id}`                 | Internal; no auth required       |
-| Artist songs| `genius.com/api/artists/{id}/songs?...`       | per_page, page, sort params      |
-| Search API  | `genius.com/api/search/multi?per_page=N&q=...`| Internal; multi-section results  |
-
-**Extracting song ID from a known lyrics URL:**
-
-```python
-# The slug alone cannot be decoded to an ID. Must fetch HTML or search.
-# From lyrics page HTML (fastest — one line):
-song_id = re.search(r'content="genius://songs/(\d+)"', html).group(1)
-
-# Or from __PRELOADED_STATE__ (same page, equally reliable):
-song_id = re.search(r'\\"song\\":\s*(\d+)', html).group(1)
-
-# Or from search API (no HTML required):
-sections = json.loads(genius_get(f"https://genius.com/api/search/multi?per_page=1&q={query}"))
-# then walk sections for type="song"
-```
-
----
-
-## What Requires a Browser
-
-The following are **not available** via `genius_get` / HTTP:
-
-- **Search results page** (`/search?q=...`): renders client-side only. The
-  returned HTML contains no song results matching the query. Use the internal
-  search API (`/api/search/multi`) instead — it works without a browser.
-
-- **Public API** (`api.genius.com`): returns HTTP 401 without a Bearer token
-  even with a browser-like User-Agent. Must register at genius.com/developers
-  to obtain a client access token. The internal site API (`genius.com/api/*`)
-  is the no-auth alternative and returns equivalent data.
-
-- **Annotations content**: annotation HTML is embedded in `__PRELOADED_STATE__`
-  but the JSON is multi-escaped (six levels of backslash nesting) and cannot
-  be reliably parsed with plain string operations. Annotation IDs are
-  available but their body text is not easily extractable.
-
-- **Login-gated features**: user library, personalization, editor tools.
-
----
-
-## Public API (api.genius.com) — Requires Bearer Token
-
-If you have a token (free registration at genius.com/developers):
-
-```python
-def genius_api(path, token):
-    """Call the official public API. path example: '/songs/1063'"""
-    import json
-    from helpers import http_get
-    url = f"https://api.genius.com{path}"
-    return json.loads(http_get(url, headers={"Authorization": f"Bearer {token}"}))
-
-# Returns same structure as the internal /api/* endpoints.
-# Endpoints: /songs/{id}, /artists/{id}, /artists/{id}/songs, /search?q=...
-# Without a token: HTTP 401 with body:
-# {"meta": {"status": 401, "message": "This call requires an access_token..."}}
-```
-
----
-
-## Gotchas
-
-- **`http_get` returns 403**: The default `User-Agent: Mozilla/5.0` (bare) is
-  blocked. Add any OS string — `(Macintosh; Intel Mac OS X 10_15_7)` is
-  sufficient. Use the `genius_get` wrapper from this document.
-
-- **`data-lyrics-container` split across 3–5 divs**: Don't look for a single
-  lyrics block. Use `_extract_div_content` on all occurrences, then join.
-  Empty containers (`<div ...></div>`, 87 bytes) appear between sections —
-  the `if text:` guard skips them cleanly.
-
-- **`data-exclude-from-selection` header in first container**: The first
-  lyrics container includes a contributor credit header div. It must be
-  stripped before text extraction or the output will begin with
-  `"516 ContributorsTranslations..."` instead of `"[Intro]"`.
-
-- **`album` field vs `albums[0]`**: `song["album"]` is the "primary" album
-  used by Genius's album link (often a compilation or reissue). `song["albums"][0]`
-  is the first album in the full list and is typically the original release.
-  Verified: for Bohemian Rhapsody, `album.name` = "Studio Collection" but
-  `albums[0].name` = "A Night at the Opera".
-
-- **`__PRELOADED_STATE__` is not parseable**: The state is embedded as
-  `JSON.parse('...')` where the inner JSON is escaped six levels deep
-  (`\\\\\"` for a literal quote inside HTML content). Standard string
-  replacement fails due to `\\'` and `\$` sequences. Don't try to parse it —
-  use the `/api/songs/{id}` endpoint instead.
-
-- **No `__NEXT_DATA__`**: Genius does not use Next.js. There is no
-  `<script id="__NEXT_DATA__">` on any page.
-
-- **No JSON-LD**: Genius does not emit `<script type="application/ld+json">`.
-  Open Graph tags are present but minimal (only `og:title`, `og:image`,
-  `og:description`, `og:url`, `og:type`). Use the API for structured data.
-
-- **Search page is client-side only**: `GET /search?q=...` returns an HTML
-  shell with ~5 unrelated song links (trending, not query-matched). The actual
-  search results are fetched client-side by JavaScript. Use `/api/search/multi`
-  instead — it works without a browser and returns properly filtered results.
-
-- **Rate limiting**: No rate limiting observed across 10 rapid sequential
-  requests to `/api/songs/{id}` (avg 0.13s/request). Song lyrics pages
-  average 0.18s. No Retry-After headers observed.
-
-- **Cloudflare**: Present (confirmed by `<meta itemprop="cf-country">` and
-  `cf-cache-status` tags), but in pass-through mode — no JS challenge, no
-  CAPTCHA. A browser-like User-Agent is all that's needed.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/github/repo-actions.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/github/repo-actions.md
deleted file mode 100644
index c7ba65410..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/github/repo-actions.md
+++ /dev/null
@@ -1,65 +0,0 @@
-# GitHub — Repo actions (star, unstar, watch)
-
-`https://github.com/{owner}/{repo}` — user-triggered actions on the repo header (Star, Unstar, Watch, Unwatch) are HTML forms that POST back to GitHub with the session's CSRF token already rendered inline. **Submit the form — do not click the button.**
-
-## Do this first
-
-```python
-# Precondition: user is logged in
-if not js('!!document.querySelector("meta[name=user-login]")'):
-    raise RuntimeError("not logged in to GitHub")
-
-# Star the current repo
-js("""
-(()=>{
-  const f = document.querySelector('form[action$="/star"]');
-  if (!f) return 'already-starred-or-missing';
-  f.submit();
-  return 'submitted';
-})()
-""")
-wait(2)
-wait_for_load()
-
-# Verify — the toggle swaps which form is present
-starred = js('!!document.querySelector(\'form[action$="/unstar"]\')')
-```
-
-Same pattern for the reverse (`form[action$="/unstar"]`) and for watch/unwatch (`form[action$="/subscription"]` + a hidden `_method` field, see below).
-
-## Why not click the button
-
-The visible Star button looks like `button[aria-label^="Star "]`, but that selector has two gotchas on the modern repo header:
-
-- **There are two matching buttons.** The first one `querySelector` returns is a hidden fallback inside the sticky sub-header form with `getBoundingClientRect() == {x:0, y:0, w:0, h:0}`. Coordinate-clicking it does nothing because it has no geometry.
-- **Synthetic `.click()` on the visible React button does not persist the star.** The click fires, `aria-label` stays `Star ...`, network tab shows no POST. GitHub's component swallows the synthetic event somewhere in its React fiber handler.
-
-`form.submit()` sidesteps both problems — it bypasses React entirely and goes straight to the HTML form's POST. The authenticity token is already in a hidden input inside the form, so there's nothing extra to fetch.
-
-## Watch / Unwatch
-
-The subscription form uses a shared endpoint with a `_method` override:
-
-```python
-# Watch (all activity)
-js("""
-(()=>{
-  const f = document.querySelector('form[action$="/subscription"]');
-  if (!f) return 'missing';
-  f.submit();
-  return 'submitted';
-})()
-""")
-```
-
-GitHub renders different form attributes (different `_method` hidden input values) depending on the current state. Re-read the form after every toggle rather than caching a reference.
-
-## Gotchas
-
-- **Star count in the rendered button lags the true count by a hydration tick.** The durable signal that "this worked" is which form is on the page after reload: `form[action$="/star"]` present means unstarred, `form[action$="/unstar"]` means starred. The visible aria-label is reliable once you scroll to the top and wait ~1s after submit; the count inside the button updates on soft navigation and is not a good assertion target.
-
-- **`form.submit()` bypasses the form's `submit` event listeners** — fine for GitHub's case (the handler is a full navigation), but if a future change wires in `e.preventDefault()` to do an XHR, `form.requestSubmit()` is the safer alternative. Worth trying first if `form.submit()` stops working.
-
-- **If the user is not logged in the forms are not rendered at all.** `meta[name="user-login"]` is the cheapest pre-check.
-
-- **For read-only star counts, don't touch the DOM — use the API.** `http_get("https://api.github.com/repos/{owner}/{repo}")` returns `stargazers_count` without any browser interaction. See `scraping.md`. Only use the form-submit pattern when you actually need to *change* state on behalf of the logged-in user.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/github/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/github/scraping.md
deleted file mode 100644
index e427c6246..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/github/scraping.md
+++ /dev/null
@@ -1,184 +0,0 @@
-# GitHub — Scraping & Data Extraction
-
-`https://github.com` — public data, mix of REST API (fast, rate-limited) and browser (trending page only).
-
-## Do this first
-
-**Use the REST API for repo/user/release data — it's one call, no browser, fully parsed JSON.**
-
-```python
-import json
-data = json.loads(http_get("https://api.github.com/repos/{owner}/{repo}"))
-# Key fields: stargazers_count, forks_count, description, language, topics,
-#             open_issues_count, created_at, updated_at, pushed_at,
-#             watchers_count, subscribers_count, network_count,
-#             default_branch, license, homepage, visibility
-```
-
-Use `raw.githubusercontent.com` for file contents — no rate limit, no auth, no base64 decode:
-
-```python
-readme = http_get("https://raw.githubusercontent.com/owner/repo/main/README.md")
-content = http_get("https://raw.githubusercontent.com/owner/repo/main/pyproject.toml")
-```
-
-Use the browser **only** for the trending page — it's server-side rendered HTML, no API equivalent.
-
-## Common workflows
-
-### Repo metadata (API)
-
-```python
-import json
-data = json.loads(http_get("https://api.github.com/repos/browser-use/browser-use"))
-print(data['stargazers_count'], data['forks_count'], data['description'])
-# returns: 88349  10136  '🌐 Make websites accessible for AI agents.'
-```
-
-### User / org profile (API)
-
-```python
-import json
-user = json.loads(http_get("https://api.github.com/users/browser-use"))
-print(user['type'], user['followers'], user['public_repos'], user['blog'])
-# returns: 'Organization'  3046  39  'https://browser-use.com'
-```
-
-### Trending page (browser required)
-
-The trending page is JS-rendered. `article.Box-row` selector confirmed working (15 results for today/all-languages, 12 for filtered). All fields work in a single JS call — **must navigate and wait in the same script run**, as each run is a separate exec context.
-
-```python
-import json
-goto_url("https://github.com/trending")          # or /trending/python?since=weekly
-wait_for_load()
-wait(2)                                       # extra 2s — React hydration completes after readyState
-
-result = js("""
-(function(){
-  var rows = Array.from(document.querySelectorAll('article.Box-row'));
-  return JSON.stringify(rows.map(function(el){
-    var h2link = el.querySelector('h2 a');
-    var starLink = el.querySelector('a[href*="/stargazers"]');
-    var forkLink = el.querySelector('a[href*="/forks"]');
-    var langEl = el.querySelector('[itemprop="programmingLanguage"]');
-    var todayEl = el.querySelector('.d-inline-block.float-sm-right');
-    var descEl = el.querySelector('p');
-    return {
-      name: h2link ? h2link.innerText.trim().replace(/\\s+/g,' ') : null,
-      url: h2link ? 'https://github.com' + h2link.getAttribute('href') : null,
-      stars_total: starLink ? starLink.innerText.trim() : null,
-      stars_period: todayEl ? todayEl.innerText.trim() : null,
-      forks: forkLink ? forkLink.innerText.trim() : null,
-      language: langEl ? langEl.innerText.trim() : null,
-      desc: descEl ? descEl.innerText.trim() : null
-    };
-  }));
-})()
-""")
-repos = json.loads(result)
-# stars_period text is e.g. "737 stars today" or "47,053 stars this week"
-```
-
-Supported URL params:
-- `/trending` — all languages, today
-- `/trending/python` — filtered to Python
-- `/trending?since=weekly` or `?since=monthly`
-- `/trending/python?since=weekly` — combined
-
-### Search repositories (API)
-
-```python
-import json
-results = json.loads(http_get(
-    "https://api.github.com/search/repositories?q=browser+automation+language:python&sort=stars&per_page=10"
-))
-print(results['total_count'])   # e.g. 3250
-for r in results['items']:
-    print(r['full_name'], r['stargazers_count'])
-```
-
-Search API rate limit is **10 req/min** unauthenticated (separate from the 60/hour core limit). Runs out fast if called in a loop.
-
-### Commits, releases, issues (API)
-
-```python
-import json
-# Commits
-commits = json.loads(http_get("https://api.github.com/repos/owner/repo/commits?per_page=10"))
-# Fields: sha, commit.message, commit.author.date, author.login
-
-# Releases
-releases = json.loads(http_get("https://api.github.com/repos/owner/repo/releases?per_page=5"))
-# Fields: tag_name, name, published_at, body, assets
-
-# Issues
-issues = json.loads(http_get("https://api.github.com/repos/owner/repo/issues?state=open&per_page=10"))
-# Fields: number, title, labels, state, created_at, user.login
-
-# Contributors
-contribs = json.loads(http_get("https://api.github.com/repos/owner/repo/contributors?per_page=10"))
-# Fields: login, contributions
-```
-
-### File contents via API (base64)
-
-```python
-import json, base64
-resp = json.loads(http_get("https://api.github.com/repos/owner/repo/contents/path/to/file.py"))
-content = base64.b64decode(resp['content']).decode()
-# resp also has: size, sha, html_url
-# Prefer raw.githubusercontent.com for large files — no base64, no rate limit hit
-```
-
-### Parallel fetching (multiple repos)
-
-```python
-import json
-from concurrent.futures import ThreadPoolExecutor
-
-def fetch_repo(name):
-    data = json.loads(http_get(f"https://api.github.com/repos/{name}"))
-    return {"name": name, "stars": data['stargazers_count'], "lang": data['language']}
-
-repos = ["owner/repo1", "owner/repo2", "owner/repo3"]
-with ThreadPoolExecutor(max_workers=3) as ex:
-    results = list(ex.map(fetch_repo, repos))
-# Confirmed working; watch rate limit — 60 unauthenticated calls/hour total
-```
-
-## Gotchas
-
-- **Rate limits are per IP, unauthenticated** — Core API: 60 req/hour. Search API: 10 req/min. These are separate pools. Check `/rate_limit` endpoint: `http_get("https://api.github.com/rate_limit")`. With a `GITHUB_TOKEN`, both limits increase to 5,000/hour.
-
-- **Token header format** — Use `Authorization: Bearer <token>` (not `token <token>`), plus `X-GitHub-Api-Version: 2022-11-28`:
-  ```python
-  import os
-  token = os.environ.get('GITHUB_TOKEN', '')
-  headers = {"Authorization": f"Bearer {token}", "X-GitHub-Api-Version": "2022-11-28"} if token else {}
-  data = json.loads(http_get("https://api.github.com/repos/owner/repo", headers=headers))
-  ```
-
-- **404 raises HTTPError, not a JSON error** — Wrap API calls for missing repos:
-  ```python
-  try:
-      data = json.loads(http_get("https://api.github.com/repos/owner/repo"))
-  except Exception as e:
-      print("Not found or rate limited:", e)
-  ```
-
-- **Code search requires auth** — `GET /search/code` returns HTTP 401 without a token. Repo/user/issues search works unauthenticated.
-
-- **Trending page selectors only work if navigation is in the same script run** — Each `uv run browser-harness` exec is fresh. Selectors that returned 0 results were run in a separate invocation after the page had navigated away. Always include `goto_url()` + `wait_for_load()` + `wait(2)` in the same script.
-
-- **wait(2) after wait_for_load() on trending** — `document.readyState == 'complete'` fires before React finishes painting repo cards. Without the extra 2s sleep, `article.Box-row` count was 0 even though the DOM technically loaded.
-
-- **Trending stars field is a string with commas** — `stars_total` comes back as `"4,548"` not `4548`. Parse with `int(r['stars_total'].replace(',', ''))` if you need to sort.
-
-- **stars_period text includes the period** — Value is `"737 stars today"` or `"47,053 stars this week"` — strip the trailing word if you want just the number.
-
-- **Repo page DOM is React-heavy, API is better** — Extracting star counts from the repo HTML page (`github.com/owner/repo`) is unreliable because GitHub uses React with server-side hydration and component IDs change. The REST API returns all the same data cleanly.
-
-- **raw.githubusercontent.com has no rate limit and no auth** — Use it for any public file. It serves the raw bytes, no JSON wrapping or base64.
-
-- **Trending page article count varies** — Today filter returned 15 articles, weekly Python filter returned 12. Don't assume 25 results; iterate `document.querySelectorAll('article.Box-row')` and take what's there.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/glassdoor/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/glassdoor/scraping.md
deleted file mode 100644
index 5f52e62b1..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/glassdoor/scraping.md
+++ /dev/null
@@ -1,543 +0,0 @@
-# Glassdoor — Company Data, Reviews, Jobs & Salaries
-
-Field-tested against glassdoor.com on 2026-04-18.
-
-## Anti-bot verdict: browser required, no http_get workaround exists
-
-**`http_get` returns HTTP 403 on every Glassdoor URL without exception.**
-
-Tested endpoints (all 403):
-- `/Reviews/Google-Reviews-E9079.htm`
-- `/Overview/Working-at-Google-EI_IE9079.htm`
-- `/Job/jobs.htm?sc.keyword=software+engineer`
-- `/Salaries/software-engineer-salary-SRCH_KO0,17.htm`
-- `/graph` (GraphQL)
-- `sitemap.xml`
-
-UAs tested (all blocked): `Mozilla/5.0`, full Chrome 124, Googlebot, `curl/7.88.1`.
-
-**Stack:** Cloudflare Bot Management (`Server: cloudflare`, `Cf-Mitigated: challenge`).
-Challenge type: `managed` (JS-executed browser fingerprint check, no CAPTCHA widget, no user click
-required in a real browser). Cookie-only bypass also fails — the `__cf_bm` cookie returned in the
-403 response is bound to the browser TLS fingerprint and does not grant access when replayed.
-
-`api.glassdoor.com` (the old public partner API) returned `410 Gone` — permanently shut down.
-
-**Use `goto_url()` + `wait()` exclusively. Never use `http_get` for Glassdoor.**
-
----
-
-## Do this first: open in a new tab, wait for CF to resolve
-
-```python
-new_tab("https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm")
-wait_for_load()
-wait(5)  # CF managed challenge runs for ~2-4s after readyState=complete
-```
-
-`wait(5)` is mandatory. CF's managed challenge executes JS fingerprinting probes after the DOM
-is ready. Extracting before this resolves returns an empty or partial page.
-
-Verify you are past the challenge before extracting:
-
-```python
-title = js("document.title")
-url = page_info()["url"]
-if "Security" in title or "__cf_chl_tk" in url:
-    # CF challenge did not resolve yet — wait longer
-    wait(5)
-    title = js("document.title")
-    assert "Security" not in title, "Still on CF block page"
-```
-
----
-
-## URL patterns
-
-| Goal | URL |
-|---|---|
-| Company reviews | `/Reviews/{Company-slug}-Reviews-E{employer_id}.htm` |
-| Company overview | `/Overview/Working-at-{Company-slug}-EI_IE{employer_id}.htm` |
-| Company jobs | `/Jobs/{Company-slug}-Jobs-E{employer_id}.htm` |
-| Keyword job search | `/Job/jobs.htm?sc.keyword={keyword}` |
-| Keyword + location | `/Job/jobs.htm?sc.keyword={keyword}&locT=C&locKeyword={city}` |
-| Remote jobs | `/Job/jobs.htm?sc.keyword={keyword}&remoteWorkType=1` |
-| Job search page 2+ | append `&p=2`, `&p=3` |
-| Salary page | `/Salaries/{role-slug}-salary-SRCH_KO0,{len}.htm` |
-
-Employer IDs and company slugs are stable. Example: Google = `EI_IE9079`, slug = `Google`.
-
-Find the employer ID from a search result URL or the company's Glassdoor page URL.
-
----
-
-## Workflow 1: Job search — extract result cards
-
-Glassdoor renders job cards client-side. Wait 5 seconds after load before extracting.
-
-```python
-import json
-from urllib.parse import quote_plus
-
-query = "software engineer"
-new_tab(f"https://www.glassdoor.com/Job/jobs.htm?sc.keyword={quote_plus(query)}")
-wait_for_load()
-wait(5)   # CF challenge + JS render
-
-# Dismiss cookie banner if present (GDPR regions)
-dismiss_cookie_banner()
-
-jobs = js("""
-(function() {
-  // Primary selector as of 2026-04
-  var cards = document.querySelectorAll('li[data-jobid]');
-  if (!cards.length) {
-    // Fallback: class-based (Next.js CSS modules use hashed suffixes — match prefix)
-    cards = document.querySelectorAll('[class*="JobsList_jobListItem"]');
-  }
-  var out = [];
-  for (var i = 0; i < cards.length; i++) {
-    var c = cards[i];
-    var jobId  = c.getAttribute('data-jobid') || '';
-    var titleEl = c.querySelector('[data-test="job-title"], a[class*="JobCard_jobTitle"]');
-    var compEl  = c.querySelector('[data-test="employer-name"], [class*="JobCard_employer"]');
-    var locEl   = c.querySelector('[data-test="emp-location"], [class*="JobCard_location"]');
-    var salEl   = c.querySelector('[data-test="detailSalary"], [class*="salary"], .salaryEstimate');
-    var ratingEl = c.querySelector('[data-test="rating"], [class*="ratingNumber"]');
-    var linkEl  = c.querySelector('a[href*="/job-listing/"], a[href*="glassdoor.com/job"]');
-    out.push({
-      jobId,
-      title:   titleEl  ? titleEl.innerText.trim()  : '',
-      company: compEl   ? compEl.innerText.trim()   : '',
-      location: locEl   ? locEl.innerText.trim()    : '',
-      salary:  salEl    ? salEl.innerText.trim()    : '',
-      rating:  ratingEl ? ratingEl.innerText.trim() : '',
-      url:     linkEl   ? linkEl.href               : '',
-    });
-  }
-  return JSON.stringify(out.filter(j => j.title));
-})()
-""")
-
-results = json.loads(jobs)
-for r in results:
-    print(r["title"], "|", r["company"], "|", r["location"])
-```
-
-**If `results` is empty:** take a screenshot and check which page you are on. Glassdoor often
-serves a different layout under A/B tests. The screenshot will reveal the actual card selector.
-
-```python
-capture_screenshot("/tmp/glassdoor_jobs.png")
-# Inspect the image, then adjust the querySelectorAll selector above
-```
-
----
-
-## Workflow 2: Job search pagination
-
-Glassdoor paginates via `&p=N` on the job search URL.
-
-```python
-import json
-from urllib.parse import quote_plus
-
-query = "data scientist"
-all_jobs = []
-
-for page in range(1, 4):   # pages 1-3, ~10 cards each
-    url = f"https://www.glassdoor.com/Job/jobs.htm?sc.keyword={quote_plus(query)}&p={page}"
-    goto_url(url)
-    wait_for_load()
-    wait(5 if page == 1 else 3)  # first page needs CF wait; subsequent pages are faster
-
-    if page == 1:
-        dismiss_cookie_banner()
-
-    batch_json = js("""
-    (function() {
-      var cards = document.querySelectorAll('li[data-jobid], [class*="JobsList_jobListItem"]');
-      var out = [];
-      for (var i = 0; i < cards.length; i++) {
-        var c = cards[i];
-        var jobId   = c.getAttribute('data-jobid') || '';
-        var titleEl = c.querySelector('[data-test="job-title"], a[class*="JobCard_jobTitle"]');
-        var compEl  = c.querySelector('[data-test="employer-name"]');
-        var locEl   = c.querySelector('[data-test="emp-location"]');
-        var salEl   = c.querySelector('[data-test="detailSalary"], [class*="salary"]');
-        var linkEl  = c.querySelector('a[href*="/job-listing/"]');
-        out.push({
-          jobId,
-          title:    titleEl ? titleEl.innerText.trim() : '',
-          company:  compEl  ? compEl.innerText.trim()  : '',
-          location: locEl   ? locEl.innerText.trim()   : '',
-          salary:   salEl   ? salEl.innerText.trim()   : '',
-          url:      linkEl  ? linkEl.href               : '',
-        });
-      }
-      return JSON.stringify(out.filter(j => j.title));
-    })()
-    """)
-
-    batch = json.loads(batch_json)
-    if not batch:
-        break   # no more results
-    all_jobs.extend(batch)
-
-print(f"Collected {len(all_jobs)} jobs across {page} pages")
-```
-
----
-
-## Workflow 3: Company overview — rating and review count
-
-Navigate to the company Overview or Reviews page. These pages require login for full content but the
-summary header (overall rating, review count, recommend %) is visible without login.
-
-```python
-import json, re
-
-# Example: Google (employer_id=9079)
-employer_id = 9079
-company_slug = "Google"
-
-goto_url(f"https://www.glassdoor.com/Overview/Working-at-{company_slug}-EI_IE{employer_id}.htm")
-wait_for_load()
-wait(5)   # CF challenge
-
-# Try __NEXT_DATA__ first — fastest and most complete
-next_data_raw = js("document.getElementById('__NEXT_DATA__') ? document.getElementById('__NEXT_DATA__').textContent : null")
-
-if next_data_raw:
-    nd = json.loads(next_data_raw)
-    # Company data lives under props.pageProps — path varies by page type
-    # Try employer overview path
-    props = nd.get("props", {}).get("pageProps", {})
-    employer = props.get("employer") or props.get("employerOverview")
-    if employer:
-        print("Rating:", employer.get("overallRating"))
-        print("Reviews:", employer.get("reviewCount") or employer.get("numberOfReviews"))
-        print("Name:", employer.get("name") or employer.get("shortName"))
-else:
-    # Fall back to DOM selectors
-    summary = js("""
-    (function() {
-      var ratingEl  = document.querySelector('[data-test="rating"], .ratingNumber, [class*="ratingNum"]');
-      var countEl   = document.querySelector('[data-test="reviewCount"], .reviewCount, [class*="reviewCount"]');
-      var nameEl    = document.querySelector('h1[data-test="employer-name"], [class*="EmployerProfile_name"]');
-      var recEl     = document.querySelector('[data-test="recommend"], [class*="recommend"]');
-      return JSON.stringify({
-        rating:  ratingEl ? ratingEl.innerText.trim() : '',
-        reviews: countEl  ? countEl.innerText.trim()  : '',
-        name:    nameEl   ? nameEl.innerText.trim()   : '',
-        recommend: recEl  ? recEl.innerText.trim()    : '',
-      });
-    })()
-    """)
-    print(json.loads(summary))
-```
-
----
-
-## Workflow 4: Company reviews page — extract individual reviews
-
-Reviews pages show up to ~10 reviews per page without login. A login modal appears after scrolling.
-Extract before scrolling.
-
-```python
-import json
-
-employer_id = 9079
-company_slug = "Google"
-
-goto_url(f"https://www.glassdoor.com/Reviews/{company_slug}-Reviews-E{employer_id}.htm")
-wait_for_load()
-wait(5)
-
-dismiss_cookie_banner()
-
-reviews = js("""
-(function() {
-  // Review cards — confirmed selector pattern
-  var cards = document.querySelectorAll('[id^="empReview_"], [data-test="review-card"], [class*="ReviewCard"]');
-  if (!cards.length) {
-    cards = document.querySelectorAll('article[class*="review"]');
-  }
-  var out = [];
-  for (var i = 0; i < cards.length; i++) {
-    var c = cards[i];
-
-    // Overall star rating (1-5)
-    var starsEl = c.querySelector('[data-test="review-rating"], [class*="starRating"], span[class*="ratingNumber"]');
-    var stars = starsEl ? starsEl.innerText.trim() : '';
-
-    // Pros / Cons text
-    var prosEl = c.querySelector('[data-test="pros"], [class*="pros"], p[class*="pros"]');
-    var consEl = c.querySelector('[data-test="cons"], [class*="cons"], p[class*="cons"]');
-    var pros = prosEl ? prosEl.innerText.trim() : '';
-    var cons = consEl ? consEl.innerText.trim() : '';
-
-    // Review title
-    var titleEl = c.querySelector('[data-test="review-title"], h2[class*="reviewTitle"], [class*="title"] a');
-    var title = titleEl ? titleEl.innerText.trim() : '';
-
-    // Job title of reviewer
-    var jobTitleEl = c.querySelector('[data-test="reviewer-job-title"], [class*="reviewerInfo"], [class*="authorJobTitle"]');
-    var jobTitle = jobTitleEl ? jobTitleEl.innerText.trim() : '';
-
-    // Date
-    var dateEl = c.querySelector('time, [data-test="review-date"], [class*="reviewDate"]');
-    var date = dateEl ? (dateEl.getAttribute('datetime') || dateEl.innerText.trim()) : '';
-
-    if (pros || cons || title) {
-      out.push({stars, title, jobTitle, pros, cons, date});
-    }
-  }
-  return JSON.stringify(out);
-})()
-""")
-
-results = json.loads(reviews)
-for r in results:
-    print(f"{r['stars']}★ | {r['title']} | {r['jobTitle']}")
-    print(f"  + {r['pros'][:100]}")
-    print(f"  - {r['cons'][:100]}")
-```
-
----
-
-## Workflow 5: Salary page — extract reported salary data
-
-```python
-import json
-from urllib.parse import quote_plus
-
-# Salary pages use slug + character-count in the URL (n = len(role_slug))
-role = "software-engineer"
-n = len(role)  # 17 for "software-engineer"
-
-goto_url(f"https://www.glassdoor.com/Salaries/{role}-salary-SRCH_KO0,{n}.htm")
-wait_for_load()
-wait(5)
-
-# Try __NEXT_DATA__ for structured salary data
-next_data_raw = js("document.getElementById('__NEXT_DATA__') ? document.getElementById('__NEXT_DATA__').textContent : null")
-
-if next_data_raw:
-    nd = json.loads(next_data_raw)
-    # Salary data is typically under props.pageProps.salaryData or .salaryEstimate
-    props = nd.get("props", {}).get("pageProps", {})
-    salary_data = props.get("salaryData") or props.get("payData")
-    if salary_data:
-        print(json.dumps(salary_data, indent=2))
-
-# DOM fallback
-salary_summary = js("""
-(function() {
-  var medianEl = document.querySelector('[data-test="salary-estimate"], [class*="salaryEstimate"], [class*="median"]');
-  var rangeEl  = document.querySelector('[data-test="salary-range"],  [class*="salaryRange"]');
-  var countEl  = document.querySelector('[data-test="salary-count"],  [class*="salaryCount"]');
-  return JSON.stringify({
-    median:  medianEl ? medianEl.innerText.trim() : '',
-    range:   rangeEl  ? rangeEl.innerText.trim()  : '',
-    count:   countEl  ? countEl.innerText.trim()  : '',
-  });
-})()
-""")
-print(json.loads(salary_summary))
-```
-
----
-
-## Handling the login modal
-
-Glassdoor shows a sign-in modal:
-- On Reviews/Salary pages: after viewing ~3-5 items (scroll-triggered)
-- On job detail pages: often immediately
-
-Dismiss it before extracting anything that requires scrolling:
-
-```python
-def dismiss_glassdoor_login_modal():
-    """Close the Glassdoor sign-in modal. Safe to call if no modal is present."""
-    closed = js("""
-    (function() {
-      var selectors = [
-        '[alt="Close"]',
-        'button[class*="modal_closeIcon"]',
-        '[data-test="close-modal"]',
-        '[aria-label="Close"]',
-        'button[data-test="CloseButton"]',
-        '[class*="CloseButton"]',
-      ];
-      for (var i = 0; i < selectors.length; i++) {
-        var btn = document.querySelector(selectors[i]);
-        if (btn && btn.offsetParent !== null) {
-          btn.click();
-          return selectors[i];
-        }
-      }
-      return null;
-    })()
-    """)
-    if closed:
-        wait(1)
-    return closed
-
-def dismiss_cookie_banner():
-    """Dismiss GDPR consent overlay. Safe to call even if no banner is present."""
-    dismissed = js("""
-    (function() {
-      var selectors = [
-        'button[data-test="accept-cookies"]',
-        '#onetrust-accept-btn-handler',
-        'button[id*="accept-all"]',
-        'button[class*="accept"]',
-        'button[class*="consent"]',
-      ];
-      for (var i = 0; i < selectors.length; i++) {
-        var btn = document.querySelector(selectors[i]);
-        if (btn && btn.offsetParent !== null) {
-          btn.click();
-          return selectors[i];
-        }
-      }
-      return null;
-    })()
-    """)
-    if dismissed:
-        wait(1)
-    return dismissed
-```
-
-For Reviews/Salary pages: call `dismiss_glassdoor_login_modal()` immediately after the initial
-wait, before any scrolling. Once you scroll down, the modal blocks the page and the X button
-may itself be outside the viewport.
-
----
-
-## Detecting whether you are past the CF challenge
-
-After `goto_url()` + `wait(5)`, confirm you are on the real page:
-
-```python
-def glassdoor_is_cf_blocked() -> bool:
-    """True if the CF managed challenge is still running."""
-    title = js("document.title") or ""
-    url   = page_info()["url"]
-    return "Security" in title or "__cf_chl_tk" in url
-
-# Usage
-goto_url("https://www.glassdoor.com/Reviews/Google-Reviews-E9079.htm")
-wait_for_load()
-wait(5)
-
-if glassdoor_is_cf_blocked():
-    wait(10)   # give CF extra time
-    if glassdoor_is_cf_blocked():
-        capture_screenshot("/tmp/glassdoor_cf_block.png")
-        raise RuntimeError("CF challenge did not resolve — check screenshot")
-```
-
----
-
-## Glassdoor company ID lookup
-
-Glassdoor uses numeric employer IDs (e.g., Google = 9079, Apple = 1138, Meta = 40772).
-To find the ID for any company:
-
-```python
-from urllib.parse import quote_plus
-
-company_name = "OpenAI"
-goto_url(f"https://www.glassdoor.com/Search/results.htm?keyword={quote_plus(company_name)}&locT=N")
-wait_for_load()
-wait(5)
-
-# Extract company cards from search results
-companies = js("""
-(function() {
-  var cards = document.querySelectorAll('[data-test="employer-card"], [class*="EmployerCard"], [class*="employer-card"]');
-  var out = [];
-  for (var i = 0; i < cards.length; i++) {
-    var c = cards[i];
-    var link = c.querySelector('a[href*="Overview"], a[href*="Reviews"]');
-    if (!link) continue;
-    var href = link.href;
-    // Extract employer ID: EI_IE{id} or E{id}
-    var m = href.match(/E(?:I_IE)?(\d+)/);
-    var empId = m ? m[1] : '';
-    var nameEl = c.querySelector('[class*="EmployerCard_name"], h2, [class*="name"]');
-    out.push({
-      empId,
-      name: nameEl ? nameEl.innerText.trim() : '',
-      href,
-    });
-  }
-  return JSON.stringify(out);
-})()
-""")
-
-import json
-for c in json.loads(companies):
-    print(c["empId"], c["name"], c["href"][:60])
-```
-
----
-
-## Gotchas
-
-- **`http_get` is permanently blocked.** Cloudflare Bot Management blocks every IP-level request
-  with a JS managed challenge. No User-Agent, cookie, or header combination bypasses it. The
-  `__cf_bm` cookie returned in the 403 response is TLS-fingerprint-bound and cannot be replayed.
-  `api.glassdoor.com` is 410 Gone (shut down). Only real Chrome via CDP works.
-
-- **`wait(5)` minimum after `wait_for_load()`.** CF's managed challenge runs for 2-4 seconds after
-  `readyState = complete`. Extracting too early returns the challenge page HTML, not Glassdoor
-  content. If you get empty results or the title is "Security | Glassdoor", wait longer.
-
-- **Login modal triggers on scroll, not on load.** Extract all visible content immediately on page
-  load before any scrolling. Call `dismiss_glassdoor_login_modal()` right after the initial wait —
-  before issuing any `scroll()` calls.
-
-- **Glassdoor shows ~10 cards without login.** Reviews and salary pages are severely limited
-  without an account. Job search cards are more accessible (~10-15 per page). If you need 30+
-  reviews, a logged-in session is required.
-
-- **CSS class names use Next.js hashed suffixes.** Selectors like `[class*="JobCard_jobTitle"]`
-  match despite the hash suffix (e.g., `JobCard_jobTitle__abc12`). Never hardcode the full hashed
-  class name — it changes with deployments. Always use `[class*="prefix"]`.
-
-- **`__NEXT_DATA__` is the fast path.** When accessible, Glassdoor's Next.js pages embed all page
-  data in `<script id="__NEXT_DATA__" type="application/json">`. Parse it before falling back to
-  DOM queries. Data path varies by page type: look under `props.pageProps.employer`,
-  `props.pageProps.salaryData`, `props.pageProps.jobListings`, etc.
-
-- **Company URL slugs and IDs are stable.** The employer ID (e.g., `9079` for Google) never
-  changes. Slugs occasionally change when a company rebrands — always verify by following the
-  canonical redirect from a search result.
-
-- **Rate limiting.** Glassdoor rate-limits by IP after ~5 company-page loads per minute.
-  Use `wait(5)` between consecutive company page navigations. Salary and reviews pages are heavier
-  — use `wait(8)` between those.
-
-- **Salary URL requires character-count parameter.** The `SRCH_KO0,{n}` fragment encodes
-  `0` (start of role name) and `n` (end, i.e., `len(role_slug)`). For `"software-engineer"` (17
-  chars): `SRCH_KO0,17`. Wrong count returns a 404.
-
-- **`locKeyword` vs `locId` for location filter.** `locKeyword=San+Francisco` works without
-  knowing Glassdoor's internal city ID. `locT=C` means city-type location. For metro areas,
-  also try `locT=M`. Omit `locId` unless you have the exact numeric ID from a Glassdoor URL.
-
-- **PerimeterX is also active as a secondary layer.** After passing CF, Glassdoor runs behavioral
-  fingerprinting. Rapid automated scrolling, mouse movement, or navigation patterns may trigger a
-  secondary block. Mitigate with `wait(2)` between actions and avoid scripted mouse movement.
-
-- **Review and salary data require login on some accounts.** Anonymous sessions get a subset of
-  data. If a field returns empty consistently, the page may require authentication before surfacing
-  that data in the DOM or `__NEXT_DATA__`.
-
-- **`goto_url()` vs `new_tab()` for first navigation.** Use `new_tab()` for the very first Glassdoor
-  page in a session. If the harness is attached to a non-Glassdoor tab, `goto_url()` can silently
-  fail to pass the CF challenge because the existing tab may not have a clean origin context.
-  After the first successful load, `goto_url()` works fine for subsequent Glassdoor navigations.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/gmail/compose.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/gmail/compose.md
deleted file mode 100644
index 921a94fa2..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/gmail/compose.md
+++ /dev/null
@@ -1,122 +0,0 @@
-# Gmail — Compose and send
-
-URL: `https://mail.google.com`
-
-## Prerequisites
-
-- Logged into Gmail in the attached Chrome profile.
-- Keyboard shortcuts enabled (Gmail default for most accounts).
-
-## Open compose
-
-```python
-press_key("c")   # Gmail shortcut — opens a new compose dialog with the "To" field focused
-wait(1)
-```
-
-## Multiple compose dialogs stack — pick the visible one
-
-Gmail keeps minimized drafts as dialogs at the bottom of the page. `document.querySelectorAll('div[role="dialog"]')` returns **all** of them (minimized *and* open). The minimized ones have small bounding rects (~`h ≤ 40`) and their inner inputs report `offsetParent === null`.
-
-Always pick the visible dialog by size, not index:
-
-```python
-idx = js("""(() => {
-  const ds = [...document.querySelectorAll('div[role="dialog"]')];
-  return ds.findIndex(d => d.getBoundingClientRect().height > 200);
-})()""")
-```
-
-…and scope every subsequent query to `dialogs[idx]`. Using index 1 blindly works *sometimes* but breaks the moment the user has a second minimized draft already sitting at the bottom.
-
-## Trap: Tab inserts a literal `\t` into the "To" field
-
-After `press_key("c")`, focus is on `[aria-label="To recipients"]`. `press_key("Tab")` does **not** advance focus — it inserts a tab character into the input. Confirmed by reading back `value` and finding `"\t"`.
-
-Either click the next field directly, or commit the recipient as a chip first (e.g. by typing a valid address; Gmail chips it automatically once the input loses focus or you type a separator).
-
-The recipient does become a chip once you click away. Read chips from `[role="dialog"] [data-hovercard-id]` — **not** from the input's `value`.
-
-## Fill the fields
-
-```python
-# After press_key("c"), "To" is focused
-type_text("someone@example.com")
-
-# Don't Tab — click subject directly
-sub = js("""(() => {
-  const d = [...document.querySelectorAll('div[role="dialog"]')].find(d => d.getBoundingClientRect().height > 200);
-  const s = d.querySelector('input[name="subjectbox"]');
-  const r = s.getBoundingClientRect();
-  return {x: r.x + r.width/2, y: r.y + r.height/2};
-})()""")
-click(sub["x"], sub["y"])
-type_text("Subject here")
-
-body = js("""(() => {
-  const d = [...document.querySelectorAll('div[role="dialog"]')].find(d => d.getBoundingClientRect().height > 200);
-  const b = d.querySelector('div[aria-label="Message Body"], div[role="textbox"]');
-  const r = b.getBoundingClientRect();
-  return {x: r.x + 40, y: r.y + 30};
-})()""")
-click(body["x"], body["y"])
-type_text("Body text goes here.")
-```
-
-## Attachments — use `DOM.setFileInputFiles` on the *visible* compose's input
-
-The paperclip button opens a native file picker that browser-harness can't drive. Instead, set files directly on Gmail's hidden file input.
-
-**Gotcha:** there is one `input[type="file"][name="Filedata"]` per compose dialog. If you use `upload_file('input[type="file"][name="Filedata"]', ...)`, the default `DOM.querySelector` returns the *first* match — usually belongs to a stale/minimized compose, and Gmail ignores it. Always target the input scoped to the **visible** compose:
-
-```python
-doc = cdp("DOM.getDocument", depth=-1)
-ids = cdp("DOM.querySelectorAll", nodeId=doc["root"]["nodeId"],
-          selector='input[type="file"][name="Filedata"]')["nodeIds"]
-# Pick the one whose ancestor dialog has height > 200
-# (quickest: the last one is usually the newest compose)
-cdp("DOM.setFileInputFiles", files=["/abs/path.png"], nodeId=ids[-1])
-wait(3)
-```
-
-After upload, `input.files` reads back as empty — Gmail consumes the FileList immediately. Don't treat that as failure. Instead, verify by screenshot or by searching the compose for the filename chip:
-
-```python
-ok = js("""(() => {
-  const d = [...document.querySelectorAll('div[role="dialog"]')].find(d => d.getBoundingClientRect().height > 200);
-  return [...d.querySelectorAll('*')].some(e => /\\.\\w+ \\(\\d+[KMG]?\\)/.test(e.textContent || ''));
-})()""")
-```
-
-The attachment chip format is `filename.ext (61K)` — size appears only once Gmail has finished ingesting the file.
-
-## Send
-
-```python
-send = js("""(() => {
-  const d = [...document.querySelectorAll('div[role="dialog"]')].find(d => d.getBoundingClientRect().height > 200);
-  const b = [...d.querySelectorAll('[role="button"]')].find(b => (b.getAttribute('aria-label')||'').startsWith('Send'));
-  const r = b.getBoundingClientRect();
-  return {x: r.x + r.width/2, y: r.y + r.height/2};
-})()""")
-click(send["x"], send["y"])
-wait(2)
-```
-
-Verify by looking for the "Message sent" toast at the bottom-left, or by checking that the visible compose dialog's height has collapsed. `⌘+Enter` also sends but requires keyboard-shortcut support in the current account.
-
-## Stable selectors
-
-- To field: `[aria-label="To recipients"]`
-- Subject: `input[name="subjectbox"]`
-- Body: `div[aria-label="Message Body"]` (also matches `div[role="textbox"]` inside the dialog)
-- Send button: `[role="dialog"] [role="button"][aria-label^="Send"]`
-- Attach file input: `input[type="file"][name="Filedata"]` (one per dialog)
-- Recipient chip: `[data-hovercard-id]` inside the dialog
-
-## Traps
-
-- Tab in the "To" field inserts `\t` — never Tab between fields, click them.
-- `input.files` is cleared by Gmail after `setFileInputFiles` — don't use it as a success check.
-- The first match of `input[type="file"]` can belong to a stale/minimized compose; pick by dialog, not by index.
-- `press_key("c")` only works if keyboard shortcuts are enabled in the account. If it no-ops, fall back to clicking the left-rail Compose pencil.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/goodreads/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/goodreads/scraping.md
deleted file mode 100644
index 6b92d3c11..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/goodreads/scraping.md
+++ /dev/null
@@ -1,461 +0,0 @@
-# Goodreads — Book Data Extraction
-
-Field-tested against goodreads.com on 2026-04-18 via `http_get` (no browser required).
-All five URL types return full HTML with no bot-wall, CAPTCHA, or login gate.
-
-## Access Summary
-
-| Page type          | `http_get` works? | Data format              |
-|--------------------|-------------------|--------------------------|
-| Book show page     | Yes               | `__NEXT_DATA__` + JSON-LD |
-| Search results     | Yes               | Server-rendered HTML (schema.org microdata) |
-| Author show page   | Yes               | Server-rendered HTML + OG meta |
-| Listopia list page | Yes               | Server-rendered HTML (schema.org microdata) |
-
-Goodreads shut down its public API in 2020. All extraction is HTML-based.
-Open Library is a reliable supplement with a free JSON API (see [Open Library fallback](#open-library-api-fallback)).
-
----
-
-## Book Page — Full Data (`__NEXT_DATA__`)
-
-URL pattern: `https://www.goodreads.com/book/show/{book_id}` or `/{book_id}.{Slug}`
-
-The slug is optional — numeric ID alone works and redirects cleanly.
-
-```python
-import re, json
-from helpers import http_get
-
-def parse_book(book_id):
-    html = http_get(f"https://www.goodreads.com/book/show/{book_id}")
-
-    # Parse Apollo state from Next.js page
-    nd = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-    ap = json.loads(nd.group(1))['props']['pageProps']['apolloState']
-
-    # The primary Book entity matches the URL's legacy ID
-    book = next(v for v in ap.values()
-                if v.get('__typename') == 'Book' and v.get('legacyId') == int(book_id))
-    work = next((v for v in ap.values() if v.get('__typename') == 'Work'), {})
-    author_ref = book['primaryContributorEdge']['node']['__ref']
-    author = ap.get(author_ref, {})
-
-    stats = work.get('stats', {})
-    work_details = work.get('details', {})
-    book_details = book.get('details', {})
-
-    return {
-        'title':            book['title'],
-        'title_complete':   book['titleComplete'],
-        'book_id':          book['legacyId'],
-        'url':              book['webUrl'],
-        'cover_url':        book['imageUrl'],
-        # Strip HTML tags from description
-        'description':      re.sub(r'<[^>]+>', '', book.get('description({"stripped":true})',
-                                   book.get('description', ''))).strip(),
-        'genres':           [g['genre']['name'] for g in book.get('bookGenres', [])],
-        'series':           [{'name': s['series']['title'], 'position': s.get('userPosition')}
-                             for s in book.get('bookSeries', [])],
-        # Author
-        'author_name':      author.get('name'),
-        'author_url':       author.get('webUrl'),
-        # Edition details
-        'format':           book_details.get('format'),
-        'num_pages':        book_details.get('numPages'),
-        'publisher':        book_details.get('publisher'),
-        'language':         (book_details.get('language') or {}).get('name'),
-        'isbn':             book_details.get('isbn'),
-        'isbn13':           book_details.get('isbn13'),
-        'pub_timestamp_ms': book_details.get('publicationTime'),
-        # Ratings (from Work, not Book)
-        'avg_rating':       stats.get('averageRating'),
-        'ratings_count':    stats.get('ratingsCount'),
-        'text_reviews':     stats.get('textReviewsCount'),
-        # ratings_dist is list of counts for [1-star, 2-star, 3-star, 4-star, 5-star]
-        'ratings_dist':     stats.get('ratingsCountDist'),
-        # Awards
-        'awards':           [a['name'] + (' — ' + a['category'] if a.get('category') else '')
-                             for a in work_details.get('awardsWon', [])],
-    }
-
-# Example
-book = parse_book(149267)  # The Stand by Stephen King
-# book['title']        => "The Stand"
-# book['avg_rating']   => 4.35
-# book['ratings_count']=> 845591
-# book['genres']       => ["Horror", "Fiction", "Fantasy", ...]
-# book['awards']       => ["Locus Award — Best SF Novel", ...]
-```
-
-**Field notes:**
-- `book['legacyId']` is the integer in the URL (e.g. `149267`). Use it to match the correct entity — the `apolloState` often contains 2-3 Book entries for different editions.
-- Ratings and awards live in the `Work` entity, not `Book`. The `Work` is always `__typename == 'Work'`.
-- `description` comes in two forms: `description` (HTML) and `description({"stripped":true})` (plain text). Prefer the stripped version.
-- `pub_timestamp_ms` is a Unix timestamp in **milliseconds**. Convert: `datetime.fromtimestamp(ts/1000)`.
-- `isbn` / `isbn13` are often `null` on older editions — the JSON-LD path (below) is no more reliable.
-
----
-
-## Book Page — Fast Path (JSON-LD)
-
-Use when you only need title, author, rating, page count, and awards. ~3× less parsing code.
-
-```python
-import re, json
-from helpers import http_get
-
-def parse_book_fast(book_id):
-    html = http_get(f"https://www.goodreads.com/book/show/{book_id}")
-    blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
-    if not blocks:
-        return None
-    ld = json.loads(blocks[0])
-    return {
-        'title':        ld.get('name'),
-        'author':       ld['author'][0]['name'] if ld.get('author') else None,
-        'avg_rating':   ld.get('aggregateRating', {}).get('ratingValue'),
-        'ratings_count':ld.get('aggregateRating', {}).get('ratingCount'),
-        'review_count': ld.get('aggregateRating', {}).get('reviewCount'),
-        'num_pages':    ld.get('numberOfPages'),
-        'isbn':         ld.get('isbn'),
-        'cover_url':    ld.get('image'),
-        'awards':       ld.get('awards'),   # single string, comma-separated
-        'format':       ld.get('bookFormat'),
-    }
-
-book = parse_book_fast(149267)
-# book['avg_rating']   => 4.35
-# book['ratings_count']=> 845591
-```
-
-**JSON-LD does NOT include:** description, genres, series membership, per-star rating distribution, publisher, language.
-Use `parse_book()` (the `__NEXT_DATA__` path) when you need any of those.
-
----
-
-## Search Results
-
-URL: `https://www.goodreads.com/search?q={query}&search_type=books&page={n}`
-
-Search uses server-rendered HTML with schema.org microdata `<tr>` rows. No `__NEXT_DATA__`.
-
-```python
-import re, json
-from helpers import http_get
-
-def search_books(query, page=1):
-    from urllib.parse import quote_plus
-    url = f"https://www.goodreads.com/search?q={quote_plus(query)}&search_type=books&page={page}"
-    html = http_get(url)
-
-    rows = re.findall(
-        r'<tr itemscope itemtype="http://schema.org/Book">(.*?)</tr>',
-        html, re.DOTALL
-    )
-
-    results = []
-    for row in rows:
-        bid    = re.search(r'<div id="(\d+)" class="u-anchorTarget">', row)
-        title  = re.search(r"itemprop='name'[^>]*>([^<]+)</span>", row)
-        author = re.search(r'class="authorName"[^>]*><span[^>]*>([^<]+)</span>', row)
-        avg    = re.search(r'(\d+\.\d+)\s*avg rating', row)
-        cnt    = re.search(r'(\d[\d,]*)\s*rating', row)
-        cover  = re.search(r'img alt="[^"]*" class="bookCover"[^>]*src="([^"]+)"', row)
-        if not (bid and title):
-            continue
-        results.append({
-            'book_id':      bid.group(1),
-            'title':        title.group(1).strip(),
-            'author':       author.group(1).strip() if author else None,
-            'avg_rating':   float(avg.group(1)) if avg else None,
-            'ratings_count':cnt.group(1).replace(',', '') if cnt else None,
-            'cover_url':    cover.group(1) if cover else None,
-            'url':          f"https://www.goodreads.com/book/show/{bid.group(1)}",
-        })
-
-    total_m = re.search(r'([\d,]+)\s+results', html)
-    total   = int(total_m.group(1).replace(',', '')) if total_m else None
-
-    return {'total': total, 'page': page, 'results': results}
-
-# Example
-r = search_books("dune")
-# r['total']   => 101026
-# r['results'] => [{'book_id':'44767458', 'title':'Dune (Dune, #1)', 'avg_rating':4.29, ...}, ...]
-```
-
-**Field notes:**
-- Returns exactly 20 results per page.
-- `total` is the result count shown in `"N results for…"` header.
-- The `avg rating` regex uses `&mdash;` (HTML entity) in the raw HTML — the pattern above matches the decoded text.
-- `ratings_count` regex hits the first occurrence of `\d+ rating` in the row, which is always the book's count (not a user review count).
-- `cover_url` is a 75px thumbnail (`._SY75_.jpg`). Swap `_SY75_` → `_SX315_` for a larger image.
-
----
-
-## Author Page
-
-URL: `https://www.goodreads.com/author/show/{author_id}.{Slug}`
-
-Author pages are **not** Next.js — they use classic server-rendered HTML with OG meta tags and microdata.
-The author ID and slug can be obtained from a book's `author_url` field.
-
-```python
-import re, json
-from helpers import http_get
-
-def parse_author(author_id_and_slug):
-    # author_id_and_slug e.g. "58.Frank_Patrick_Herbert"
-    html = http_get(f"https://www.goodreads.com/author/show/{author_id_and_slug}")
-
-    # Name and basic info from OG/meta tags
-    name    = re.search(r"<meta content='([^']+)' property='og:title'>", html)
-    img     = re.search(r"<meta content='([^']+)' property='og:image'>", html)
-    website = re.search(r"Website\s*</div>\s*<div[^>]*>\s*<a[^>]*href=\"([^\"]+)\"", html)
-
-    # Full biography from hidden span (shown/hidden by "...more" toggle in browser)
-    bio_span = re.search(
-        r'<span id="freeText(?:author|long)\d+"[^>]*>(.*?)</span>',
-        html, re.DOTALL
-    )
-    bio = re.sub(r'<[^>]+>', '', bio_span.group(1)).strip() if bio_span else None
-
-    # Top books listed on the page (10 rows, same microdata format as search)
-    rows = re.findall(
-        r'<tr itemscope itemtype="http://schema.org/Book">(.*?)</tr>',
-        html, re.DOTALL
-    )
-    books = []
-    for row in rows:
-        bid   = re.search(r'<div id="(\d+)" class="u-anchorTarget">', row)
-        title = re.search(r"itemprop='name'[^>]*>([^<]+)</span>", row)
-        avg   = re.search(r'(\d+\.\d+)\s*avg rating', row)
-        cnt   = re.search(r'(\d[\d,]*)\s*rating', row)
-        if bid and title:
-            books.append({
-                'book_id':      bid.group(1),
-                'title':        title.group(1).strip(),
-                'avg_rating':   float(avg.group(1)) if avg else None,
-                'ratings_count':cnt.group(1).replace(',', '') if cnt else None,
-                'url':          f"https://www.goodreads.com/book/show/{bid.group(1)}",
-            })
-
-    return {
-        'name':         name.group(1) if name else None,
-        'profile_image':img.group(1) if img else None,
-        'bio':          bio,
-        'website':      website.group(1) if website else None,
-        'top_books':    books,
-    }
-
-# Example
-author = parse_author("58.Frank_Patrick_Herbert")
-# author['name']    => "Frank Patrick Herbert"
-# author['bio']     => "Franklin Patrick Herbert Jr. was an American science fiction..."
-# len(author['top_books']) => 10
-```
-
-**Field notes:**
-- Author IDs can be found in a book's `author_url` (from `__NEXT_DATA__` or JSON-LD).
-- The slug is optional in the URL — numeric ID alone redirects correctly.
-- `profile_image` from OG tag is a large portrait (p8 suffix = 800px). Swap to `p5` for 500px.
-- The bio is server-rendered in a `<span id="freeTextauthor{ID}">` or `<span id="freeTextlong{ID}">` — which variant appears depends on length.
-- Follower count is **not** present in the static HTML — it requires JS execution to appear.
-- Page lists exactly 10 books. To get all books, paginate `/author/list/{author_id}?page=N`.
-
----
-
-## Listopia List Page
-
-URL: `https://www.goodreads.com/list/show/{list_id}.{Slug}?page={n}`
-
-Returns 100 books per page with rank numbers.
-
-```python
-import re, json
-from helpers import http_get
-
-def parse_list(list_id_and_slug, page=1):
-    url = f"https://www.goodreads.com/list/show/{list_id_and_slug}?page={page}"
-    html = http_get(url)
-
-    rows = re.findall(
-        r'<tr itemscope itemtype="http://schema.org/Book">(.*?)</tr>',
-        html, re.DOTALL
-    )
-
-    results = []
-    for row in rows:
-        rank   = re.search(r'<td[^>]*class="number"[^>]*>(\d+)</td>', row)
-        bid    = re.search(r'<div id="(\d+)" class="u-anchorTarget">', row)
-        title  = re.search(r"itemprop='name'[^>]*>([^<]+)</span>", row)
-        author = re.search(r'class="authorName"[^>]*><span[^>]*>([^<]+)</span>', row)
-        avg    = re.search(r'(\d+\.\d+)\s*avg rating', row)
-        cnt    = re.search(r'(\d[\d,]*)\s*rating', row)
-        if not (bid and title):
-            continue
-        results.append({
-            'rank':         int(rank.group(1)) if rank else None,
-            'book_id':      bid.group(1),
-            'title':        title.group(1).strip(),
-            'author':       author.group(1).strip() if author else None,
-            'avg_rating':   float(avg.group(1)) if avg else None,
-            'ratings_count':cnt.group(1).replace(',', '') if cnt else None,
-            'url':          f"https://www.goodreads.com/book/show/{bid.group(1)}",
-        })
-
-    return {'page': page, 'results': results}
-
-# Example
-lst = parse_list("1.Best_Books_Ever")
-# lst['results'][0] => {'rank': 1, 'book_id': '2767052',
-#                       'title': 'The Hunger Games (The Hunger Games, #1)',
-#                       'author': 'Suzanne Collins', 'avg_rating': 4.35, ...}
-```
-
-**Field notes:**
-- 100 rows per page. Ranks are sequential across pages (page 2 starts at rank 101).
-- Paginate with `?page=2`, `?page=3` etc.
-- List pages do not use `__NEXT_DATA__` — same classic HTML format as author pages.
-
----
-
-## Open Library API Fallback
-
-Use Open Library when you need structured JSON without HTML parsing, or when you want supplementary data (birth/death dates, ISBNs across editions, subjects).
-
-Open Library's ratings are from its own user base (~400 ratings vs. Goodreads' 800k+ for Dune) — use Goodreads ratings when accuracy matters.
-
-### Search
-
-```python
-import json
-from urllib.parse import quote_plus
-from helpers import http_get
-
-def ol_search(query, limit=10):
-    url = f"https://openlibrary.org/search.json?q={quote_plus(query)}&limit={limit}"
-    data = json.loads(http_get(url))
-    results = []
-    for doc in data.get('docs', []):
-        cover_id = doc.get('cover_i')
-        results.append({
-            'ol_key':           doc['key'],           # e.g. "/works/OL893415W"
-            'title':            doc.get('title'),
-            'author':           (doc.get('author_name') or [''])[0],
-            'author_key':       (doc.get('author_key') or [''])[0],
-            'first_pub_year':   doc.get('first_publish_year'),
-            'edition_count':    doc.get('edition_count'),
-            'series':           doc.get('series_name'),
-            'cover_url':        f"https://covers.openlibrary.org/b/id/{cover_id}-M.jpg" if cover_id else None,
-        })
-    return {'total': data.get('numFound'), 'results': results}
-
-r = ol_search("dune frank herbert", limit=5)
-# r['results'][0]['ol_key']  => "/works/OL893415W"
-# r['results'][0]['title']   => "Dune"
-```
-
-### Work (book details)
-
-```python
-def ol_work(ol_key):
-    # ol_key like "/works/OL893415W" or just "OL893415W"
-    key = ol_key if ol_key.startswith('/') else f'/works/{ol_key}'
-    data = json.loads(http_get(f"https://openlibrary.org{key}.json"))
-    desc = data.get('description', '')
-    if isinstance(desc, dict):
-        desc = desc.get('value', '')
-    return {
-        'title':    data.get('title'),
-        'subjects': data.get('subjects', []),
-        'series':   data.get('series', []),
-        'description': desc,
-        'covers':   data.get('covers', []),
-        'links':    data.get('links', []),
-    }
-
-work = ol_work("OL893415W")
-# work['title']    => "Dune"
-# work['subjects'] => ["Dune (Imaginary place)", "Fiction", ...]
-```
-
-### Ratings for a work
-
-```python
-def ol_ratings(ol_key):
-    key = ol_key if ol_key.startswith('/') else f'/works/{ol_key}'
-    data = json.loads(http_get(f"https://openlibrary.org{key}/ratings.json"))
-    return data.get('summary', {})
-
-# {'average': 4.30, 'count': 414, 'sortable': 4.21}
-```
-
-### Author
-
-```python
-def ol_author(author_key):
-    # author_key like "OL79034A"
-    data = json.loads(http_get(f"https://openlibrary.org/authors/{author_key}.json"))
-    bio = data.get('bio', '')
-    if isinstance(bio, dict):
-        bio = bio.get('value', '')
-    return {
-        'name':        data.get('name'),
-        'birth_date':  data.get('birth_date'),
-        'death_date':  data.get('death_date'),
-        'bio':         bio,
-        'ol_key':      data.get('key'),
-    }
-
-author = ol_author("OL79034A")
-# author['name']       => "Frank Herbert"
-# author['birth_date'] => "8 October 1920"
-# author['death_date'] => "11 February 1986"
-```
-
----
-
-## Combining Goodreads + Open Library
-
-```python
-# Get full book data: Goodreads for ratings/genres/description, OL for ISBNs/edition details
-def get_book_full(goodreads_book_id, ol_work_key=None):
-    gr = parse_book(goodreads_book_id)
-    result = dict(gr)
-    if ol_work_key:
-        ol = ol_work(ol_work_key)
-        result['ol_subjects']    = ol['subjects']
-        result['ol_description'] = ol['description']
-        result['ol_covers']      = ol['covers']
-    return result
-```
-
----
-
-## Gotchas
-
-- **Goodreads API is gone**: The official API was shut down in December 2020. All data must come from HTML scraping or the unofficial paths documented here.
-
-- **Book ID 5107 redirects**: The URL `goodreads.com/book/show/5107.The_Stand` actually resolves to *The Catcher in the Rye* (ID 5107). The Stand is ID `149267`. Always verify `book['legacyId']` matches the URL ID.
-
-- **Author page ID mismatch**: Author ID `10538` in the URL resolves to Carl Sagan, not Frank Herbert (ID `58`). Always obtain author IDs from the `author_url` field inside a book's data rather than guessing.
-
-- **Two Book entities in `apolloState`**: The `apolloState` contains multiple `Book:` entries — one is a stub (only has `legacyId` and `webUrl`), and one is full. Filter by `legacyId == int(book_id)` AND check that the entry has more than 3 fields.
-
-- **Ratings are on `Work`, not `Book`**: `avg_rating`, `ratingsCount`, and `ratingsCountDist` are in the `Work` entity's `stats` key. The `Book` entity has no rating fields.
-
-- **Author pages are old-style HTML**: Author pages (`/author/show/`) do not use Next.js or `__NEXT_DATA__`. Use OG meta tags and regex for extraction. The follower count only loads via JS — it will be missing from `http_get` responses.
-
-- **Search has no `__NEXT_DATA__`**: Search result pages (`/search`) are classic server-rendered HTML. JSON-LD is absent. Use the `<tr itemscope itemtype="http://schema.org/Book">` microdata rows.
-
-- **`ratings_count` regex order matters**: The pattern `r'(\d[\d,]*)\s*rating'` always matches the book's aggregate rating count first in each search row — this is reliable. Do not use `minirating` span text as it contains nested HTML.
-
-- **Open Library cover URLs return binary JPEG**: `http_get()` will raise a `UnicodeDecodeError` on cover image URLs. Use `urllib.request.urlopen()` directly and read bytes, or just store the URL string without fetching.
-
-- **Open Library ratings are sparse**: OL has ~400 community ratings for Dune vs. Goodreads' 1.6M. Use OL ratings only as a last resort.
-
-- **Search page `&mdash;` entity**: The raw HTML uses `&mdash;` (not `—`) between rating value and count in search and author pages. The regex patterns above match the decoded text because Python's `re` operates on the decoded string after `http_get()` decodes UTF-8.
-
-- **Book slug is optional**: `goodreads.com/book/show/44767458` (no slug) works identically to `goodreads.com/book/show/44767458-dune`. Redirects are transparent.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/gutenberg/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/gutenberg/scraping.md
deleted file mode 100644
index 8a4800e51..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/gutenberg/scraping.md
+++ /dev/null
@@ -1,383 +0,0 @@
-# Project Gutenberg — Scraping & Data Extraction
-
-`https://www.gutenberg.org` — 78 000+ free public-domain ebooks. Every workflow here is pure `http_get` — no browser needed.
-
-## Do this first
-
-**Use the Gutendex REST API (`gutendex.com`) for all search and discovery. It is one call, returns clean JSON, and requires no auth. Go to gutenberg.org URLs only to fetch actual file content.**
-
-```python
-import json
-
-# Search by title/author keyword
-data = json.loads(http_get("https://gutendex.com/books/?search=pride+and+prejudice"))
-# data['count'] = 6 (total matches)
-# data['results'] = list of up to 32 book objects
-book = data['results'][0]
-# book['id'] = 1342  ← use this ID for all further calls
-# book['formats']['text/plain; charset=utf-8'] = direct txt URL
-
-# Fetch the plain-text content of that book
-text = http_get(book['formats']['text/plain; charset=utf-8'])
-# Returns 763 083 chars including Project Gutenberg header/footer boilerplate
-```
-
-For a known book ID, skip search entirely:
-
-```python
-book = json.loads(http_get("https://gutendex.com/books/1342/"))
-```
-
-## Common workflows
-
-### Search by keyword and get the first result
-
-```python
-import json
-
-data = json.loads(http_get("https://gutendex.com/books/?search=frankenstein"))
-if data['results']:
-    b = data['results'][0]
-    print(b['id'], b['title'], b['authors'][0]['name'])
-    # 84  Frankenstein; or, the modern prometheus  Shelley, Mary Wollstonecraft
-    txt_url = b['formats'].get('text/plain; charset=utf-8')
-    if txt_url:
-        text = http_get(txt_url)
-```
-
-### Get the most downloaded books (popularity ranking)
-
-```python
-import json
-
-data = json.loads(http_get("https://gutendex.com/books/?sort=popular"))
-for b in data['results'][:10]:
-    authors = ', '.join(a['name'] for a in b['authors'])
-    print(f"[{b['id']}] {b['title']} — {authors} ({b['download_count']:,} downloads)")
-# [84]    Frankenstein                      — Shelley, Mary Wollstonecraft  (178,271)
-# [45304] The City of God, Volume I         — Augustine, of Hippo, Saint    (147,663)
-# [2701]  Moby Dick; Or, The Whale          — Melville, Herman              (112,302)
-# [1342]  Pride and Prejudice               — Austen, Jane                  (107,502)
-# [768]   Wuthering Heights                 — Brontë, Emily                  (72,775)
-# [1513]  Romeo and Juliet                  — Shakespeare, William           (70,272)
-# [11]    Alice's Adventures in Wonderland  — Carroll, Lewis                 (65,243)
-# [64317] The Great Gatsby                  — Fitzgerald, F. Scott           (60,632)
-# [100]   Complete Works of Shakespeare     — Shakespeare, William           (60,527)
-# [1260]  Jane Eyre: An Autobiography       — Brontë, Charlotte              (57,602)
-```
-
-### Browse by genre / topic
-
-```python
-import json
-
-# 'topic' matches both subjects and bookshelves fields
-data = json.loads(http_get("https://gutendex.com/books/?topic=science+fiction"))
-# data['count'] = 3473 total results, 32 per page
-
-data = json.loads(http_get("https://gutendex.com/books/?topic=detective+fiction"))
-# data['count'] = 111
-# data['results'][0]: id=1661 The Adventures of Sherlock Holmes — Doyle, Arthur Conan
-
-# Filter by language (ISO 639-1 code)
-data = json.loads(http_get("https://gutendex.com/books/?languages=fr&topic=roman"))
-# data['count'] = 254 French books with 'roman' in topic
-```
-
-### Paginate through results
-
-```python
-import json
-
-url = "https://gutendex.com/books/?topic=science+fiction"
-books = []
-while url:
-    data = json.loads(http_get(url))
-    books.extend(data['results'])
-    url = data['next']   # None on last page
-    # data['previous'] is also populated after page 1
-    # e.g. data['next'] = "https://gutendex.com/books/?page=3&topic=science+fiction"
-# All 3473 sci-fi books loaded across ~109 pages of 32 each
-```
-
-### Fetch multiple specific books by ID
-
-```python
-import json
-
-data = json.loads(http_get("https://gutendex.com/books/?ids=1342,11,84"))
-# Returns exactly those 3 books, count=3
-for b in data['results']:
-    print(b['id'], b['title'])
-# 84    Frankenstein; or, the modern prometheus
-# 1342  Pride and Prejudice
-# 11    Alice's Adventures in Wonderland
-```
-
-### Read the plain text of a book (boilerplate stripped)
-
-```python
-raw = http_get("https://www.gutenberg.org/cache/epub/1342/pg1342.txt")
-# 763 083 chars total including PG licence header and footer
-
-START = "*** START OF THE PROJECT GUTENBERG EBOOK"
-END   = "*** END OF THE PROJECT GUTENBERG EBOOK"
-s = raw.find(START)
-e = raw.find(END)
-if s != -1:
-    content = raw[raw.index('\n', s) + 1 : e].strip()
-    # 743 241 chars of actual novel text
-```
-
-The cache URL is the most reliable direct path. The `formats` dict in Gutendex also provides a redirect URL that resolves to the same file:
-
-```python
-# Both of these return identical content (763 083 chars):
-http_get("https://www.gutenberg.org/ebooks/1342.txt.utf-8")          # redirect
-http_get("https://www.gutenberg.org/cache/epub/1342/pg1342.txt")     # direct cache
-```
-
-### Download formats available per book
-
-Every book's `formats` dict maps MIME type to URL. All URLs resolve to `/cache/epub/{id}/` files via redirect.
-
-| MIME type | URL pattern (after redirect) | Typical size |
-|---|---|---|
-| `text/plain; charset=utf-8` | `pg{id}.txt` | ~750 KB |
-| `text/html` | `pg{id}-images.html` | ~850 KB |
-| `application/epub+zip` | `pg{id}-images-3.epub` | ~25 MB |
-| `application/x-mobipocket-ebook` | `pg{id}-images-kf8.mobi` | ~25 MB |
-| `application/rdf+xml` | `{id}.rdf` via gutenberg.org | metadata XML |
-| `image/jpeg` | `pg{id}.cover.medium.jpg` | cover image |
-| `application/octet-stream` | `pg{id}-h.zip` | HTML+images zip |
-
-```python
-import json
-
-b = json.loads(http_get("https://gutendex.com/books/1342/"))
-# Grab every downloadable format URL:
-for mime, url in b['formats'].items():
-    print(mime, '->', url)
-# text/html                         -> https://www.gutenberg.org/ebooks/1342.html.images
-# application/epub+zip              -> https://www.gutenberg.org/ebooks/1342.epub3.images
-# application/x-mobipocket-ebook   -> https://www.gutenberg.org/ebooks/1342.kf8.images
-# application/rdf+xml               -> https://www.gutenberg.org/ebooks/1342.rdf
-# image/jpeg                        -> https://www.gutenberg.org/cache/epub/1342/pg1342.cover.medium.jpg
-# application/octet-stream          -> https://www.gutenberg.org/cache/epub/1342/pg1342-h.zip
-# text/plain; charset=utf-8         -> https://www.gutenberg.org/ebooks/1342.txt.utf-8
-```
-
-### Fetch RDF/XML metadata for a book
-
-```python
-import re
-
-rdf = http_get("https://www.gutenberg.org/cache/epub/1342/pg1342.rdf")
-# Also available as: http_get("https://www.gutenberg.org/ebooks/1342.rdf")
-
-title     = re.search(r'<dcterms:title>(.*?)</dcterms:title>', rdf, re.DOTALL)
-creator   = re.findall(r'<pgterms:name>(.*?)</pgterms:name>', rdf)
-birth     = re.findall(r'<pgterms:birthdate[^>]*>(\d+)', rdf)
-death     = re.findall(r'<pgterms:deathdate[^>]*>(\d+)', rdf)
-issued    = re.search(r'<dcterms:issued[^>]*>(.*?)</dcterms:issued>', rdf)
-rights    = re.search(r'<dcterms:rights>(.*?)</dcterms:rights>', rdf)
-downloads = re.search(r'<pgterms:downloads[^>]*>(\d+)</pgterms:downloads>', rdf)
-language  = re.search(r'<dcterms:language>.*?<rdf:value>(.*?)</rdf:value>', rdf, re.DOTALL)
-subjects  = re.findall(r'<dcterms:subject>.*?<rdf:value>(.*?)</rdf:value>.*?</dcterms:subject>', rdf, re.DOTALL)
-
-print(title.group(1))          # Pride and Prejudice
-print(creator)                 # ['Austen, Jane']
-print(birth, death)            # ['1775'] ['1817']
-print(issued.group(1))         # 1998-06-01
-print(rights.group(1))         # Public domain in the USA.
-print(int(downloads.group(1))) # 107502
-print(subjects[:3])            # ['England -- Fiction', 'Young women -- Fiction', 'Love stories']
-```
-
-Note: `<dcterms:language>` value is a subject string, not a language code. For language codes use the Gutendex `languages` field instead.
-
-### Search the HTML catalog (25 results per page)
-
-Use this only when you need to leverage Gutenberg's own search index (author:, title:, subject: prefix syntax).
-
-```python
-import re, json
-
-html = http_get(
-    "https://www.gutenberg.org/ebooks/search/"
-    "?query=shakespeare&sort_order=downloads"
-)
-# sort_order options: downloads, title, release_date, last_update, random
-
-entries = re.findall(r'<li class="booklink">(.*?)</li>', html, re.DOTALL)
-books = []
-for e in entries:
-    book_id   = re.search(r'/ebooks/(\d+)', e)
-    title     = re.search(r'<span class="title">(.*?)</span>', e)
-    author    = re.search(r'<span class="subtitle">(.*?)</span>', e)
-    downloads = re.search(r'<span class="extra">([^<]+)</span>', e)
-    books.append({
-        'id':        int(book_id.group(1)) if book_id else None,
-        'title':     title.group(1) if title else '',
-        'author':    author.group(1) if author else '',
-        'downloads': downloads.group(1).strip() if downloads else '',
-    })
-
-# books[0] = {'id': 1513, 'title': 'Romeo and Juliet',
-#             'author': 'William Shakespeare', 'downloads': '74316 downloads'}
-
-# Paginate with start_index (25 per page)
-html_p2 = http_get(
-    "https://www.gutenberg.org/ebooks/search/"
-    "?query=shakespeare&sort_order=downloads&start_index=26"
-)
-```
-
-### Browse a bookshelf (curated genre list)
-
-```python
-import re
-
-# Bookshelf 68 = Science Fiction
-html = http_get("https://www.gutenberg.org/ebooks/bookshelf/68")
-titles = re.findall(r'<span class="title">(.*?)</span>', html)
-# ['Twenty Thousand Leagues under the Sea', 'The War of the Worlds',
-#  'The Time Machine', 'Thuvia, Maid of Mars', ...]
-```
-
-### OPDS catalog (machine-readable Atom feed)
-
-```python
-import re
-
-feed = http_get("https://www.gutenberg.org/ebooks/search.opds/?query=dracula")
-# Returns Atom XML, 7 entries per page (including 1 metadata entry)
-entries = re.findall(r'<entry>(.*?)</entry>', feed, re.DOTALL)
-for e in entries:
-    title = re.search(r'<title>(.*?)</title>', e)
-    entry_id = re.search(r'<id>(.*?)</id>', e)
-    if title and entry_id and 'opds' in entry_id.group(1):
-        book_id = re.search(r'/ebooks/(\d+)\.opds', entry_id.group(1))
-        print(book_id.group(1), title.group(1))
-# 345  Dracula
-```
-
-## Gutendex API — full response schema
-
-Validated against a real call to `GET https://gutendex.com/books/1342/`:
-
-```json
-{
-  "id": 1342,
-  "title": "Pride and Prejudice",
-  "authors": [
-    {"name": "Austen, Jane", "birth_year": 1775, "death_year": 1817}
-  ],
-  "summaries": ["...automatically generated summary..."],
-  "editors": [],
-  "translators": [],
-  "subjects": [
-    "Courtship -- Fiction",
-    "Domestic fiction",
-    "England -- Fiction",
-    "Love stories",
-    "Sisters -- Fiction",
-    "Women -- England -- Fiction",
-    "Young women -- Fiction"
-  ],
-  "bookshelves": [
-    "Best Books Ever Listings",
-    "Category: British Literature",
-    "Category: Classics of Literature",
-    "Category: Novels",
-    "Category: Romance",
-    "Harvard Classics"
-  ],
-  "languages": ["en"],
-  "copyright": false,
-  "media_type": "Text",
-  "formats": {
-    "text/html":                       "https://www.gutenberg.org/ebooks/1342.html.images",
-    "application/epub+zip":            "https://www.gutenberg.org/ebooks/1342.epub3.images",
-    "application/x-mobipocket-ebook": "https://www.gutenberg.org/ebooks/1342.kf8.images",
-    "application/rdf+xml":             "https://www.gutenberg.org/ebooks/1342.rdf",
-    "image/jpeg":                      "https://www.gutenberg.org/cache/epub/1342/pg1342.cover.medium.jpg",
-    "application/octet-stream":        "https://www.gutenberg.org/cache/epub/1342/pg1342-h.zip",
-    "text/plain; charset=utf-8":       "https://www.gutenberg.org/ebooks/1342.txt.utf-8"
-  },
-  "download_count": 107502
-}
-```
-
-List response wrapper (from `GET /books/`):
-
-```json
-{
-  "count": 6,
-  "next": null,
-  "previous": null,
-  "results": [...]
-}
-```
-
-`count` is the total across all pages. `next` / `previous` are fully-formed URLs ready to pass to `http_get`, or `null` when absent.
-
-## Gutendex query parameters
-
-All parameters combine freely.
-
-| Parameter | Example | Notes |
-|---|---|---|
-| `search` | `search=moby+dick` | Matches title and author |
-| `ids` | `ids=1342,11,84` | Comma-separated; returns only those books |
-| `languages` | `languages=fr` | ISO 639-1 code; comma-separated for multiple |
-| `topic` | `topic=science+fiction` | Matches subjects + bookshelves |
-| `author_year_start` | `author_year_start=1800` | Author born on/after year |
-| `author_year_end` | `author_year_end=1850` | Author born on/before year |
-| `copyright` | `copyright=false` | `false`=public domain, `true`=copyrighted |
-| `sort` | `sort=popular` | `popular` (default), `ascending`, `descending` |
-| `page` | `page=2` | 1-based; 32 results per page (not configurable) |
-
-`page_size` is not supported — always 32 results per page regardless.
-
-## Finding book IDs
-
-Three ways, in order of preference:
-
-1. **Gutendex search** — returns `id` directly in JSON.
-2. **Gutenberg HTML catalog** — `book_id = re.search(r'/ebooks/(\d+)', entry)`. IDs in the URL.
-3. **URL pattern** — `https://www.gutenberg.org/ebooks/{id}` — if you already know the ID from any source.
-
-Notable IDs validated in tests: `84` (Frankenstein), `1342` (Pride and Prejudice), `11` (Alice in Wonderland), `2701` (Moby Dick), `64317` (The Great Gatsby), `1513` (Romeo and Juliet), `100` (Complete Works of Shakespeare), `1661` (Adventures of Sherlock Holmes), `345` (Dracula).
-
-## Rate limits
-
-Gutendex (`gutendex.com`) returns no `X-RateLimit-*` headers. Server is Apache/2.4.58 on Ubuntu. Rapid sequential calls can trigger connection resets — observed a timeout on the second call in a tight loop. Add a small delay between calls when paginating:
-
-```python
-import time, json
-
-url = "https://gutendex.com/books/?sort=popular"
-while url:
-    data = json.loads(http_get(url))
-    # ... process data['results'] ...
-    url = data['next']
-    if url:
-        time.sleep(0.5)   # be respectful — no published rate limit but timeouts observed
-```
-
-For gutenberg.org file downloads (txt, epub, etc.) there is no documented rate limit but Gutenberg asks not to use automated bulk downloading; use their [offline catalogs](https://www.gutenberg.org/ebooks/offline_catalogs.html) for bulk access.
-
-## Gotchas
-
-- **`.opf` 404**: `https://www.gutenberg.org/cache/epub/1342/pg1342.opf` returns 404. Use `.rdf` instead — same path prefix, same data in RDF/XML.
-- **`formats` URLs redirect**: URLs like `https://www.gutenberg.org/ebooks/1342.txt.utf-8` are redirect endpoints that resolve to `/cache/epub/1342/pg1342.txt`. Either form works with `http_get` (urllib follows redirects automatically), but the `/cache/epub/` direct URL avoids an extra round trip.
-- **Two text files**: `/files/1342/1342-0.txt` (older Project Gutenberg edition, 729 KB) and `/cache/epub/1342/pg1342.txt` (modern edition, 763 KB) contain different versions of the same book. The Gutendex `formats` entry always points to the cache/modern version.
-- **Boilerplate**: Every `.txt` file opens with a PG licence header and closes with a footer. Strip with `START`/`END` markers (see "Read the plain text" section above).
-- **`summaries` field is AI-generated**: The `summaries` array in Gutendex responses contains automatically generated summaries, not the author's original blurb.
-- **`copyright: false`** means public domain in the USA. Non-US copyright status is not tracked.
-- **`page_size` ignored**: Passing `?page_size=5` to Gutendex has no effect — always returns 32 results.
-- **Gutendex `sort=ascending/descending`** sorts by ID (oldest/newest book in the catalog), not by title or author name.
-- **Catalog search `author:` prefix**: `?query=author:dickens` searches within author names but Gutenberg's relevance ranking is fuzzy and can return unexpected results. For precise author lookup use Gutendex `?search=charles+dickens`.
-- **OPDS pagination**: Only 7 entries per page (1 metadata + 6 books). Slow for bulk extraction — use Gutendex instead.
-- **HTML catalog `start_index`**: Pagination is 25 per page. Next page = `start_index=26`, then `51`, `76`, etc. The value appears in the rendered HTML (`re.findall(r'start_index=(\d+)', html)` returns the next page's value).
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/hackernews/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/hackernews/scraping.md
deleted file mode 100644
index 86ac6b785..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/hackernews/scraping.md
+++ /dev/null
@@ -1,243 +0,0 @@
-# Hacker News — Data Extraction
-
-`https://news.ycombinator.com` — YCombinator's link aggregator. Three access paths tested: `http_get` DOM scraping, Algolia search API, and the official HN Firebase API. All work without a browser.
-
-## Do this first: pick your access path
-
-| Goal | Best approach | Latency |
-|------|--------------|---------|
-| Current front page (30 stories, real-time) | `http_get` + regex | ~170ms |
-| Historical / keyword search | Algolia search API | ~400ms |
-| Full comment tree (nested) | Algolia items API | ~300ms |
-| Specific item by ID | Firebase API | ~200ms |
-| 500 ranked story IDs | Firebase topstories | ~200ms (+ ~190ms/item after) |
-
-**Never use a browser for read-only HN tasks.** Everything is accessible over HTTP with no auth, no JS rendering needed.
-
----
-
-## Path 1: http_get front page (fastest for real-time data)
-
-The front page HTML is ~34KB. Story order matches Firebase `/topstories.json` exactly — confirmed identical on 2026-04-18.
-
-```python
-import re, html as htmllib
-
-page = http_get("https://news.ycombinator.com")
-
-# Extract all 30 story IDs (in rank order)
-story_ids = re.findall(r'<tr class="athing submission" id="(\d+)">', page)
-
-# Extract titles + URLs (same order as IDs)
-titles_urls = re.findall(
-    r'class="titleline"[^>]*><a href="([^"]*)"[^>]*>(.*?)</a>', page
-)
-
-# Extract scores keyed by story ID (job posts have no score row)
-scores_by_id = {
-    m.group(1): int(m.group(2))
-    for m in re.finditer(
-        r'<span class="score" id="score_(\d+)">(\d+) points</span>', page
-    )
-}
-
-# Extract authors keyed by story ID (anchor on score span)
-authors_by_id = {}
-for m in re.finditer(
-    r'<span class="score" id="score_(\d+)">\d+ points</span>'
-    r'.*?class="hnuser">(.*?)</a>',
-    page, re.DOTALL
-):
-    authors_by_id[m.group(1)] = m.group(2)
-
-# Extract comment counts keyed by story ID
-comments_by_id = {
-    m.group(1): int(m.group(2))
-    for m in re.finditer(
-        r'href="item\?id=(\d+)">(\d+)&nbsp;comments</a>', page
-    )
-}
-
-stories = []
-for i, sid in enumerate(story_ids):
-    url, raw_title = titles_urls[i] if i < len(titles_urls) else ('', '')
-    stories.append({
-        'rank': i + 1,
-        'id': sid,
-        'title': htmllib.unescape(raw_title),   # MUST unescape — titles contain &#x27; etc.
-        'url': url,
-        'score': scores_by_id.get(sid),          # None for job posts
-        'author': authors_by_id.get(sid),
-        'comments': comments_by_id.get(sid, 0),
-    })
-```
-
-**Gotchas:**
-- Titles contain HTML entities (`&#x27;` `&amp;` `&quot;` `&gt;`). Always call `html.unescape()`.
-- `<tr class="athing submission" id="...">` — the class is `athing submission`, not just `athing`. The `athing comtr` class is for comment rows.
-- Job/hiring posts (YC ads) appear in the list but have no score or author. `scores_by_id.get(sid)` returns `None` for them — check before comparing.
-- `re.DOTALL` multi-line patterns can cross story boundaries. Use ID-anchored patterns (as above) instead of positional zip for score/author.
-- The page only serves page 1 (30 items). Pages 2–4 exist at `?p=2` etc. but require a login cookie for page 3+.
-
----
-
-## Path 2: Algolia search API (best for historical / keyword search)
-
-No rate limiting observed. Returns up to 1000 hits per query (`hitsPerPage` max is capped at ~1000 per Algolia plan).
-
-```python
-import json
-
-# Keyword search — sorted by relevance
-data = json.loads(http_get(
-    "https://hn.algolia.com/api/v1/search"
-    "?query=llm&tags=story&hitsPerPage=20"
-))
-
-# Date-sorted (most recent first)
-data = json.loads(http_get(
-    "https://hn.algolia.com/api/v1/search_by_date"
-    "?tags=story&hitsPerPage=20"
-))
-
-# Paginate: add &page=N (0-indexed), up to data['nbPages']-1
-```
-
-**Fields returned per story hit:**
-```
-objectID, title, url, author, points, num_comments,
-created_at (ISO 8601), created_at_i (unix ts), story_id,
-children (list of comment IDs — flat, not tree),
-_tags, _highlightResult
-```
-
-**Fields returned per comment hit:**
-```
-objectID, comment_text, author, story_id, story_title, story_url,
-parent_id, created_at, created_at_i, points
-```
-Note: comment hits use `comment_text`, NOT `text`. Story hits use `story_text` for self-post body.
-
-### Tag filters
-
-Tags are AND by default, OR with parentheses:
-
-```python
-# Story types
-"tags=story"           # regular link/self posts
-"tags=show_hn"         # Show HN
-"tags=ask_hn"          # Ask HN
-"tags=poll"            # polls
-"tags=job"             # job posts
-
-# Combined AND
-"tags=story,front_page"          # currently on front page
-"tags=story,author_pg"           # stories submitted by pg
-
-# OR
-"tags=(ask_hn,show_hn),story"    # Ask OR Show HN
-
-# By story ID (gets story + all its comments)
-"tags=story_47806725"
-```
-
-### Numeric filters
-
-```python
-# Date range (unix timestamps)
-"numericFilters=created_at_i>1745000000"
-"numericFilters=created_at_i>1700000000,created_at_i<1750000000"
-
-# Point threshold
-"numericFilters=points>100"
-"numericFilters=points>500,points<1000"
-```
-
-### Full Algolia items API (nested comment tree)
-
-```python
-import json
-
-thread = json.loads(http_get(
-    "https://hn.algolia.com/api/v1/items/47806725"
-))
-# thread['children'] = list of top-level comment objects
-# Each comment: author, text (HTML), created_at, children (nested replies)
-# Recursively walk children for full thread
-
-# Total comment count (recursive walk with stack):
-stack = list(thread.get('children', []))
-total = 0
-while stack:
-    node = stack.pop()
-    total += 1
-    stack.extend(node.get('children', []))
-```
-
-Confirmed: Algolia items returns 653 total comments for a 659-comment thread (some deleted). `text` field in items API is HTML with `<p>` tags and `<a>` links — may need to strip tags.
-
----
-
-## Path 3: Official HN Firebase API
-
-Clean JSON, no scraping. Use for fetching specific items or building live feeds.
-
-```python
-import json
-
-# Ranked story ID lists (no metadata — just IDs)
-top   = json.loads(http_get("https://hacker-news.firebaseio.com/v0/topstories.json"))  # 500 IDs
-new   = json.loads(http_get("https://hacker-news.firebaseio.com/v0/newstories.json"))  # 500 IDs
-best  = json.loads(http_get("https://hacker-news.firebaseio.com/v0/beststories.json")) # 200 IDs
-ask   = json.loads(http_get("https://hacker-news.firebaseio.com/v0/askstories.json"))  # ~32 IDs
-show  = json.loads(http_get("https://hacker-news.firebaseio.com/v0/showstories.json")) # ~119 IDs
-jobs  = json.loads(http_get("https://hacker-news.firebaseio.com/v0/jobstories.json"))  # ~31 IDs
-
-# Fetch a single item
-item = json.loads(http_get(
-    "https://hacker-news.firebaseio.com/v0/item/47806725.json"
-))
-# Fields: id, type, by, title, url, score, descendants (total comment count),
-#         time (unix ts), kids (list of top-level comment IDs), text (self-post body)
-
-# Fetch a user profile
-user = json.loads(http_get(
-    "https://hacker-news.firebaseio.com/v0/user/pg.json"
-))
-# Fields: id, karma, created (unix ts), about (HTML), submitted (list of item IDs)
-
-# Highest current item ID (useful for polling new items)
-maxid = json.loads(http_get("https://hacker-news.firebaseio.com/v0/maxitem.json"))
-```
-
-**Firebase vs Algolia tradeoff:**
-- Firebase `topstories` gives you 500 IDs in one call but then requires one HTTP call per item (~190ms each). Fetching all 500 items sequentially would take ~100 seconds.
-- Algolia returns full story data (title, points, author, comments) in one call for up to ~1000 results.
-- For "top 30 stories with full metadata": use `http_get` front page scrape (170ms total). For "top 500 stories with full metadata": use Algolia with `tags=front_page` or loop pages.
-
----
-
-## Comment thread HTML (item page)
-
-For a large thread, the item page HTML (~1MB for 659 comments) loads ALL comments flat in a single request — no pagination, no JS required.
-
-```python
-import re, html as htmllib
-
-page = http_get("https://news.ycombinator.com/item?id=47806725")
-
-# Count all comment IDs
-comment_ids = re.findall(r'<tr class="athing comtr" id="(\d+)">', page)
-# len(comment_ids) matches total comment count
-
-# Extract comment texts (careful: text spans multiple lines with <p> tags)
-# Use Algolia items API instead for structured access
-```
-
-For structured comment access prefer Algolia items API — it returns a proper nested tree. The HTML item page is useful only when you need approximate comment count without an API call.
-
----
-
-## Do NOT use a browser for HN
-
-All data is in plain HTML or JSON APIs. `goto_url()` + `wait_for_load()` takes 3–8 seconds; `http_get` takes 170–400ms. The JS `querySelectorAll` approach works (tested, returns correct data) but is 20–50x slower with no benefit.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/howlongtobeat/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/howlongtobeat/scraping.md
deleted file mode 100644
index e93bba749..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/howlongtobeat/scraping.md
+++ /dev/null
@@ -1,473 +0,0 @@
-# HowLongToBeat — Scraping & Data Extraction
-
-Field-tested against howlongtobeat.com on 2026-04-18. All code blocks validated with live requests.
-
-## Do this first
-
-**Use the search API — it returns structured JSON with all completion times in one POST call.**
-
-HLTB runs a token-gated POST endpoint at `/api/find`. You must first fetch a session token from `/api/find/init`, then include it in the search request. Both steps are plain HTTP — no browser required.
-
-```python
-import json, re, urllib.request, time
-from helpers import http_get
-
-UA = "Mozilla/5.0"
-
-def get_token():
-    """Fetch a fresh session token. Token encodes IP+UA+timestamp, reusable for ~15 min."""
-    url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
-    data = http_get(url, headers={"Referer": "https://howlongtobeat.com/"})
-    return json.loads(data)  # {token, hpKey, hpVal}
-
-def search_hltb(title, size=20, page=1, token_data=None):
-    """
-    Search HLTB for games. Returns raw API dict:
-    {count, pageCurrent, pageTotal, pageSize, data: [...]}
-    token_data can be reused across searches (fetch once, use many times).
-    """
-    if token_data is None:
-        token_data = get_token()
-    hp_key, hp_val = token_data['hpKey'], token_data['hpVal']
-    payload = {
-        "searchType": "games",
-        "searchTerms": title.split(),
-        "searchPage": page,
-        "size": size,
-        "searchOptions": {
-            "games": {
-                "userId": 0, "platform": "", "sortCategory": "popular",
-                "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
-                "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
-                "rangeYear": {"min": "", "max": ""}, "modifier": ""
-            },
-            "users": {"sortCategory": "postcount"},
-            "lists": {"sortCategory": "follows"},
-            "filter": "", "sort": 0, "randomizer": 0
-        },
-        "useCache": True,
-        hp_key: hp_val      # honeypot field — key and value vary per token
-    }
-    req = urllib.request.Request(
-        "https://howlongtobeat.com/api/find",
-        data=json.dumps(payload).encode(),
-        headers={
-            "User-Agent": UA,
-            "Content-Type": "application/json",
-            "Origin": "https://howlongtobeat.com",
-            "Referer": "https://howlongtobeat.com/",
-            "x-auth-token": token_data['token'],
-            "x-hp-key": hp_key,
-            "x-hp-val": hp_val,
-        },
-        method="POST"
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        return json.loads(r.read().decode())
-
-# Usage
-tok = get_token()
-
-result = search_hltb("elden ring", token_data=tok, size=3)
-for g in result['data']:
-    print(g['game_id'], g['game_name'], g['release_world'])
-    print(f"  Main: {g['comp_main']/3600:.1f}h  +Extras: {g['comp_plus']/3600:.1f}h  100%: {g['comp_100']/3600:.1f}h")
-
-# Confirmed output (2026-04-18):
-# 68151 Elden Ring 2022
-#   Main: 60.0h  +Extras: 101.2h  100%: 135.5h
-# 160589 Elden Ring: Nightreign 2025
-#   Main: 28.1h  +Extras: 40.1h  100%: 66.9h
-# 139385 Elden Ring: Shadow of the Erdtree 2024
-#   Main: 25.7h  +Extras: 39.0h  100%: 51.1h
-```
-
-Token is reusable — fetch it once and pass it to multiple `search_hltb()` calls. No need to re-fetch per search.
-
----
-
-## Fastest approach: search + parse in one helper
-
-```python
-import json, re, urllib.request, time
-from helpers import http_get
-
-UA = "Mozilla/5.0"
-
-def hltb_search(title, size=5):
-    """One-shot: get token + search, return list of dicts with hours."""
-    url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
-    tok = json.loads(http_get(url, headers={"Referer": "https://howlongtobeat.com/"}))
-    hp_key, hp_val = tok['hpKey'], tok['hpVal']
-    payload = {
-        "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": size,
-        "searchOptions": {
-            "games": {"userId": 0, "platform": "", "sortCategory": "popular",
-                      "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
-                      "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
-                      "rangeYear": {"min": "", "max": ""}, "modifier": ""},
-            "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
-            "filter": "", "sort": 0, "randomizer": 0
-        },
-        "useCache": True, hp_key: hp_val
-    }
-    req = urllib.request.Request(
-        "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
-        headers={"User-Agent": UA, "Content-Type": "application/json",
-                 "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
-                 "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
-        method="POST"
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        data = json.loads(r.read().decode())
-
-    def h(secs):
-        return round(secs / 3600, 1) if secs else None
-
-    return [
-        {
-            "game_id":      g["game_id"],
-            "name":         g["game_name"],
-            "type":         g["game_type"],         # "game" | "dlc" | "expansion" | "hack"
-            "year":         g["release_world"],
-            "platforms":    g["profile_platform"],
-            "main":         h(g["comp_main"]),       # Main Story hours (polled average)
-            "main_plus":    h(g["comp_plus"]),       # Main + Extras hours
-            "completionist":h(g["comp_100"]),        # Completionist hours
-            "all_styles":   h(g["comp_all"]),        # All playstyles combined
-            "main_count":   g["comp_main_count"],    # Number of submissions
-            "plus_count":   g["comp_plus_count"],
-            "comp_count":   g["comp_100_count"],
-            "review_score": g["review_score"],       # 0–100
-            "image_url":    f"https://howlongtobeat.com/games/{g['game_image']}",
-            "page_url":     f"https://howlongtobeat.com/game/{g['game_id']}",
-        }
-        for g in data["data"]
-    ]
-
-# Verified results (2026-04-18):
-print(hltb_search("the witcher 3")[0])
-# {'game_id': 10270, 'name': 'The Witcher 3: Wild Hunt', 'type': 'game', 'year': 2015,
-#  'main': 51.6, 'main_plus': 103.8, 'completionist': 174.4, 'all_styles': 103.8,
-#  'main_count': 2681, 'plus_count': 6708, 'comp_count': 2327, 'review_score': 93, ...}
-
-print(hltb_search("gone home")[0])
-# {'game_id': 4010, 'name': 'Gone Home', 'main': 2.0, 'main_plus': 2.5, 'completionist': 3.1, ...}
-```
-
----
-
-## Game detail page (full stat breakdown, speedrun data, per-platform times)
-
-When you have a `game_id`, fetch the game page and extract `__NEXT_DATA__` for the complete dataset — includes median/avg/low/high times, speedrun data, co-op/multiplayer times, and per-platform breakdowns.
-
-```python
-import json, re
-from helpers import http_get
-
-def get_game_detail(game_id):
-    """
-    Fetch complete game data from the HLTB game page.
-    Returns pageProps['game']['data'] with keys: 'game', 'individuality', 'relationships'.
-    """
-    html = http_get(f"https://howlongtobeat.com/game/{game_id}")
-    nd = json.loads(re.search(
-        r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
-    ).group(1))
-    return nd['props']['pageProps']['game']['data']
-
-data = get_game_detail(10270)   # Witcher 3
-g = data['game'][0]
-
-# Core completion times (all in seconds — divide by 3600 for hours)
-print(g['comp_main'] / 3600)       # 51.6  — Main Story (polled avg)
-print(g['comp_main_med'] / 3600)   # 50.0  — Main Story median
-print(g['comp_main_l'] / 3600)     # 32.7  — Main Story low
-print(g['comp_main_h'] / 3600)     # 85.8  — Main Story high
-print(g['comp_main_count'])        # 2681  — submission count
-
-print(g['comp_plus'] / 3600)       # 103.8 — Main + Extras
-print(g['comp_100'] / 3600)        # 174.4 — Completionist
-print(g['comp_all'] / 3600)        # 103.8 — All Styles
-
-# Speedrun times
-print(g['comp_lvl_spd'])           # 1 if speedrun data exists, 0 if not
-print(g['comp_speed'] / 3600)      # 19.2  — any% (polled avg)
-print(g['comp_speed_min'] / 3600)  # 3.2   — fastest submission
-print(g['comp_speed_max'] / 3600)  # 30.0  — slowest speedrun
-print(g['comp_speed_count'])       # 15    — speedrun submissions
-
-print(g['comp_speed100'] / 3600)   # 59.4  — 100% speedrun
-print(g['comp_speed100_count'])    # 4
-
-# Multiplayer / co-op invested time
-print(g['comp_lvl_co'])            # 1 if co-op data exists
-print(g['comp_lvl_mp'])            # 1 if multiplayer data exists
-print(g['invested_co'] / 3600)     # hours in co-op mode
-print(g['invested_mp'] / 3600)     # hours in competitive multiplayer
-print(g['invested_co_count'])      # submission count
-
-# Metadata
-print(g['profile_dev'])            # "CD Projekt RED"
-print(g['profile_pub'])            # "CD Projekt, Warner Bros..."
-print(g['profile_platform'])       # "Nintendo Switch, PC, PlayStation 4, ..."
-print(g['profile_genre'])          # "Third-Person, Action, Open World, Role-Playing"
-print(g['profile_steam'])          # 292030  — Steam App ID (0 if not on Steam)
-print(g['release_world'])          # "2015-05-19"
-print(g['rating_esrb'])            # "M"
-print(g['review_score'])           # 93  (0–100)
-print(g['count_comp'])             # 26007  — times completed
-print(g['count_backlog'])          # 31083
-
-# Per-platform breakdown (individuality)
-for plat in data['individuality']:
-    print(plat['platform'],
-          int(plat['comp_main'])/3600,    # main hours
-          int(plat['comp_plus'])/3600,    # +extras hours
-          int(plat['comp_100'])/3600,     # 100% hours
-          plat['count_comp'])             # completions on this platform
-# Example:
-# Nintendo Switch  57.0h  112.3h  194.9h  236
-# PC, PS4, Xbox One  52.9h  110.0h  179.4h  11136
-# PS5, Xbox Series X/S  52.1h  92.5h  168.8h  343
-
-# DLC / expansion completion times
-for rel in data['relationships'][:3]:
-    print(rel['game_id'], rel['game_name'], rel['game_type'],
-          rel['comp_main']/3600 if rel['comp_main'] else None)
-```
-
----
-
-## Common workflows
-
-### Quick lookup: name → completion times
-
-```python
-import json, re, urllib.request, time
-from helpers import http_get
-
-UA = "Mozilla/5.0"
-
-def get_times(title):
-    """Return Main/+Extras/100% hours for the top search match."""
-    tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
-    tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
-    hp_key, hp_val = tok['hpKey'], tok['hpVal']
-    payload = {
-        "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": 1,
-        "searchOptions": {
-            "games": {"userId": 0, "platform": "", "sortCategory": "popular",
-                      "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
-                      "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
-                      "rangeYear": {"min": "", "max": ""}, "modifier": ""},
-            "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
-            "filter": "", "sort": 0, "randomizer": 0
-        },
-        "useCache": True, hp_key: hp_val
-    }
-    req = urllib.request.Request(
-        "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
-        headers={"User-Agent": UA, "Content-Type": "application/json",
-                 "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
-                 "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
-        method="POST"
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        data = json.loads(r.read().decode())
-    if not data['data']:
-        return None
-    g = data['data'][0]
-    h = lambda s: round(s/3600, 1) if s else None
-    return {
-        "id": g['game_id'], "name": g['game_name'],
-        "main": h(g['comp_main']), "main_plus": h(g['comp_plus']),
-        "completionist": h(g['comp_100'])
-    }
-
-# Verified:
-print(get_times("celeste"))
-# {'id': 42818, 'name': 'Celeste', 'main': 8.3, 'main_plus': 14.6, 'completionist': 39.2}
-print(get_times("stardew valley"))
-# {'id': 34716, 'name': 'Stardew Valley', 'main': 53.4, 'main_plus': 94.6, 'completionist': 171.5}
-print(get_times("hades"))
-# {'id': 62941, 'name': 'Hades', 'main': 23.4, 'main_plus': 48.5, 'completionist': 95.0}
-```
-
-### Paginated search (all results for a query)
-
-`count` = total matches, `pageTotal` = total pages with current `size`. The same token works across all pages.
-
-```python
-def search_all_pages(title, size=20):
-    """Yield every search result for a query across all pages."""
-    tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
-    tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
-    hp_key, hp_val = tok['hpKey'], tok['hpVal']
-
-    page = 1
-    while True:
-        payload = {
-            "searchType": "games", "searchTerms": title.split(),
-            "searchPage": page, "size": size,
-            "searchOptions": {
-                "games": {"userId": 0, "platform": "", "sortCategory": "popular",
-                          "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
-                          "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
-                          "rangeYear": {"min": "", "max": ""}, "modifier": ""},
-                "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
-                "filter": "", "sort": 0, "randomizer": 0
-            },
-            "useCache": True, hp_key: hp_val
-        }
-        req = urllib.request.Request(
-            "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
-            headers={"User-Agent": UA, "Content-Type": "application/json",
-                     "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
-                     "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
-            method="POST"
-        )
-        with urllib.request.urlopen(req, timeout=20) as r:
-            data = json.loads(r.read().decode())
-        yield from data['data']
-        if page >= data['pageTotal']:
-            break
-        page += 1
-
-# "mario" returns 308 results across 16 pages (size=20)
-mario_games = list(search_all_pages("mario", size=20))
-print(len(mario_games))   # 308
-```
-
-### Batch lookup by game ID (parallel)
-
-```python
-import json, re, urllib.request
-from concurrent.futures import ThreadPoolExecutor
-from helpers import http_get
-
-def fetch_game(game_id):
-    html = http_get(f"https://howlongtobeat.com/game/{game_id}")
-    nd = json.loads(re.search(
-        r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
-    ).group(1))
-    g = nd['props']['pageProps']['game']['data']['game'][0]
-    return {
-        "id": g['game_id'], "name": g['game_name'],
-        "main": round(g['comp_main']/3600, 1) if g['comp_main'] else None,
-        "main_plus": round(g['comp_plus']/3600, 1) if g['comp_plus'] else None,
-        "completionist": round(g['comp_100']/3600, 1) if g['comp_100'] else None,
-    }
-
-ids = [10270, 68151, 42818, 26803, 34716]   # Witcher3, Elden Ring, Celeste, DS3, Stardew
-with ThreadPoolExecutor(max_workers=5) as ex:
-    results = list(ex.map(fetch_game, ids))
-
-for r in results:
-    print(f"[{r['id']}] {r['name']}: {r['main']}h / {r['main_plus']}h / {r['completionist']}h")
-
-# Confirmed output:
-# [10270] The Witcher 3: Wild Hunt: 51.6h / 103.8h / 174.4h
-# [68151] Elden Ring: 60.0h / 101.2h / 135.5h
-# [42818] Celeste: 8.3h / 14.6h / 39.2h
-# [26803] Dark Souls III: 31.2h / 48.4h / 100.5h
-# [34716] Stardew Valley: 53.4h / 94.6h / 171.5h
-```
-
----
-
-## Search response field reference
-
-Every item in `data[]` from `/api/find`:
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `game_id` | int | HLTB internal game ID |
-| `game_name` | str | Full game title |
-| `game_alias` | str | Alternate title / edition name |
-| `game_type` | str | `"game"` \| `"dlc"` \| `"expansion"` \| `"hack"` |
-| `game_image` | str | Image filename → `https://howlongtobeat.com/games/{game_image}` |
-| `release_world` | int | Release year (just the year integer, not a date) |
-| `profile_platform` | str | Comma-separated platform list |
-| `comp_main` | int | Main Story seconds (polled average), 0 if no data |
-| `comp_plus` | int | Main + Extras seconds |
-| `comp_100` | int | Completionist seconds |
-| `comp_all` | int | All Styles combined seconds |
-| `comp_main_count` | int | Submission count for Main Story |
-| `comp_plus_count` | int | Submission count for Main + Extras |
-| `comp_100_count` | int | Submission count for Completionist |
-| `comp_all_count` | int | Total submissions across all categories |
-| `comp_lvl_sp` | int | 1 if single-player data exists |
-| `comp_lvl_co` | int | 1 if co-op data exists |
-| `comp_lvl_mp` | int | 1 if multiplayer data exists |
-| `invested_co` | int | Average co-op time in seconds |
-| `invested_mp` | int | Average multiplayer time in seconds |
-| `count_comp` | int | Total completions logged |
-| `count_backlog` | int | Users with game in backlog |
-| `count_playing` | int | Currently playing |
-| `count_speedrun` | int | Speedrun entries |
-| `count_review` | int | Review count |
-| `review_score` | int | Community review score 0–100 |
-| `profile_popular` | int | Popularity rank |
-
-Additional fields in `__NEXT_DATA__` game page only:
-
-| Field | Description |
-|-------|-------------|
-| `comp_main_med/avg/l/h` | Median / average / low / high for main time |
-| `comp_plus_med/avg/l/h` | Same for Main + Extras |
-| `comp_100_med/avg/l/h` | Same for Completionist |
-| `comp_speed` | Speedrun any% average seconds |
-| `comp_speed_min/max/med` | Speedrun spread |
-| `comp_speed100` | 100% speedrun average |
-| `comp_speed_count` | Speedrun submission count |
-| `comp_lvl_spd` | 1 if speedrun data exists |
-| `profile_dev` | Developer name |
-| `profile_pub` | Publisher name |
-| `profile_genre` | Comma-separated genres |
-| `profile_steam` | Steam App ID (0 if not on Steam) |
-| `release_world` | Full release date `"YYYY-MM-DD"` |
-| `rating_esrb` | ESRB rating string (may be empty) |
-| `count_replay` | Times replayed |
-| `count_total` | Total user entries |
-
----
-
-## Anti-bot measures
-
-- **Cloudflare** is present (confirmed by `CF-Ray` response header), but does not block plain HTTP with a browser UA.
-- **Token system**: Every search requires a fresh token from `/api/find/init`. Token encodes `timestamp::IP|UA|hpKey|hmacHash`. The server validates that the UA used to fetch the token matches the UA used in the search POST.
-- **Honeypot field**: `hpKey` and `hpVal` from the init response must appear as a top-level field in the POST body (e.g., `{"ign_7671546b": "a6679ea54598d502", ...}`). The key name rotates per request.
-- **Required headers on search POST**: `Origin: https://howlongtobeat.com` AND `Referer: https://howlongtobeat.com/` — missing either causes HTTP 403 or 404. `x-auth-token`, `x-hp-key`, `x-hp-val` are also required.
-- **Required header on init GET**: `Referer: https://howlongtobeat.com/` — missing causes HTTP 403.
-- **Token reuse**: A single token works for multiple searches and multiple pages. No per-request token fetch needed.
-- **No CAPTCHA** observed during testing with standard UA strings.
-- **Rate limits**: Not triggered during testing (token fetches + 10+ searches sequentially). Fetching many game pages in parallel (5 workers) worked without 429s.
-
----
-
-## Gotchas
-
-- **Completion times are in seconds** — all `comp_*` fields are integer seconds. Divide by 3600 for hours. `0` means no data (not 0 hours).
-
-- **`release_world` is a year int in search, a full date in game page** — in the `/api/find` response, `release_world` is an integer year (e.g., `2015`). In `__NEXT_DATA__` on the game page, it's `"2015-05-19"`.
-
-- **UA fingerprinting** — the token from `/api/find/init` encodes the User-Agent. The search POST must use the identical UA that fetched the token, or you'll get HTTP 403. Since `http_get` sends `Mozilla/5.0`, use that same string for the search POST.
-
-- **Honeypot key name rotates** — `hpKey` is something like `ign_7671546b` (changes each token fetch). Always read it from the init response and use it dynamically. Never hardcode it.
-
-- **Both `x-hp-key`/`x-hp-val` headers AND the body field are required** — the server checks the request headers (`x-hp-key`, `x-hp-val`) against the dynamic key in the POST body. If either is wrong or missing, you get HTTP 404 (wrong body value) or HTTP 403 (missing/wrong header).
-
-- **`game_type` in search results** — can be `"game"`, `"dlc"`, `"expansion"`, or `"hack"`. Search results mix these by default. Filter with `if g['game_type'] == 'game'` if you only want base games.
-
-- **Games with no submission data** — `comp_main`, `comp_plus`, `comp_100` are `0` (not `None`) when no users have submitted times. Always check `if g['comp_main']:` before dividing.
-
-- **`individuality` (per-platform) data** — available only in `__NEXT_DATA__` on the game page, not in search results. `comp_main` etc. are strings, not ints, in this sub-object — cast with `int(plat['comp_main'])`.
-
-- **`profile_platform` in search** — a comma-separated string that HLTB displays. Not structured. Use game page `individuality` for per-platform time breakdowns.
-
-- **Token expiry** — if a long-running loop gets HTTP 403 with `{"error":"Session expired or invalid fingerprint"}`, call `get_token()` again and retry. Token lifetime appears to be ~15 minutes based on the timestamp embedded in the decoded value.
-
-- **No slug-based URLs** — HLTB uses integer `game_id` for all game pages, not slugs. There is no `title-to-slug` mapping; use search to find the `game_id` first.
-
-- **`sortCategory` options** — `"popular"` ranks by community engagement (best for "top result = intended game"). `"name"` sorts alphabetically. Other values (`"madnessTime"`, `"mainThenExtras"`) exist but return same results as `"name"` in testing.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/imdb/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/imdb/scraping.md
deleted file mode 100644
index 10e15f9de..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/imdb/scraping.md
+++ /dev/null
@@ -1,271 +0,0 @@
-# IMDb — Charts, Search, and "More Like This" Scraping
-
-`https://www.imdb.com` — the Internet Movie Database. Field-tested on 2026-04-24 against `chart/top`, `chart/moviemeter`, `find/?s=tt&q=`, and `title/tt{id}/` pages.
-
-IMDb's app shell is React with a shared design system (`ipc-*` classes). The same `li.ipc-metadata-list-summary-item` row primitive is reused across Top 250, MovieMeter, Search, and most other list pages — learn one selector set, scrape many pages.
-
-The `tt`-prefixed title ID in the URL (`/title/tt0111161/`) is IMDb's stable primary key. Titles, prefixes, rankings, and CSS class hashes change between releases; `tt`-ids do not. Always dedupe by `tt`-id.
-
----
-
-## Access path decision table
-
-| Goal | Method | Page | Notes |
-|------|--------|------|-------|
-| Top 250 films (ranked) | browser | `/chart/top` | 250 rows, fully rendered server-side |
-| MovieMeter (trending top 100) | browser | `/chart/moviemeter` | 100 rows, same row structure as Top 250 |
-| Keyword/title search | browser | `/find/?s=tt&q=KEYWORD` | `s=tt` restricts to titles |
-| "More Like This" recommendations | browser | `/title/tt{id}/` | Lazy-loaded, requires scroll |
-| Title metadata (year, runtime, genres) | `http_get` + JSON-LD | `/title/tt{id}/` | The `<script type="application/ld+json">` block carries full Movie schema |
-
-`http_get` works for title pages (the JSON-LD and OG meta-tag blocks are pre-rendered), but the chart, search, and "More Like This" panels are client-hydrated — use the browser for those.
-
----
-
-## Path 1: Top 250 chart (`/chart/top`)
-
-```python
-import json
-from helpers import goto, wait_for_load, wait, js
-
-goto("https://www.imdb.com/chart/top/")
-wait_for_load()
-wait(2)  # let React finish hydration
-
-rows = json.loads(js(r"""
-(function () {
-  var out = [];
-  document.querySelectorAll('li.ipc-metadata-list-summary-item').forEach(function (li) {
-    var a = li.querySelector('a.ipc-title-link-wrapper');
-    var h = li.querySelector('a.ipc-title-link-wrapper h3.ipc-title__text');
-    if (!a || !h) return;
-    var raw = h.textContent.trim();                       // "1. The Shawshank Redemption"
-    var m = raw.match(/^(\d+)\.\s*(.+)$/);
-    var ttMatch = a.href.match(/\/title\/(tt\d+)\//);
-
-    var meta = Array.from(li.querySelectorAll(
-      '.cli-title-metadata-item, .sc-300a8231-6, span.cli-title-metadata-item'
-    )).map(function (s) { return s.textContent.trim(); });
-
-    var rating = li.querySelector('span.ipc-rating-star--rating');
-    var votes  = li.querySelector('span.ipc-rating-star--voteCount');
-
-    out.push({
-      rank:      m ? parseInt(m[1], 10) : null,
-      title:     m ? m[2] : raw,
-      tt_id:     ttMatch ? ttMatch[1] : null,
-      url:       a.href,
-      year:      meta[0] || null,        // "1994"
-      runtime:   meta[1] || null,        // "2h 22m"
-      certificate: meta[2] || null,      // "R" / "PG-13"
-      rating:    rating ? parseFloat(rating.textContent) : null,
-      votes_raw: votes ? votes.textContent.trim() : null   // "(3.1M)"
-    });
-  });
-  return JSON.stringify(out);
-})()
-"""))
-
-print(len(rows), rows[0])
-# 250 rows; rows[0]: {'rank': 1, 'title': 'The Shawshank Redemption',
-#                    'tt_id': 'tt0111161', 'year': '1994', 'runtime': '2h 22m',
-#                    'certificate': 'R', 'rating': 9.3, 'votes_raw': '(3.1M)'}
-```
-
-The title text is prefixed with the rank (e.g. `"1. The Shawshank Redemption"`) — strip it with a single regex rather than relying on a separate rank element.
-
-### Parsing the abbreviated vote count
-
-```python
-def parse_votes(raw):
-    """'(3.1M)' -> 3_100_000, '(850K)' -> 850_000, '(12,345)' -> 12345."""
-    if not raw:
-        return None
-    s = raw.strip("() ").upper().replace(",", "")
-    if s.endswith("M"): return int(float(s[:-1]) * 1_000_000)
-    if s.endswith("K"): return int(float(s[:-1]) * 1_000)
-    return int(s) if s.isdigit() else None
-```
-
----
-
-## Path 2: MovieMeter (`/chart/moviemeter`)
-
-Structurally identical to Top 250 — same `li.ipc-metadata-list-summary-item` rows, same `h3.ipc-title__text` prefix, same rating/votes spans. The chart returns 100 trending titles updated weekly.
-
-```python
-goto("https://www.imdb.com/chart/moviemeter/")
-wait_for_load()
-wait(2)
-
-# Reuse the Path 1 JS block verbatim — it works unchanged.
-rows = json.loads(js(TOP_CHART_JS))   # returns 100 rows
-```
-
-Because the row primitive is shared, any function you write for `/chart/top` works on `/chart/moviemeter`, `/chart/toptv`, `/chart/bottom`, and the box-office charts. The difference is only the row count and the semantic meaning of the rank.
-
----
-
-## Path 3: Title search (`/find/?s=tt&q=KEYWORD`)
-
-The `s=tt` param restricts results to titles (other values: `nm` people, `co` companies, `kw` keywords). IMDb has two result-item classes — older pages ship `li.find-title-result`, newer variants use the shared `li.ipc-metadata-list-summary-item`. Query with both, dedupe by `tt`-id.
-
-```python
-import urllib.parse
-from helpers import goto, wait_for_load, wait, js
-
-def imdb_search(keyword, limit=25):
-    q = urllib.parse.quote(keyword)
-    goto(f"https://www.imdb.com/find/?s=tt&q={q}")
-    wait_for_load()
-    wait(1.5)
-
-    results = json.loads(js(r"""
-    (function () {
-      var seen = new Set();
-      var out = [];
-      var rows = document.querySelectorAll(
-        'li.find-title-result, li.ipc-metadata-list-summary-item'
-      );
-      rows.forEach(function (li) {
-        var a = li.querySelector('a[href*="/title/tt"]');
-        if (!a) return;
-        var ttM = a.href.match(/\/title\/(tt\d+)\//);
-        if (!ttM || seen.has(ttM[1])) return;
-        seen.add(ttM[1]);
-
-        var tEl = li.querySelector(
-          '.ipc-metadata-list-summary-item__t, .ipc-title__text, a.ipc-metadata-list-summary-item__t'
-        );
-        var meta = Array.from(li.querySelectorAll(
-          '.ipc-metadata-list-summary-item__li, .ipc-inline-list__item'
-        )).map(function (s) { return s.textContent.trim(); }).filter(Boolean);
-
-        out.push({
-          tt_id: ttM[1],
-          title: (tEl ? tEl.textContent.trim() : a.textContent.trim()),
-          url:   a.href.split('?')[0],
-          meta:  meta                        // e.g. ['1994', 'Feature', 'Tim Robbins']
-        });
-      });
-      return JSON.stringify(out);
-    })()
-    """))
-
-    return results[:limit]
-
-hits = imdb_search("shawshank")
-# [{'tt_id': 'tt0111161', 'title': 'The Shawshank Redemption',
-#   'url': 'https://www.imdb.com/title/tt0111161/',
-#   'meta': ['1994', 'Feature', 'Tim Robbins, Morgan Freeman']}, ...]
-```
-
-The `meta` list's content varies by title type (feature / short / TV episode / video game) — don't assume a fixed positional schema.
-
----
-
-## Path 4: "More Like This" recommendations (`/title/tt{id}/`)
-
-The "More Like This" panel sits below the fold on the title page and mounts via IntersectionObserver. It is **not present** in the initial DOM — you must scroll down to trigger hydration. Two `scroll(dy=3000)` calls with a short `wait()` between them is the verified recipe.
-
-The panel's heading text ("More Like This") is stable; its container class hash is not. Find the heading by regex, then walk up to its section and scrape every `a[href*="/title/tt"]` inside. Dedupe by `tt`-id — each card has the title link twice (poster + text).
-
-```python
-from helpers import goto, wait_for_load, wait, scroll, js
-import json, re
-
-def more_like_this(tt_id, limit=12):
-    goto(f"https://www.imdb.com/title/{tt_id}/")
-    wait_for_load()
-    wait(2)
-
-    # Force lazy-load: two big scrolls beat the IntersectionObserver threshold.
-    # Positive dy scrolls DOWN in CDP's mouseWheel convention.
-    scroll(500, 500, dy=3000); wait(1.0)
-    scroll(500, 500, dy=3000); wait(1.2)
-
-    recs = json.loads(js(r"""
-    (function () {
-      // Find the "More Like This" heading by regex (class hash is unstable).
-      var heading = Array.from(document.querySelectorAll(
-        'h3, h2, [data-testid*="more"], span.ipc-title__text'
-      )).find(function (el) { return /more like this/i.test(el.textContent); });
-      if (!heading) return '[]';
-
-      // Walk up to the enclosing section.
-      var section = heading.closest('section, [data-testid*="MoreLikeThis"], div');
-      for (var i = 0; i < 5 && section && section.querySelectorAll('a[href*="/title/tt"]').length < 2; i++) {
-        section = section.parentElement;
-      }
-      if (!section) return '[]';
-
-      var seen = new Set();
-      var out  = [];
-      section.querySelectorAll('a[href*="/title/tt"]').forEach(function (a) {
-        var m = a.href.match(/\/title\/(tt\d+)\//);
-        if (!m || seen.has(m[1]) || m[1] === arguments[0]) return;
-        seen.add(m[1]);
-
-        // The card root usually sits a few levels up from the link.
-        var card = a.closest(
-          '[data-testid*="MoreLikeThis"], .ipc-poster-card, .ipc-sub-grid-item, li'
-        ) || a.parentElement;
-
-        var titleEl = card ? card.querySelector(
-          '.ipc-title__text, [data-testid*="title"], span'
-        ) : null;
-        var rating  = card ? card.querySelector('span.ipc-rating-star--rating') : null;
-
-        var txt = (titleEl ? titleEl.textContent : a.textContent).trim();
-        // Some titles are prefixed with a rank number on chart-embedded cards.
-        txt = txt.replace(/^\d+\.\s*/, '');
-        if (!txt) return;
-
-        out.push({
-          tt_id:  m[1],
-          title:  txt,
-          url:    a.href.split('?')[0],
-          rating: rating ? parseFloat(rating.textContent) : null
-        });
-      });
-      return JSON.stringify(out);
-    })()
-    """))
-
-    # Drop the source title itself if it slipped through.
-    recs = [r for r in recs if r["tt_id"] != tt_id]
-    return recs[:limit]
-
-recs = more_like_this("tt0111161")   # Shawshank -> ~12 recommendations
-# [{'tt_id': 'tt0068646', 'title': 'The Godfather', 'rating': 9.2, ...}, ...]
-```
-
----
-
-## Gotchas
-
-**`li.ipc-metadata-list-summary-item` is IMDb's universal list row.** It's reused on chart pages, search results, company credits, and people filmographies. Before writing a bespoke selector for a new IMDb list page, try this one first — it probably works.
-
-**Title text includes the rank prefix on chart pages.** `h3.ipc-title__text` contains `"1. The Shawshank Redemption"`, not `"The Shawshank Redemption"`. Strip with `/^(\d+)\.\s*(.+)$/` — don't assume the rank lives in a separate element.
-
-**Votes are abbreviated and parenthesised.** `span.ipc-rating-star--voteCount` returns `(3.1M)`, `(850K)`, `(12,345)`. Trim the parens, uppercase, and dispatch on trailing `M`/`K` — see `parse_votes` above.
-
-**CSS class hashes churn.** Selectors like `.sc-300a8231-6` (a styled-components hash) WILL break — use them only as a last-resort fallback alongside the stable `ipc-*` names.
-
-**"More Like This" requires scroll to mount.** It is mounted via IntersectionObserver and will not appear in the initial DOM. Two `scroll(500, 500, dy=3000)` calls with a `wait()` after each is the verified minimum. Positive `dy` is scroll-down in CDP's mouseWheel convention (matches the reddit/facebook domain-skills).
-
-**Heading text is the stable anchor, not the container class.** IMDb rewrites its layout containers frequently; the strings "More Like This", "Top Cast", and "User reviews" are kept stable for SEO. Always locate panels by heading-regex and walk up to find the section root.
-
-**Each recommendation card has the title link twice** — once wrapping the poster, once wrapping the text label. Always dedupe by the `tt`-id parsed out of `href`, not by element count.
-
-**Search result classes come in two flavours.** Older result pages use `li.find-title-result`; newer ones use the shared `li.ipc-metadata-list-summary-item`. Query both in one `querySelectorAll` call.
-
-**`s=tt` restricts to titles only.** Drop the param (or use `s=all`) and the result set mixes in people (`nm*`), companies (`co*`), keywords, and user lists. The `tt`-id dedupe still works, but the result shape changes.
-
-**`http_get` works on title pages.** If you only need year, runtime, directors, cast, and the aggregate rating, a plain `http_get(f"https://www.imdb.com/title/{tt_id}/")` gives you the `<script type="application/ld+json">` Movie block in one request — no browser needed. Use the browser only for the chart, search, and "More Like This" panels.
-
----
-
-## Why this skill exists
-
-Built 2026-04-24 for the LOMA3 "Web Data in Empirical Research" Module III demo, which replicates Fig 4B of Foerderer (2023) — using IMDb's Top 250 as a ground-truth ranked list against which a naïve keyword-search scraper's recall is measured. The four paths above are exactly what that demo exercises live in class.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/itch-io/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/itch-io/scraping.md
deleted file mode 100644
index b36a3bd94..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/itch-io/scraping.md
+++ /dev/null
@@ -1,436 +0,0 @@
-# itch.io — Scraping & Data Extraction
-
-Field-tested against itch.io on 2026-04-18. All code blocks validated with live requests.
-
----
-
-## TL;DR — fastest approaches by task
-
-| Task | Method | Notes |
-|---|---|---|
-| Browse listings (36/page) | `http_get` HTML | Works, no key, no bot block |
-| Game detail (name, price, rating) | `http_get` + JSON-LD | `<script type="application/ld+json">` Product block |
-| Info table (tags, genre, status) | `http_get` + regex on `game_info_panel_widget` | Always present |
-| Top N games from any category | RSS `.xml` feed | Cleaner than HTML for bulk |
-| API (key endpoints) | `http_get` + key in path | Free keys at itch.io/docs/api |
-| Download/purchase counts | Not public | Owners only via dashboard |
-
-`http_get` works on all itch.io game and browse pages with no extra headers needed.
-No Cloudflare, no JS challenge, no CAPTCHA on standard game/browse routes.
-
----
-
-## Approach 1 (Fastest for listings): RSS feeds — 36 games per call, clean XML
-
-Every browse URL has an `.xml` RSS variant. Returns price, pub/update dates, platforms, thumbnail. No HTML parsing.
-
-```python
-import re
-from helpers import http_get
-
-def parse_rss(url):
-    """
-    Parse any itch.io RSS listing feed.
-    url examples:
-      https://itch.io/games/top-rated.xml
-      https://itch.io/games/newest.xml
-      https://itch.io/games/featured.xml
-      https://itch.io/games/on-sale.xml
-      https://itch.io/games/free.xml
-      https://itch.io/games/tag-puzzle.xml    # any tag slug works
-      https://itch.io/games/top-rated.xml?page=2
-    """
-    xml = http_get(url)
-    items = []
-    for m in re.finditer(r'<item>(.*?)</item>', xml, re.DOTALL):
-        ix = m.group(1)
-        def get(tag, s=ix):
-            tm = re.search(rf'<{tag}>(.*?)</{tag}>', s, re.DOTALL)
-            return tm.group(1).strip() if tm else None
-        items.append({
-            'url':         get('guid'),
-            'title':       get('plainTitle'),          # clean title, no [tags]
-            'price':       get('price'),               # "$0.00", "$7.99", etc.
-            'currency':    get('currency'),            # "USD"
-            'pub_date':    get('pubDate'),
-            'update_date': get('updateDate'),
-            'image':       get('imageurl'),            # 315x250 thumbnail
-            'platforms': {
-                k: get(k) == 'yes'
-                for k in ['windows', 'osx', 'linux', 'android', 'html']
-                if get(k) is not None
-            },
-        })
-    return items
-
-# Confirmed output:
-items = parse_rss("https://itch.io/games/top-rated.xml")
-# items[0] -> {
-#   'url':         'https://gbpatch.itch.io/our-life',
-#   'title':       'Our Life: Beginnings & Always',
-#   'price':       '$0.00',
-#   'currency':    'USD',
-#   'pub_date':    'Fri, 07 Jun 2019 23:47:57 GMT',
-#   'update_date': 'Sun, 22 May 2022 15:48:27 GMT',
-#   'image':       'https://img.itch.zone/aW1nLzcwMTIxNDMucG5n/315x250%23c/BalGQb.png',
-#   'platforms':   {'windows': True, 'osx': True, 'linux': True, 'android': True},
-# }
-```
-
-**RSS limitations:** no rating score or count. Use HTML scraping (Approach 2) when you need ratings.
-
----
-
-## Approach 2: HTML listings — ratings, genre, price, 36 games per page
-
-```python
-import re
-from helpers import http_get
-
-def parse_game_cards(html):
-    """
-    Extract all game cards from any itch.io browse/listing/search/profile HTML page.
-    Works on:
-      https://itch.io/games/top-rated
-      https://itch.io/games/newest
-      https://itch.io/games/featured
-      https://itch.io/games/on-sale
-      https://itch.io/games/free
-      https://itch.io/games/tag-puzzle      (genre/tag path)
-      https://itch.io/search?q=platformer   (search — 54 cards per page)
-      https://<author>.itch.io              (author profile)
-    All accept ?page=N for pagination.
-    """
-    games = []
-    for m in re.finditer(r'data-game_id="(\d+)"', html):
-        game_id = m.group(1)
-        start = m.start()
-        chunk = html[start:start + 3000]
-
-        # Title + URL — attribute order differs between page 1 and pages 2+
-        title_m = re.search(
-            r'class="title game_link"[^>]*href="([^"]+)"[^>]*>([^<]+)</a>', chunk
-        )
-        if not title_m:
-            title_m = re.search(
-                r'href="([^"]+)"[^>]*class="title game_link"[^>]*>([^<]+)</a>', chunk
-            )
-
-        rating_m  = re.search(
-            r'data-tooltip="([\d.]+) average rating from ([\d,]+) total ratings"', chunk
-        )
-        genre_m   = re.search(r'class="game_genre">([^<]+)</div>', chunk)
-        price_m   = re.search(r'class="price_value">([^<]+)</div>', chunk)
-        desc_m    = re.search(r'class="game_text" title="([^"]+)"', chunk)
-        img_m     = re.search(r'data-lazy_src="([^"]+)"', chunk)
-        platforms = re.findall(r'title="Download for ([^"]+)"', chunk)
-
-        games.append({
-            'id':           game_id,
-            'url':          title_m.group(1) if title_m else None,
-            'title':        title_m.group(2).strip() if title_m else None,
-            'rating':       float(rating_m.group(1)) if rating_m else None,
-            'rating_count': int(rating_m.group(2).replace(',', '')) if rating_m else None,
-            'genre':        genre_m.group(1) if genre_m else None,
-            'price':        price_m.group(1) if price_m else 'Free',
-            'description':  desc_m.group(1) if desc_m else None,
-            'thumbnail':    img_m.group(1) if img_m else None,
-            'platforms':    platforms,      # ['Windows', 'macOS', 'Linux', 'Android']
-        })
-    return games
-
-# Usage:
-html = http_get("https://itch.io/games/top-rated")
-games = parse_game_cards(html)
-# games[0] -> {
-#   'id': '434554', 'url': 'https://gbpatch.itch.io/our-life',
-#   'title': 'Our Life: Beginnings & Always',
-#   'rating': 4.94, 'rating_count': 7191,
-#   'genre': 'Visual Novel', 'price': 'Free',
-#   'platforms': ['Windows', 'Linux', 'macOS', 'Android'],
-# }
-
-# Paid game example:
-html = http_get("https://itch.io/games/top-rated?page=5")
-games = parse_game_cards(html)
-# Returns games where price_m captures '$7.99' when present
-```
-
-### CSS selector reference (for browser/JS use)
-
-```
-.game_cell                        — one card per game
-.game_cell[data-game_id]          — get game ID from attribute
-.game_cell .title.game_link       — title text + href
-.game_cell .game_rating           — rating container
-.game_cell .game_rating[data-tooltip]  — "4.94 average rating from 7,191 total ratings"
-.game_cell .star_fill             — inline style width: NN% (rating as percentage of 5)
-.game_cell .rating_count          — "(7,191)"
-.game_cell .game_genre            — genre text
-.game_cell .price_tag .price_value — price e.g. "$7.99" (absent = Free)
-.game_cell .game_text             — one-line description (also in title attr)
-.game_cell .game_author a         — author name + href
-.game_cell img.lazy_loaded        — thumbnail (src in data-lazy_src before JS runs)
-```
-
-**Gotcha — attribute order flips on page >= 2.** Page 1 uses `class="..." data-game_id="..."`, page 2+ uses `data-game_id="..." class="..."`. The regex above handles both. If you use a CSS selector engine, `[data-game_id]` is unambiguous.
-
-**Gotcha — ratings absent on some listing types.** The tag/genre browse pages (e.g. `/games/tag-puzzle`) sometimes omit the rating tooltip on the card even when the game has ratings. Fetch the detail page for the authoritative rating.
-
----
-
-## Approach 3: Game detail page — JSON-LD Product schema
-
-The cleanest source for individual game data. All confirmed fields:
-
-```python
-import json, re
-from helpers import http_get
-
-def extract_game_detail(url):
-    """
-    Fetch full metadata for a single itch.io game.
-    url format: https://<author>.itch.io/<game-slug>
-    """
-    html = http_get(url)
-
-    # --- JSON-LD (always present, covers name/description/price/rating) ---
-    ld_product = None
-    for block in re.findall(
-        r'<script[^>]*type="application/ld\+json"[^>]*>(.*?)</script>',
-        html, re.DOTALL
-    ):
-        ld = json.loads(block.strip())
-        if ld.get('@type') == 'Product':
-            ld_product = ld
-            break
-
-    # --- Info panel table (Status, Platforms, Genre, Tags, Author, etc.) ---
-    info = {}
-    panel_m = re.search(
-        r'class="game_info_panel_widget[^"]*"[^>]*><table>(.*?)</table>',
-        html, re.DOTALL
-    )
-    if panel_m:
-        for row in re.finditer(
-            r'<tr><td>([^<]+)</td><td>(.*?)</td></tr>',
-            panel_m.group(1), re.DOTALL
-        ):
-            key = row.group(1).strip()
-            val = re.sub(r'<[^>]+>', '', row.group(2)).strip()
-            # Multi-value fields become lists (Tags, Platforms, Genre, Links)
-            info[key] = [v.strip() for v in val.split(',')] if ',' in val else val
-
-    # --- Cover image ---
-    cover_m = re.search(r'<meta property="og:image" content="([^"]+)"', html)
-
-    offers = (ld_product or {}).get('offers', {})
-    agg    = (ld_product or {}).get('aggregateRating', {})
-
-    return {
-        'url':          url,
-        'name':         (ld_product or {}).get('name'),
-        'description':  (ld_product or {}).get('description'),
-        'price':        offers.get('price'),          # "0.00" for free, "7.99" for paid
-        'currency':     offers.get('priceCurrency'),  # "USD"
-        'rating':       agg.get('ratingValue'),       # "4.9" string
-        'rating_count': agg.get('ratingCount'),       # int
-        'cover':        cover_m.group(1) if cover_m else None,
-        'info':         info,
-    }
-
-# Free game:
-r = extract_game_detail("https://gbpatch.itch.io/our-life")
-# {
-#   'name': 'Our Life: Beginnings & Always',
-#   'description': 'Grow from childhood to adulthood with the lonely boy next door...',
-#   'price': None, 'currency': None,   <- no 'offers' block for free games
-#   'rating': '4.9', 'rating_count': 7191,
-#   'cover': 'https://img.itch.zone/aW1hZ2Uv.../347x500/7HqrvV.jpg',
-#   'info': {
-#     'Status':    'Released',
-#     'Platforms': ['Windows', 'macOS', 'Linux', 'Android'],
-#     'Rating':    'Rated 4.9 out of 5 stars(7,191 total ratings)',
-#     'Author':    'GBPatch',
-#     'Genre':     ['Visual Novel', 'Interactive Fiction'],
-#     'Tags':      ['Amare', 'Comedy', 'Dating Sim', 'Gay', 'LGBT', ...],
-#     'Links':     'Steam',
-#   }
-# }
-
-# Paid game:
-r = extract_game_detail("https://adamgryu.itch.io/a-short-hike")
-# {
-#   'name': 'A Short Hike',
-#   'price': '7.99', 'currency': 'USD',
-#   'rating': '4.9', 'rating_count': 4307,
-#   'info': {
-#     'Status':          'Released',
-#     'Platforms':       ['Windows', 'macOS', 'Linux'],
-#     'Release date':    'Jul 30, 2019',
-#     'Genre':           ['Adventure', 'Platformer'],
-#     'Made with':       'Unity',
-#     'Tags':            ['3D', 'Atmospheric', 'Cute', 'Relaxing', 'Short', ...],
-#     'Average session': 'About an hour',
-#     'Languages':       ['English', 'Spanish; Latin America', 'French', ...],
-#     'Inputs':          ['Keyboard', 'Mouse', 'Xbox controller', ...],
-#     'Accessibility':   ['Subtitles', 'Configurable controls'],
-#     'Links':           ['Steam', 'Homepage', 'Soundtrack', 'Twitter/X'],
-#   }
-# }
-```
-
-**JSON-LD available fields:**
-
-| Field | Free game | Paid game |
-|---|---|---|
-| `@type` | `Product` | `Product` |
-| `name` | yes | yes |
-| `description` | yes | yes |
-| `aggregateRating.ratingValue` | yes | yes |
-| `aggregateRating.ratingCount` | yes | yes |
-| `offers.price` | absent | yes ("7.99") |
-| `offers.priceCurrency` | absent | yes ("USD") |
-| `offers.seller.name` | absent | yes (author name) |
-| `offers.seller.url` | absent | yes (author profile URL) |
-
----
-
-## Pagination
-
-Browse pages: `?page=N`. Detect end of results by HTTP 404 (page too high) or absent `<link rel="next">`.
-
-```python
-import re
-from helpers import http_get
-
-def paginate_listing(base_url, max_pages=10):
-    """
-    Scrape multiple pages from any itch.io browse URL.
-    base_url: https://itch.io/games/top-rated  (no ?page= suffix)
-    Returns flat list of game dicts.
-    Stops when HTTP 404 or no <link rel="next"> found.
-    """
-    all_games = []
-    page = 1
-    while page <= max_pages:
-        url = base_url if page == 1 else f"{base_url}?page={page}"
-        try:
-            html = http_get(url)
-        except Exception:
-            break   # 404 = past last page
-        all_games.extend(parse_game_cards(html))
-        if not re.search(r'<link[^>]+rel="next"[^>]*/>', html):
-            break
-        page += 1
-    return all_games
-
-# Confirmed: page 1 has <link href="?page=2" rel="next"/>
-#            page 2 has <link rel="prev" href="/games/top-rated"/> and <link rel="next" href="?page=3"/>
-#            past last page returns HTTP 404
-# top-rated has at least 200 pages (each 36 games); page 300+ -> 404
-```
-
----
-
-## Browse URL patterns
-
-All confirmed working via `http_get`:
-
-```python
-BASE = "https://itch.io/games"
-
-# Sort orders
-f"{BASE}/top-rated"      # all-time top rated (rated by community, 0–5 stars)
-f"{BASE}/newest"         # most recently published
-f"{BASE}/featured"       # itch.io staff picks
-f"{BASE}/on-sale"        # discounted games
-f"{BASE}/free"           # free games only
-
-# Genre/tag paths (append .xml for RSS)
-f"{BASE}/tag-puzzle"     # tag slug — prefix with 'tag-'
-f"{BASE}/genre-action"   # genre — prefix with 'genre-' (less common)
-
-# Combine: tag + sort via separate pages (no combined URL that survives http_get)
-# Note: https://itch.io/games/top-rated/tag-puzzle -> HTTP 403
-# Note: ?tag= query param does NOT filter server-side (returns same games)
-
-# Pagination
-f"{BASE}/top-rated?page=2"
-f"{BASE}/tag-puzzle?page=3"
-
-# RSS equivalents (36 items, no pagination needed for small sets)
-f"{BASE}/top-rated.xml"
-f"{BASE}/tag-puzzle.xml"
-f"{BASE}/tag-puzzle.xml?page=2"
-
-# Search (54 results/page, no server-side pagination beyond page 1 via http_get)
-"https://itch.io/search?q=platformer"
-
-# Author profile
-"https://<author-slug>.itch.io"
-```
-
----
-
-## API (requires key)
-
-itch.io has an official REST API. A free key is issued per-account with no rate limit published.
-Get one at: `https://itch.io/user/settings/api-keys`
-
-Base URL: `https://itch.io/api/1/<key>/`
-
-```python
-import json
-from helpers import http_get
-
-ITCH_KEY = "your_api_key_here"   # from https://itch.io/user/settings/api-keys
-
-def api(path):
-    return json.loads(http_get(f"https://itch.io/api/1/{ITCH_KEY}/{path}"))
-
-# Authenticated user info
-api("me")
-# -> {"user": {"id": ..., "username": "...", "url": "...", "display_name": "...", ...}}
-
-# Games owned by authenticated user
-api("my-games")
-# -> {"games": [{"id": ..., "title": "...", "url": "...", "created_at": "...",
-#                "published": true/false, "min_price": 0, ...}, ...]}
-
-# Download keys for a game (owner only)
-api("game/434554/download_keys")
-
-# Credentials (for authenticated purchases)
-api("game/434554/credentials")
-```
-
-**Error structure:** invalid/missing key returns `{"errors": ["invalid key"]}` with HTTP 200.
-Non-existent endpoints return HTTP 404.
-
-**No unauthenticated game lookup API.** `https://itch.io/api/1/x/games` -> HTTP 404.
-Use HTML scraping or RSS for unauthenticated game data.
-
----
-
-## Gotchas
-
-1. **Attribute order flips page 1 vs 2+.** On page 1, game cards use `class="game_cell ..." data-game_id="..."`. On pages 2+, the order is `data-game_id="..." class="game_cell ..."`. Always match `data-game_id` independently of class ordering.
-
-2. **Ratings absent on tag/genre listing pages.** The `data-tooltip` with rating is often missing from card HTML on `/games/tag-*` pages even though the game has ratings. Fetch the detail page for `aggregateRating` via JSON-LD.
-
-3. **`price_value` absent = Free.** Paid games have `<div class="price_tag meta_tag" title="Pay $7.99 or more..."><div class="price_value">$7.99</div></div>`. Free games have no such element. Default to `'Free'` when absent.
-
-4. **Free-game JSON-LD has no `offers` block.** Only paid games include the `offers` object. For free games, use absence of `offers` as the signal, not presence of `price: 0`.
-
-5. **`/games/top-rated/tag-puzzle` returns HTTP 403.** Cannot combine sort + tag in a path. Use separate `/games/tag-puzzle` (top-rated is the default sort anyway).
-
-6. **`?tag=` query param is ignored server-side.** `https://itch.io/games/top-rated?tag=puzzle` returns the same games as `?top-rated`. Use `/games/tag-puzzle` path instead.
-
-7. **Download/purchase counts are not public.** No count field appears anywhere in the public HTML, JSON-LD, RSS, or unauthenticated API. Game owners see their stats in the dashboard only.
-
-8. **Search beyond page 1 is AJAX-only.** `https://itch.io/search?q=X&page=2` via `http_get` returns the same 54 results as page 1. To get more search results use the browser and scroll/click "load more".
-
-9. **RSS is capped at 36 items per page.** Paginate with `?page=N`. Very high page numbers (300+) return HTTP 404 on browse pages.
-
-10. **Unicode zero-width space in some titles.** `\u200b` (zero-width space) appears at the start of certain titles (e.g. "​Our Life: Beginnings & Always"). Strip with `.replace('\u200b', '').strip()` or `.strip()` alone won't remove it — use `title.replace('\u200b', '').strip()`.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md
deleted file mode 100644
index 7d14bf5ba..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md
+++ /dev/null
@@ -1,1021 +0,0 @@
-# Job Boards — Indeed, Glassdoor, Stepstone
-
-Covers: `indeed.com`, `glassdoor.com`, `stepstone.de`
-
----
-
-## Do this first: construct search URLs directly
-
-Never type into the search box on the homepage — bot detection triggers immediately. Build search URLs directly and navigate straight to results.
-
-```python
-from urllib.parse import quote_plus
-
-# Indeed — English (US)
-query, location = "Python developer", "San Francisco"
-goto_url(f"https://www.indeed.com/jobs?q={quote_plus(query)}&l={quote_plus(location)}")
-wait_for_load()
-wait(2)
-
-# Indeed — last 24 hours
-goto_url(f"https://www.indeed.com/jobs?q={quote_plus(query)}&l={quote_plus(location)}&fromage=1")
-wait_for_load()
-wait(2)
-
-# Glassdoor — public search (no login required for result cards)
-goto_url(f"https://www.glassdoor.com/Job/jobs.htm?sc.keyword={quote_plus(query)}")
-wait_for_load()
-wait(2)
-
-# Stepstone (Germany)
-keyword, city = "Data Scientist", "Berlin"
-goto_url(f"https://www.stepstone.de/jobs/{quote_plus(keyword)}/in-{quote_plus(city)}.html")
-wait_for_load()
-wait(2)
-```
-
----
-
-## URL patterns
-
-### Indeed
-
-| Goal | URL pattern |
-|---|---|
-| Keyword + location | `/jobs?q={title}&l={location}` |
-| Last 24 hours | `/jobs?q={title}&l={location}&fromage=1` |
-| Last 3 days | `/jobs?q={title}&l={location}&fromage=3` |
-| Last week | `/jobs?q={title}&l={location}&fromage=7` |
-| Remote only | `/jobs?q={title}&remotejob=032b3046-06a3-4876-8dfd-474eb5e7ed11` |
-| Full-time only | `/jobs?q={title}&l={location}&jt=fulltime` |
-| Part-time | `/jobs?q={title}&l={location}&jt=parttime` |
-| With salary | `/jobs?q={title}&l={location}&rbl=%24{min}%2B` |
-| Page 2 (results 11-20) | append `&start=10` |
-| Page 3 (results 21-30) | append `&start=20` |
-| Job detail page | `https://www.indeed.com/viewjob?jk={job_key}` |
-
-**Indeed country variants**: `.co.uk`, `.de`, `.fr`, `.com.au` — same URL structure, different base domain.
-
-### Glassdoor
-
-| Goal | URL pattern |
-|---|---|
-| Keyword search | `/Job/jobs.htm?sc.keyword={title}` |
-| Keyword + city name | `/Job/jobs.htm?sc.keyword={title}&locT=C&locKeyword={city}` |
-| Remote filter | `/Job/jobs.htm?sc.keyword={title}&remoteWorkType=1` |
-| Next page | append `&p=2`, `&p=3` |
-
-### Stepstone (Germany)
-
-| Goal | URL pattern |
-|---|---|
-| Keyword in city | `/jobs/{keyword}/in-{city}.html` |
-| Page 2 | `/jobs/{keyword}/in-{city}/page-2.html` |
-| Page 3 | `/jobs/{keyword}/in-{city}/page-3.html` |
-| Full-time | `/jobs/{keyword}/in-{city}.html?of=1` |
-
-For Stepstone, keyword and city go directly in the path — encode spaces as `-`:
-```python
-kw_path = keyword.replace(" ", "-")
-city_path = city.replace(" ", "-")
-goto_url(f"https://www.stepstone.de/jobs/{kw_path}/in-{city_path}.html")
-```
-
----
-
-## Cookie / consent banner dismissal
-
-Indeed (EU/UK) and Glassdoor show GDPR consent overlays. Dismiss before extraction.
-
-```python
-def dismiss_cookie_banner():
-    """Try common consent button patterns. Safe to call even if no banner is present."""
-    dismissed = js("""
-    (function() {
-      // Indeed: "Accept all cookies" button
-      var selectors = [
-        'button[id*="onetrust-accept"]',
-        'button[id*="accept-all"]',
-        '#onetrust-accept-btn-handler',
-        'button[data-testid="cookie-consent-accept"]',
-        // Glassdoor: consent modal
-        'button[data-test="accept-cookies"]',
-        // Generic patterns
-        'button[class*="accept"]',
-        'button[class*="consent"]',
-      ];
-      for (var i = 0; i < selectors.length; i++) {
-        var btn = document.querySelector(selectors[i]);
-        if (btn && btn.offsetParent !== null) {
-          btn.click();
-          return selectors[i];
-        }
-      }
-      return null;
-    })()
-    """)
-    if dismissed:
-        wait(1)
-    return dismissed
-```
-
-Call immediately after `wait_for_load()` on `.co.uk`, `.de`, or `glassdoor.com`:
-
-```python
-goto_url("https://www.indeed.co.uk/jobs?q=Python+developer&l=London")
-wait_for_load()
-wait(2)
-dismiss_cookie_banner()
-wait(1)
-```
-
----
-
-## Workflow 1: Indeed — search result card extraction
-
-Each result card on Indeed carries a `data-jk` attribute (the job key). Use it to construct direct URLs.
-
-```python
-import json
-from urllib.parse import quote_plus
-
-query, location = "machine learning engineer", "New York"
-goto_url(f"https://www.indeed.com/jobs?q={quote_plus(query)}&l={quote_plus(location)}")
-wait_for_load()
-wait(2)
-dismiss_cookie_banner()
-
-jobs = js("""
-(function() {
-  // Cards live in <div data-jk="..."> or <li> with data-jk attribute
-  var cards = document.querySelectorAll('[data-jk]');
-  var out = [];
-  for (var i = 0; i < cards.length; i++) {
-    var c = cards[i];
-    var jk = c.getAttribute('data-jk') || '';
-    if (!jk) continue;
-
-    // Title
-    var titleEl = c.querySelector('h2.jobTitle span[title], h2.jobTitle span:not(.visually-hidden), [data-testid="job-title"]');
-    var title = titleEl ? titleEl.innerText.trim() : '';
-
-    // Company name
-    var compEl = c.querySelector('[data-testid="company-name"], .companyName, span[data-testid="company-name"]');
-    var company = compEl ? compEl.innerText.trim() : '';
-
-    // Location
-    var locEl = c.querySelector('[data-testid="text-location"], .companyLocation');
-    var location = locEl ? locEl.innerText.trim() : '';
-
-    // Salary — may not always be present in the card
-    var salEl = c.querySelector('[data-testid="attribute_snippet_testid"], .salary-snippet-container, .metadata.salary-snippet');
-    var salary = salEl ? salEl.innerText.trim() : '';
-
-    // Posting date / age
-    var dateEl = c.querySelector('[data-testid="myJobsStateDate"], span.date, .result-link-bar-container .date');
-    var posted = dateEl ? dateEl.innerText.trim() : '';
-
-    // Direct URL via job key
-    var url = 'https://www.indeed.com/viewjob?jk=' + jk;
-
-    if (title) {
-      out.push({jk, title, company, location, salary, posted, url});
-    }
-  }
-  return JSON.stringify(out);
-})()
-""")
-
-results = json.loads(jobs)
-for r in results:
-    print(r)
-# Typically returns 10–15 cards per page
-```
-
----
-
-## Workflow 2: Indeed — pagination (multi-page extraction)
-
-Indeed paginates using `&start=N` where N increments by 10 per page.
-
-```python
-import json
-from urllib.parse import quote_plus
-
-query, location = "data scientist", "remote"
-base_url = f"https://www.indeed.com/jobs?q={quote_plus(query)}&l={quote_plus(location)}"
-
-all_jobs = []
-
-for page in range(3):   # 3 pages = up to ~30 results
-    start = page * 10
-    url = base_url if start == 0 else f"{base_url}&start={start}"
-    goto_url(url)
-    wait_for_load()
-    wait(2)   # mandatory — bot detection is aggressive on rapid loads
-
-    if page == 0:
-        dismiss_cookie_banner()
-
-    batch_json = js("""
-    (function() {
-      var cards = document.querySelectorAll('[data-jk]');
-      var out = [];
-      for (var i = 0; i < cards.length; i++) {
-        var c = cards[i];
-        var jk = c.getAttribute('data-jk') || '';
-        if (!jk) continue;
-        var titleEl = c.querySelector('h2.jobTitle span[title], [data-testid="job-title"]');
-        var compEl  = c.querySelector('[data-testid="company-name"], .companyName');
-        var locEl   = c.querySelector('[data-testid="text-location"], .companyLocation');
-        var salEl   = c.querySelector('[data-testid="attribute_snippet_testid"], .salary-snippet-container');
-        var dateEl  = c.querySelector('[data-testid="myJobsStateDate"], span.date');
-        out.push({
-          jk,
-          title:   titleEl ? titleEl.innerText.trim() : '',
-          company: compEl  ? compEl.innerText.trim()  : '',
-          location: locEl  ? locEl.innerText.trim()   : '',
-          salary:  salEl   ? salEl.innerText.trim()   : '',
-          posted:  dateEl  ? dateEl.innerText.trim()  : '',
-          url: 'https://www.indeed.com/viewjob?jk=' + jk,
-        });
-      }
-      return JSON.stringify(out.filter(j => j.title));
-    })()
-    """)
-
-    batch = json.loads(batch_json)
-    if not batch:
-        break   # no results on this page — stop
-    all_jobs.extend(batch)
-
-print(f"Collected {len(all_jobs)} jobs across {page+1} pages")
-```
-
-**For `fromage` (date filter) + pagination**: keep the `fromage` param in the base URL:
-```python
-base_url = f"https://www.indeed.com/jobs?q={quote_plus(query)}&l={quote_plus(location)}&fromage=1"
-```
-
----
-
-## Workflow 3: Indeed — job detail page extraction
-
-Fetch the full job description from the detail page. The `viewjob?jk=` URL is canonical and stable.
-
-```python
-import json, re
-
-def get_indeed_job_detail(jk: str) -> dict:
-    """Fetch full job details from an Indeed job key."""
-    goto_url(f"https://www.indeed.com/viewjob?jk={jk}")
-    wait_for_load()
-    wait(2)
-
-    detail = js("""
-    (function() {
-      // Title
-      var titleEl = document.querySelector('[data-testid="jobsearch-JobInfoHeader-title"], h1.jobsearch-JobInfoHeader-title');
-      var title = titleEl ? titleEl.innerText.trim() : '';
-
-      // Company
-      var compEl = document.querySelector('[data-testid="inlineHeader-companyName"] a, [data-company-name="true"]');
-      var company = compEl ? compEl.innerText.trim() : '';
-
-      // Location
-      var locEl = document.querySelector('[data-testid="inlineHeader-companyLocation"], [data-testid="job-location"]');
-      var location = locEl ? locEl.innerText.trim() : '';
-
-      // Salary — shown when available in header
-      var salEl = document.querySelector('[data-testid="jobsearch-OtherJobDetailsContainer"] [aria-label*="alary"], #salaryInfoAndJobType span');
-      var salary = salEl ? salEl.innerText.trim() : '';
-
-      // Full job description text
-      var descEl = document.getElementById('jobDescriptionText');
-      var description = descEl ? descEl.innerText.trim() : '';
-
-      // Job type (Full-time, Part-time, Contract, etc.)
-      var typeEl = document.querySelector('[data-testid="attribute_snippet_testid"]');
-      var jobType = typeEl ? typeEl.innerText.trim() : '';
-
-      // "Apply on company site" link — external application URL
-      var externalBtn = document.querySelector('[data-jk][href*="indeed.com/applystart"], a[href*="indeed.com/applystart"]');
-      var externalUrl = externalBtn ? externalBtn.href : '';
-
-      return JSON.stringify({title, company, location, salary, jobType, description, externalUrl});
-    })()
-    """)
-    return json.loads(detail)
-
-# Example
-detail = get_indeed_job_detail("abc123def456xyz")
-print(detail["title"], "—", detail["salary"])
-print(detail["description"][:500])  # first 500 chars
-```
-
----
-
-## Workflow 4: Glassdoor — search result extraction
-
-Glassdoor shows a login modal after a few scrolls. Extract cards from the first visible load before triggering that wall.
-
-```python
-import json
-from urllib.parse import quote_plus
-
-query = "product manager"
-goto_url(f"https://www.glassdoor.com/Job/jobs.htm?sc.keyword={quote_plus(query)}")
-wait_for_load()
-wait(3)   # Glassdoor JS rendering takes longer
-
-# Dismiss cookie banner if present
-dismiss_cookie_banner()
-
-# Extract cards before any scroll (avoid triggering login modal)
-jobs = js("""
-(function() {
-  // Glassdoor job cards: li[data-jobid] or article[data-id]
-  var cards = document.querySelectorAll('li[data-jobid], li[class*="JobsList_jobListItem"]');
-  if (!cards.length) {
-    // Fallback: try generic article cards
-    cards = document.querySelectorAll('[data-test="jobListing"], [id^="job-listing-"]');
-  }
-  var out = [];
-  for (var i = 0; i < cards.length; i++) {
-    var c = cards[i];
-
-    // Job ID (used for canonical URL)
-    var jobId = c.getAttribute('data-jobid') || c.getAttribute('data-id') || '';
-
-    // Title
-    var titleEl = c.querySelector('[data-test="job-title"], a[class*="JobCard_jobTitle"], .job-title');
-    var title = titleEl ? titleEl.innerText.trim() : '';
-
-    // Company
-    var compEl = c.querySelector('[data-test="employer-name"], [class*="JobCard_employer"], .employer-name');
-    var company = compEl ? compEl.innerText.trim() : '';
-
-    // Location
-    var locEl = c.querySelector('[data-test="emp-location"], [class*="JobCard_location"], .location');
-    var location = locEl ? locEl.innerText.trim() : '';
-
-    // Salary estimate (not always shown in card)
-    var salEl = c.querySelector('[data-test="detailSalary"], [class*="salary"], .salaryEstimate');
-    var salary = salEl ? salEl.innerText.trim() : '';
-
-    // Company rating
-    var ratingEl = c.querySelector('[data-test="rating"], [class*="ratingNumber"], .rating');
-    var rating = ratingEl ? ratingEl.innerText.trim() : '';
-
-    // Canonical URL
-    var linkEl = c.querySelector('a[href*="/job-listing/"], a[href*="glassdoor.com/job"]');
-    var url = linkEl ? linkEl.href : (jobId ? 'https://www.glassdoor.com/job-listing/glassdoor-jl' + jobId + '.htm' : '');
-
-    if (title) out.push({jobId, title, company, location, salary, rating, url});
-  }
-  return JSON.stringify(out);
-})()
-""")
-
-results = json.loads(jobs)
-for r in results:
-    print(r)
-```
-
-**If `jobs` returns an empty list**, Glassdoor has changed its DOM structure. Take a screenshot and inspect:
-
-```python
-capture_screenshot()
-# Look for the actual card selector, then update the querySelectorAll above
-```
-
----
-
-## Workflow 5: Glassdoor — handling the login wall
-
-Glassdoor increasingly shows a login modal after viewing a few listings. Detect and dismiss it.
-
-```python
-def dismiss_glassdoor_login_modal():
-    """Close the Glassdoor sign-in / register modal if it appears."""
-    closed = js("""
-    (function() {
-      // Close button on the modal
-      var closeBtn = document.querySelector(
-        '[alt="Close"], button[class*="modal_closeIcon"], [data-test="close-modal"]'
-      );
-      if (closeBtn && closeBtn.offsetParent !== null) {
-        closeBtn.click();
-        return 'closed';
-      }
-      // Sometimes the modal has an X with aria-label
-      var ariaClose = document.querySelector('[aria-label="Close"]');
-      if (ariaClose && ariaClose.offsetParent !== null) {
-        ariaClose.click();
-        return 'aria-closed';
-      }
-      return null;
-    })()
-    """)
-    if closed:
-        wait(1)
-    return closed
-
-# Strategy: extract as much as possible before the modal appears
-# If the modal blocks results, dismiss it and try again
-result = dismiss_glassdoor_login_modal()
-if result:
-    wait(1)
-    # Re-run extraction after dismissal
-```
-
-If the modal is persistent and cannot be closed, switch to Indeed for the same search — it has more accessible public results.
-
----
-
-## Workflow 6: Stepstone (German) — job extraction
-
-Stepstone is server-rendered. Most data can be extracted with `http_get` for speed, or via `goto` + `js()` for dynamic content.
-
-```python
-import json, re
-from urllib.parse import quote_plus
-
-keyword = "Sachbearbeiter Einkauf"
-city = "Regensburg"
-
-# Stepstone encodes keyword/city in the path
-kw_path = keyword.replace(" ", "-")
-city_path = city.replace(" ", "-")
-
-goto_url(f"https://www.stepstone.de/jobs/{kw_path}/in-{city_path}.html")
-wait_for_load()
-wait(2)
-dismiss_cookie_banner()
-
-jobs = js("""
-(function() {
-  // Stepstone result cards
-  var cards = document.querySelectorAll(
-    'article[data-at="job-item"], [data-genesis-element="JOB_CARD"], article.sc-fhzFiK'
-  );
-  var out = [];
-  for (var i = 0; i < cards.length; i++) {
-    var c = cards[i];
-
-    // Title
-    var titleEl = c.querySelector('h2[data-at="job-item-title"] a, [data-at="job-title"], .listing__title a');
-    var title   = titleEl ? titleEl.innerText.trim() : '';
-    var url     = titleEl ? (titleEl.href || '') : '';
-
-    // Company
-    var compEl = c.querySelector('[data-at="job-item-company-name"], [data-at="company-name"], .listing__company');
-    var company = compEl ? compEl.innerText.trim() : '';
-
-    // Location
-    var locEl = c.querySelector('[data-at="job-item-location"], .listing__location');
-    var location = locEl ? locEl.innerText.trim() : '';
-
-    // Posting date
-    var dateEl = c.querySelector('[data-at="job-posting-date"], time, .listing__date');
-    var posted = dateEl ? (dateEl.getAttribute('datetime') || dateEl.innerText.trim()) : '';
-
-    if (title) out.push({title, company, location, posted, url});
-  }
-  return JSON.stringify(out);
-})()
-""")
-
-results = json.loads(jobs)
-for r in results:
-    print(r)
-```
-
-### Stepstone pagination
-
-```python
-import json
-
-all_jobs = []
-for page in range(1, 4):   # pages 1-3
-    if page == 1:
-        url = f"https://www.stepstone.de/jobs/{kw_path}/in-{city_path}.html"
-    else:
-        url = f"https://www.stepstone.de/jobs/{kw_path}/in-{city_path}/page-{page}.html"
-
-    goto_url(url)
-    wait_for_load()
-    wait(2)
-
-    if page == 1:
-        dismiss_cookie_banner()
-
-    batch_json = js("""
-    (function() {
-      var cards = document.querySelectorAll('article[data-at="job-item"], [data-genesis-element="JOB_CARD"]');
-      var out = [];
-      for (var i = 0; i < cards.length; i++) {
-        var c = cards[i];
-        var titleEl = c.querySelector('[data-at="job-item-title"] a, [data-at="job-title"]');
-        var compEl  = c.querySelector('[data-at="job-item-company-name"]');
-        var locEl   = c.querySelector('[data-at="job-item-location"]');
-        var dateEl  = c.querySelector('time');
-        out.push({
-          title:    titleEl ? titleEl.innerText.trim() : '',
-          company:  compEl  ? compEl.innerText.trim()  : '',
-          location: locEl   ? locEl.innerText.trim()   : '',
-          posted:   dateEl  ? dateEl.getAttribute('datetime') || dateEl.innerText.trim() : '',
-          url:      titleEl ? titleEl.href : '',
-        });
-      }
-      return JSON.stringify(out.filter(j => j.title));
-    })()
-    """)
-
-    batch = json.loads(batch_json)
-    if not batch:
-        break
-    all_jobs.extend(batch)
-
-print(f"Stepstone: {len(all_jobs)} jobs collected")
-```
-
----
-
-## Indeed job key (jk) — direct URL construction
-
-Indeed search result links go through a tracking redirect. **Do not use those redirect URLs.** Instead, extract the `data-jk` attribute directly for the stable canonical URL.
-
-```python
-# Correct approach: extract data-jk from the card
-job_keys = js("""
-JSON.stringify(
-  Array.from(document.querySelectorAll('[data-jk]'))
-    .map(el => el.getAttribute('data-jk'))
-    .filter(jk => jk && jk.length > 0)
-    .filter((jk, i, arr) => arr.indexOf(jk) === i)  // dedupe
-)
-""")
-import json
-jks = json.loads(job_keys)
-
-# Canonical job detail URL for any job key:
-for jk in jks:
-    direct_url = f"https://www.indeed.com/viewjob?jk={jk}"
-    print(direct_url)
-```
-
-If you already have a redirect URL and need to extract the `jk` from it:
-
-```python
-import re
-def extract_jk(url: str) -> str | None:
-    m = re.search(r'[?&]jk=([a-f0-9]+)', url)
-    return m.group(1) if m else None
-```
-
----
-
-## Salary extraction and normalization
-
-Salary appears in different places and formats depending on the job and site.
-
-### Indeed salary patterns
-
-```python
-import re
-
-def parse_indeed_salary(raw: str) -> dict:
-    """
-    Parse Indeed salary strings like:
-      "$85,000 - $110,000 a year"
-      "Up to $65 an hour"
-      "$25 - $30 an hour"
-      "From $120,000 a year"
-      "Employer est.: $90,000 - $120,000 a year"
-    Returns: {low, high, period, source}
-    """
-    if not raw:
-        return {"raw": raw, "low": None, "high": None, "period": None, "source": None}
-
-    source = None
-    if "Employer est." in raw:
-        source = "employer"
-        raw = raw.replace("Employer est.:", "").strip()
-    elif "Glassdoor est." in raw:
-        source = "glassdoor"
-        raw = raw.replace("Glassdoor est.:", "").strip()
-
-    raw_clean = raw.replace(",", "")
-
-    # Period
-    period = None
-    if "a year" in raw or "per year" in raw or "/yr" in raw:
-        period = "year"
-    elif "an hour" in raw or "per hour" in raw or "/hr" in raw:
-        period = "hour"
-    elif "a month" in raw or "per month" in raw:
-        period = "month"
-
-    # Range: two dollar amounts
-    range_m = re.findall(r'\$?([\d]+(?:\.\d+)?)', raw_clean)
-    low  = float(range_m[0]) if len(range_m) >= 1 else None
-    high = float(range_m[1]) if len(range_m) >= 2 else low
-
-    return {"raw": raw, "low": low, "high": high, "period": period, "source": source}
-
-# Examples
-parse_indeed_salary("$85,000 - $110,000 a year")
-# -> {"low": 85000.0, "high": 110000.0, "period": "year", "source": None}
-
-parse_indeed_salary("Employer est.: $90,000 - $120,000 a year")
-# -> {"low": 90000.0, "high": 120000.0, "period": "year", "source": "employer"}
-
-parse_indeed_salary("Up to $65 an hour")
-# -> {"low": 65.0, "high": 65.0, "period": "hour", "source": None}
-```
-
-### Glassdoor salary note
-
-Glassdoor shows two types of salary estimates:
-- **"Employer est."** — the company provided a range in the job post
-- **"Glassdoor est."** — Glassdoor estimated based on similar roles; shown with "(est.)" in the card
-
-Both are shown as text inside the card. Parse the same way as Indeed.
-
-If the salary is absent in the search result card, it is only available on the job detail page (requires a click through to the individual listing).
-
----
-
-## Date normalization ("3 days ago" → actual date)
-
-All three sites use relative timestamps. Convert to absolute dates when needed.
-
-```python
-import re
-from datetime import datetime, timedelta
-
-def parse_relative_date(text: str, reference_date: datetime = None) -> datetime | None:
-    """
-    Convert relative job posting dates to datetime objects.
-    Handles: "Just posted", "Today", "1 day ago", "3 days ago", "30+ days ago"
-    """
-    if reference_date is None:
-        reference_date = datetime.utcnow()
-
-    text = text.strip().lower()
-
-    if not text or text in ("", "unknown"):
-        return None
-    if text in ("just posted", "today", "active today"):
-        return reference_date
-    if "hour" in text:
-        m = re.search(r'(\d+)', text)
-        hours = int(m.group(1)) if m else 1
-        return reference_date - timedelta(hours=hours)
-    if "day" in text:
-        m = re.search(r'(\d+)', text)
-        days = int(m.group(1)) if m else 1
-        return reference_date - timedelta(days=days)
-    if "week" in text:
-        m = re.search(r'(\d+)', text)
-        weeks = int(m.group(1)) if m else 1
-        return reference_date - timedelta(weeks=weeks)
-    if "month" in text:
-        m = re.search(r'(\d+)', text)
-        months = int(m.group(1)) if m else 1
-        return reference_date - timedelta(days=months * 30)
-    if "30+" in text:
-        return reference_date - timedelta(days=30)
-
-    return None  # unparseable
-
-# Examples
-parse_relative_date("3 days ago")    # datetime ~3 days before now
-parse_relative_date("Just posted")  # datetime.utcnow()
-parse_relative_date("30+ days ago") # datetime 30 days ago
-```
-
----
-
-## Workflow 7: Fast bulk extraction with `http_get` (no browser)
-
-For Indeed, the raw HTML of search results contains structured JSON in a `window.mosaic.providerData` script tag. This is faster and more reliable than DOM extraction.
-
-```python
-import json, re
-from urllib.parse import quote_plus
-
-def indeed_http_search(query: str, location: str = "", fromage: int = 0, start: int = 0) -> list[dict]:
-    """
-    Extract Indeed jobs via HTTP (no browser). Parses the embedded JSON payload.
-    Returns up to ~15 jobs per call.
-    """
-    params = f"q={quote_plus(query)}&l={quote_plus(location)}&start={start}"
-    if fromage:
-        params += f"&fromage={fromage}"
-
-    html = http_get(
-        f"https://www.indeed.com/jobs?{params}",
-        headers={
-            "Accept-Language": "en-US,en;q=0.9",
-            "Accept": "text/html,application/xhtml+xml",
-        }
-    )
-
-    # Check for CAPTCHA before parsing
-    if "captcha" in html.lower() or "robot check" in html.lower():
-        return []  # fall back to browser-based extraction
-
-    # Indeed embeds job data in window.mosaic.providerData["mosaic-provider-jobcards"]
-    m = re.search(
-        r'window\.mosaic\.providerData\["mosaic-provider-jobcards"\]\s*=\s*(\{.*?\});',
-        html, re.DOTALL
-    )
-    if not m:
-        return []
-
-    try:
-        data = json.loads(m.group(1))
-    except json.JSONDecodeError:
-        return []
-
-    results_list = (
-        data
-        .get("metaData", {})
-        .get("mosaicProviderJobCardsModel", {})
-        .get("results", [])
-    )
-
-    jobs = []
-    for r in results_list:
-        jk = r.get("jobkey", "")
-        jobs.append({
-            "jk":       jk,
-            "title":    r.get("title", ""),
-            "company":  r.get("company", ""),
-            "location": r.get("formattedLocation", ""),
-            "salary":   r.get("salarySnippet", {}).get("text", ""),
-            "posted":   r.get("formattedRelativeTime", ""),
-            "url":      f"https://www.indeed.com/viewjob?jk={jk}",
-            "snippet":  r.get("snippet", ""),  # short description preview
-        })
-    return jobs
-
-# Example — last 24h remote jobs
-jobs = indeed_http_search("software engineer", "remote", fromage=1)
-for j in jobs:
-    print(j["title"], "|", j["company"], "|", j["salary"])
-```
-
-If `http_get` returns 0 results (CAPTCHA or structure change), fall back to the `goto` + `js()` browser workflow above.
-
----
-
-## Workflow 8: "Easy Apply" vs external application detection
-
-Some Indeed listings apply on Indeed directly ("Easy Apply") while others redirect to the company site. Detect which type before deciding what to do.
-
-```python
-def get_application_type(jk: str) -> dict:
-    """Returns {type: 'easy_apply'|'external'|'unknown', external_url: str|None}"""
-    goto_url(f"https://www.indeed.com/viewjob?jk={jk}")
-    wait_for_load()
-    wait(2)
-
-    return js("""
-    (function() {
-      // "Apply now" button pointing to /applystart = indeed-hosted Easy Apply
-      var easyBtn = document.querySelector(
-        'button[data-testid="applyButton"], [id="indeedApplyButton"], button[class*="IndeedApplyButton"]'
-      );
-      // "Apply on company site" button
-      var extBtn = document.querySelector(
-        'a[data-testid="applyButton"][href*="indeed.com/applystart"], a[href*="indeed.com/applystart"]'
-      );
-      // External redirect — check the main CTA
-      var mainCta = document.querySelector('[data-testid="applyButton"]');
-      var ctaHref = mainCta ? mainCta.href : '';
-
-      if (easyBtn && !ctaHref.includes('apply.indeed')) {
-        return {type: 'easy_apply', externalUrl: null};
-      }
-      if (extBtn || (ctaHref && !ctaHref.includes('indeed.com/viewjob'))) {
-        return {type: 'external', externalUrl: ctaHref || null};
-      }
-      return {type: 'unknown', externalUrl: null};
-    })()
-    """)
-```
-
----
-
-## Bot detection and rate limiting
-
-Indeed and Glassdoor have active bot detection. Violating these limits leads to CAPTCHA walls, IP blocks, or silently degraded results (cards with empty fields).
-
-### Safe request cadence
-
-```python
-# Minimum wait between page loads
-INTER_PAGE_WAIT = 2.5   # seconds — don't go below 2
-
-# Between job detail page fetches
-INTER_DETAIL_WAIT = 3.0  # seconds
-
-# http_get concurrency limit
-MAX_HTTP_CONCURRENT = 2  # never more than 2 at once for Indeed/Glassdoor
-```
-
-### CAPTCHA detection
-
-```python
-def is_captcha_page() -> bool:
-    """Check if the current page is a CAPTCHA or block page."""
-    url = page_info()["url"]
-    title = js("document.title") or ""
-    body_text = js("document.body ? document.body.innerText.substring(0, 500) : ''") or ""
-
-    signals = [
-        "captcha" in url.lower(),
-        "robot" in title.lower(),
-        "are you a human" in body_text.lower(),
-        "verify you are human" in body_text.lower(),
-        "unusual traffic" in body_text.lower(),
-        "indeed.com/error" in url,
-        "sorry" in title.lower() and "indeed" in url,
-    ]
-    return any(signals)
-
-# Use after every goto:
-goto_url(some_url)
-wait_for_load()
-wait(2)
-if is_captcha_page():
-    capture_screenshot()
-    # Wait longer and retry once
-    wait(10)
-    goto_url(some_url)
-    wait_for_load()
-    wait(3)
-```
-
-### Glassdoor session hygiene
-
-Glassdoor's bot detection is more fingerprint-based. If results stop loading:
-
-1. Take a `capture_screenshot()` — confirm whether it is a login modal vs a block page
-2. Dismiss any login modal first (`dismiss_glassdoor_login_modal()`)
-3. If a block page appears, pause 30+ seconds before retrying
-4. Switch to Indeed for the same query — results are similar and bot tolerance is higher
-
----
-
-## Filtering by date, job type, and salary
-
-### Indeed URL filter parameters
-
-```python
-from urllib.parse import quote_plus
-
-def build_indeed_url(
-    query: str,
-    location: str = "",
-    fromage: int = 0,       # days: 1=last 24h, 3=last 3 days, 7=last week
-    job_type: str = "",     # "fulltime", "parttime", "contract", "internship", "temporary"
-    remote: bool = False,
-    start: int = 0,
-) -> str:
-    base = f"https://www.indeed.com/jobs?q={quote_plus(query)}&l={quote_plus(location)}"
-    if fromage:
-        base += f"&fromage={fromage}"
-    if job_type:
-        base += f"&jt={job_type}"
-    if remote:
-        base += "&remotejob=032b3046-06a3-4876-8dfd-474eb5e7ed11"
-    if start:
-        base += f"&start={start}"
-    return base
-
-# Examples
-url = build_indeed_url("backend engineer", "Austin, TX", fromage=7, job_type="fulltime")
-url = build_indeed_url("data analyst", remote=True, fromage=1)
-```
-
----
-
-## Collecting N results across pages
-
-```python
-import json
-from urllib.parse import quote_plus
-
-def collect_indeed_jobs(query: str, location: str = "", max_results: int = 20,
-                        fromage: int = 0, job_type: str = "") -> list[dict]:
-    """
-    Collect up to max_results jobs from Indeed across multiple pages.
-    Waits between pages to avoid bot detection.
-    """
-    all_jobs = []
-    seen_jks = set()
-    page = 0
-
-    while len(all_jobs) < max_results:
-        start = page * 10
-        url = build_indeed_url(query, location, fromage=fromage, job_type=job_type, start=start)
-        goto_url(url)
-        wait_for_load()
-        wait(2.5)
-
-        if page == 0:
-            dismiss_cookie_banner()
-
-        if is_captcha_page():
-            print(f"CAPTCHA on page {page+1}, stopping")
-            break
-
-        batch_json = js("""
-        (function() {
-          var cards = document.querySelectorAll('[data-jk]');
-          var out = [];
-          for (var i = 0; i < cards.length; i++) {
-            var c = cards[i];
-            var jk = c.getAttribute('data-jk') || '';
-            if (!jk) continue;
-            var titleEl = c.querySelector('h2.jobTitle span[title], [data-testid="job-title"]');
-            var compEl  = c.querySelector('[data-testid="company-name"], .companyName');
-            var locEl   = c.querySelector('[data-testid="text-location"], .companyLocation');
-            var salEl   = c.querySelector('[data-testid="attribute_snippet_testid"], .salary-snippet-container');
-            var dateEl  = c.querySelector('[data-testid="myJobsStateDate"], span.date');
-            out.push({
-              jk,
-              title:    titleEl ? titleEl.innerText.trim() : '',
-              company:  compEl  ? compEl.innerText.trim()  : '',
-              location: locEl   ? locEl.innerText.trim()   : '',
-              salary:   salEl   ? salEl.innerText.trim()   : '',
-              posted:   dateEl  ? dateEl.innerText.trim()  : '',
-              url: 'https://www.indeed.com/viewjob?jk=' + jk,
-            });
-          }
-          return JSON.stringify(out.filter(j => j.title && j.jk));
-        })()
-        """)
-
-        batch = json.loads(batch_json)
-        if not batch:
-            break  # no more results
-
-        new_jobs = [j for j in batch if j["jk"] not in seen_jks]
-        seen_jks.update(j["jk"] for j in new_jobs)
-        all_jobs.extend(new_jobs)
-        page += 1
-
-    return all_jobs[:max_results]
-
-# Examples
-jobs = collect_indeed_jobs("Python developer", "San Francisco", max_results=20)
-jobs = collect_indeed_jobs("remote software engineer", fromage=1, max_results=10)
-jobs = collect_indeed_jobs("machine learning engineer", max_results=30, fromage=7, job_type="fulltime")
-```
-
----
-
-## Gotchas
-
-- **`data-jk` is the job key, not a DOM id** — Always use `[data-jk]` to select cards, not `#job-...` ids which vary by page layout and A/B test variant.
-
-- **Indeed redirect links are NOT stable URLs** — Anchor `href` values in search results go through `https://www.indeed.com/rc/clk?...` tracking redirects which expire. Always extract `data-jk` from the card and construct `https://www.indeed.com/viewjob?jk={jk}` yourself.
-
-- **Salary is on the detail page, not the card** — Many listings show no salary in the search result card. If salary is required, fetch the individual `viewjob?jk=` page and extract it there. Budget `wait(3)` per detail page and do not fetch more than 5 detail pages per minute.
-
-- **"Employer est." vs "Glassdoor est."** — These are two distinct data signals. Employer estimates come from the job post itself; Glassdoor estimates are crowd-sourced. The distinction matters when reporting salary accuracy to users.
-
-- **Glassdoor login modal appears after 2-3 scrolls** — Extract all visible cards immediately on load before scrolling. If you need to load more results via scroll/infinite scroll, dismiss the modal first.
-
-- **Glassdoor public results are limited** — Without login, Glassdoor shows ~10-15 cards. If the task requires 30+ results, use Indeed instead (no login required, up to ~15 per page with full pagination).
-
-- **Stepstone uses path-based URL routing, not query params** — Spaces in keyword or city must be replaced with `-` for the path, not `%20` or `+`. `quote_plus()` is wrong for path segments. Use `.replace(" ", "-")`.
-
-- **Stepstone pagination is in the path** — `/page-2.html`, `/page-3.html` — not `?page=2`. There is no `&start=N` param as in Indeed.
-
-- **`http_get` for Glassdoor fails more often** — Glassdoor requires JS to render job cards. Use the browser path for Glassdoor. `http_get` only works reliably for Indeed and Stepstone where server-rendered HTML contains structured data.
-
-- **Indeed embeds JSON in a `<script>` tag** — The `window.mosaic.providerData` block in the HTML source is the fastest extraction path but it can break if Indeed changes the key. Always have the DOM-based `js()` approach as a fallback.
-
-- **Date strings are relative, not absolute** — "3 days ago", "30+ days ago", "Just posted" — none of these are machine-parseable dates without a reference point. Use `datetime.utcnow()` as the reference. "30+" means at least 30 days ago; treat as stale.
-
-- **`fromage=1` on Indeed means "last 24 hours" but uses the listing creation date, not the apply-by date** — Fresh listings can appear in `fromage=3` results a day later due to indexing lag.
-
-- **Indeed CAPTCHA appears as a clean-looking page** with an image puzzle or just a "continue" button — it will not raise an error. Always check `is_captcha_page()` before assuming extraction results are valid.
-
-- **Glassdoor location IDs for `locT=C&locId=`** — Programmatic location filtering by ID requires a separate city-ID lookup (Glassdoor's internal city registry). For basic scraping, omit `locId` and use `locKeyword=` with the city name instead — results are less precise but don't require a lookup step.
-
-- **User-agent matters** — `http_get` uses `Mozilla/5.0` by default (see `helpers.py`). For Indeed `http_get`, also set `Accept-Language: en-US,en;q=0.9` to avoid getting German or localized results based on IP geolocation.
-
-- **Stepstone cookie modal is fullscreen** — On first load, Stepstone shows a fullscreen consent overlay that blocks the entire page. Always call `dismiss_cookie_banner()` before any extraction. If the overlay cannot be dismissed with the generic pattern, use a coordinate click: `capture_screenshot()` first to find the "Alle akzeptieren" (Accept all) button position, then `click_at_xy(x, y)`.
-
-- **Glassdoor salary in card vs detail** — Salary text in the card may be truncated ("$90K - $120K (Glassdoor est.)"). The full salary breakdown (base, bonus, total comp) is only on the job detail page, which requires a click through.
-
-- **"Easy Apply" listings may not have an external URL** — If the job only has an Indeed-hosted application, there is no company site URL. The `externalUrl` will be `null` — this is expected, not a scraping failure.
-
-- **Empty cards on Indeed mobile breakpoints** — If the browser viewport is very narrow, Indeed may render a different card layout with different selectors. Keep viewport at normal desktop width (1280px+) to get consistent `[data-jk]` card rendering.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/letterboxd/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/letterboxd/scraping.md
deleted file mode 100644
index 9c3c57483..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/letterboxd/scraping.md
+++ /dev/null
@@ -1,349 +0,0 @@
-# Letterboxd — Film Data Scraping
-
-`https://letterboxd.com` — film logging, rating, and review site. Film pages and user profile root pages are publicly accessible via `http_get` (~200–350ms). Most sub-pages (reviews, ratings, user film lists, browse/genre pages) return 403 and require the browser.
-
-## Access path decision table
-
-| Goal | Method | Latency |
-|------|--------|---------|
-| Film metadata (title, year, director, cast, genres, rating) | `http_get` + JSON-LD | ~200–350ms |
-| Film synopsis, poster, OG data | `http_get` + meta tags | same request |
-| Film popular reviews (top 12 inline) | `http_get` film page | same request |
-| User profile stats (film count, followers) | `http_get` user root | ~150ms |
-| Recent global activity stream | `http_get /films/` | ~200ms |
-| User watched film list | browser (`/{username}/films/`) | |
-| Ratings distribution histogram | browser (`/film/{slug}/ratings/`) | |
-| All reviews (paginated) | browser (`/film/{slug}/reviews/`) | |
-| Popular / browse / genre film lists | browser (`/films/popular/`, etc.) | |
-| Director / actor pages | browser (`/director/{slug}/`, `/actor/{slug}/`) | |
-| User diary / lists | browser (`/{username}/diary/`, `/{username}/lists/`) | |
-
-**Letterboxd's public API** (`api.letterboxd.com/api/v0/`) returns 401 on all endpoints — it requires OAuth2 client credentials (apply at letterboxd.com/api-beta/).
-
-**Cloudflare Turnstile** is configured in the page JS but is not blocking `http_get` on accessible pages. It only activates on the login form.
-
----
-
-## Path 1: Film page via http_get (fastest for metadata + ratings)
-
-Film pages at `letterboxd.com/film/{slug}/` are fully accessible. The JSON-LD block (Movie schema) contains everything you need in one parse.
-
-**URL slug format:** lowercase title, spaces replaced with hyphens. For disambiguation (same title, different year) append `-{year}`: e.g. `parasite-2019`, `alien-1979`.
-
-```python
-import json, re, html as htmllib
-from helpers import http_get
-
-def extract_film_data(slug):
-    """
-    Fetch and parse a Letterboxd film page.
-    slug examples: 'the-godfather', 'parasite-2019', 'inception', '2001-a-space-odyssey'
-    """
-    html = http_get(f"https://letterboxd.com/film/{slug}/")
-    result = {}
-
-    # --- JSON-LD (primary source) ---
-    jsonld_raw = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
-    for block in jsonld_raw:
-        # Strip CDATA wrapper that Letterboxd wraps around JSON-LD
-        cleaned = re.sub(r'/\*\s*<!\[CDATA\[.*?\*/\s*', '', block, flags=re.DOTALL)
-        cleaned = re.sub(r'/\*\s*\]\]>.*?\*/', '', cleaned, flags=re.DOTALL)
-        try:
-            data = json.loads(cleaned.strip())
-        except json.JSONDecodeError:
-            continue
-        if data.get('@type') != 'Movie':
-            continue
-
-        result['title']     = data['name']
-        result['year']      = data['releasedEvent'][0]['startDate'] if data.get('releasedEvent') else None
-        result['directors'] = [d['name'] for d in data.get('director', [])]
-        result['genres']    = data.get('genre', [])
-        result['countries'] = [c['name'] for c in data.get('countryOfOrigin', [])]
-        result['studios']   = [s['name'] for s in data.get('productionCompany', [])]
-        result['actors']    = [a['name'] for a in data.get('actors', [])]
-        result['poster_url'] = data.get('image')
-        result['url']       = data.get('url')
-        r = data.get('aggregateRating', {})
-        result['rating']       = r.get('ratingValue')   # float 0.0–5.0
-        result['rating_count'] = r.get('ratingCount')   # int, total ratings cast
-        result['review_count'] = r.get('reviewCount')   # int, written reviews only
-
-    # --- OG / meta tags (fast fallback, redundant) ---
-    og = lambda prop: next(iter(re.findall(
-        rf'<meta[^>]+property="og:{prop}"[^>]+content="([^"]*)"', html)), None)
-    result['og_title']  = og('title')    # includes year: "The Godfather (1972)"
-    result['synopsis']  = htmllib.unescape(og('description') or '')
-    result['og_image']  = og('image')    # large 1200x675 crop
-
-    # --- Film ID (internal numeric ID) ---
-    m = re.search(r'data-film-id="(\d+)"', html)
-    result['film_id'] = m.group(1) if m else None
-
-    # --- Tagline ---
-    m = re.search(r'<h4 class="tagline">([^<]+)</h4>', html)
-    result['tagline'] = htmllib.unescape(m.group(1)) if m else None
-
-    # --- Themes (from tab-genres section) ---
-    m = re.search(r'<h3><span>Themes</span></h3>.*?<p>(.*?)</p>', html, re.DOTALL)
-    result['themes'] = re.findall(r'class="text-slug">([^<]+)</a>', m.group(1)) if m else []
-
-    # --- Languages ---
-    result['languages'] = re.findall(r'href="/films/language/[^/]+/"[^>]*>([^<]+)</a>', html)
-
-    # --- Fans count ---
-    m = re.search(r'class="accessory"[^>]*>\s*([\d,KkMm]+)\s*fans</a>', html)
-    result['fans'] = m.group(1) if m else None  # e.g. "133K"
-
-    # --- Popular reviews (top 12 inline on the page) ---
-    result['reviews'] = []
-    for vid, person, block in re.findall(
-        r'<article class="production-viewing[^"]*"[^>]*data-viewing-id="(\d+)"[^>]*data-person="([^"]+)">(.*?)</article>',
-        html, re.DOTALL
-    ):
-        dm = re.search(r'<strong class="displayname">([^<]+)</strong>', block)
-        tm = re.search(r'class="body-text -prose -reset[^"]*"[^>]*>(.*?)</div>', block, re.DOTALL)
-        lm = re.search(r'data-count="(\d+)"', block)
-        result['reviews'].append({
-            'viewing_id':   vid,
-            'username':     person,
-            'display_name': dm.group(1) if dm else person,
-            'review':       re.sub(r'<[^>]+>', '', tm.group(1)).strip() if tm else '',
-            'likes':        int(lm.group(1)) if lm else 0,
-        })
-
-    return result
-```
-
-### Verified output (2026-04-18)
-
-```python
-data = extract_film_data('the-godfather')
-# {
-#   'title': 'The Godfather',
-#   'year': '1972',
-#   'directors': ['Francis Ford Coppola'],
-#   'genres': ['Crime', 'Drama'],
-#   'countries': ['USA'],
-#   'studios': ['Paramount Pictures', 'Alfran Productions'],
-#   'actors': ['Marlon Brando', 'Al Pacino', 'James Caan', ...],  # full cast list
-#   'rating': 4.52,
-#   'rating_count': 2619662,
-#   'review_count': 372579,
-#   'fans': '133K',
-#   'film_id': '51818',
-#   'tagline': "An offer you can't refuse.",
-#   'genres': ['Crime', 'Drama'],
-#   'themes': ['Crime, drugs and gangsters', 'Gritty crime and ruthless gangsters', ...],
-#   'languages': ['English', 'Latin', 'English', 'Italian'],  # may have dupes; deduplicate
-#   'og_title': 'The Godfather (1972)',
-#   'synopsis': 'Spanning the years 1945 to 1955...',
-#   'poster_url': 'https://a.ltrbxd.com/resized/film-poster/.../51818-the-godfather-0-230-0-345-crop.jpg...',
-#   'og_image': 'https://a.ltrbxd.com/resized/sm/upload/.../the-godfather-1200-1200-675-675-crop-000000.jpg...',
-#   'reviews': [
-#     {'username': 'wizardchurch', 'display_name': 'Hannah', 'likes': 30944,
-#      'review': 'haha they made that scene from zootopia into a movie'},
-#     ...  # 12 total
-#   ]
-# }
-
-data = extract_film_data('parasite-2019')
-# title: 'Parasite', year: '2019', rating: 4.53, rating_count: 5264520, review_count: 690652
-# fans: '175K', directors: ['Bong Joon Ho'], countries: ['South Korea']
-
-data = extract_film_data('inception')
-# title: 'Inception', year: '2010', rating: 4.23, rating_count: 3913620
-```
-
----
-
-## Path 2: User profile via http_get
-
-Only the user root page `letterboxd.com/{username}/` is accessible. Sub-pages (`/films/`, `/diary/`, `/lists/`) return 403.
-
-```python
-import re, html as htmllib
-from helpers import http_get
-
-def extract_user_profile(username):
-    html = http_get(f"https://letterboxd.com/{username}/")
-
-    # Display name
-    dm = re.search(r'class="displayname tooltip"[^>]*><span class="label">([^<]+)</span>', html)
-
-    # Stats block (Films / This year / Lists / Following / Followers)
-    stats = re.findall(
-        r'<span class="value">(\d[\d,]*)</span>'
-        r'<span class="definition[^"]*">([^<]+)</span>',
-        html
-    )
-
-    # Favorites from OG description
-    od = re.search(r'<meta[^>]+property="og:description"[^>]+content="([^"]*)"', html)
-    favorites = []
-    if od:
-        fm = re.search(r'Favorites:\s*([^.]+)\.', od.group(1))
-        if fm:
-            favorites = [f.strip() for f in fm.group(1).split(',')]
-
-    # Film IDs of films shown on profile page (recent activity)
-    film_ids_on_page = list(set(re.findall(r'data-film-id="(\d+)"', html)))
-
-    return {
-        'username':    username,
-        'display_name': dm.group(1) if dm else None,
-        'stats':       {label.strip(): int(val.replace(',', '')) for val, label in stats},
-        'favorites':   favorites,
-        'film_ids_on_page': film_ids_on_page,
-    }
-```
-
-### Verified output
-
-```python
-data = extract_user_profile('dave')
-# {
-#   'username': 'dave',
-#   'display_name': 'Dave Vis',
-#   'stats': {'Films': 2553, 'This year': 63, 'Lists': 155, 'Following': 77, 'Followers': 34512},
-#   'favorites': ['High and Low (1963)', 'Burning (2018)', 'My Neighbor Totoro (1988)', 'Mulholland Drive (2001)'],
-#   'film_ids_on_page': ['51818', '47756', ...]  # ~32 film IDs from recent activity blocks
-# }
-```
-
----
-
-## Path 3: Global activity stream from /films/
-
-`letterboxd.com/films/` returns the recent global activity feed — approximately 6 full viewing entries, plus many more film slugs from the UI. Use this to discover recently-logged films.
-
-```python
-import re, html as htmllib
-from helpers import http_get
-
-def extract_activity_stream():
-    html = http_get("https://letterboxd.com/films/")
-    entries = []
-    for owner, obj_id, block in re.findall(
-        r'class="production-viewing[^"]*"[^>]*data-owner="([^"]+)"[^>]*data-object-id="([^"]+)"[^>]*>(.*?)</article>',
-        html, re.DOTALL
-    ):
-        film_m = re.search(
-            r'data-item-name="([^"]*)".*?data-item-slug="([^"]*)".*?data-film-id="(\d+)"',
-            block, re.DOTALL
-        )
-        if film_m:
-            entries.append({
-                'owner':     owner,
-                'film_name': htmllib.unescape(film_m.group(1)),
-                'film_slug': film_m.group(2),
-                'film_id':   film_m.group(3),
-            })
-    return entries
-
-# Returns ~6 entries. Film names are in "Title (Year)" format.
-# Example: [{'owner': 'sidduww', 'film_name': 'The Drama (2026)',
-#            'film_slug': 'the-drama', 'film_id': '1205494'}, ...]
-```
-
----
-
-## Path 4: Browser for list pages and sub-pages (403 via http_get)
-
-These pages require the browser — use `goto_url()` + `wait_for_load()` + `wait(2)`:
-
-```python
-from helpers import goto, wait_for_load, wait, js
-import json
-
-# Popular films
-goto_url("https://letterboxd.com/films/popular/")
-wait_for_load()
-wait(2)
-
-films = json.loads(js("""
-(function() {
-  var items = Array.from(document.querySelectorAll('li.film-list-entry, li[class*="poster-container"]'));
-  return JSON.stringify(items.slice(0, 30).map(function(el) {
-    var poster = el.querySelector('[data-item-slug]') || el.querySelector('[data-film-slug]');
-    return {
-      name: poster ? (poster.dataset.itemName || poster.dataset.filmName) : null,
-      slug: poster ? (poster.dataset.itemSlug || poster.dataset.filmSlug) : null,
-      film_id: poster ? poster.dataset.filmId : null
-    };
-  }).filter(function(x){ return x.slug; }));
-})()
-"""))
-
-# User watched films list (paginated, 72/page)
-goto_url("https://letterboxd.com/dave/films/")
-wait_for_load()
-wait(2)
-
-films = json.loads(js("""
-(function() {
-  var items = Array.from(document.querySelectorAll('li[data-film-id]'));
-  return JSON.stringify(items.map(function(el) {
-    return {
-      film_id:   el.dataset.filmId,
-      film_slug: el.dataset.targetLink ? el.dataset.targetLink.replace(/\\/film\\/|\\/$/g,'') : null,
-      rating:    el.dataset.ownerRating || null
-    };
-  }));
-})()
-"""))
-
-# User diary entries
-goto_url("https://letterboxd.com/dave/diary/")
-wait_for_load()
-wait(2)
-
-# For paginated browsing, check next page link
-next_page_url = js("""
-(function() {
-  var a = document.querySelector('a.next');
-  return a ? a.href : null;
-})()
-""")
-# Returns URL for next page or null. Load it with goto_url(next_page_url).
-```
-
----
-
-## Gotchas
-
-**JSON-LD is wrapped in CDATA comments** — `json.loads(block)` will fail without stripping the wrapper. Always strip `/* <![CDATA[ */` and `/* ]]> */` first:
-```python
-cleaned = re.sub(r'/\*\s*<!\[CDATA\[.*?\*/\s*', '', block, flags=re.DOTALL)
-cleaned = re.sub(r'/\*\s*\]\]>.*?\*/', '', cleaned, flags=re.DOTALL)
-data = json.loads(cleaned.strip())
-```
-
-**JSON-LD `name` is bare title, not "Title (Year)"** — `data['name']` returns `'Parasite'`, not `'Parasite (2019)'`. Year is in `data['releasedEvent'][0]['startDate']`. The OG `og:title` meta tag does include the year.
-
-**OG description contains HTML entities** — `og:description` and `tagline` use `&#039;` etc. Always call `html.unescape()` on them.
-
-**`languages` list can have duplicates** — e.g. Parasite returns `['Korean', 'English', 'German', 'Korean']`. Call `list(dict.fromkeys(result['languages']))` to deduplicate while preserving order.
-
-**Disambiguation slugs** — when two films share a title, Letterboxd appends the year to the slug: `parasite-2019` (Bong's film), vs `parasite` (1982 film). If your slug 404s, try appending `-{year}`.
-
-**403 pages** — `/film/{slug}/reviews/`, `/film/{slug}/ratings/`, `/film/{slug}/cast/`, `/film/{slug}/details/`, `/{username}/films/`, `/films/popular/`, `/films/by/rating/`, `/genre/{slug}/`, `/director/{slug}/`, `/actor/{slug}/` all return 403 to `http_get`. These require the browser.
-
-**CSI endpoints are 403** — Letterboxd loads the ratings histogram via `/csi/film/{slug}/rating-histogram/` which returns 403 without a session cookie. Access ratings distribution via browser on `/film/{slug}/ratings/`.
-
-**`/csi/` and `/ajax/` endpoints need session cookies** — these are used to populate the ratings histogram, friend activity, and popular review sections after page load. Only the inline HTML data (top 12 popular reviews) is available via `http_get`.
-
-**Cloudflare Turnstile is present but passive** — the `configuration.cloudflare.turnstile` object is in the page JS, but it only activates on the login form. It does not block unauthenticated reads on public film/user pages.
-
-**The official API requires OAuth** — `api.letterboxd.com/api/v0/` returns 401 on all endpoints. Apply for API access at letterboxd.com/api-beta/ to get client credentials.
-
-**Fans count is abbreviated** — `'133K'`, `'175K'`. Parse with:
-```python
-def parse_abbrev(s):
-    s = s.strip().upper()
-    if s.endswith('K'): return int(float(s[:-1]) * 1000)
-    if s.endswith('M'): return int(float(s[:-1]) * 1000000)
-    return int(s.replace(',', ''))
-```
-
-**Film slug from unknown title** — Letterboxd has no public search API. Construct the slug by lowercasing the title and replacing spaces with hyphens, then `http_get` and check for a 403/404 vs a valid JSON-LD block.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/linkedin/invitation-manager.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/linkedin/invitation-manager.md
deleted file mode 100644
index 1d0d8bb66..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/linkedin/invitation-manager.md
+++ /dev/null
@@ -1,109 +0,0 @@
-# LinkedIn — Invitation Manager
-
-Accept or ignore pending connection invitations in bulk from
-`https://www.linkedin.com/mynetwork/invitation-manager/received/<FILTER>/`.
-
-## URL filters
-
-The trailing slug pre-filters the received invitations. Observed slugs:
-
-- `PEOPLE_WITH_MUTUAL_CONNECTION` — people who share a mutual connection
-- `PEOPLE_WITH_MUTUAL_SCHOOL` — people who share a school
-- omit the slug (`.../received/`) for all pending invitations
-
-The filter chip at the top of the page mirrors the URL and also renders
-`All (N)`, `Mutual Connections (N)`, `Your School (N)` — the `(N)` is the
-authoritative remaining-count for the active filter and is what you loop on.
-
-## Button selectors
-
-Each pending-invitation card contains an Accept and an Ignore control.
-**The aria-label formats are different** for the two buttons — don't derive
-one from the other:
-
-- Accept: `aria-label = "Accept <Name>'s invitation"` (note: curly `’`, not ASCII `'`)
-- Ignore: `aria-label = "Ignore an invitation to connect from <Name>"`
-
-```python
-# Match either — both are unique per card
-accepts = js("Array.from(document.querySelectorAll('button, a')).filter(b => (b.getAttribute('aria-label')||'').startsWith('Accept ')).length")
-ignores = js("Array.from(document.querySelectorAll('button')).filter(b => (b.getAttribute('aria-label')||'').toLowerCase().startsWith('ignore')).length")
-```
-
-## Trap: "follows you" cards render Accept as `<a>`, not `<button>`
-
-For invitations labeled `<Name> follows you and is inviting you to connect`
-(typically Premium users' auto-invites), the Accept control is an `<a href>`,
-not a `<button>` — and the `href` points at the **current page URL**.
-
-`<a>.click()` follows the href → same-URL soft-nav → accept never fires.
-Dispatched `MouseEvent`s and coordinate `Input.dispatchMouseEvent` clicks
-also land on the element (you can see the focus ring appear) but do not
-trigger the underlying accept handler. **There is no known way to accept
-these via CDP.** Click the Ignore button instead (Ignore is always a
-`<button>` and works with a normal coordinate click), or skip the row.
-
-Detect with `element.tagName === 'A'` on the Accept element.
-
-```python
-# In your extractor, capture the tag so downstream logic can route these
-rows = js(r"""
-(() => {
-  const accepts = Array.from(document.querySelectorAll('button, a'))
-    .filter(b => (b.getAttribute('aria-label')||'').startsWith('Accept ') && !b.disabled);
-  return accepts.map(a => ({aria: a.getAttribute('aria-label'), tag: a.tagName}));
-})()
-""")
-```
-
-## Pagination — reload, don't scroll
-
-The list only renders ~10 cards at a time. After you click Accept on the
-visible batch, LinkedIn replaces the pending section with a "X is now a
-connection" acknowledgment list + "Suggestions for you" — the next batch of
-pending invites does **not** auto-mount. Window-scroll does not trigger
-lazy-load either.
-
-Pattern:
-
-1. Navigate to the filter URL, `wait_for_load()`, sleep ~2.5s.
-2. Extract visible rows, decide, click Accept/Ignore for each (`.click()` via
-   JS works for `<button>` Accept and Ignore; coordinate click via
-   `Input.dispatchMouseEvent` also works).
-3. Reload the URL (`cdp("Page.navigate", url=...)`). Do **not** rely on
-   scrolling or clicking a "show more" control.
-4. Repeat until the filter chip shows `(0)` or no Accept buttons remain.
-
-Chip count decreases by the number of successful accepts + ignores per
-cycle — use it as the loop guard.
-
-## Safety modal: "Take care when connecting"
-
-LinkedIn occasionally interposes a `"Take care when connecting"` dialog
-when you click Accept on a connection it considers unfamiliar. The dialog
-has `View profile` and `Accept invite` buttons — click `Accept invite` to
-proceed. Watch for it between accepts; it's intermittent, not per-row.
-
-## Quick sketch
-
-```python
-import time
-
-def chip():
-    return js(r"""(() => {
-      const el = Array.from(document.querySelectorAll('button, a')).map(e => (e.textContent||'').trim())
-        .find(t => /^Mutual Connections \(/.test(t));
-      return el || '';
-    })()""")
-
-while True:
-    cdp("Page.navigate", url="https://www.linkedin.com/mynetwork/invitation-manager/received/PEOPLE_WITH_MUTUAL_CONNECTION/")
-    wait_for_load()
-    time.sleep(2.5)
-    n = int(js(r"""(() => Array.from(document.querySelectorAll('button, a'))
-      .filter(b => (b.getAttribute('aria-label')||'').startsWith('Accept ') && !b.disabled).length)()"""))
-    if n == 0:
-        break
-    # click each Accept (route tag === 'A' rows to Ignore — see trap above)
-    ...
-```
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/loom/folder-enumeration.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/loom/folder-enumeration.md
deleted file mode 100644
index b09ec9227..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/loom/folder-enumeration.md
+++ /dev/null
@@ -1,170 +0,0 @@
-# Loom — Library Folder Enumeration
-
-Field-tested against loom.com on 2026-04-26.
-For private workspace library folders that require an authenticated session.
-
-## TL;DR — When to use this skill vs yt-dlp
-
-Loom has **two** kinds of folder URLs and they need different tools:
-
-| URL pattern | Visibility | Tool |
-|---|---|---|
-| `loom.com/share/folder/<32-hex>` | Public-shared (anyone with link) | **yt-dlp** — `LoomFolderIE` already supports this. Skip browser-harness entirely. |
-| `loom.com/looms/videos/<slug>-<32-hex>` | Private workspace library | **browser-harness** (this skill). yt-dlp doesn't support library folders, and the underlying `/v1/folders/<id>` endpoint returns `Forbidden` even with cookies. |
-
-The library variant is what every Loom user sees in their own workspace sidebar. There is no public read API for it; the only programmatic route is the authenticated session in the user's browser. That's why this skill exists.
-
-For the **download** itself (after enumeration), `yt-dlp --cookies-from-browser chrome -f http-transcoded` is the fast path — a single HTTP MP4 stream rather than the ~125 HLS fragments the default selection grabs. Substantially faster for bulk runs. See "Pipe to yt-dlp" at the bottom.
-
----
-
-## 1. Attach to the user's open Loom tab
-
-Always attach to the existing tab. **Do not** call `new_tab()` for `loom.com` — it spawns duplicate tabs in the user's Chrome profile (observed: four duplicate Loom tabs accumulated in one debugging session). The user has to clean those up manually afterwards.
-
-```python
-import time
-tabs = cdp("Target.getTargets")
-loom_tid = next(
-    (t["targetId"] for t in tabs.get("targetInfos", [])
-     if "loom.com/looms/videos/" in t.get("url", "")),
-    None,
-)
-if not loom_tid:
-    raise SystemExit("User must open the Loom library folder in Chrome first.")
-switch_tab(loom_tid)
-time.sleep(0.3)
-```
-
-If multiple Loom tabs of the same folder are already open (common after a few sessions), pick the freshest one and close the others with `cdp("Target.closeTarget", targetId=tid)` before scrolling — keeps the user's Chrome tidy and avoids future ambiguity in `Target.getTargets`.
-
----
-
-## 2. Selector — `[data-videoid]`
-
-Each video card is an `<article data-videoid="<32-hex>">`. Inside, the first text line is the title (with two ARIA prefix/suffix strings to strip):
-
-```python
-items = js("""
-Array.from(document.querySelectorAll("[data-videoid]")).map(e => [
-  e.getAttribute("data-videoid"),
-  (e.innerText || "")
-    .split("\\n")[0]
-    .replace(/^Add /, "")
-    .replace(/ for bulk actions$/, "")
-    .trim()
-])
-""")
-# items: [[id, title], ...] for the cards currently rendered
-```
-
-The visible `<a href>` on the card points at `loom.com/share/<id>`, so once you have the ID you can hand it straight to yt-dlp.
-
----
-
-## 3. The virtualization quirk — `scrollIntoView`, NOT `scrollTop`
-
-Loom's library uses an aggressive virtual scroller that:
-- Renders ~30–60 cards at a time
-- **Caps `document.scrollingElement.scrollTop` to a value far smaller than `scrollHeight`** as long as the bottom of the list isn't the bottom of the viewport
-- Unmounts cards above the viewport once you scroll past them
-
-Setting `scrollTop = N` directly silently fails (the value snaps back) once you hit the cap. `window.scrollTo` behaves the same way. Mouse-wheel and PageDown via CDP weren't fully tested in our run, but given they end up at the same `scrollingElement` they're unlikely to escape the cap either.
-
-The reliable mechanic is to take the **last currently rendered card** and scroll it into view at the bottom — the virtual scroller responds by mounting the next batch below it:
-
-```python
-import time
-ids_seen = {}
-prev = -1; stuck = 0
-js("document.scrollingElement.scrollTop = 0")
-time.sleep(0.8)
-
-for i in range(80):
-    items = js("""
-    Array.from(document.querySelectorAll("[data-videoid]")).map(e => [
-      e.getAttribute("data-videoid"),
-      (e.innerText||"").split("\\n")[0]
-        .replace(/^Add /, "").replace(/ for bulk actions$/, "").trim()
-    ])
-    """)
-    for id_, title in (items or []):
-        ids_seen[id_] = title
-
-    js("""
-    (() => {
-      const a = document.querySelectorAll("[data-videoid]");
-      if (a.length) a[a.length - 1].scrollIntoView({block: "end"});
-    })()
-    """)
-    time.sleep(0.6)
-
-    if len(ids_seen) == prev:
-        stuck += 1
-    else:
-        stuck = 0
-    prev = len(ids_seen)
-    if stuck > 12:
-        break
-
-print(f"collected {len(ids_seen)} videos")
-```
-
-Empirical numbers from one test run on a 78-video folder:
-- `scrollTop`-based scrolling: stuck at 60 of 78 (cap hit at `scrollTop ≈ 2967` while `scrollHeight` was `4529`).
-- `scrollIntoView`-based scrolling: 78 of 78 in a single pass; `scrollHeight` grew to `5884` as the virtualizer extended.
-
-The `stuck` counter (12 idle iterations) is the right signal for "done" — `paging.total` style metadata is not exposed in the DOM, and the visible "78 videos" header at the top is a separate widget that does not refresh after scroll.
-
----
-
-## 4. Endpoints that look promising but don't help
-
-For completeness — a few dead ends so the next agent doesn't waste time:
-
-- `https://www.loom.com/v1/folders/<id>?limit=10000` — works for `/share/folder/...` IDs (this is what `LoomFolderIE` uses), returns `Forbidden` for library folder IDs even with the user's cookies.
-- `https://www.loom.com/graphql` — fires hundreds of times during page load. A folder-listing operation almost certainly lives in there, but the `query` strings come from the bundled React app and would have to be reverse-engineered from the JS bundle. Likely brittle long-term. Reading the rendered DOM is more durable.
-- `performance.getEntriesByType("resource")` — useful for proving these endpoints exist, but only returns URLs/timings, not request bodies.
-
----
-
-## 5. Pipe to yt-dlp for the actual download
-
-The DOM scrape gives you IDs. Hand them to yt-dlp for the bytes — don't try to grab MP4 URLs yourself. yt-dlp already knows the GraphQL flow for single videos (`LoomIE`), handles CDN signature URLs, and merges audio + video tracks.
-
-```bash
-# One-time: cache cookies from Chrome (saves ~2s/video on bulk runs)
-yt-dlp --cookies-from-browser chrome --cookies /tmp/loom_cookies.txt \
-  --skip-download --no-warnings \
-  "https://www.loom.com/share/<any-known-id>" >/dev/null
-
-# Bulk: 16 videos in parallel, single-stream 1080p MP4 (~10× faster than HLS default)
-download_one() {
-  yt-dlp --cookies /tmp/loom_cookies.txt \
-    -f http-transcoded \
-    -o "%(title)s.%(ext)s" \
-    --no-progress --no-warnings --no-mtime --no-overwrites \
-    "https://www.loom.com/share/$1"
-}
-export -f download_one
-
-cat /tmp/loom_ids.json \
-  | python3 -c "import sys, json; [print(k) for k in json.load(sys.stdin)]" \
-  | xargs -P 16 -I {} bash -c 'download_one "$@"' _ {}
-```
-
-Format notes:
-- `-f http-transcoded` is a **single HTTP MP4 stream at 1920×1080**. The default selection picks `hls-raw-3200` + `hls-raw-audio-audio`, which is also 1080p but split into ~125 fragments per video. For bulk runs the single-stream form is dramatically faster (one TCP connection per video at full bandwidth, no per-fragment overhead). For a single video the difference is negligible.
-- Loom does not currently expose anything above 1080p for transcoded videos.
-- `--no-overwrites` makes the bulk job idempotent: re-running picks up only what's missing.
-
----
-
-## Gotchas
-
-- **Two different folder URL families.** `loom.com/share/folder/<id>` (public) is yt-dlp territory. `loom.com/looms/videos/<slug>-<id>` (library) needs this skill. Don't mix them up — it's the difference between a one-line yt-dlp call and a DOM scrape.
-- **`scrollTop` is silently capped.** Always use `scrollIntoView({block: "end"})` on the last rendered card. Setting `scrollTop` plateaus before the bottom of the list and gives you a partial enumeration that *looks* complete because the loop hits its idle threshold.
-- **Never `new_tab()` for Loom.** Attach to the user's existing tab via `Target.getTargets`. New tabs accumulate in the user's Chrome profile across sessions, and the user has to clean them up.
-- **Idle-counter is the reliable end-of-list signal.** The "N videos" count in the page header may or may not be in sync with what's actually rendered (we didn't fully verify either direction). Use `len(ids_seen)` going N iterations without growing as the stop condition rather than reading the header.
-- **Title strings are wrapped in ARIA noise.** The first line of `innerText` is `Add <title> for bulk actions`. Strip both prefix and suffix before using as a filename.
-- **One MP4 per ID via `loom.com/share/<id>`.** Library-internal share links are valid and yt-dlp accepts them — no need to transform IDs into anything fancier.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/macrotrends/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/macrotrends/scraping.md
deleted file mode 100644
index 727af6fde..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/macrotrends/scraping.md
+++ /dev/null
@@ -1,537 +0,0 @@
-# Macrotrends — Data Extraction
-
-`https://www.macrotrends.net` — long-term historical financial and economic charts. Three access patterns depending on page type; all work with plain `http_get`, no browser required.
-
-All results validated against live site on 2026-04-18.
-
-## Do this first: pick your access pattern
-
-| Goal | Pattern | Latency | Variable |
-|------|---------|---------|----------|
-| Stock OHLCV price history | Direct iframe PHP | ~190ms | `dataDaily` |
-| Stock market cap (daily) | Direct iframe PHP | ~200ms | `chartData` |
-| Stock fundamentals (PE, revenue, margins) | Direct iframe PHP | ~140ms | `chartData` |
-| S&P 500 / composite index charts | `chart_iframe_comp.php` | ~90ms | `originalData` |
-| Economic indicators (rates, yields, CPI) | `/economic-data/` JSON API | ~150ms | `data[]` array |
-| Gold, commodity prices | Either path (both work) | ~150ms | `data[]` or `originalData` |
-
-**Never use the browser for Macrotrends read-only tasks.** All endpoints are accessible via `http_get` with the default `Mozilla/5.0` UA. For pages that occasionally 403, switch to a Chrome UA (see gotchas).
-
----
-
-## Pattern 1: Stock price history (OHLCV)
-
-Construct the iframe URL directly — no need to fetch the main page first.
-
-```python
-import json, re
-from helpers import http_get
-
-def get_stock_ohlcv(ticker: str, years_back: int = None) -> list[dict]:
-    """
-    Returns daily OHLCV records for any US stock.
-
-    ticker:     uppercase ticker symbol, e.g. 'AAPL', 'MSFT', 'TSLA', 'NVDA'
-    years_back: number of years of history (1=~250 records, 15=~3772 records).
-                Omit (None) to get ALL available history (AAPL goes back to 1980).
-    """
-    url = f"https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/stock_price_history.php?t={ticker}"
-    if years_back:
-        url += f"&yb={years_back}"
-
-    html = http_get(url)
-    m = re.search(r'var\s+dataDaily\s*=\s*\[', html)
-    if not m:
-        raise ValueError(f"No dataDaily found for ticker {ticker!r}")
-
-    si = html.index('[', m.start())
-    bc = 0
-    for j, ch in enumerate(html[si:], si):
-        if ch == '[':   bc += 1
-        elif ch == ']':
-            bc -= 1
-            if bc == 0: ei = j; break
-    return json.loads(html[si:ei+1])
-
-# Usage
-records = get_stock_ohlcv('AAPL', years_back=15)
-# [{'d': '2011-04-18', 'o': '9.771', 'h': '9.9547', 'l': '9.593', 'c': '9.9433', 'v': '18.275'}, ...]
-
-latest = records[-1]
-# {'d': '2026-04-17', 'o': '266.96', 'h': '272.3', 'l': '266.72', 'c': '270.23',
-#  'v': '55.211', 'ma50': '260.554', 'ma200': '251.828'}
-
-print(f"{latest['d']}: close=${latest['c']} vol={latest['v']}M shares")
-```
-
-### dataDaily field reference
-
-| Field | Meaning | Type |
-|-------|---------|------|
-| `d` | Date (YYYY-MM-DD) | str |
-| `o` | Open price (adjusted for splits) | str/float |
-| `h` | High | str/float |
-| `l` | Low | str/float |
-| `c` | Close | str/float |
-| `v` | Volume in **millions of shares** | str/float |
-| `ma50` | 50-day moving average | str/float (appears on recent records only) |
-| `ma200` | 200-day moving average | str/float (appears on recent records only) |
-
-**Note:** All price values are strings — cast with `float()`. Volume is millions: `55.211` = 55.2M shares traded.
-
-### Confirmed tickers (2026-04-18)
-
-All tested with direct iframe URL, no page fetch needed:
-
-```python
-# All work: AAPL, MSFT, TSLA, NVDA, GOOGL, AMZN, META, NFLX, etc.
-# 3772 records for yb=15 (goes back to 2011-04-18)
-# AAPL full history: 11428 records back to 1980-12-12
-```
-
----
-
-## Pattern 2: Stock fundamentals (PE ratio, revenue, market cap, margins)
-
-Different PHP files depending on metric. Construct directly.
-
-### Market cap (daily, in billions USD)
-
-```python
-import json, re
-from helpers import http_get
-
-def get_market_cap(ticker: str, years_back: int = 15) -> list[dict]:
-    url = f"https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/market_cap.php?t={ticker}&yb={years_back}"
-    html = http_get(url)
-    m = re.search(r'var\s+chartData\s*=\s*\[', html)
-    si = html.index('[', m.start())
-    bc = 0
-    for j, ch in enumerate(html[si:], si):
-        if ch == '[':   bc += 1
-        elif ch == ']':
-            bc -= 1
-            if bc == 0: ei = j; break
-    return json.loads(html[si:ei+1])
-
-data = get_market_cap('AAPL')
-# [{'date': '2026-04-15', 'v1': 3929.35}, {'date': '2026-04-16', 'v1': 3884.67}, ...]
-# v1 = market cap in billions USD
-```
-
-### PE ratio, revenue, current ratio (quarterly/annual fundamentals)
-
-```python
-import json, re
-from helpers import http_get
-
-def get_fundamental(ticker: str, metric_type: str, statement: str,
-                    freq: str = 'Q', years_back: int = 15) -> list[dict]:
-    """
-    freq: 'Q' = quarterly, 'A' = annual
-    """
-    url = (
-        f"https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/"
-        f"fundamental_iframe.php?t={ticker}&type={metric_type}&statement={statement}"
-        f"&freq={freq}&sub=&yb={years_back}"
-    )
-    html = http_get(url)
-    m = re.search(r'var\s+chartData\s*=\s*\[', html)
-    si = html.index('[', m.start())
-    bc = 0
-    for j, ch in enumerate(html[si:], si):
-        if ch == '[':   bc += 1
-        elif ch == ']':
-            bc -= 1
-            if bc == 0: ei = j; break
-    return json.loads(html[si:ei+1])
-
-# PE ratio
-pe = get_fundamental('AAPL', 'pe-ratio', 'price-ratios')
-# [{'date': '2025-09-30', 'v1': 254.146, 'v2': 7.46, 'v3': 34.07}, ...]
-# v1 = stock price, v2 = quarterly EPS, v3 = PE ratio
-
-# Revenue
-rev = get_fundamental('AAPL', 'revenue', 'income-statement')
-# [{'date': '2025-12-31', 'v1': 435.617, 'v2': 143.756, 'v3': 15.65}, ...]
-# v1 = TTM revenue ($B), v2 = quarterly revenue ($B), v3 = YoY growth %
-
-# Total assets
-assets = get_fundamental('AAPL', 'total-assets', 'balance-sheet')
-
-# Current ratio
-ratio = get_fundamental('AAPL', 'current-ratio', 'ratios')
-```
-
-### Profit margins
-
-```python
-def get_profit_margins(ticker: str, years_back: int = 15) -> list[dict]:
-    url = (
-        f"https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/"
-        f"fundamental_metric.php?t={ticker}&chart=profit-margin&sub=&yb={years_back}"
-    )
-    html = http_get(url)
-    m = re.search(r'var\s+chartData\s*=\s*\[', html)
-    si = html.index('[', m.start())
-    bc = 0
-    for j, ch in enumerate(html[si:], si):
-        if ch == '[':   bc += 1
-        elif ch == ']':
-            bc -= 1
-            if bc == 0: ei = j; break
-    return json.loads(html[si:ei+1])
-
-margins = get_profit_margins('AAPL')
-# [{'date': '2025-12-31', 'v1': 47.33, 'v2': 32.38, 'v3': 27.04}, ...]
-# v1 = gross margin %, v2 = operating margin %, v3 = net margin %
-```
-
-### Dividend yield
-
-```python
-def get_dividend_yield(ticker: str, years_back: int = 15) -> list[dict]:
-    url = f"https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/dividend_yield.php?t={ticker}&yb={years_back}"
-    html = http_get(url)
-    m = re.search(r'var\s+chartData\s*=\s*\[', html)
-    si = html.index('[', m.start())
-    bc = 0
-    for j, ch in enumerate(html[si:], si):
-        if ch == '[':   bc += 1
-        elif ch == ']':
-            bc -= 1
-            if bc == 0: ei = j; break
-    return json.loads(html[si:ei+1])
-
-dy = get_dividend_yield('AAPL')
-# [{'date': '2026-04-17', 'c': 270.23, 'ttm_d': 1.03848, 'ttm_dy': 0.3843}, ...]
-# c = stock price, ttm_d = TTM dividend ($), ttm_dy = TTM yield (%)
-```
-
-### Stock metric URL reference
-
-| Metric | PHP file | Extra params |
-|--------|----------|-------------|
-| Stock price OHLCV | `stock_price_history.php` | — |
-| Market cap (daily) | `market_cap.php` | — |
-| Dividend yield | `dividend_yield.php` | — |
-| Stock splits (price history) | `stock_splits.php` | — |
-| PE ratio | `fundamental_iframe.php` | `type=pe-ratio&statement=price-ratios` |
-| Revenue | `fundamental_iframe.php` | `type=revenue&statement=income-statement` |
-| Total assets | `fundamental_iframe.php` | `type=total-assets&statement=balance-sheet` |
-| Current ratio | `fundamental_iframe.php` | `type=current-ratio&statement=ratios` |
-| Profit margins | `fundamental_metric.php` | `chart=profit-margin` |
-
-Base URL prefix: `https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/`
-
-All take `?t={TICKER}&yb={N}` (or `&sub=&yb={N}` for the fundamental ones).
-
----
-
-## Pattern 3: Index and composite charts (S&P 500, Shiller PE, etc.)
-
-These pages embed chart data via `chart_iframe_comp.php`. The variable is `originalData`.
-
-```python
-import json, re
-from helpers import http_get
-
-def extract_index_chart(page_id: int, url_slug: str) -> list[dict]:
-    """
-    page_id:  the numeric ID from the page URL, e.g. 2577
-    url_slug: last segment of the page URL, e.g. 'sp500-pe-ratio-price-to-earnings-chart'
-    """
-    url = f"https://www.macrotrends.net/assets/php/chart_iframe_comp.php?id={page_id}&url={url_slug}"
-    html = http_get(url)
-    m = re.search(r'var\s+originalData\s*=\s*\[', html)
-    if not m:
-        raise ValueError("originalData not found — this page may use a different pattern")
-    si = html.index('[', m.start())
-    bc = 0
-    for j, ch in enumerate(html[si:], si):
-        if ch == '[':   bc += 1
-        elif ch == ']':
-            bc -= 1
-            if bc == 0: ei = j; break
-    return json.loads(html[si:ei+1])
-
-# S&P 500 PE ratio (1180 monthly records, 1927-2026)
-pe_data = extract_index_chart(2577, 'sp500-pe-ratio-price-to-earnings-chart')
-# [{'date': '1927-12-01', 'close': '15.9099'}, ..., {'date': '2026-03-01', 'close': '27.8925'}]
-# 'close' is the PE ratio value
-
-# Gold prices (1336 monthly records, 1915-2026)
-gold_data = extract_index_chart(1333, 'historical-gold-prices-100-year-chart')
-# [{'id': 'GOLDAMGBD228NLBM', 'date': '1915-01-01', 'close': '629.36', 'close1': '19.250'}, ...]
-# 'close' = inflation-adjusted price, 'close1' = nominal USD price
-
-print(f"Latest S&P PE: {pe_data[-1]}")   # {'date': '2026-03-01', 'close': '27.8925'}
-print(f"Latest gold:   {gold_data[-1]}") # {'id': ..., 'date': '2026-04-01', 'close': '5177.19', 'close1': '5177.190'}
-```
-
-### Detecting which pattern a page uses
-
-```python
-def get_page_pattern(page_url: str) -> str:
-    html = http_get(page_url)
-    if 'chart_iframe_comp.php' in html:
-        return 'index_chart'           # use extract_index_chart()
-    elif 'generateChart' in html and 'highchartsURL' in html:
-        return 'economic_api'          # use get_economic_data()
-    elif '/production/stocks/desktop/PRODUCTION/' in html:
-        return 'stock_iframe'          # use get_stock_ohlcv() etc.
-    return 'unknown'
-```
-
-### To get the ID and slug from a page
-
-```python
-import re
-from helpers import http_get
-
-page_url = "https://www.macrotrends.net/2577/sp500-pe-ratio-price-to-earnings-chart"
-html = http_get(page_url)
-
-# Option A: parse from the iframe src in the HTML
-m = re.search(r'chart_iframe_comp\.php\?id=(\d+)&url=([^"&]+)', html)
-if m:
-    page_id, url_slug = int(m.group(1)), m.group(2)
-
-# Option B: derive from the page URL (works when slug matches)
-import urllib.parse
-parts = page_url.rstrip('/').split('/')
-page_id  = int(parts[-2])   # 2577
-url_slug = parts[-1]         # 'sp500-pe-ratio-price-to-earnings-chart'
-```
-
----
-
-## Pattern 4: Economic indicator API
-
-Pages that use `generateChart()` in their JS load data from `/economic-data/{pageID}/{freq}`.
-This endpoint requires a `Referer` header matching the page URL.
-
-```python
-import json, datetime, gzip, urllib.request
-from helpers import http_get
-
-def get_economic_data(page_id: int, referer_url: str, freq: str = 'D') -> dict:
-    """
-    page_id:     numeric ID from the page URL (e.g. 2015 for Fed Funds Rate)
-    referer_url: the full page URL — required as Referer header
-    freq:        'D' = daily, 'M' = monthly (not all support both)
-    
-    Returns {'data': [[ts_ms, value], ...], 'metadata': {...}}
-    """
-    url = f"https://www.macrotrends.net/economic-data/{page_id}/{freq}"
-    headers = {
-        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
-        "Accept": "application/json, */*",
-        "Accept-Encoding": "gzip",
-        "Referer": referer_url,
-    }
-    with urllib.request.urlopen(urllib.request.Request(url, headers=headers), timeout=20) as r:
-        raw = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            raw = gzip.decompress(raw)
-        result = json.loads(raw)
-    if result is None:
-        raise ValueError(f"pageID={page_id} does not support freq={freq!r}")
-    return result
-
-# Fed Funds Rate (daily, 25319 records)
-ffr = get_economic_data(2015, "https://www.macrotrends.net/2015/fed-funds-rate-historical-chart", freq='D')
-print(ffr['metadata']['name'])  # 'Fed Funds Interest Rate'
-print(ffr['metadata']['label']) # '%'
-
-# Convert timestamps to dates
-for ts_ms, value in ffr['data'][-3:]:
-    dt = datetime.datetime.fromtimestamp(ts_ms / 1000, datetime.UTC)
-    print(f"{dt.strftime('%Y-%m-%d')}: {value}%")
-# 2026-04-13: 3.64%
-# 2026-04-14: 3.64%
-# 2026-04-15: 3.64%
-
-# 10-Year Treasury yield (daily, 16074 records)
-t10 = get_economic_data(2016, "https://www.macrotrends.net/2016/10-year-treasury-bond-rate-yield-chart", freq='D')
-# Last: 2026-04-15: 4.29%
-
-# Gold prices (monthly, 1336 records, 1915-present) — template=5
-gold = get_economic_data(1333, "https://www.macrotrends.net/1333/historical-gold-prices-100-year-chart", freq='M')
-# metadata: {'name': 'Gold Prices', 'currency': '$', 'label': ''}
-
-# US Unemployment Rate (monthly, 938 records)
-unemp = get_economic_data(1316, "https://www.macrotrends.net/1316/us-national-unemployment-rate", freq='M')
-# metadata: {'name': 'U.S. Unemployment Rate', 'label': '%'}
-
-# Debt-to-GDP ratio (monthly, 712 records)
-debt_gdp = get_economic_data(1381, "https://www.macrotrends.net/1381/debt-to-gdp-ratio-historical-chart", freq='M')
-```
-
-### metadata fields
-
-```python
-{
-    'name':            'Fed Funds Interest Rate',  # chart title
-    'tableHeaderName': 'Fed Funds Interest Rate',
-    'currency':        '',            # '$' for dollar-denominated series
-    'label':           '%',          # units label
-    'chartType':       'line',
-    'mobileChartType': 'line',
-    'lineWidth':       2,
-    'positiveColor':   '#2caffe',
-    'negativeColor':   '',
-    'decimals':        '',
-    'chartScale':      'linear',
-    'seriesUnits':     ''
-}
-```
-
-### Available frequency codes
-
-| Code | Meaning | Notes |
-|------|---------|-------|
-| `D` | Daily | Most series support this |
-| `M` | Monthly | Returns `null` if not available |
-| `Q` | Quarterly | Usually `null` — use `M` instead |
-| `A` | Annual | Usually `null` — use `M` instead |
-| `DEFAULT` | Default (usually monthly) | Same data as `M` for most series |
-| `INDEXMONTHLY` | Monthly index close | Some commodity/index series |
-| `INDEXDAILY` | Daily index | Some series |
-| `DAILYEXCHANGERATE` | Daily FX rate | Currency pairs |
-| `10YD` | 10-year daily | Specialized series |
-
-Try `D` first, fall back to `M` if you get `null`.
-
-### Known economic page IDs
-
-| ID | URL slug | Description |
-|----|----------|-------------|
-| 1316 | us-national-unemployment-rate | U.S. Unemployment Rate (monthly, back to 1948) |
-| 1333 | historical-gold-prices-100-year-chart | Gold Prices (monthly, back to 1915) |
-| 1381 | debt-to-gdp-ratio-historical-chart | U.S. Debt to GDP Ratio |
-| 2015 | fed-funds-rate-historical-chart | Fed Funds Interest Rate (daily, back to 1954) |
-| 2016 | 10-year-treasury-bond-rate-yield-chart | 10-Year Treasury Yield (daily, back to 1962) |
-| 2577 | sp500-pe-ratio-price-to-earnings-chart | S&P 500 PE Ratio (uses `chart_iframe_comp.php`) |
-
----
-
-## Generic extraction helper
-
-One function that handles all three embedded-JS patterns:
-
-```python
-import json, re
-from helpers import http_get
-
-def extract_chart_var(html: str, var_name: str) -> list:
-    """Extract a JS array variable from Macrotrends iframe HTML."""
-    m = re.search(rf'var\s+{re.escape(var_name)}\s*=\s*\[', html)
-    if not m:
-        return []
-    si = html.index('[', m.start())
-    bc = 0
-    for j, ch in enumerate(html[si:], si):
-        if ch == '[':   bc += 1
-        elif ch == ']':
-            bc -= 1
-            if bc == 0:
-                return json.loads(html[si:j+1])
-    return []
-
-# Works for dataDaily, chartData, or originalData:
-html = http_get("https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/stock_price_history.php?t=AAPL&yb=15")
-daily = extract_chart_var(html, 'dataDaily')
-
-html2 = http_get("https://www.macrotrends.net/assets/php/chart_iframe_comp.php?id=2577&url=sp500-pe-ratio-price-to-earnings-chart")
-pe_data = extract_chart_var(html2, 'originalData')
-```
-
----
-
-## URL construction guide
-
-### Stock pages
-
-```python
-STOCK_BASE = "https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/"
-
-# Price history OHLCV
-f"{STOCK_BASE}stock_price_history.php?t={ticker}"                        # all history
-f"{STOCK_BASE}stock_price_history.php?t={ticker}&yb={years}"             # last N years
-
-# Market cap
-f"{STOCK_BASE}market_cap.php?t={ticker}&yb={years}"
-
-# Fundamentals
-f"{STOCK_BASE}fundamental_iframe.php?t={ticker}&type={type}&statement={stmt}&freq={freq}&sub=&yb={years}"
-# type/statement combos: pe-ratio/price-ratios, revenue/income-statement,
-#                        total-assets/balance-sheet, current-ratio/ratios
-
-# Metrics
-f"{STOCK_BASE}fundamental_metric.php?t={ticker}&chart={metric}&sub=&yb={years}"
-# metrics: profit-margin
-
-# Dividend yield
-f"{STOCK_BASE}dividend_yield.php?t={ticker}&yb={years}"
-```
-
-### Economic / index pages
-
-```python
-# From numeric ID + URL slug (read from page source or page URL)
-f"https://www.macrotrends.net/assets/php/chart_iframe_comp.php?id={id}&url={slug}"
-
-# Economic indicator JSON API (requires Referer header)
-f"https://www.macrotrends.net/economic-data/{page_id}/{freq}"
-```
-
----
-
-## Rate limits and anti-bot
-
-- **No rate limiting observed** at any tested volume. 10 rapid requests to the same stock iframe completed in 1.8s with no throttling, CAPTCHA, or 429 errors.
-- **Default UA works** (`Mozilla/5.0`) for most endpoints. The iframe PHP files never 403'd.
-- **Chrome UA needed** for some main HTML pages (not data endpoints): use when fetching `/stocks/charts/...` or `/2015/...` wrapper pages if you get 403. Switch to:
-  ```python
-  headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}
-  ```
-- **Referer required** for `/economic-data/{id}/{freq}` — send the page URL as `Referer`. Without it, the request is allowed but you get a 403 on some pages.
-- **No cookies, sessions, or auth tokens** needed for any endpoint.
-
----
-
-## Gotchas
-
-**Main page URL ≠ data page:** Some URLs redirect to different content. `/1316/us-national-debt-by-year` redirects to `/1316/us-national-unemployment-rate`. Always check the final URL with `r.url` if the returned data looks wrong. Use the final URL as the Referer.
-
-**yb parameter controls history depth:**
-- `yb=1` → ~250 records (last year)
-- `yb=15` → ~3772 records (last 15 years)
-- omit → full history (AAPL: 11428 records to 1980; default for most queries)
-
-**Two iframe patterns for economic pages:** Pages at `macrotrends.net/NNNN/slug` use either `chart_iframe_comp.php` (→ `originalData`) or `generateChart` + `/economic-data/` API. Check the main page HTML to detect which:
-```python
-if 'chart_iframe_comp.php' in html:   # use extract_index_chart()
-elif 'highchartsURL' in html:          # use get_economic_data()
-```
-
-**Gold data has two price columns:**
-```python
-{'id': 'GOLDAMGBD228NLBM', 'date': '2026-04-01', 'close': '5177.19', 'close1': '5177.190'}
-# 'close'  = inflation-adjusted price (base year adjusts over time)
-# 'close1' = nominal USD price (the raw market price)
-```
-
-**Economic API frequency codes:** Only `D` and `M` consistently return data across most series. `A` and `Q` return `null` for most economic indicators. Always try `D` first.
-
-**chartData fields vary by metric:**
-- `market_cap.php` → `{'date', 'v1'}` (v1 = market cap in $B)
-- `fundamental_iframe.php` type=pe-ratio → `{'date', 'v1', 'v2', 'v3'}` (stock price, EPS, PE)
-- `fundamental_iframe.php` type=revenue → `{'date', 'v1', 'v2', 'v3'}` (TTM revenue, quarterly revenue, YoY%)
-- `fundamental_metric.php` chart=profit-margin → `{'date', 'v1', 'v2', 'v3'}` (gross%, operating%, net%)
-- `dividend_yield.php` → `{'date', 'c', 'ttm_d', 'ttm_dy'}` (price, dividend, yield%)
-
-**Bracket matching required for large arrays:** The `var dataDaily = [...]` in stock iframes is ~450KB with 3772 OHLCV records. The `re.DOTALL` greedy approach works but is slow; bracket-counting (`bc` pattern above) is O(n) and fast.
-
-**No public API for ticker lookup:** To find the company slug for a URL, check the search endpoint: `https://www.macrotrends.net/production/stocks/desktop/PRODUCTION/ticker_search_list.php?v=YYYYMMDD` — but the stock price iframe only needs the ticker symbol (`?t=AAPL`), not the slug.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/medium/article-hydration.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/medium/article-hydration.md
deleted file mode 100644
index a3ac3927d..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/medium/article-hydration.md
+++ /dev/null
@@ -1,120 +0,0 @@
-# Medium — Article Body via DOM
-
-Extract a Medium article's body as clean markdown using the logged-in browser. Use this when API paths in `scraping.md` are blocked or truncated:
-
-- Cloudflare challenge on the `?format=json` endpoint ("Performing security verification")
-- Member-only post that the API returns locked (`isSubscriptionLocked=True`) but the logged-in browser can render in full
-- JS-only variant where the article is gated behind a client-side paywall modal
-
-If the article is free and the API works, prefer `scraping.md` — it's faster and doesn't need a visible tab.
-
-## URL patterns
-
-- Canonical: `https://medium.com/@<author>/<slug>-<id>`
-- Publication: `https://<pub>.medium.com/<slug>-<id>` or `https://medium.com/<pub>/<slug>-<id>`
-- Custom domain: some publications (e.g. `towardsdatascience.com`) proxy Medium; the same DOM extractor works there.
-
-All variants render the article body inside a single `<article>` element.
-
-## Site structure
-
-- The article body lives under the page's single `<article>` element.
-- Block-level content: `h1`–`h4`, `p`, `pre`, `blockquote`, `ul`, `ol`, `figure`.
-- Images are always wrapped in `<figure>` with a `<figcaption>` sibling; the real resolution lives on `miro.medium.com/v2/resize:fit:<N>/...`.
-- Code blocks are `<pre>` — no language class is exposed in the DOM, so emit plain fenced blocks.
-- Pull quotes render as `<blockquote>` with nested `<p>`.
-
-## Cruft to strip
-
-Medium injects engagement UI **inside** `<article>`. The text "6 2 Listen Share More" at the top is the clap/comment/listen/share button row, not content. Also expect a follow button near the author's name and sometimes a "Help" / "Status" footer.
-
-Safe pattern: take the extracted markdown, then drop leading paragraphs that are shorter than ~12 characters until you hit the first real block (the "Last updated" line, the H1, or the first long paragraph).
-
-## Extractor
-
-````bash
-browser-harness <<'PY'
-new_tab("https://medium.com/@user/slug-abc123")
-wait_for_load()
-wait(2.0)  # Medium hydrates more UI after readyState=complete
-
-md = js(r"""
-(()=>{
-  const article = document.querySelector('article');
-  if(!article) return null;
-  const blocks = article.querySelectorAll('h1, h2, h3, h4, p, pre, blockquote, ul, ol, figure');
-  const out = [];
-  const seen = new Set();
-  for(const el of blocks){
-    let skip = false;
-    for(const s of seen){ if(s.contains(el) && s !== el){ skip=true; break; } }
-    if(skip) continue;
-    seen.add(el);
-    const tag = el.tagName;
-    const txt = (el.innerText || '').trim();
-    if(!txt && tag !== 'FIGURE') continue;
-    if(tag === 'H1') out.push('# ' + txt);
-    else if(tag === 'H2') out.push('## ' + txt);
-    else if(tag === 'H3') out.push('### ' + txt);
-    else if(tag === 'H4') out.push('#### ' + txt);
-    else if(tag === 'PRE') out.push('```\n' + txt + '\n```');
-    else if(tag === 'BLOCKQUOTE') out.push(txt.split('\n').map(l=>'> '+l).join('\n'));
-    else if(tag === 'UL' || tag === 'OL'){
-      const items = [...el.querySelectorAll(':scope > li')].map((li,i)=>{
-        const t = li.innerText.trim();
-        return (tag==='OL' ? (i+1)+'. ' : '- ') + t;
-      });
-      out.push(items.join('\n'));
-    }
-    else if(tag === 'FIGURE'){
-      const img = el.querySelector('img');
-      const cap = el.querySelector('figcaption');
-      if(img && img.src){
-        const alt = img.alt || (cap ? cap.innerText.trim() : '');
-        out.push('![' + alt + '](' + img.src + ')');
-      }
-    }
-    else if(tag === 'P') out.push(txt);
-  }
-  return out.join('\n\n');
-})()
-""")
-
-# Strip engagement-button cruft from the top
-paras = md.split('\n\n')
-while paras and len(paras[0]) < 12:
-    paras.pop(0)
-md = '\n\n'.join(paras)
-print(md)
-PY
-````
-
-The `seen` set avoids double-emitting when an `<li>` matches the block query inside its `<ul>`.
-
-## Waits
-
-- `wait_for_load()` is necessary but not sufficient — Medium continues to hydrate author-card and clap widgets after `readyState=complete`. An additional `wait(2.0)` avoids cases where the article outer frame exists but the first few paragraphs are still skeleton `<div>`s.
-- For member-only articles, if `<article>` renders but text length is suspiciously short (<500 chars), the paywall modal intercepted. Confirm the tab is on your logged-in profile and retry.
-
-## Paywall / login detection
-
-```python
-state = js("""
-(()=>{
-  const art = document.querySelector('article');
-  const len = art ? art.innerText.length : 0;
-  const hasPaywall = !!document.querySelector('[data-testid*="paywall"], [aria-label*="Sign in" i]');
-  return {len, hasPaywall};
-})()
-""")
-```
-
-If `hasPaywall` is true or `len < 500`, fall back to `scraping.md` API paths (the article may simply be locked for this account).
-
-## Traps
-
-- **Don't use `article.innerText` alone.** It drops structure — code blocks lose their fences, lists lose their markers, figures disappear. The block walker above preserves each element kind.
-- **Don't rely on CSS class names.** Medium's class names are hashed (`pw-post-body-paragraph`, etc.) and rotate; select by tag instead.
-- **`<figure>` caption text is often also repeated as `<img alt>`.** Prefer `alt`, fall back to `figcaption`, so you don't emit both.
-- **The article ends before the "About the Author" card sometimes, sometimes not.** The walker captures both, which is fine for archival. If you need body-only, cut at the last `h2`/`h3` before a `<hr>`-equivalent divider, or trim by known footer strings (`Follow`, `More from`, `Written by`).
-- **Tab marker.** `new_tab()` prepends 🟢 to the title. Don't include `document.title` in the emitted markdown — use the article's `<h1>` instead.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/medium/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/medium/scraping.md
deleted file mode 100644
index a00e61dcc..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/medium/scraping.md
+++ /dev/null
@@ -1,414 +0,0 @@
-# Medium — Data Extraction
-
-`https://medium.com` — blogging platform. Three access paths tested and validated: the undocumented `?format=json` endpoint (fastest for article + publication data), the undocumented GraphQL API (best for targeted metric lookups), and RSS feeds (best for recent posts lists without auth). No browser needed for any read-only task.
-
-## Do this first: pick your access path
-
-| Goal | Best approach | Latency |
-|------|--------------|---------|
-| Article metadata + full body | `?format=json` on article URL | ~400ms |
-| Article metrics only (claps, visibility) | GraphQL `post(id:)` | ~275ms |
-| Author profile + follower count | GraphQL `user(username:)` | ~220ms |
-| Recent posts for a user (up to 10) | `?format=json` on profile URL | ~240ms |
-| Recent posts for a publication | `?format=json` on publication URL | ~300ms |
-| Paginated post list (feed) | RSS feed | ~260ms |
-| Full article body as HTML | RSS `content:encoded` field | ~260ms |
-| Publication subscriber count | `?format=json` on publication URL | ~300ms |
-
-**Never use a browser for read-only Medium tasks.** All article content, metadata, and metrics are available over HTTP. Browser is only needed for authenticated actions (clapping, posting, account management).
-
----
-
-## The XSSI prefix
-
-Every `?format=json` response starts with the anti-hijacking prefix `])}while(1);</x>` before the JSON. **Strip it before parsing.** The helper below handles this.
-
-```python
-import urllib.request, gzip, json, re
-
-def medium_json(url):
-    """Fetch any Medium URL with ?format=json and return parsed dict.
-    Strips the XSSI prefix ])}while(1);</x> automatically.
-    Works on: article URLs, user profile URLs, publication URLs.
-    Does NOT work on: search pages, /latest, profile stream API.
-    """
-    sep = '&' if '?' in url else '?'
-    req = urllib.request.Request(
-        url + sep + 'format=json',
-        headers={
-            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
-            "Accept": "application/json, */*",
-            "Accept-Encoding": "gzip",
-        }
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        raw = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            raw = gzip.decompress(raw)
-        text = raw.decode()
-    # Strip everything before the first {
-    return json.loads(re.sub(r'^[^\{]+', '', text))
-```
-
----
-
-## Path 1: `?format=json` — article metadata + body (fastest for articles)
-
-Append `?format=json` to any article URL. Returns full metadata, virtuals (metrics), and the complete article body in a structured `bodyModel`. No auth required for public and subscriber-locked articles alike — the metadata and full body are always returned, but paywalled body content in a browser would be truncated.
-
-```python
-data = medium_json("https://medium.com/@karpathy/software-2-0-a64152b37c35")
-payload = data['payload']
-val     = payload['value']        # article fields
-refs    = payload['references']   # User, Social, SocialStats dicts keyed by ID
-
-# --- Article fields ---
-title       = val['title']                              # "Software 2.0"
-article_id  = val['id']                                 # "a64152b37c35"
-creator_id  = val['creatorId']                          # "ac9d9a35533e"
-slug        = val['uniqueSlug']                         # "software-2-0-a64152b37c35"
-url         = val['canonicalUrl']                       # "https://medium.com/@karpathy/..."
-first_pub   = val['firstPublishedAt']                   # unix ms: 1510438733751
-last_pub    = val['latestPublishedAt']                  # unix ms: 1615659523264
-visibility  = val['visibility']                         # 0=public, 2=subscriber-locked
-is_locked   = val['isSubscriptionLocked']               # True if paywalled
-locked_src  = val['lockedPostSource']                   # 0=free, 1=Medium Partner Program
-
-# --- Metrics (in val['virtuals']) ---
-virtuals    = val['virtuals']
-clap_count  = virtuals['totalClapCount']                # 60865 (all claps, including multi-clap)
-recommends  = virtuals['recommends']                    # 8846 (unique clappers)
-read_time   = virtuals['readingTime']                   # 8.79811320754717 (minutes, float)
-word_count  = virtuals['wordCount']                     # 2146
-
-# --- Tags ---
-tags = [t['slug'] for t in virtuals['tags']]
-# ['machine-learning', 'artificial-intelligence', 'programming', 'software-development', 'future']
-
-# --- Author (from references) ---
-user = refs['User'][creator_id]
-author_name = user['name']          # "Andrej Karpathy"
-author_handle = user['username']    # "karpathy"
-author_bio  = user['bio']           # "I like to train deep neural nets on large datasets."
-author_twitter = user['twitterScreenName']  # "karpathy"
-
-# --- Follower count (from SocialStats) ---
-ss = refs['SocialStats'][creator_id]
-follower_count  = ss['usersFollowedByCount']   # 60027
-following_count = ss['usersFollowedCount']     # 183
-```
-
-### Detect paywall
-
-```python
-# Paywalled (Medium Partner Program): isSubscriptionLocked=True, visibility=2, lockedPostSource=1
-# Free: isSubscriptionLocked=False, visibility=0, lockedPostSource=0
-is_paywalled = val['isSubscriptionLocked']   # True / False
-```
-
-Confirmed on real TDS articles: paywalled articles return `isSubscriptionLocked=True`, `visibility=2`, `lockedPostSource=1`. Free articles: all three are `False`/`0`.
-
-### Article body
-
-The full body is in `val['content']['bodyModel']['paragraphs']` — a list of dicts:
-
-```python
-paragraphs = val['content']['bodyModel']['paragraphs']
-
-# Paragraph types (confirmed for this article):
-# type=1  -> body text (P)
-# type=3  -> heading (H1/H2)
-# type=4  -> image (text is empty; metadata has image ID)
-
-# Reconstruct plain text:
-text_paras = [p['text'] for p in paragraphs if p.get('text')]
-full_text   = '\n\n'.join(text_paras)
-```
-
----
-
-## Path 2: GraphQL API — targeted metric lookups
-
-`POST https://medium.com/_/graphql` with a JSON body. No auth, no CSRF token required.
-Returns HTTP 200 with JSON even for unauthenticated queries. Invalid fields return HTTP 400 — do not assume a field exists without testing first.
-
-```python
-import json, urllib.request, gzip
-
-def gql(query):
-    body = json.dumps({"query": query}).encode()
-    req  = urllib.request.Request(
-        "https://medium.com/_/graphql",
-        data=body,
-        headers={
-            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
-            "Content-Type": "application/json",
-            "Accept": "application/json",
-            "Accept-Encoding": "gzip",
-        },
-        method="POST",
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        raw = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            raw = gzip.decompress(raw)
-        return json.loads(raw.decode())
-```
-
-### Fetch article metrics (fastest)
-
-```python
-result = gql("""
-{
-  post(id: "a64152b37c35") {
-    title
-    id
-    firstPublishedAt
-    latestPublishedAt
-    visibility
-    uniqueSlug
-    canonicalUrl
-    mediumUrl
-    isLocked
-    clapCount
-    readingTime
-    wordCount
-  }
-}
-""")
-post = result['data']['post']
-# post['visibility']  -> "PUBLIC" | "LOCKED"  (string, not numeric)
-# post['isLocked']    -> False | True
-# post['clapCount']   -> 60865  (same as totalClapCount in format=json)
-# post['readingTime'] -> 8.79811320754717  (minutes)
-# post['wordCount']   -> 2146
-```
-
-**Confirmed working `post()` fields:** `title`, `id`, `createdAt`, `updatedAt`, `firstPublishedAt`, `latestPublishedAt`, `visibility`, `uniqueSlug`, `canonicalUrl`, `mediumUrl`, `isLocked`, `clapCount`, `readingTime`, `wordCount`
-
-**Nested object that works:** `topics { name slug }`, `creator { name username }`, `collection { name id slug description domain creator { name username } }`
-
-**Fields that return HTTP 400 (not available):** `tags`, `author`, `recommends`, `content`, `publication`, `responses`, `sequence`
-
-### Fetch author profile
-
-```python
-result = gql("""
-{
-  user(username: "karpathy") {
-    name
-    username
-    id
-    bio
-    imageId
-    twitterScreenName
-    mediumMemberAt
-    socialStats {
-      followerCount
-      followingCount
-    }
-  }
-}
-""")
-user = result['data']['user']
-# user['name']                       -> "Andrej Karpathy"
-# user['id']                         -> "ac9d9a35533e"
-# user['bio']                        -> "I like to train deep neural nets on large datasets."
-# user['twitterScreenName']          -> "karpathy"
-# user['socialStats']['followerCount'] -> 60028
-# user['mediumMemberAt']             -> 0 (not a member); nonzero = unix ms join date
-```
-
-**Confirmed working `user()` fields:** `name`, `username`, `id`, `bio`, `imageId`, `twitterScreenName`, `mediumMemberAt`, `socialStats { followerCount followingCount }`
-
-**Fields that return HTTP 400:** `followerCount` (top-level), `followingCount` (top-level), `postCount`
-
-### Fetch collection (publication) by ID
-
-The GraphQL `collection()` query only accepts `id`, not `slug`. Get the ID from `?format=json` on the publication page.
-
-```python
-# TDS Archive id: 7f60cf5620c9  (from medium.com/towards-data-science?format=json)
-result = gql("""
-{
-  collection(id: "7f60cf5620c9") {
-    name
-    id
-    slug
-    description
-    domain
-    creator { name username }
-  }
-}
-""")
-coll = result['data']['collection']
-# coll['name'] -> "TDS Archive"
-# coll['slug'] -> "data-science"
-```
-
----
-
-## Path 3: RSS feeds (best for recent posts list + article bodies)
-
-Works with plain `http_get`. Returns up to 10 most recent posts. Full article HTML is in `content:encoded`. No clap count or visibility info in RSS.
-
-```python
-import re
-from helpers import http_get
-
-def parse_rss_items(rss_xml):
-    """Extract items from Medium RSS feed. Returns list of dicts."""
-    def cdata(tag, text):
-        m = re.search(rf'<{tag}[^>]*><!\[CDATA\[(.*?)\]\]></{tag}>', text, re.DOTALL)
-        return m.group(1).strip() if m else None
-
-    items = []
-    for raw in re.findall(r'<item>(.*?)</item>', rss_xml, re.DOTALL):
-        # link is plain text (not CDATA)
-        link_m = re.search(r'<link>(.*?)</link>', raw, re.DOTALL)
-        items.append({
-            'title':    cdata('title', raw),
-            'link':     link_m.group(1).strip() if link_m else None,
-            'pubDate':  cdata('pubDate', raw),
-            'creator':  cdata('dc:creator', raw),
-            'tags':     re.findall(r'<category><!\[CDATA\[(.*?)\]\]></category>', raw),
-            'body_html': cdata('content:encoded', raw),   # full article HTML
-        })
-    return items
-
-# User feed (up to 10 latest posts)
-rss = http_get("https://medium.com/feed/@karpathy")
-posts = parse_rss_items(rss)
-# posts[0]['title']    -> "Software 2.0"
-# posts[0]['pubDate']  -> "Sat, 11 Nov 2017 22:18:53 GMT"
-# posts[0]['creator']  -> "Andrej Karpathy"
-# posts[0]['tags']     -> ['programming', 'software-development', 'artificial-intelligence', 'future', 'machine-learning']
-# posts[0]['link']     -> "https://karpathy.medium.com/software-2-0-a64152b37c35?source=rss-..."
-# posts[0]['body_html'] -> full article body as HTML string (~15KB for this article)
-
-# Publication feed (up to 10 latest posts)
-rss_pub = http_get("https://medium.com/feed/towards-data-science")
-pub_posts = parse_rss_items(rss_pub)
-```
-
-**RSS limitations:**
-- RSS does not include clap count, view count, or paywall status.
-- `body_html` contains the full article body as HTML, including `<p>`, `<strong>`, `<a>`, `<img>` tags.
-- Pagination is not supported — RSS always returns the 10 most recent posts.
-
----
-
-## Path 4: `?format=json` on user profile — recent posts with metrics
-
-Better than RSS when you need clap counts alongside post list. Returns up to `limit` posts (default 10) plus full author metadata.
-
-```python
-data = medium_json("https://medium.com/@karpathy?limit=10")
-payload = data['payload']
-
-user = payload['user']
-# user['name']     -> "Andrej Karpathy"
-# user['username'] -> "karpathy"
-# user['bio']      -> "I like to train deep neural nets on large datasets."
-
-refs = payload['references']
-ss   = refs['SocialStats'][user['userId']]
-# ss['usersFollowedByCount'] -> 60028 (followers)
-# ss['usersFollowedCount']   -> 183   (following)
-
-posts = refs.get('Post', {})  # dict keyed by post ID
-for pid, p in posts.items():
-    v = p['virtuals']
-    print(p['title'], v['totalClapCount'], round(v['readingTime'], 1))
-
-# Paginate: use paging['next'] from payload
-paging = payload['paging']
-next_params = paging['next']
-# next_params = {'limit': 10, 'to': '1495652975362', 'source': 'overview', 'page': 2, 'ignoredIds': []}
-# Append as query params to the same profile URL to get next page
-next_url = (
-    f"https://medium.com/@{user['username']}"
-    f"?limit={next_params['limit']}&to={next_params['to']}"
-    f"&source={next_params['source']}&page={next_params['page']}"
-)
-data2 = medium_json(next_url)
-# Note: karpathy has only 8 total posts — pagination returns same refs on page 2
-```
-
----
-
-## Path 5: `?format=json` on publication page
-
-Returns publication metadata and recent posts with metrics.
-
-```python
-data = medium_json("https://medium.com/towards-data-science")
-payload = data['payload']
-
-coll = payload['collection']
-# coll['name']            -> "TDS Archive"
-# coll['slug']            -> "data-science"
-# coll['description']     -> full description string
-# coll['subscriberCount'] -> 828527
-# coll['metadata']['followerCount'] -> 828527
-# coll['tags']            -> ['DATA SCIENCE', 'MACHINE LEARNING', ...]
-
-posts = payload['references'].get('Post', {})
-for pid, p in posts.items():
-    v = p['virtuals']
-    print(p['title'], v['totalClapCount'], p['isSubscriptionLocked'])
-# Also includes: p['visibility'] (0=free, 2=paywalled)
-
-# Paginate (same pattern as user profile)
-paging = payload['paging']
-# paging['next'] = {'to': '1738573325936', 'page': 3}
-```
-
----
-
-## Retrieving the article ID from a URL
-
-The `id` is the last 12 hex chars of a Medium article URL slug:
-
-```python
-import re
-
-url = "https://medium.com/@karpathy/software-2-0-a64152b37c35"
-article_id = re.search(r'-([a-f0-9]{12})$', url.rstrip('/').split('?')[0])
-if article_id:
-    article_id = article_id.group(1)   # "a64152b37c35"
-```
-
-This ID is the same across all URL forms (`medium.com/@user/slug`, `user.medium.com/slug`, `medium.com/publication/slug`).
-
----
-
-## Gotchas
-
-- **HTTP 403 on plain `http_get`** — The default `http_get` helper sends `User-Agent: Mozilla/5.0` which Medium accepts for most endpoints, but article HTML pages (without `?format=json`) return 403. Always use `?format=json` for article and profile pages.
-
-- **`?format=json` works; profile stream API does not** — `https://medium.com/_/api/users/{id}/profile/stream` returns HTTP 403 for unauthenticated requests. Use `?format=json` on the profile URL instead.
-
-- **`?format=json` on search pages returns 403 or broken JSON** — `medium.com/search?q=...&format=json` and `medium.com/search/posts?q=...&format=json` both fail. Search is not available without auth.
-
-- **GraphQL `collection()` requires ID, not slug** — `collection(slug: "towards-data-science")` returns HTTP 400. You must use the numeric ID (e.g. `"7f60cf5620c9"`). Get it from `?format=json` on the publication page: `payload['collection']['id']`.
-
-- **GraphQL `tags` field on `post()` returns HTTP 400** — Use `topics { name slug }` instead. Topics are a subset of tags but work without auth.
-
-- **GraphQL visibility is a string, not a number** — `post().visibility` returns `"PUBLIC"` or `"LOCKED"` (string). The `?format=json` `value.visibility` field uses integers: `0`=public, `2`=locked. Both agree on the lock status.
-
-- **`totalClapCount` vs `recommends`** — `totalClapCount` (60865) counts all claps (Medium allows up to 50 claps per reader). `recommends` (8846) counts unique clappers. The GraphQL `clapCount` field equals `totalClapCount`, not `recommends`.
-
-- **RSS returns at most 10 items, no clap counts** — RSS is best for getting recent article links + full HTML body. Use `?format=json` profile if you need metrics.
-
-- **RSS link contains tracking params** — `posts[0]['link']` includes `?source=rss-{userId}------2`. Strip with `.split('?')[0]` if you need a clean URL.
-
-- **`content:encoded` in RSS is full HTML, not plaintext** — Strip HTML tags if you want plaintext: `re.sub(r'<[^>]+>', '', body_html)`.
-
-- **Medium subdomains** — Some users have custom subdomains (`karpathy.medium.com`). Both `medium.com/@karpathy/...` and `karpathy.medium.com/...` resolve to the same article; `?format=json` works on both.
-
-- **towardsdatascience.com is no longer Medium** — TDS moved to its own WordPress site. `towardsdatascience.com/article-slug?format=json` returns full WordPress HTML, not Medium JSON. Use `medium.com/towards-data-science` for the archived Medium publication.
-
-- **No public search API** — Medium has no Algolia equivalent. Finding articles by keyword requires either a browser, or fetching a user/publication feed and filtering locally.
-
-- **Timestamps are unix milliseconds** — `firstPublishedAt`, `createdAt`, `latestPublishedAt` are all in milliseconds. Convert: `datetime.fromtimestamp(val['firstPublishedAt'] / 1000, tz=timezone.utc)`.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/metacritic/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/metacritic/scraping.md
deleted file mode 100644
index 1f3508093..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/metacritic/scraping.md
+++ /dev/null
@@ -1,477 +0,0 @@
-# Metacritic — Scraping & Data Extraction
-
-Field-tested against metacritic.com on 2026-04-18. All code blocks validated with live requests.
-
-## Do this first
-
-**Use the backend API — it returns clean JSON with both scores in one call, no HTML parsing.**
-
-Metacritic's internal backend API is publicly accessible with a stable key embedded in every page. It covers games, movies, and TV shows.
-
-```python
-import json
-
-API_KEY = "1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u"
-
-def get_game_scores(slug):
-    """slug = URL slug e.g. 'elden-ring', 'the-last-of-us'"""
-    base = f"https://backend.metacritic.com"
-    product = json.loads(http_get(
-        f"{base}/games/metacritic/{slug}/web"
-        f"?componentName=product&componentType=Product&apiKey={API_KEY}"
-    ))
-    user_stats = json.loads(http_get(
-        f"{base}/reviews/metacritic/user/games/{slug}/stats/web"
-        f"?componentName=user-score-summary&componentType=ScoreSummary&apiKey={API_KEY}"
-    ))
-    item = product["data"]["item"]
-    crit = item["criticScoreSummary"]
-    user = user_stats["data"]["item"]
-    return {
-        "title": item["title"],
-        "platform": item["platform"],           # lead platform
-        "platforms": item["platforms"],          # list with per-platform scores
-        "metascore": crit["score"],              # int 0–100 or None
-        "critic_reviews": crit["reviewCount"],  # int
-        "critic_sentiment": crit["sentiment"],   # e.g. "Universal acclaim"
-        "user_score": user["score"],             # float 0.0–10.0 or None
-        "user_reviews": user["reviewCount"],     # int
-        "user_sentiment": user["sentiment"],
-        "release_date": item["releaseDate"],     # "YYYY-MM-DD"
-    }
-
-print(get_game_scores("the-last-of-us"))
-# {'title': 'The Last of Us', 'platform': 'PlayStation 3',
-#  'metascore': 95, 'critic_reviews': 98,
-#  'user_score': 9.2, 'user_reviews': 17207, ...}
-```
-
-Use the browser **only** if you need music pages — `metacritic.com/music/*` returns HTTP 403 to `http_get` (Cloudflare blocks those routes). Games, movies, and TV all work with plain HTTP.
-
----
-
-## Fastest approach: JSON-LD (critic score + review count only)
-
-If you only need the Metascore and critic review count and don't need the user score, JSON-LD is the single-call option — no API key, no separate request:
-
-```python
-import json, re
-
-url = "https://www.metacritic.com/game/elden-ring/"   # or /movie/ or /tv/
-html = http_get(url)
-
-block = re.findall(
-    r'<script[^>]*type="application/ld\+json"[^>]*>(.*?)</script>',
-    html, re.DOTALL
-)[0]
-ld = json.loads(block)
-
-agg = ld["aggregateRating"]
-print(ld["name"])                   # "Elden Ring"
-print(agg["ratingValue"])           # 96  (metascore)
-print(agg["reviewCount"])           # 93  (critic reviews count)
-print(ld.get("gamePlatform"))       # ['Xbox One', 'PC', 'PlayStation 4', 'Xbox Series X', 'PlayStation 5']
-print(ld.get("genre"))              # "Action RPG"
-print(ld.get("contentRating"))      # "M"
-print(ld.get("datePublished"))      # "2022-02-25"
-```
-
-**JSON-LD limitations:**
-- Only contains Metascore and critic review count — **no user score, no user review count**.
-- For multi-platform games, `ratingValue` is the lead platform score; all platforms are listed in `gamePlatform` but without individual scores.
-- `@type` is `VideoGame` for games, `Movie` for movies, `TVSeries` for TV shows.
-
----
-
-## Common workflows
-
-### Get scores for a single title (backend API)
-
-```python
-import json
-
-API_KEY = "1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u"
-BASE = "https://backend.metacritic.com"
-
-# Games
-def game_scores(slug):
-    p = json.loads(http_get(f"{BASE}/games/metacritic/{slug}/web?componentName=product&componentType=Product&apiKey={API_KEY}"))
-    u = json.loads(http_get(f"{BASE}/reviews/metacritic/user/games/{slug}/stats/web?componentName=user-score-summary&componentType=ScoreSummary&apiKey={API_KEY}"))
-    c = p["data"]["item"]["criticScoreSummary"]
-    us = u["data"]["item"]
-    return {"metascore": c["score"], "critic_reviews": c["reviewCount"],
-            "user_score": us["score"], "user_reviews": us["reviewCount"]}
-
-# Movies
-def movie_scores(slug):
-    p = json.loads(http_get(f"{BASE}/movies/metacritic/{slug}/web?componentName=product&componentType=Product&apiKey={API_KEY}"))
-    u = json.loads(http_get(f"{BASE}/reviews/metacritic/user/movies/{slug}/stats/web?componentName=user-score-summary&componentType=ScoreSummary&apiKey={API_KEY}"))
-    c = p["data"]["item"]["criticScoreSummary"]
-    us = u["data"]["item"]
-    return {"metascore": c["score"], "critic_reviews": c["reviewCount"],
-            "user_score": us["score"], "user_reviews": us["reviewCount"]}
-
-# TV shows
-def show_scores(slug):
-    p = json.loads(http_get(f"{BASE}/shows/metacritic/{slug}/web?componentName=product&componentType=Product&apiKey={API_KEY}"))
-    u = json.loads(http_get(f"{BASE}/reviews/metacritic/user/shows/{slug}/stats/web?componentName=user-score-summary&componentType=ScoreSummary&apiKey={API_KEY}"))
-    c = p["data"]["item"]["criticScoreSummary"]
-    us = u["data"]["item"]
-    return {"metascore": c["score"], "critic_reviews": c["reviewCount"],
-            "user_score": us["score"], "user_reviews": us["reviewCount"]}
-
-# Verified results:
-print(game_scores("the-last-of-us"))
-# {'metascore': 95, 'critic_reviews': 98, 'user_score': 9.2, 'user_reviews': 17207}
-print(movie_scores("the-godfather"))
-# {'metascore': 100, 'critic_reviews': 16, 'user_score': 9.2, 'user_reviews': 4450}
-print(show_scores("breaking-bad"))
-# {'metascore': 87, 'critic_reviews': 98, 'user_score': 9.4, 'user_reviews': 19070}
-```
-
-### Parallel fetching (multiple titles at once)
-
-10 API calls in 0.68s with 5 workers — no rate-limit errors:
-
-```python
-import json
-from concurrent.futures import ThreadPoolExecutor
-import urllib.request, gzip
-
-API_KEY = "1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u"
-
-def _fetch(url):
-    h = {"User-Agent": "Mozilla/5.0", "Accept-Encoding": "gzip"}
-    with urllib.request.urlopen(urllib.request.Request(url, headers=h), timeout=15) as r:
-        data = r.read()
-        if r.headers.get("Content-Encoding") == "gzip": data = gzip.decompress(data)
-        return json.loads(data.decode())
-
-def batch_game_scores(slugs, workers=5):
-    """Fetch critic+user scores for multiple game slugs in parallel."""
-    BASE = "https://backend.metacritic.com"
-    AK = f"apiKey={API_KEY}"
-
-    def fetch_one(slug):
-        c = _fetch(f"{BASE}/reviews/metacritic/critic/games/{slug}/stats/web?componentName=critic-score-summary&componentType=ScoreSummary&{AK}")
-        u = _fetch(f"{BASE}/reviews/metacritic/user/games/{slug}/stats/web?componentName=user-score-summary&componentType=ScoreSummary&{AK}")
-        ci = c["data"]["item"]
-        ui = u["data"]["item"]
-        return {"slug": slug, "metascore": ci["score"], "critic_reviews": ci["reviewCount"],
-                "user_score": ui["score"], "user_reviews": ui["reviewCount"]}
-
-    with ThreadPoolExecutor(max_workers=workers) as ex:
-        return list(ex.map(fetch_one, slugs))
-
-results = batch_game_scores([
-    "the-last-of-us", "elden-ring", "god-of-war",
-    "red-dead-redemption-2", "the-witcher-3-wild-hunt"
-])
-# the-last-of-us: meta=95/98critics, user=9.2/17207
-# elden-ring: meta=96/86critics, user=8.4/23344
-# god-of-war: meta=94/118critics, user=9.0/30439
-# red-dead-redemption-2: meta=97/99critics, user=9.0/35306
-# the-witcher-3-wild-hunt: meta=92/79critics, user=9.1/19370
-```
-
-### Search by title
-
-```python
-import json, urllib.parse
-
-API_KEY = "1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u"
-
-def search(query, media_type=None, limit=10):
-    """
-    media_type: None='all', 'games' (mcoTypeId=13), 'movies' (2), 'shows' (1)
-    Returns list of {title, type, slug, premiereYear, metascore, user_score}
-    """
-    type_map = {"games": 13, "movies": 2, "shows": 1}
-    q = urllib.parse.quote(query)
-    type_param = f"&mcoTypeId={type_map[media_type]}" if media_type else "&mcoTypeId=1%2C2%2C3%2C13"
-    url = (
-        f"https://backend.metacritic.com/finder/metacritic/search/{q}/web"
-        f"?offset=0&limit={limit}&sortBy=META_SCORE&sortDirection=DESC"
-        f"{type_param}&componentName=search&componentType=SearchResult"
-        f"&apiKey={API_KEY}"
-    )
-    data = json.loads(http_get(url))
-    items = data["data"]["items"]
-    return [
-        {
-            "title": i["title"],
-            "type": i["type"],          # "game-title", "movie", "tv-show"
-            "slug": i["slug"],
-            "year": i.get("premiereYear"),
-            "metascore": i.get("criticScoreSummary", {}).get("score"),
-            "user_score": i.get("userScore"),  # None in search results (use stats API for this)
-        }
-        for i in items
-    ]
-
-results = search("elden ring", media_type="games")
-# [{'title': 'Elden Ring', 'type': 'game-title', 'slug': 'elden-ring', 'year': 2022, 'metascore': 96, ...}]
-```
-
-**Note:** `userScore` is `None` in search results. Call the stats API with the `slug` to get it.
-
-### Browse/list titles by score
-
-```python
-import json
-
-API_KEY = "1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u"
-
-def browse_games(sort_by="-metaScore", year_min=None, year_max=None, offset=0, limit=24):
-    """
-    sort_by: '-metaScore' | '-releaseDate' | '-popularityCount'
-    Returns up to `limit` games with criticScoreSummary and userScore.
-    Total available: ~14,160 games.
-    """
-    params = f"sortBy={sort_by}&mcoTypeId=13&offset={offset}&limit={limit}"
-    if year_min: params += f"&releaseYearMin={year_min}"
-    if year_max: params += f"&releaseYearMax={year_max}"
-    url = f"https://backend.metacritic.com/finder/metacritic/web?{params}&componentName=finder&componentType=Finder&apiKey={API_KEY}"
-    data = json.loads(http_get(url))
-    total = data["data"]["totalResults"]
-    items = data["data"]["items"]
-    return total, [
-        {
-            "title": i["title"],
-            "slug": i["slug"],
-            "year": i.get("premiereYear"),
-            "metascore": i.get("criticScoreSummary", {}).get("score"),
-            "critic_reviews": i.get("criticScoreSummary", {}).get("reviewCount"),
-            "user_score": i.get("userScore", {}).get("score") if isinstance(i.get("userScore"), dict) else i.get("userScore"),
-        }
-        for i in items
-    ]
-
-total, games = browse_games(year_min=2023, year_max=2024)
-print(f"{total} games 2023-2024")   # 953
-for g in games[:3]:
-    print(g)
-# {'title': "Baldur's Gate 3", 'year': 2023, 'metascore': 96, 'user_score': 9.2}
-# {'title': 'The Legend of Zelda: Tears of the Kingdom', 'year': 2023, 'metascore': 96, ...}
-```
-
-Finder API totals (confirmed 2026-04-18):
-- Games (mcoTypeId=13): 14,160
-- Movies (mcoTypeId=2): 17,152
-- TV Shows (mcoTypeId=1): 3,392
-
-### Get per-platform scores for multi-platform games
-
-```python
-import json
-
-API_KEY = "1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u"
-
-def game_platforms(slug):
-    url = (
-        f"https://backend.metacritic.com/games/metacritic/{slug}/web"
-        f"?componentName=product&componentType=Product&apiKey={API_KEY}"
-    )
-    data = json.loads(http_get(url))
-    platforms = data["data"]["item"]["platforms"]
-    return [
-        {
-            "name": p["name"],
-            "slug": p["slug"],
-            "is_lead": p["isLeadPlatform"],
-            "metascore": p["criticScoreSummary"]["score"],        # None if <4 reviews
-            "critic_reviews": p["criticScoreSummary"]["reviewCount"],
-            "release_date": p["releaseDate"],
-        }
-        for p in platforms
-    ]
-
-print(game_platforms("elden-ring"))
-# [{'name': 'Xbox One', 'slug': 'xbox-one', 'is_lead': False, 'metascore': None, ...},
-#  {'name': 'PC', 'slug': 'pc', 'is_lead': False, 'metascore': 94, 'critic_reviews': 63},
-#  {'name': 'PlayStation 4', 'slug': 'playstation-4', 'is_lead': False, 'metascore': None, 'critic_reviews': 1},
-#  {'name': 'Xbox Series X', 'slug': 'xbox-series-x', 'is_lead': False, 'metascore': 96, 'critic_reviews': 19},
-#  {'name': 'PlayStation 5', 'slug': 'playstation-5', 'is_lead': True, 'metascore': 96, 'critic_reviews': 93}]
-```
-
-### Get critic reviews (paginated)
-
-```python
-import json
-
-API_KEY = "1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u"
-
-def get_critic_reviews(slug, media="games", offset=0, limit=10, sort="date"):
-    """sort: 'date' | 'score' | 'publication'"""
-    url = (
-        f"https://backend.metacritic.com/reviews/metacritic/critic/{media}/{slug}/web"
-        f"?offset={offset}&limit={limit}&sort={sort}"
-        f"&componentName=latest-critic-reviews&componentType=CriticReviewList&apiKey={API_KEY}"
-    )
-    data = json.loads(http_get(url))
-    total = data["data"]["totalResults"]   # e.g. 98
-    items = data["data"]["items"]
-    return total, [
-        {
-            "score": r["score"],                # int 0–100
-            "publication": r["publicationName"],
-            "quote": r["quote"],
-            "date": r["date"],
-            "url": r.get("url"),
-        }
-        for r in items
-    ]
-
-total, reviews = get_critic_reviews("the-last-of-us", sort="score")
-print(f"{total} critic reviews")   # 98
-print(reviews[0])
-# {'score': 97, 'publication': 'GamingXP', 'quote': 'Flawless in its ambition...', 'date': '...'}
-```
-
-### Get user reviews (paginated)
-
-```python
-import json
-
-API_KEY = "1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u"
-
-def get_user_reviews(slug, media="games", offset=0, limit=10, order_by="score", order_type="desc"):
-    """order_by: 'score' | 'date' | 'helpfulness'"""
-    url = (
-        f"https://backend.metacritic.com/reviews/metacritic/user/{media}/{slug}/web"
-        f"?offset={offset}&limit={limit}&orderBy={order_by}&orderType={order_type}"
-        f"&componentName=top-user-reviews&componentType=UserReviewList&apiKey={API_KEY}"
-    )
-    data = json.loads(http_get(url))
-    total = data["data"]["totalResults"]   # e.g. 2983 for The Last of Us
-    items = data["data"]["items"]
-    return total, [
-        {
-            "score": r["score"],   # int 0–10
-            "quote": r["quote"],
-            "date": r["date"],
-            "spoiler": r.get("spoiler", False),
-        }
-        for r in items
-    ]
-
-total, reviews = get_user_reviews("the-last-of-us")
-print(f"{total} user reviews")   # 2983
-```
-
-### NUXT_DATA extraction (alternative, no API key)
-
-If you need to avoid the backend API (e.g., the API key rotates), the HTML page embeds all score data in `<script id="__NUXT_DATA__">` as a flat pool with integer cross-references. This is more fragile but requires no API key:
-
-```python
-import json, re
-
-url = "https://www.metacritic.com/game/the-last-of-us/"
-html = http_get(url)
-
-pool = json.loads(
-    re.search(r'<script[^>]*id="__NUXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL).group(1)
-)
-
-def deref(pool, idx, depth=0, visited=None):
-    if visited is None: visited = set()
-    if not isinstance(idx, int) or idx in visited or depth > 4: return idx
-    visited.add(idx)
-    val = pool[idx]
-    if isinstance(val, dict):
-        return {k: deref(pool, v, depth+1, visited) if isinstance(v, int) else v for k, v in val.items()}
-    elif isinstance(val, list) and len(val) == 2 and isinstance(val[0], str) and val[0] in ('Ref','Reactive','ShallowReactive','ShallowRef'):
-        return deref(pool, val[1], depth+1, visited)
-    elif isinstance(val, list):
-        return [deref(pool, v, depth+1, visited) if isinstance(v, int) else v for v in val]
-    return val
-
-# Find components
-components_idx = next(
-    pool[i] for i, v in enumerate(pool)
-    if isinstance(v, list) and len(v) > 3 and all(isinstance(x, int) for x in v[:3])
-    and isinstance(pool[v[0]], dict) and 'data' in pool[v[0]] and 'meta' in pool[v[0]]
-)
-
-def find_component(pool, name):
-    for i, val in enumerate(pool):
-        if not isinstance(val, dict) or 'data' not in val or 'meta' not in val: continue
-        meta = pool[val['meta']] if isinstance(val.get('meta'), int) else {}
-        if not isinstance(meta, dict): continue
-        cname = pool[meta.get('componentName')] if isinstance(meta.get('componentName'), int) else ''
-        if cname == name: return deref(pool, val['data'])
-    return None
-
-critic = find_component(pool, 'critic-score-summary')
-user = find_component(pool, 'user-score-summary')
-print("Metascore:", critic['item']['score'])        # 95
-print("Critic reviews:", critic['item']['reviewCount'])  # 98
-print("User score:", user['item']['score'])         # 9.2
-print("User reviews:", user['item']['reviewCount']) # 17207
-```
-
----
-
-## URL slug patterns
-
-Metacritic slugs are lowercased, spaces replaced with hyphens, special chars dropped:
-
-| Title | Slug |
-|-------|------|
-| `The Last of Us` | `the-last-of-us` |
-| `Baldur's Gate 3` | `baldurs-gate-3` |
-| `Elden Ring: Shadow of the Erdtree` | `elden-ring-shadow-of-the-erdtree` |
-| `Breaking Bad` | `breaking-bad` |
-| `The Godfather` | `the-godfather` |
-
-Derive slug from the page URL: everything between the media-type path and the trailing slash.
-
-```python
-import re
-
-def slug_from_url(url):
-    # Works for /game/, /movie/, /tv/
-    m = re.search(r'/(?:game|movie|tv)/([^/]+)/', url)
-    return m.group(1) if m else None
-
-slug_from_url("https://www.metacritic.com/game/the-last-of-us/")  # "the-last-of-us"
-```
-
----
-
-## Anti-bot measures
-
-- **Cloudflare** is in front of both `metacritic.com` and `backend.metacritic.com` (confirmed via `CF-Ray` and `Server: cloudflare` headers).
-- **Frontend pages** (`metacritic.com/*`): Require a non-empty User-Agent. `Mozilla/5.0` works. `python-requests/...` or empty UA returns HTTP 403.
-- **Backend API** (`backend.metacritic.com`): Same rule — any non-empty User-Agent works, including `curl/7.84.0` and `python-requests/2.31.0`. Only truly empty UA gets 403.
-- **Music pages** (`metacritic.com/music/*`): HTTP 403 even with valid User-Agent — Cloudflare blocks that path category. The backend API for music also 404s. Use the browser via CDP for music pages.
-- **Cache**: Frontend pages are Cloudflare-cached (10 minute TTL, `Cache-Control: public, max-age=600`). Backend API responses are not cached (`CF-Cache-Status: MISS`).
-- **No CAPTCHA** observed during testing with any of the approaches above.
-- **No rate limit** hit during testing: 10 sequential calls at 3.6 calls/sec, 10 parallel calls in 0.68s — all succeeded.
-- **PerimeterX**: Not detected in response headers or HTML.
-
----
-
-## Gotchas
-
-- **API key is embedded in every Metacritic page HTML** — find it by searching for `apiKey=` in `backend.metacritic.com` URLs. The key `1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u` was confirmed active as of 2026-04-18. If it rotates, fetch any Metacritic page and extract it: `re.search(r'apiKey=([A-Za-z0-9]+)', html).group(1)`.
-
-- **userScore is None in search results** — the `/finder/metacritic/search/` endpoint returns `userScore: null` for all results. Call the stats API separately with the slug to get user score + review count.
-
-- **Metascore None means < 4 reviews** — the backend API returns `score: null` (not 0) when a title has fewer than 4 critic reviews. Always check `if score is not None` before using.
-
-- **Multi-platform games: JSON-LD shows lead platform score** — for a game on PS5 and Xbox, `aggregateRating.ratingValue` in JSON-LD is the lead platform's score (the platform marked `isLeadPlatform: True` in the backend API). All platforms are listed in `gamePlatform` but without individual scores. Use the product API's `platforms` array for per-platform breakdown.
-
-- **Music is blocked** — `metacritic.com/music/*` returns HTTP 403 even with a real browser User-Agent. The backend API for albums (`/albums/metacritic/...`) also returns 404. Music data requires a real browser session via CDP.
-
-- **NUXT_DATA pool deref is fragile** — the pool structure is a flat array where every value is either a primitive or an integer index pointing elsewhere in the array. Component locations shift between page loads but the component names (`critic-score-summary`, `user-score-summary`) are stable.
-
-- **`http_get` default UA is `Mozilla/5.0`** — this works for Metacritic. No need to override it.
-
-- **Backend API `componentName` and `componentType` params are required** — omitting them returns HTTP 400. Use the exact values shown in the code examples above.
-
-- **Finder `userScore` format**: In finder/browse results, `userScore` is `{"score": 9.1}` (a dict with just `score`). In the stats API, it's a full object with `reviewCount`, `sentiment`, etc. In search results, it's `null`.
-
-- **Media type URL paths**: games=`/games/`, movies=`/movies/`, TV shows=`/shows/`. There is no `/tv/` path in the backend API (the frontend uses `/tv/` but the API uses `/shows/`).
-
-- **Finder API does not support free-text search** — the `q=` param is silently ignored and returns 0 results. Use the `/finder/metacritic/search/{query}/web` endpoint for title search.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/musicbrainz/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/musicbrainz/scraping.md
deleted file mode 100644
index ef5a7d8f5..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/musicbrainz/scraping.md
+++ /dev/null
@@ -1,478 +0,0 @@
-# MusicBrainz — Data Extraction
-
-`https://musicbrainz.org` — open music encyclopedia with a fully free JSON API.
-No auth required for reads. No browser needed for any documented workflow.
-
-Field-tested against musicbrainz.org on 2026-04-18.
-
----
-
-## Do this first
-
-**The MusicBrainz Web Service API (ws/2) returns clean JSON for all entity types — no browser needed.**
-
-```python
-from helpers import http_get
-import json
-
-# REQUIRED: every request must include this header or you get HTTP 403
-UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
-
-data = json.loads(http_get("https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5", headers=UA))
-for a in data['artists']:
-    print(a['id'], a['name'], a.get('type'), a.get('country'), a['score'])
-# 0383dadf-2a4e-4d10-a46a-e9e041da8eb3  Queen  Group  GB  100
-# 79239441-bfd5-4981-a70c-55c3f15c1287  Madonna  Person  US  73
-```
-
-`User-Agent` is **mandatory** — omitting it returns HTTP 403 immediately. Format: `AppName/Version (contact@email.com)`.
-
----
-
-## Entity types
-
-| Entity | Endpoint | Key fields |
-|---|---|---|
-| `artist` | `/ws/2/artist/` | name, sort-name, type (Group/Person/Orchestra/Choir), country, life-span, tags, rating |
-| `release-group` | `/ws/2/release-group/` | title, primary-type (Album/Single/EP/Other), first-release-date |
-| `release` | `/ws/2/release/` | title, date, country, status (Official/Bootleg/Promotional), barcode, label-info, media |
-| `recording` | `/ws/2/recording/` | title, length (milliseconds), artist-credit, releases |
-| `label` | `/ws/2/label/` | name, type, country, area |
-| `work` | `/ws/2/work/` | title, type (Song/Aria/Soundtrack/etc.), relations |
-
-All entities share the same MBID (MusicBrainz ID) format: UUID v4, e.g. `0383dadf-2a4e-4d10-a46a-e9e041da8eb3`.
-
----
-
-## Common workflows
-
-### Artist search
-
-```python
-from helpers import http_get
-import json
-
-UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
-
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/artist/?query=queen&fmt=json&limit=5",
-    headers=UA
-))
-# resp keys: count (total matches), offset, artists (list)
-for a in resp['artists']:
-    print(a['id'])           # MBID: 0383dadf-2a4e-4d10-a46a-e9e041da8eb3
-    print(a['name'])         # Queen
-    print(a['sort-name'])    # Queen  (differs for persons: "Bowie, David")
-    print(a.get('type'))     # Group / Person / Orchestra / Choir
-    print(a.get('country'))  # GB
-    print(a.get('life-span'))# {'begin': '1970-06-27', 'end': None, 'ended': True}
-    print(a.get('disambiguation', ''))  # e.g. "English singer-songwriter"
-    print(a['score'])        # relevance 0-100
-```
-
-### Artist by MBID (with related data via `inc=`)
-
-```python
-# inc= parameters stack with + between them
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/artist/0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
-    "?inc=releases+tags+ratings+release-groups&fmt=json",
-    headers=UA
-))
-print(resp['name'])      # Queen
-print(resp['type'])      # Group
-print(resp['country'])   # GB
-print(resp['life-span']) # {'begin': '1970-06-27', 'end': None, 'ended': True}
-
-# Tags (community-voted genre labels, sorted by count)
-tags = sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)
-print([t['name'] for t in tags[:5]])
-# ['rock', 'glam rock', 'hard rock', 'art rock', 'british']
-
-# Rating (community score, 0-5)
-print(resp.get('rating'))  # {'votes-count': 43, 'value': 4.7}
-
-# Direct releases (up to 25 per request — use browse for full list)
-for r in resp.get('releases', []):
-    print(r['id'], r['title'], r.get('date'))
-
-# Release groups (albums, singles, EPs — deduplicated by edition)
-for rg in resp.get('release-groups', []):
-    print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
-# 6b47c9a0  A Night at the Opera  Album  1975-11-21
-# 002ed683  Sheer Heart Attack    Album  1974-11-01
-```
-
-### Browse releases by artist (full list)
-
-```python
-# Browse API: uses 'artist' param (not 'query') — response key is 'release-count' not 'count'
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/release/"
-    "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25&offset=0",
-    headers=UA
-))
-print(resp['release-count'])   # 1635 — total releases for this artist
-for r in resp['releases']:
-    print(r['id'], r['title'], r.get('date'), r.get('country'), r.get('status'))
-    # Also has: cover-art-archive.artwork (bool), cover-art-archive.front (bool)
-    caa = r.get('cover-art-archive', {})
-    print(caa.get('artwork'), caa.get('front'), caa.get('count'))
-
-# Paginate: increment offset by limit
-```
-
-### Release search and lookup
-
-```python
-# Search by title
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/release/?query=dark+side+of+the+moon&fmt=json&limit=5",
-    headers=UA
-))
-# resp keys: count, offset, releases
-
-# Full release with track list, artists, and labels
-release = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/release/b84ee12a-09ef-421b-82de-0441a926375b"
-    "?inc=artists+recordings+labels+release-groups&fmt=json",
-    headers=UA
-))
-print(release['title'])   # The Dark Side of the Moon
-print(release['date'])    # 1973-03-24
-print(release['status'])  # Official
-print(release['country']) # GB
-
-# Release group (the "album concept", deduplicates editions)
-rg = release.get('release-group', {})
-print(rg['title'], rg.get('primary-type'), rg['id'])
-# The Dark Side of the Moon  Album  f5093c06-23e3-404f-aeaa-40f72885ee3a
-
-# Artist credit
-for ac in release.get('artist-credit', []):
-    if isinstance(ac, dict) and 'artist' in ac:
-        print(ac['artist']['name'], ac['artist']['id'])
-        # Pink Floyd  83d91898-7763-47d7-b03b-b92132375c47
-
-# Labels
-for li in release.get('label-info', []):
-    label = li.get('label', {})
-    print(label.get('name'), li.get('catalog-number'))
-    # Harvest  SHVL 804
-
-# Track list (from media[].tracks[])
-for disc in release.get('media', []):
-    for track in disc.get('tracks', []):
-        dur_s = track['length'] // 1000 if track.get('length') else None
-        rec = track.get('recording', {})
-        print(track['number'], track['title'], dur_s, rec.get('id'))
-        # A1  Speak to Me  68s  bef3fddb-5aca-49f5-b2fd-d56a23268d63
-        # A2  Breathe      168s ecbc7c9b-e79d-4ec8-ac77-44e4a7f7f1b8
-```
-
-### Recording (track) search
-
-```python
-# Use Lucene field syntax to filter by artist
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/recording/"
-    "?query=bohemian+rhapsody+AND+artist:queen&fmt=json&limit=5",
-    headers=UA
-))
-print(resp['count'])  # 419
-for r in resp['recordings']:
-    dur_s = r['length'] // 1000 if r.get('length') else None
-    artists = [ac['artist']['name'] for ac in r.get('artist-credit', []) if isinstance(ac, dict)]
-    releases = r.get('releases', [])
-    print(r['id'], r['title'], dur_s, artists, releases[0]['title'] if releases else None)
-# a4803b45  Bohemian Rhapsody  130s  ['Queen']  Rhapsody in Red
-# 40212eb6  Bohemian Rhapsody  338s  ['Queen']  1986-07: Wembley Stadium
-```
-
-### Release-group search (deduplicated albums)
-
-```python
-# Use release-group endpoint to avoid getting every regional edition
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/release-group/"
-    "?query=release-group:\"A+Night+at+the+Opera\"+AND+artist:queen&fmt=json&limit=5",
-    headers=UA
-))
-# resp keys: count, release-groups
-for rg in resp.get('release-groups', []):
-    print(rg['id'], rg['title'], rg.get('primary-type'), rg.get('first-release-date'), rg['score'])
-# 6b47c9a0  A Night at the Opera  Album  1975-11-21  100
-
-# Browse release-groups for an artist
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/release-group/"
-    "?artist=0383dadf-2a4e-4d10-a46a-e9e041da8eb3&fmt=json&limit=25",
-    headers=UA
-))
-print(resp['release-group-count'])  # 412
-for rg in resp.get('release-groups', []):
-    print(rg['title'], rg.get('primary-type'), rg.get('first-release-date'))
-```
-
-### Label and work lookups
-
-```python
-# Label search
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/label/?query=EMI&fmt=json&limit=3",
-    headers=UA
-))
-for l in resp['labels']:
-    print(l['id'], l['name'], l.get('type'), l.get('country'), l['score'])
-# c029628b  EMI  Original Production  GB  100
-
-# Work (song composition — author-level, not performance-level)
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/work/?query=bohemian+rhapsody&fmt=json&limit=3",
-    headers=UA
-))
-for w in resp['works']:
-    print(w['id'], w['title'], w.get('type'), w['score'])
-# 41c94a08  Bohemian Rhapsody  Song  100
-```
-
-### Cover Art Archive
-
-```python
-# Get cover art for a release MBID
-# 404 if no artwork has been uploaded for that release
-def get_cover_art(release_mbid, size="500"):
-    """
-    size: '250', '500', '1200', or 'full' (original file)
-    Returns the front cover URL, or None if no artwork exists.
-    """
-    try:
-        resp = json.loads(http_get(
-            f"https://coverartarchive.org/release/{release_mbid}",
-            headers=UA
-        ))
-    except Exception:
-        return None   # 404 = no art uploaded
-
-    images = resp.get('images', [])
-    # Prefer an image flagged as front=True
-    front = next((img for img in images if img.get('front')), None)
-    img = front or (images[0] if images else None)
-    if not img:
-        return None
-
-    if size == 'full':
-        return img['image']
-    return img['thumbnails'].get(size) or img['thumbnails'].get('large')
-
-# Thumbnail sizes confirmed: '250', '500', '1200', 'small' (=250), 'large' (=500)
-
-url = get_cover_art("b84ee12a-09ef-421b-82de-0441a926375b")
-# http://coverartarchive.org/release/b84ee12a.../1611507818-500.jpg
-
-# Full images response structure
-resp = json.loads(http_get(
-    "https://coverartarchive.org/release/b84ee12a-09ef-421b-82de-0441a926375b",
-    headers=UA
-))
-for img in resp['images']:
-    print(img.get('types'))   # ['Front'], ['Back'], ['Liner'], ['Poster'], ['Medium'], ['Sticker'], ['Other']
-    print(img.get('front'))   # True only for front=True flagged images (not all 'Front' types)
-    print(img.get('approved'))# True/False
-    print(img['image'])       # full resolution URL
-    print(img['thumbnails'])  # {'small': '...-250.jpg', 'large': '...-500.jpg', '250': ..., '500': ..., '1200': ...}
-```
-
-### Lucene query syntax for search
-
-All search endpoints support Lucene field queries:
-
-```python
-# Field search: artist:, type:, country:, tag:, release:, date:
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/artist/"
-    "?query=artist:queen+AND+type:group+AND+country:GB&fmt=json&limit=5",
-    headers=UA
-))
-# count: 23 (exact matches only)
-
-# Phrase search with quotes
-resp = json.loads(http_get(
-    "https://musicbrainz.org/ws/2/release/"
-    '?query=release:"A+Night+at+the+Opera"+AND+artist:queen&fmt=json&limit=5',
-    headers=UA
-))
-```
-
-Common Lucene field names per entity:
-- artist: `artist:`, `type:`, `country:`, `tag:`, `begin:`, `end:`
-- release: `release:`, `artist:`, `date:`, `country:`, `status:`, `label:`, `barcode:`
-- recording: `recording:`, `artist:`, `release:`, `dur:` (milliseconds), `tnum:` (track number)
-- release-group: `release-group:`, `artist:`, `primarytype:`, `secondarytype:`
-
-### Parallel fetching
-
-```python
-from concurrent.futures import ThreadPoolExecutor
-
-UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
-
-def fetch_artist(mbid):
-    resp = json.loads(http_get(
-        f"https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags&fmt=json",
-        headers=UA
-    ))
-    tags = [t['name'] for t in sorted(resp.get('tags', []), key=lambda x: x['count'], reverse=True)[:3]]
-    return {"name": resp['name'], "type": resp.get('type'), "tags": tags}
-
-mbids = [
-    "0383dadf-2a4e-4d10-a46a-e9e041da8eb3",  # Queen
-    "83d91898-7763-47d7-b03b-b92132375c47",  # Pink Floyd
-    "678d88b2-87b0-403b-b63d-5da7465aecc3",  # Led Zeppelin
-]
-
-with ThreadPoolExecutor(max_workers=3) as ex:
-    results = list(ex.map(fetch_artist, mbids))
-# 3 artists fetched in ~0.79s total
-```
-
-Tested: 5-6 rapid sequential requests all succeed. Parallel requests at 3x concurrency succeed. Real 429s (rate-limit blocks) are only hit at very high burst rates; if you do get a 429, add `time.sleep(1)` between requests.
-
-### Pagination
-
-```python
-import time
-
-UA = {"User-Agent": "browser-harness/1.0 (your@email.com)"}
-
-def browse_all_releases(artist_mbid, page_size=25):
-    """Fetch all releases for an artist across multiple pages."""
-    offset = 0
-    total = None
-    releases = []
-    while total is None or offset < total:
-        resp = json.loads(http_get(
-            f"https://musicbrainz.org/ws/2/release/"
-            f"?artist={artist_mbid}&fmt=json&limit={page_size}&offset={offset}",
-            headers=UA
-        ))
-        total = resp['release-count']
-        batch = resp['releases']
-        releases.extend(batch)
-        offset += len(batch)
-        if offset < total:
-            time.sleep(1)  # stay within 1 req/s for sequential pagination
-    return releases
-
-# Queen has 1635 releases — use release-groups (412) to get deduplicated albums
-```
-
----
-
-## `inc=` parameter reference
-
-Stack multiple `inc=` values with `+` between them.
-
-**Artist lookup** (`/ws/2/artist/{mbid}`):
-- `releases` — list of releases (max ~25)
-- `release-groups` — list of release groups (max ~25)
-- `recordings` — list of recordings (max ~25)
-- `works` — list of works
-- `tags` — community genre tags (name + vote count)
-- `ratings` — community rating (value 0-5, votes-count)
-- `aliases` — alternative names and transliterations
-- `annotation` — free-text editorial note
-- `artist-rels`, `release-rels`, `recording-rels`, `work-rels` — relationship data
-
-**Release lookup** (`/ws/2/release/{mbid}`):
-- `artists` — full artist-credit objects
-- `recordings` — track list with recording links (populates `media[].tracks[].recording`)
-- `labels` — label-info with catalog numbers
-- `release-groups` — the release group this belongs to
-- `artist-credits` — expanded artist credit with joinphrase
-- `media` — disc/format info (always included in lookup, not needed in `inc=`)
-
----
-
-## Response shapes cheat sheet
-
-```
-# MBID format: standard UUID v4
-"0383dadf-2a4e-4d10-a46a-e9e041da8eb3"
-
-# Search response (artist/recording/release/release-group/label/work)
-{
-  "count": 1612,          # total matches
-  "offset": 0,
-  "<entity-plural>": [...] # e.g. "artists", "releases", "recordings", "release-groups"
-}
-
-# Browse response (using ?artist=MBID or ?label=MBID style)
-{
-  "release-count": 1635,  # note: key name changes per entity
-  "release-offset": 0,    # e.g. "release-group-count", "recording-count"
-  "releases": [...]
-}
-
-# Recording length is always milliseconds
-recording['length'] // 1000  # => seconds
-
-# Artist life-span
-life_span = artist['life-span']
-# {'begin': '1970-06-27', 'end': None, 'ended': True}
-# 'ended': True with 'end': None means end date unknown but band is inactive
-
-# Artist credit joinphrase (for multi-artist tracks)
-# [{"name": "Simon", "artist": {...}, "joinphrase": " & "}, {"name": "Garfunkel", ...}]
-```
-
----
-
-## URL patterns
-
-| Resource | URL |
-|---|---|
-| Artist search | `https://musicbrainz.org/ws/2/artist/?query={q}&fmt=json&limit=5` |
-| Artist by MBID | `https://musicbrainz.org/ws/2/artist/{mbid}?inc=tags+ratings&fmt=json` |
-| Browse releases by artist | `https://musicbrainz.org/ws/2/release/?artist={mbid}&fmt=json&limit=25&offset=0` |
-| Release search | `https://musicbrainz.org/ws/2/release/?query={q}&fmt=json&limit=5` |
-| Release by MBID | `https://musicbrainz.org/ws/2/release/{mbid}?inc=artists+recordings+labels&fmt=json` |
-| Release-group browse | `https://musicbrainz.org/ws/2/release-group/?artist={mbid}&fmt=json&limit=25` |
-| Recording search | `https://musicbrainz.org/ws/2/recording/?query={q}&fmt=json&limit=5` |
-| Label search | `https://musicbrainz.org/ws/2/label/?query={q}&fmt=json&limit=5` |
-| Work search | `https://musicbrainz.org/ws/2/work/?query={q}&fmt=json&limit=5` |
-| Cover art | `https://coverartarchive.org/release/{release-mbid}` |
-
-MusicBrainz entity browser URL (human-readable): `https://musicbrainz.org/artist/{mbid}` (replace `artist` with `release`, `recording`, etc.)
-
----
-
-## Gotchas
-
-- **`User-Agent` is mandatory** — without it you get HTTP 403 instantly. The header must include contact info, e.g. `browser-harness/1.0 (you@example.com)`. The default `http_get` UA (`Mozilla/5.0`) also gets 403.
-
-- **Browse vs search response keys differ** — Search responses use `count` and `offset`; Browse responses (with `?artist=MBID`) use `release-count` / `release-offset` (or `release-group-count` etc.). Accessing `data['count']` on a browse response throws `KeyError`.
-
-- **`releases` include in artist lookup caps at ~25** — Use the browse endpoint (`?artist=MBID`) with pagination for complete lists. Queen has 1,635 releases total; the `inc=releases` on the artist endpoint only returns ~25.
-
-- **Use release-groups to avoid edition explosion** — A popular album can have hundreds of release entries (every country's pressing, every remaster, every format). Use `/ws/2/release-group/` to get one entry per "album concept". Queen's "A Night at the Opera" has 75+ release entries but 1 release-group.
-
-- **Recording length is milliseconds** — `recording['length']` is in milliseconds, not seconds. Divide by 1000.
-
-- **Sort-name differs from display name for persons** — Artists have both `name` (display: "David Bowie") and `sort-name` (alphabetical: "Bowie, David"). Groups usually have identical values.
-
-- **Disambiguation in parentheses** — When multiple entities share a name, MusicBrainz adds a `disambiguation` field to distinguish them (e.g. `"English singer-songwriter"` vs a different David Bowie). Always check `a.get('disambiguation', '')` when resolving artist identity.
-
-- **Score 100 does not mean unique** — Search returns `score: 100` for multiple results when several equally match the query. "dark side of the moon" returns 6 results all scored 100 — they're different regional pressings. Filter by `date`, `country`, or `status` to narrow down.
-
-- **Recording search: plain query matches titles AND artists broadly** — `?query=bohemian+rhapsody+queen` matches *cover versions* first because "queen" appears in the artist or title of other recordings. Use `AND artist:queen` Lucene syntax to restrict to Queen performances.
-
-- **Cover Art Archive returns 404 for releases with no uploaded art** — Check `release['cover-art-archive']['artwork']` (boolean) from any release browse/search response before hitting the CAA endpoint. Saves an extra HTTP round-trip.
-
-- **Cover art `front=True` flag vs `types=['Front']`** — A release can have multiple images typed as 'Front' but only one (or none) flagged `front: true`. Always filter on `img.get('front') == True` for the canonical cover, not on `img.get('types') == ['Front']`.
-
-- **CAA thumbnail key names** — Both string keys `'small'` (250px) and `'large'` (500px) exist as aliases alongside numeric string keys `'250'`, `'500'`, `'1200'`. Access as `img['thumbnails']['500']` or `img['thumbnails']['large']` — both work.
-
-- **Rate limit: 1 req/s unauthenticated** — In practice, bursts of 5-6 sequential requests succeed without throttling. True 429s appear at higher rates. For sequential pagination loops, add `time.sleep(1)` between pages. For parallel fetching, limit concurrency to 3-5 workers.
-
-- **`fmt=json` required** — Omitting it returns XML instead of JSON. Always append `&fmt=json` to every request.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/nasa/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/nasa/scraping.md
deleted file mode 100644
index 575c198c2..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/nasa/scraping.md
+++ /dev/null
@@ -1,339 +0,0 @@
-# NASA APIs — Scraping & Data Extraction
-
-`https://api.nasa.gov` — open NASA data APIs. **Never use the browser.** All endpoints return JSON via `http_get`. DEMO_KEY works for low-volume use; register for a free personal key at https://api.nasa.gov/ to raise limits.
-
-## Do this first
-
-**All `api.nasa.gov` endpoints share the same rate-limit pool under DEMO_KEY. EPIC and Exoplanet Archive are on separate domains with no rate limit.**
-
-```python
-import json
-from helpers import http_get
-
-# Simplest call: today's Astronomy Picture of the Day
-apod = json.loads(http_get("https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY"))
-print(apod['date'], apod['title'], apod['media_type'])
-# Confirmed output (2026-04-18): 2026-04-18 PanSTARRS and Planets image
-```
-
-Use DEMO_KEY for exploration. Switch to a personal key for any bulk work — DEMO_KEY hits its limit at ~10 req/hour/IP (daily budget around 50; `retry-after` header will show ~22 hours when exhausted).
-
-## Rate limits
-
-| Key type | Limit | Resets |
-|---|---|---|
-| `DEMO_KEY` | 10 req/hour, ~50/day per IP | Hourly window; daily hard stop with `retry-after` ~22h |
-| Personal key (free) | 1,000 req/hour | Hourly window |
-
-Rate limit headers on every `api.nasa.gov` response:
-- `X-Ratelimit-Limit` — your current window limit (e.g. `10`)
-- `X-Ratelimit-Remaining` — calls left this window
-- `Retry-After` — seconds until next window (only on 429)
-
-**EPIC (`epic.gsfc.nasa.gov`) and Exoplanet Archive (`exoplanetarchive.ipac.caltech.edu`) share no rate-limit pool with `api.nasa.gov`.**
-
-## Common workflows
-
-### APOD — single day
-
-```python
-import json
-from helpers import http_get
-
-apod = json.loads(http_get("https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY"))
-print(apod['date'])        # '2026-04-18'
-print(apod['title'])       # 'PanSTARRS and Planets'
-print(apod['media_type'])  # 'image' or 'video'
-print(apod['url'])         # full-res or YouTube embed URL
-print(apod['hdurl'])       # HD image URL (absent when media_type='video')
-print(apod.get('copyright'))  # None if public domain
-# Confirmed output (2026-04-18):
-# url:   https://apod.nasa.gov/apod/image/2604/PanstarrsPlanetsPerrotLab1024.jpg
-# hdurl: https://apod.nasa.gov/apod/image/2604/PanstarrsPlanetsPerrot.jpg
-# copyright: Luc Perrot
-```
-
-### APOD — date range (array response)
-
-```python
-import json
-from helpers import http_get
-
-apods = json.loads(http_get(
-    "https://api.nasa.gov/planetary/apod"
-    "?start_date=2024-01-01&end_date=2024-01-07&api_key=DEMO_KEY"
-))
-# Returns a list of 7 dicts — same schema as single-day response
-for a in apods:
-    print(a['date'], a['media_type'], a['title'][:50])
-# Confirmed output (7 items):
-# 2024-01-01 image NGC 1232: A Grand Design Spiral Galaxy
-# 2024-01-02 image Rocket Transits Rippling Moon
-# 2024-01-03 image A SAR Arc from New Zealand
-# 2024-01-04 image Zeta Oph: Runaway Star
-# 2024-01-05 image Trapezium: At the Heart of Orion
-# 2024-01-06 video The Snows of Churyumov-Gerasimenko
-# 2024-01-07 image The Cat's Eye Nebula in Optical and X-ray
-```
-
-Optional params: `date=YYYY-MM-DD` (specific day), `count=N` (N random entries), `thumbs=true` (include `thumbnail_url` for video entries).
-
-### APOD — random sample
-
-```python
-import json
-from helpers import http_get
-
-apods = json.loads(http_get(
-    "https://api.nasa.gov/planetary/apod?count=5&api_key=DEMO_KEY"
-))
-for a in apods:
-    print(a['date'], a['title'][:40])
-# Returns 5 random APOD entries — dates can be any day since 1995-06-16
-```
-
-### NEO — Near Earth Objects feed
-
-```python
-import json
-from helpers import http_get
-
-data = json.loads(http_get(
-    "https://api.nasa.gov/neo/rest/v1/feed"
-    "?start_date=2024-01-01&end_date=2024-01-02&api_key=DEMO_KEY"
-))
-print(data['element_count'])   # 32 (total NEOs across both days)
-neos = data['near_earth_objects']  # dict keyed by date string
-for date, objects in sorted(neos.items()):
-    for neo in objects:
-        ca = neo['close_approach_data'][0]
-        print(
-            neo['name'],
-            'hazardous:', neo['is_potentially_hazardous_asteroid'],
-            'miss km:', ca['miss_distance']['kilometers'][:12],
-            'vel kph:', ca['relative_velocity']['kilometers_per_hour'][:10]
-        )
-# Confirmed output (2 days, 32 total NEOs):
-# 415949 (2001 XY10) hazardous: False miss km: 50452409.34 vel kph: 57205.8951
-# (22+ more objects per day)
-```
-
-NEO object fields:
-- `id`, `name`, `nasa_jpl_url` — identity
-- `estimated_diameter` — dict with `kilometers`, `meters`, `miles`, `feet` sub-dicts, each with `min`/`max`
-- `is_potentially_hazardous_asteroid` — bool
-- `close_approach_data[0]` — `close_approach_date`, `miss_distance` (au/lunar/km/mi), `relative_velocity` (km/s, km/h, mph), `orbiting_body`
-
-Date range is capped at **7 days per request**. For longer ranges, paginate with `start_date` / `end_date` in 7-day steps. `links.next` in the response gives the next 7-day window URL.
-
-### NEO — single asteroid lookup
-
-```python
-import json
-from helpers import http_get
-
-# Asteroid ID comes from the feed's `id` field
-neo = json.loads(http_get(
-    "https://api.nasa.gov/neo/rest/v1/neo/2415949?api_key=DEMO_KEY"
-))
-print(neo['name'])
-print(neo['orbital_data']['orbit_class']['orbit_class_description'])
-# Full orbital history + all close approaches are in `close_approach_data` (long list)
-```
-
-### Mars Rover photos — Curiosity by sol
-
-```python
-import json
-from helpers import http_get
-
-# sol = Martian solar day since landing
-data = json.loads(http_get(
-    "https://api.nasa.gov/mars-photos/api/v1/rovers/curiosity/photos"
-    "?sol=1000&api_key=DEMO_KEY"
-))
-photos = data['photos']
-print(f"Photos on sol 1000: {len(photos)}")
-p = photos[0]
-print(p['earth_date'])          # '2015-05-30'
-print(p['img_src'])             # direct JPEG URL
-print(p['camera']['name'])      # 'FHAZ'
-print(p['camera']['full_name']) # 'Front Hazard Avoidance Camera'
-print(p['rover']['name'])       # 'Curiosity'
-print(p['rover']['status'])     # 'active'
-print(p['rover']['max_sol'])    # highest sol with photos
-
-# Filter by camera
-data = json.loads(http_get(
-    "https://api.nasa.gov/mars-photos/api/v1/rovers/curiosity/photos"
-    "?sol=1000&camera=navcam&api_key=DEMO_KEY"
-))
-```
-
-Available cameras for Curiosity: `fhaz`, `rhaz`, `mast`, `chemcam`, `mahli`, `mardi`, `navcam`. Other rovers: `opportunity`, `spirit`, `perseverance`.
-
-Use `latest_photos` to get the most recent available:
-```python
-data = json.loads(http_get(
-    "https://api.nasa.gov/mars-photos/api/v1/rovers/curiosity/latest_photos"
-    "?api_key=DEMO_KEY"
-))
-photos = data['latest_photos']
-```
-
-Add `&page=N` for pagination (25 photos/page by default).
-
-### EPIC — Earth Polychromatic Imaging Camera
-
-EPIC images are served from `epic.gsfc.nasa.gov` — **no `api_key` required, no rate limit.**
-
-```python
-import json
-from helpers import http_get
-
-# Latest available images (natural color)
-images = json.loads(http_get("https://epic.gsfc.nasa.gov/api/natural"))
-print(f"Latest batch: {len(images)} images")  # Confirmed: 4 images on 2026-04-18
-
-img = images[0]
-print(img['identifier'])               # '20260416162050'
-print(img['image'])                    # 'epic_1b_20260416162050'
-print(img['date'])                     # '2026-04-16 16:16:01'
-print(img['centroid_coordinates'])     # {'lat': 13.25, 'lon': -75.59}
-
-# Construct PNG URL from image name + date
-date_str = img['date'].split(' ')[0]   # '2026-04-16'
-year, month, day = date_str.split('-')
-png_url = f"https://epic.gsfc.nasa.gov/archive/natural/{year}/{month}/{day}/png/{img['image']}.png"
-jpg_thumb = f"https://epic.gsfc.nasa.gov/archive/natural/{year}/{month}/{day}/thumbs/{img['image']}.jpg"
-print(png_url)
-# Confirmed: https://epic.gsfc.nasa.gov/archive/natural/2026/04/16/png/epic_1b_20260416162050.png
-```
-
-```python
-# Images for a specific date
-images = json.loads(http_get("https://epic.gsfc.nasa.gov/api/natural/date/2024-01-15"))
-print(len(images))   # 14 images on 2024-01-15
-
-# Enhanced (color-corrected) images — same API, different path
-enhanced = json.loads(http_get("https://epic.gsfc.nasa.gov/api/enhanced/date/2024-01-15"))
-# Enhanced image URL pattern uses 'enhanced' and 'epic_RGB_' prefix:
-img = enhanced[0]
-date_str = img['date'].split(' ')[0]
-year, month, day = date_str.split('-')
-url = f"https://epic.gsfc.nasa.gov/archive/enhanced/{year}/{month}/{day}/png/{img['image']}.png"
-# e.g. .../archive/enhanced/2024/01/15/png/epic_RGB_20240115005515.png
-
-# All available dates
-all_dates = json.loads(http_get("https://epic.gsfc.nasa.gov/api/natural/all"))
-print(f"Available dates: {len(all_dates)}")  # 3477 dates (2015-06-13 to present)
-print(all_dates[0])   # {'date': '2026-04-16'}  (newest first)
-print(all_dates[-1])  # {'date': '2015-06-13'}  (oldest)
-```
-
-### Exoplanet Archive — TAP/ADQL queries
-
-No API key or rate limit. SQL-like ADQL queries over the full archive.
-
-```python
-import json
-from helpers import http_get
-
-# Short-period planets with known radii
-planets = json.loads(http_get(
-    "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
-    "?query=select+pl_name,hostname,pl_orbper+from+ps+where+pl_orbper+%3C+10"
-    "&format=json"
-))
-print(f"Rows: {len(planets)}")      # 17675 (table 'ps' includes duplicate measurements)
-print(planets[0])
-# {'pl_name': 'GJ 1214 b', 'hostname': 'GJ 1214', 'pl_orbper': 1.58040482}
-```
-
-```python
-# Use 'pscomppars' for one row per planet (composite best-estimate params)
-planets = json.loads(http_get(
-    "https://exoplanetarchive.ipac.caltech.edu/TAP/sync"
-    "?query=select+pl_name,hostname,disc_year,discoverymethod,pl_orbper,pl_rade,pl_masse,pl_eqt,sy_dist"
-    "+from+pscomppars+where+disc_year+%3E+2020+and+pl_rade+is+not+null"
-    "+order+by+disc_year+desc"
-    "&format=json&maxrec=5"
-))
-for p in planets:
-    print(p['pl_name'], p['disc_year'], p['discoverymethod'], f"r={p['pl_rade']}Re")
-# Confirmed output:
-# KMT-2024-BLG-1870L b 2026 Microlensing r=13.8Re
-# LHS 1903 b 2026 Transit r=1.382Re
-# TOI-375 d 2026 Radial Velocity r=13.6Re
-```
-
-Key tables:
-- `ps` — all measurements per planet (multiple rows per planet, all sources)
-- `pscomppars` — one row per confirmed planet (best composite parameters)
-
-Key columns: `pl_name`, `hostname`, `disc_year`, `discoverymethod`, `pl_orbper` (orbital period, days), `pl_rade` (radius in Earth radii), `pl_masse` (mass in Earth masses), `pl_eqt` (equilibrium temp K), `sy_dist` (distance in parsec).
-
-URL-encode operators: `<` = `%3C`, `>` = `%3E`, spaces = `+`.
-
-## URL reference
-
-### api.nasa.gov endpoints
-
-| Endpoint | URL pattern |
-|---|---|
-| APOD today | `https://api.nasa.gov/planetary/apod?api_key=KEY` |
-| APOD by date | `...&date=YYYY-MM-DD` |
-| APOD range | `...&start_date=YYYY-MM-DD&end_date=YYYY-MM-DD` |
-| APOD random N | `...&count=N` |
-| NEO feed | `https://api.nasa.gov/neo/rest/v1/feed?start_date=...&end_date=...&api_key=KEY` |
-| NEO by ID | `https://api.nasa.gov/neo/rest/v1/neo/{id}?api_key=KEY` |
-| Mars photos by sol | `https://api.nasa.gov/mars-photos/api/v1/rovers/{rover}/photos?sol=N&api_key=KEY` |
-| Mars photos by date | `...?earth_date=YYYY-MM-DD&api_key=KEY` |
-| Mars latest | `https://api.nasa.gov/mars-photos/api/v1/rovers/{rover}/latest_photos?api_key=KEY` |
-
-### EPIC (epic.gsfc.nasa.gov — no key, no rate limit)
-
-| Endpoint | URL |
-|---|---|
-| Latest natural images | `https://epic.gsfc.nasa.gov/api/natural` |
-| Natural by date | `https://epic.gsfc.nasa.gov/api/natural/date/YYYY-MM-DD` |
-| Enhanced latest | `https://epic.gsfc.nasa.gov/api/enhanced` |
-| Enhanced by date | `https://epic.gsfc.nasa.gov/api/enhanced/date/YYYY-MM-DD` |
-| All available dates | `https://epic.gsfc.nasa.gov/api/natural/all` |
-| PNG image | `https://epic.gsfc.nasa.gov/archive/natural/YYYY/MM/DD/png/{image}.png` |
-| Thumbnail (JPEG) | `https://epic.gsfc.nasa.gov/archive/natural/YYYY/MM/DD/thumbs/{image}.jpg` |
-| Enhanced PNG | `https://epic.gsfc.nasa.gov/archive/enhanced/YYYY/MM/DD/png/{image}.png` |
-
-### Exoplanet Archive (no key, no rate limit)
-
-```
-https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=<ADQL>&format=json&maxrec=<N>
-```
-
-## Gotchas
-
-- **DEMO_KEY limit is effectively 10/hour per IP, not 30** — The `X-Ratelimit-Limit` header shows `10` in practice. When the daily budget (~50 req) is exhausted, the `retry-after` header is set to ~80,000 seconds (about 22 hours). Register a free personal key at https://api.nasa.gov/ to get 1,000/hour.
-
-- **All `api.nasa.gov` paths share one rate-limit pool** — APOD, NEO, Mars Rover, and all other `api.nasa.gov` paths draw from the same DEMO_KEY bucket. Calling any one of them depletes the limit for all others.
-
-- **EPIC and Exoplanet Archive are fully free** — `epic.gsfc.nasa.gov` returns no rate-limit headers and is not throttled. `exoplanetarchive.ipac.caltech.edu/TAP/sync` is similarly unrestricted. Use these freely without fear of exhausting DEMO_KEY.
-
-- **NEO date range max is 7 days** — Requests spanning more than 7 days return HTTP 400. Paginate with 7-day windows and use `links.next` from the response to get the next URL.
-
-- **APOD earliest date is 1995-06-16** — Requesting `date` before `1995-06-16` returns HTTP 400 with an error message. No upper date bound other than today.
-
-- **APOD `hdurl` is absent for video entries** — When `media_type` is `video`, the response has `url` (a YouTube embed URL) but no `hdurl`. Always check `media_type` before accessing `hdurl`.
-
-- **Mars Rover `sol` vs `earth_date`** — Both are valid filter params. `sol` is the Martian solar day since rover landing. `earth_date` uses `YYYY-MM-DD`. You cannot mix them in one request.
-
-- **Mars Rover pagination defaults to 25 photos/page** — Large sols (Curiosity sol 1000 has many photos) require `&page=2`, `&page=3`, etc. There is no total count in the response; keep paginating until you get an empty `photos` list.
-
-- **EPIC image name encodes type in the prefix** — Natural images use `epic_1b_` prefix; enhanced color-corrected images use `epic_RGB_` prefix. The API returns the correct filename in `img['image']`; don't guess the prefix.
-
-- **EPIC `/api/natural/all` returns newest-first** — The list of 3,477+ available dates starts from today and goes back to 2015-06-13. Not all days have images (gaps during spacecraft maintenance).
-
-- **Exoplanet `ps` table has multiple rows per planet** — Different publications report different measurements for the same planet. Use `pscomppars` for one-row-per-planet composite parameters. `ps` is useful when you need all reported values or want to filter by specific reference.
-
-- **Exoplanet null values come back as `None` in JSON** — Many fields like `pl_masse` are `null` for planets without mass measurements. Always guard with `if row['pl_masse'] is not None`.
-
-- **`http_get` in helpers.py uses stdlib `urllib`** — On some macOS Python 3.11 installs, SSL certificate verification fails (`CERTIFICATE_VERIFY_FAILED`). If you hit this, run `curl` via `subprocess` as a fallback, or install certifi and patch the default SSL context. The harness's browser CDP connection is not affected; only `http_get` is.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/news-aggregation/multi-source.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/news-aggregation/multi-source.md
deleted file mode 100644
index 0132775de..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/news-aggregation/multi-source.md
+++ /dev/null
@@ -1,205 +0,0 @@
-# News Aggregation — Multi-Source
-
-Field-tested against TechCrunch, The Verge, Ars Technica, BBC, Guardian, Wired, NPR, HN, Reuters, CNN, NYT (2026-04-18).
-
-## Lead with RSS — fastest and most reliable
-
-For every site that has a feed, `http_get` + XML parsing is faster and more reliable than a browser. Use `ThreadPoolExecutor` for parallel fetches.
-
-**Confirmed working RSS feeds (tested):**
-
-| Source | Feed URL | Format | Items | Fetch time |
-|--------|----------|--------|-------|------------|
-| TechCrunch | `https://techcrunch.com/feed/` | RSS 2.0 | 20 | ~0.08s |
-| Ars Technica | `https://feeds.arstechnica.com/arstechnica/index` | RSS 2.0 | 20 | ~0.10s |
-| BBC News | `http://feeds.bbci.co.uk/news/rss.xml` | RSS 2.0 | 37 | ~0.23s |
-| The Guardian (World) | `https://www.theguardian.com/world/rss` | RSS 2.0 | 45 | ~0.11s |
-| The Guardian (Tech) | `https://www.theguardian.com/technology/rss` | RSS 2.0 | 32 | ~0.25s |
-| Wired | `https://www.wired.com/feed/rss` | RSS 2.0 | 50 | ~0.10s |
-| NPR Top Stories | `https://feeds.npr.org/1001/rss.xml` | RSS 2.0 | 10 | ~0.14s |
-| Hacker News | `https://news.ycombinator.com/rss` | RSS 2.0 | 30 | ~0.16s |
-| CNN Top Stories | `http://rss.cnn.com/rss/cnn_topstories.rss` | RSS 2.0 | 69 | ~0.25s |
-| NYT Homepage | `https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml` | RSS 2.0 | 23 | ~0.12s |
-| The Verge | `https://www.theverge.com/rss/index.xml` | **Atom** | 10 | ~0.15s |
-
-## Parallel fetch pattern (4.3x speedup measured)
-
-Sequential fetch of 7 feeds: **0.70s**. Parallel fetch of same 7 feeds: **0.16s** (4.3x speedup).
-
-```python
-from concurrent.futures import ThreadPoolExecutor
-import xml.etree.ElementTree as ET
-
-RSS_FEEDS = [
-    ("TechCrunch",     "https://techcrunch.com/feed/"),
-    ("Ars Technica",   "https://feeds.arstechnica.com/arstechnica/index"),
-    ("BBC News",       "http://feeds.bbci.co.uk/news/rss.xml"),
-    ("Guardian World", "https://www.theguardian.com/world/rss"),
-    ("Wired",          "https://www.wired.com/feed/rss"),
-    ("NPR",            "https://feeds.npr.org/1001/rss.xml"),
-    ("Wired",          "https://www.wired.com/feed/rss"),
-]
-
-def fetch_rss(name_url):
-    name, url = name_url
-    xml_data = http_get(url)
-    root = ET.fromstring(xml_data)
-    items = root.findall('.//item')
-    return name, items
-
-with ThreadPoolExecutor(max_workers=len(RSS_FEEDS)) as ex:
-    results = list(ex.map(fetch_rss, RSS_FEEDS))
-
-for name, items in results:
-    for item in items[:5]:
-        title = item.find('title').text
-        link  = item.find('link').text
-        print(f"[{name}] {title}")
-```
-
-## The Verge requires Atom namespace parsing
-
-The Verge's feed is Atom format, not RSS 2.0. The naive `.//item` selector returns 0 items. The `title` element uses `type="html"` attribute but its `.text` still contains the plain string.
-
-```python
-import xml.etree.ElementTree as ET
-
-NS = {'atom': 'http://www.w3.org/2005/Atom'}
-
-xml_data = http_get("https://www.theverge.com/rss/index.xml")
-root = ET.fromstring(xml_data)
-entries = root.findall('.//atom:entry', NS)   # 10 entries
-
-for e in entries:
-    title = e.find('atom:title', NS).text
-    link  = e.find('atom:link', NS).get('href')
-    print(title, link)
-```
-
-Do NOT use `root.findall('.//{http://www.w3.org/2005/Atom}entry')` with a bare namespace — the explicit `NS` dict approach above is cleaner. Do NOT call `.text` on a `find()` result without checking for `None` first (the naive RSS path hit this on The Verge).
-
-## Sites that block http_get entirely
-
-**Reuters** returns HTTP 403/Forbidden for all `http_get` calls, even with a real browser `User-Agent` header. Use browser fallback (see below).
-
-```
-Reuters: ERROR HTTP Error 401: HTTP Forbidden   # with AND without User-Agent
-```
-
-Reuters's old RSS feeds (`feeds.reuters.com/reuters/topNews`) resolve to DNS NXDOMAIN — they have been shut down.
-
-## Sites that work fine with http_get + User-Agent
-
-NYT, Guardian, HN, CNN all return full HTML via `http_get` without issues. The User-Agent header (`Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36`) is not required for these but doesn't hurt.
-
-```python
-headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
-html = http_get("https://www.nytimes.com", headers=headers)  # 1.1MB, works
-html = http_get("https://news.ycombinator.com")               # 34KB, works without UA
-```
-
-**HN parsing via regex (no HTML parser needed):**
-```python
-import re
-html = http_get("https://news.ycombinator.com")
-stories = re.findall(r'class="titleline"><a href="([^"]+)"[^>]*>([^<]+)<', html)
-# Returns list of (url, title) tuples — 30 stories on the front page
-```
-
-## Browser extraction — use when RSS is unavailable
-
-### BBC (`bbc.com/news`)
-
-No consent banner in headless browser (US region served; GDPR banner only appears for EU IP). Articles use `article h2` selectors.
-
-```python
-goto_url("https://www.bbc.com/news")
-wait_for_load()
-wait(2)
-
-headlines = js("""
-  Array.from(document.querySelectorAll('article h2'))
-    .map(h => ({
-      title: h.innerText.trim(),
-      url: h.closest('a')?.href || h.closest('[href]')?.href || 
-           h.parentElement.querySelector('a')?.href
-    }))
-    .filter(h => h.title.length > 10)
-""")
-# Returns 50+ articles. First one is typically LIVE/breaking.
-```
-
-If running from a EU IP and a consent banner appears:
-```python
-accept = js("""
-  var btns = Array.from(document.querySelectorAll('button'));
-  var btn = btns.find(b => /accept all|agree|continue/i.test(b.innerText));
-  if (btn) { btn.click(); return 'clicked: ' + btn.innerText; }
-  return 'no banner';
-""")
-```
-
-Confirmed: `h3` elements on BBC are site-chrome labels ("The BBC is in multiple languages"), NOT article headlines. Use `article h2` only.
-
-### TechCrunch (`techcrunch.com`)
-
-`article` and `.post-block` selectors return 0 results — TechCrunch changed their layout. Articles are in `h3` elements.
-
-```python
-goto_url("https://techcrunch.com")
-wait_for_load()
-wait(2)
-
-articles = js("""
-  Array.from(document.querySelectorAll('h3'))
-    .map(h => ({
-      title: h.innerText?.trim(),
-      url: h.closest('a')?.href || h.querySelector('a')?.href || 
-           h.parentElement?.querySelector('a')?.href
-    }))
-    .filter(a => a.title && a.title.length > 20)
-""")
-# Returns ~10 articles. RSS is preferred (20 items, no JS required).
-```
-
-RSS is almost always faster for TechCrunch: **0.08s vs 3-5s browser** load. Only fall back to browser if you need paywall/subscriber content.
-
-### Reuters (`reuters.com`)
-
-`http_get` returns 403. Browser loads but the homepage is heavily JS-rendered with delayed hydration. `h3` selectors only return nav elements after standard `wait_for_load()`. Use `wait(3)` plus scroll:
-
-```python
-goto_url("https://www.reuters.com")
-wait_for_load()
-wait(3)
-js("window.scrollTo(0, 500)")
-wait(1)
-# Category links work for topic nav:
-links = js("""
-  Array.from(document.querySelectorAll('a[href*="/world/"], a[href*="/technology/"]'))
-    .filter(a => a.innerText.trim().length > 20)
-    .map(a => ({text: a.innerText.trim(), href: a.href}))
-""")
-```
-
-Reuters headlines are best obtained from the Guardian or AP — Reuters no longer has a public RSS and their JS hydration is slow.
-
-## Decision tree: which approach to use
-
-```
-Does the site have an RSS/Atom feed?
-  YES → http_get + XML parse (fastest, ~0.1s per feed)
-         - RSS 2.0: root.findall('.//item')
-         - Atom:    root.findall('.//atom:entry', {'atom': 'http://www.w3.org/2005/Atom'})
-  NO  → Does http_get return valid HTML (not 403/401/JS shell)?
-          YES → http_get + regex/BeautifulSoup (fast, ~0.2-0.3s)
-          NO  → goto + wait_for_load + wait(2) + js() extraction (slow, 3-8s)
-```
-
-## What to skip
-
-- **Reuters RSS** — DNS dead (`feeds.reuters.com` is NXDOMAIN)
-- **Reuters http_get** — returns 403 regardless of User-Agent
-- **TechCrunch `article`/`.post-block` selectors** — layout changed, use `h3` instead
-- **BBC `h3` for headlines** — those are site-chrome labels; use `article h2`
-- **The Verge `.//item` selector** — feed is Atom, not RSS; use Atom namespace
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/open-library/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/open-library/scraping.md
deleted file mode 100644
index 97371a615..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/open-library/scraping.md
+++ /dev/null
@@ -1,472 +0,0 @@
-# Open Library — Book Data Extraction
-
-`https://openlibrary.org` — Internet Archive's free book catalog. All endpoints are public JSON APIs — no auth, no browser, no scraping required.
-
-## Do this first
-
-**Every task is a direct HTTP call — never open the browser.**
-
-```python
-import json
-from helpers import http_get
-
-# Search by title
-results = json.loads(http_get("https://openlibrary.org/search.json?q=dune&limit=5"))
-# results['numFound']  == 49090
-# results['docs']      == list of work objects
-# results['start']     == 0  (offset for pagination)
-```
-
-The search API is your entry point for everything. It returns work-level records (all editions grouped). To get edition details, follow the `key` to the Works or Books API.
-
----
-
-## Common workflows
-
-### Search by query, author, title, or ISBN
-
-```python
-import json
-from helpers import http_get
-
-# Free-text search
-r = json.loads(http_get("https://openlibrary.org/search.json?q=dune+frank+herbert&limit=5"))
-
-# Author search
-r = json.loads(http_get(
-    "https://openlibrary.org/search.json?author=tolkien&limit=5"
-    "&fields=title,author_name,first_publish_year,isbn"
-))
-# fields=* returns all available fields; default returns ~15
-
-# Title + author combined
-r = json.loads(http_get(
-    "https://openlibrary.org/search.json?title=dune&author=frank+herbert&limit=3"
-    "&fields=title,author_name,edition_count,first_publish_year"
-))
-# r['docs'][0]['title']              == 'Dune'
-# r['docs'][0]['author_name']        == ['Frank Herbert']
-# r['docs'][0]['first_publish_year'] == 1965
-# r['docs'][0]['edition_count']      == 120
-
-# ISBN lookup (returns 0–2 results for the same work)
-r = json.loads(http_get("https://openlibrary.org/search.json?isbn=9780743273565"))
-# r['numFound']            == 2
-# r['docs'][0]['title']    == 'The Great Gatsby'
-# r['docs'][0]['key']      == '/works/OL468431W'
-```
-
-**Sort options** (`&sort=`): `new` (recently added), `old`, `random`, `editions` (most editions), `scans` (most scans). Default is relevance.
-
-**Language filter**: `&language=fre` (ISO 639-2/B codes: `eng`, `fre`, `ger`, `spa`, `ita`, etc.)
-
-**Pagination**: `&limit=N&offset=N`. Max limit not enforced but keep under 100 for reliability.
-
-#### Search doc fields (default — ~15 keys always present)
-
-| Field | Type | Notes |
-|---|---|---|
-| `key` | str | `/works/OL893415W` — use for Works API |
-| `title` | str | Work title |
-| `author_name` | list[str] | e.g. `['Frank Herbert']` |
-| `author_key` | list[str] | e.g. `['OL79034A']` |
-| `first_publish_year` | int | |
-| `edition_count` | int | Number of editions across all languages |
-| `cover_i` | int | Cover image ID — use with covers API |
-| `cover_edition_key` | str | e.g. `OL7353617M` |
-| `language` | list[str] | ISO codes of all editions |
-| `ia` | list[str] | Internet Archive identifiers (when has_fulltext=true) |
-| `ebook_access` | str | `'public'`, `'borrowable'`, `'no_ebook'` |
-| `has_fulltext` | bool | |
-
-#### Extra fields with `&fields=*`
-
-```python
-# With fields=* you also get:
-# 'isbn'                   list[str]   All ISBNs across editions
-# 'publisher'              list[str]   All publishers ever
-# 'publish_date'           list[str]   All publish dates (strings, inconsistent formats)
-# 'publish_year'           list[int]   Parsed years
-# 'subject'                list[str]   Subject headings
-# 'person'                 list[str]   Subject persons (e.g. 'Big Brother')
-# 'place'                  list[str]   Subject places
-# 'time'                   list[str]   Subject times
-# 'number_of_pages_median' int         Median page count across editions
-# 'ratings_average'        float       e.g. 4.29
-# 'ratings_count'          int
-# 'want_to_read_count'     int
-# 'already_read_count'     int
-# 'currently_reading_count'int
-# 'readinglog_count'       int         Total of all reading log entries
-# 'first_sentence'         list[str]
-# 'id_goodreads'           list[str]
-# 'id_librarything'        list[str]
-# 'id_wikidata'            list[str]
-# 'ddc'                    list[str]   Dewey Decimal Classification
-# 'lcc'                    list[str]   Library of Congress Classification
-# 'lccn'                   list[str]
-```
-
----
-
-### Bulk ISBN lookups (parallel)
-
-```python
-import json
-from helpers import http_get
-from concurrent.futures import ThreadPoolExecutor
-
-isbns = ['9780743273565', '9780451524935', '9780618346257']
-
-def lookup_isbn(isbn):
-    url = f"https://openlibrary.org/search.json?isbn={isbn}&fields=title,author_name,first_publish_year,key"
-    r = json.loads(http_get(url))
-    if r['docs']:
-        d = r['docs'][0]
-        return {'isbn': isbn, 'title': d.get('title'), 'author': d.get('author_name', [None])[0],
-                'year': d.get('first_publish_year'), 'key': d.get('key')}
-    return {'isbn': isbn, 'found': False}
-
-with ThreadPoolExecutor(max_workers=5) as ex:
-    books = list(ex.map(lookup_isbn, isbns))
-
-# [{'isbn': '9780743273565', 'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'year': 1920, ...},
-#  {'isbn': '9780451524935', 'title': 'Nineteen Eighty-Four', 'author': 'George Orwell',      'year': 1949, ...},
-#  {'isbn': '9780618346257', 'title': 'The Fellowship of the Ring', 'author': 'J.R.R. Tolkien', 'year': 1954, ...}]
-```
-
----
-
-### Works API (editions grouped by title)
-
-Returns all metadata for a work (all editions combined). Get the work ID from `key` in search results.
-
-```python
-import json
-from helpers import http_get
-
-work_id = 'OL893415W'  # from search doc['key'] = '/works/OL893415W'
-work = json.loads(http_get(f"https://openlibrary.org/works/{work_id}.json"))
-
-# work['title']           == 'Dune'
-# work['key']             == '/works/OL893415W'
-# work['covers']          == [11481354, 12375564, 11157826]  ← cover IDs for covers API
-# work['subjects']        == ['Dune (Imaginary place)', 'Fiction', ...]
-# work['subject_places']  == [...]   ← geographic subjects (may be absent)
-# work['subject_people']  == [...]   ← person subjects (may be absent)
-# work['subject_times']   == [...]   ← time subjects (may be absent)
-# work['authors']         == [{'author': {'key': '/authors/OL79034A'}, 'type': {...}}]
-# work['description']     → either str OR {'type': '/type/text', 'value': str}  ← see gotchas
-# work['created']         == {'type': '/type/datetime', 'value': '2009-10-15T11:34:21.437031'}
-# work['last_modified']   same shape as created
-```
-
-Helper for the description field (which has two possible shapes):
-
-```python
-def get_description(work: dict) -> str:
-    desc = work.get('description', '')
-    if isinstance(desc, dict):
-        return desc.get('value', '')
-    return desc or ''
-```
-
-#### Works editions (paginated list of all editions)
-
-```python
-editions_resp = json.loads(http_get(
-    f"https://openlibrary.org/works/{work_id}/editions.json?limit=10&offset=0"
-))
-# editions_resp['size']    == 120      (total edition count)
-# editions_resp['entries'] == [...]    (up to limit items)
-# editions_resp['links']   == {'self': '...', 'work': '...', 'next': '...', 'prev': '...'}
-# ← use links['next'] for pagination when offset+limit < size
-
-e = editions_resp['entries'][0]
-# e['title']           == 'Duna'
-# e['publishers']      == ['Editora Aleph']
-# e['publish_date']    == '19/08/2017'   ← inconsistent format, string
-# e['isbn_13']         == ['9788576573135']
-# e['isbn_10']         == ['857657313X']
-# e['covers']          == [10368109]
-# e['number_of_pages'] == 680
-# e['languages']       == [{'key': '/languages/por'}]
-# e['key']             == '/books/OL28969075M'
-# e['physical_format'] == 'Paperback'   (often missing)
-# e['notes']           → str or {'value': str}  (often missing)
-```
-
----
-
-### Books API (specific edition)
-
-Two sub-APIs: direct JSON for raw data, or `api/books` for enriched data.
-
-#### Direct edition JSON
-
-```python
-import json
-from helpers import http_get
-
-edition_id = 'OL7353617M'  # from editions list e['key'] or cover_edition_key in search
-edition = json.loads(http_get(f"https://openlibrary.org/books/{edition_id}.json"))
-
-# edition['title']           == 'Fantastic Mr. Fox'
-# edition['publishers']      == ['Puffin']
-# edition['publish_date']    == 'October 1, 1988'
-# edition['isbn_13']         == ['9780140328721']
-# edition['isbn_10']         == ['0140328726']
-# edition['number_of_pages'] == 96
-# edition['covers']          == [...]   ← cover IDs
-# edition['languages']       == [{'key': '/languages/eng'}]
-# edition['works']           == [{'key': '/works/OL45804W'}]
-# edition['authors']         == [{'key': '/authors/OL34184A'}]
-# edition['identifiers']     == {'goodreads': [...], 'librarything': [...]}
-# edition['first_sentence']  == {'value': '...'} or str  (often missing)
-# edition['ocaid']           == 'fantast00dahl'  ← Internet Archive ID (if available)
-```
-
-#### Bibkeys API (enriched, multiple books at once)
-
-```python
-# jscmd=data: cleaned up dict with cover URLs pre-built
-r = json.loads(http_get(
-    "https://openlibrary.org/api/books"
-    "?bibkeys=ISBN:9780743273565,ISBN:9780451524935"
-    "&format=json&jscmd=data"
-))
-# r == {'ISBN:9780743273565': {...}, 'ISBN:9780451524935': {...}}
-
-book = r['ISBN:9780743273565']
-# book['title']           == 'The Great Gatsby'
-# book['authors']         == [{'url': '...', 'name': 'F. Scott Fitzgerald'}]
-# book['publish_date']    == '2021'
-# book['publishers']      == [{'name': 'Independently Published'}]
-# book['number_of_pages'] == 208
-# book['url']             == 'http://openlibrary.org/books/OL46773254M/The_Great_Gatsby'
-# book['key']             == '/books/OL46773254M'
-# book['cover']           == {'small': '...S.jpg', 'medium': '...M.jpg', 'large': '...L.jpg'}
-# book['identifiers']     == {'isbn_13': [...], 'openlibrary': [...]}
-# book['subjects']        == [{'name': 'Modern fiction', 'url': '...'}, ...]
-# book['subject_places']  == None  ← often null even with jscmd=data
-
-# jscmd=details: raw edition JSON + extra fields
-r2 = json.loads(http_get(
-    "https://openlibrary.org/api/books"
-    "?bibkeys=ISBN:9780743273565&format=json&jscmd=details"
-))
-item = r2['ISBN:9780743273565']
-# item['bib_key']      == 'ISBN:9780743273565'
-# item['info_url']     == 'http://openlibrary.org/books/OL...'
-# item['preview']      == 'noview' | 'restricted' | 'full'
-# item['preview_url']  == URL to read on OL or IA
-# item['thumbnail_url']== 'https://covers.openlibrary.org/b/id/14314120-S.jpg'
-# item['details']      → raw edition JSON (same as /books/OL...M.json)
-
-# Supported bibkey prefixes: ISBN:, OCLC:, LCCN:, OLID: (e.g. OLID:OL46773254M)
-```
-
----
-
-### Authors API
-
-```python
-import json
-from helpers import http_get
-
-# Lookup by known author key
-author = json.loads(http_get("https://openlibrary.org/authors/OL26320A.json"))
-# OL26320A is J.R.R. Tolkien (note: not Frank Herbert as originally stated — verify with search)
-
-# author['name']           == 'J.R.R. Tolkien'
-# author['fuller_name']    == 'John Ronald Reuel Tolkien'
-# author['personal_name']  == 'J. R. R. Tolkien'
-# author['birth_date']     == '3 January 1892'   ← string, not parsed
-# author['death_date']     == '2 September 1973'
-# author['bio']            → str or {'type': '/type/text', 'value': str}
-# author['photos']         == [6155606, 6433524, ...]   ← photo IDs for covers API
-# author['links']          == [{'title': '...', 'url': '...', 'type': {...}}, ...]
-# author['remote_ids']     == {'wikidata': 'Q892', 'viaf': '95218067', ...}
-# author['alternate_names']== ['J. R. R. Tolkien', 'TOLKIEN', ...]
-# author['key']            == '/authors/OL26320A'
-# author['wikipedia']      → URL string or None
-
-# Author works (paginated)
-works = json.loads(http_get("https://openlibrary.org/authors/OL26320A/works.json?limit=5"))
-# works['size']    == 415
-# works['entries'] == [{title, key, covers, authors, created, ...}, ...]
-# works['links']   == {'self': '...', 'next': '...'}
-```
-
-#### Author search
-
-```python
-r = json.loads(http_get("https://openlibrary.org/search/authors.json?q=tolkien"))
-# r['numFound'] == 40
-# r['docs'][0]:
-#   'name'                   == 'Christopher Tolkien'
-#   'key'                    == 'OL2623360A'     ← NOTE: no /authors/ prefix here
-#   'birth_date'             == '21 November 1924'
-#   'death_date'             == '16 January 2020'
-#   'top_work'               == 'The War of the Ring'
-#   'work_count'             == 43
-#   'top_subjects'           == [...]
-#   'alternate_names'        == [...]
-#   'ratings_average'        float
-#   'want_to_read_count'     int
-#   'already_read_count'     int
-#   'currently_reading_count'int
-```
-
----
-
-### Cover images
-
-Covers are served directly as JPEG — redirect to Internet Archive CDN. No auth needed.
-
-```
-# Book covers — three key types:
-https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg      # by cover ID (most reliable)
-https://covers.openlibrary.org/b/isbn/{isbn}-{size}.jpg        # by ISBN
-https://covers.openlibrary.org/b/olid/{edition_id}-{size}.jpg  # by edition OLID (unreliable — see gotchas)
-
-# Author photos:
-https://covers.openlibrary.org/a/id/{photo_id}-{size}.jpg
-
-# Sizes: S (small), M (medium), L (large)
-```
-
-```python
-import urllib.request
-
-def get_cover_bytes(cover_id: int, size: str = 'M') -> bytes | None:
-    """Fetch cover image bytes. Returns None if no cover (43-byte GIF placeholder)."""
-    url = f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
-    req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
-    with urllib.request.urlopen(req, timeout=15) as resp:
-        data = resp.read()
-    return None if len(data) == 43 else data   # 43-byte GIF = no cover placeholder
-
-# Or just get the URL for embedding:
-def cover_url(cover_id: int, size: str = 'M') -> str:
-    return f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
-
-# Usage:
-from helpers import http_get
-import json
-work = json.loads(http_get("https://openlibrary.org/works/OL893415W.json"))
-if work.get('covers'):
-    img = get_cover_bytes(work['covers'][0], 'L')   # first cover, large
-    # img is ~20–80KB JPEG bytes, redirected from ia*.archive.org
-```
-
-To get cover by ISBN directly (e.g. for UI without a full book lookup):
-```python
-# Medium-size cover by ISBN:
-url = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg"
-# Redirects to Internet Archive CDN, content-type: image/jpeg
-# Use ?default=false to get 404 instead of 1×1 GIF placeholder for missing covers
-url_safe = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg?default=false"
-```
-
----
-
-### Subjects API
-
-```python
-import json
-from helpers import http_get
-
-# Subject slugs: lowercase, underscores for spaces
-r = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5"))
-# r['name']          == 'science fiction'
-# r['subject_type']  == 'subject'   ← also: 'person', 'place', 'time'
-# r['work_count']    == 20973
-# r['works']         == [{title, key, cover_id, authors, edition_count, ...}, ...]
-
-w = r['works'][0]
-# w['title']           == 'Alice\'s Adventures in Wonderland'
-# w['key']             == '/works/OL138052W'
-# w['cover_id']        == 10527843
-# w['cover_edition_key']== 'OL...'
-# w['authors']         == [{'key': '/authors/OL22098A', 'name': 'Lewis Carroll'}]
-# w['edition_count']   == 3546
-# w['first_publish_year']== ...
-# w['has_fulltext']    == True | False
-# w['ia']              == 'identifier'   (Internet Archive ID when available)
-
-# Pagination: &offset=N
-# Place subject:
-r2 = json.loads(http_get("https://openlibrary.org/subjects/place:london.json?limit=5"))
-# r2['subject_type'] == 'place', r2['work_count'] == 23927
-
-# Person subject:
-# https://openlibrary.org/subjects/person:napoleon.json?limit=5
-# Time subject:
-# https://openlibrary.org/subjects/time:middle_ages.json?limit=5
-
-# Combine with ebooks=true to filter to only freely readable books:
-r3 = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5&ebooks=true"))
-# r3['works'][i]['has_fulltext'] == True for all results
-```
-
----
-
-### Trending books
-
-```python
-import json
-from helpers import http_get
-
-for period in ['daily', 'weekly', 'monthly']:
-    r = json.loads(http_get(f"https://openlibrary.org/trending/{period}.json?limit=10"))
-    # r['works']  == list of search-doc-style objects
-    # r['days']   == int (time window)
-    # r['hours']  == int
-    # Same fields as search docs (title, author_name, cover_i, key, ...)
-    print(period, r['works'][0]['title'])  # e.g. 'Atomic Habits'
-```
-
----
-
-## Rate limits
-
-No authentication required. No API key. No explicit rate limit published.
-
-Observed in testing: 5 requests completed in ~1 second with no throttling, no 429s. The API is served from CDN/Solr — in practice you can make 10–20 parallel requests without issue. For bulk operations (hundreds of ISBNs), use `ThreadPoolExecutor(max_workers=5)` to be a good citizen.
-
-**No `User-Agent` override needed** — the default `Mozilla/5.0` from `http_get` is accepted by all Open Library endpoints (unlike Nominatim which blocks it).
-
----
-
-## Gotchas
-
-**`description` field has two shapes.** Both are real — check at runtime:
-```python
-desc = work.get('description', '')
-text = desc.get('value', '') if isinstance(desc, dict) else (desc or '')
-```
-
-**`/works/OL45804W` is Fantastic Mr. Fox, not Dune.** The OL IDs in the original prompt were placeholders. Always resolve real IDs via the search API rather than hardcoding them.
-
-**Author search `key` has no prefix.** `/search/authors.json` returns `key: 'OL26320A'`, but the Authors API and all other APIs use `/authors/OL26320A`. Add the prefix manually when constructing follow-up URLs.
-
-**Missing cover → 43-byte GIF placeholder, not 404.** Without `?default=false`, the covers API returns a 1×1 transparent GIF instead of HTTP 404 for unknown IDs. Check `len(data) == 43` to detect missing covers.
-
-**`covers.openlibrary.org/b/olid/{work_id}` is unreliable.** OLID-based cover URLs for work IDs (OL...W) return the placeholder even when covers exist. Always use `b/id/{cover_id}` (from `work['covers'][0]`) or `b/isbn/{isbn}` instead.
-
-**Bibkeys API picks one edition per ISBN.** When the same ISBN appears on multiple editions (reprint, reissue), `api/books?bibkeys=ISBN:...` returns one — and it may not be the most common edition.
-
-**`publish_date` is a raw string.** Values like `'October 1, 1988'`, `'19/08/2017'`, `'2021'`, and `'1965-01-01'` all appear. Don't parse without normalization.
-
-**`/works/.../editions.json` pagination uses `links.next`.** Unlike search (which uses `offset=`), check `links['next']` in the response to know if more pages exist:
-```python
-resp = json.loads(http_get("https://openlibrary.org/works/OL893415W/editions.json?limit=50"))
-while 'next' in resp.get('links', {}):
-    resp = json.loads(http_get("https://openlibrary.org" + resp['links']['next']))
-    # process resp['entries']
-```
-
-**404 for non-existent IDs.** `/works/OL99999999W.json`, `/books/OL99999999M.json`, and `/authors/OL99999999A.json` all raise `HTTPError: HTTP Error 404: Not Found`. Wrap in try/except.
-
-**Search `docs` default fields are minimal.** The default response includes ~15 fields. Add `&fields=*` to get all 100+ Solr fields (ratings, ISBNs, publishers, subjects, Goodreads IDs, etc.). Alternatively specify exactly what you need: `&fields=title,isbn,ratings_average`.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/openalex/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/openalex/scraping.md
deleted file mode 100644
index f8a759e86..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/openalex/scraping.md
+++ /dev/null
@@ -1,470 +0,0 @@
-# OpenAlex — Scraping & Data Extraction
-
-`https://api.openalex.org` — open academic knowledge graph covering 260M+ works, 90M+ authors, 110K+ institutions. **Never use the browser for OpenAlex.** The entire API is JSON over HTTPS, completely free, no API key required. Add `mailto=your@email.com` to every request to use the polite pool (10 req/s vs 100 req/s limit, more reliable).
-
-## Do this first
-
-**Use `http_get` with the REST JSON API — one call, JSON response, no auth, no parsing library.**
-
-```python
-from helpers import http_get
-import json
-
-data = json.loads(http_get(
-    "https://api.openalex.org/works?search=transformer+attention&per-page=5&mailto=you@example.com"
-))
-works = data["results"]
-total = data["meta"]["count"]
-```
-
-Always include `mailto=` to stay in the polite pool. Always parse with `json.loads()`.
-
-## Common workflows
-
-### Search papers (works)
-
-```python
-from helpers import http_get
-import json
-
-data = json.loads(http_get(
-    "https://api.openalex.org/works"
-    "?search=transformer+attention"
-    "&per-page=5"
-    "&sort=cited_by_count:desc"
-    "&select=id,doi,display_name,publication_year,cited_by_count,open_access,primary_location"
-    "&mailto=you@example.com"
-))
-print("total matching:", data["meta"]["count"])
-for w in data["results"]:
-    oa   = w["open_access"]
-    loc  = w["primary_location"] or {}
-    src  = loc.get("source") or {}
-    print(w["id"].split("/")[-1], w["publication_year"], w["cited_by_count"], w["display_name"][:60])
-    print("  doi:", w["doi"])
-    print("  open access:", oa["is_oa"], "| pdf:", oa["oa_url"])
-    print("  journal:", src.get("display_name"))
-# Confirmed output (2026-04-18):
-# W3151130473 2021 1887 CrossViT: Cross-Attention Multi-Scale Vision Transformer for I
-#   doi: https://doi.org/10.1109/iccv48922.2021.00041
-#   open access: False | pdf: None
-#   journal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
-```
-
-### Fetch single paper by OpenAlex ID or DOI
-
-```python
-from helpers import http_get
-import json
-
-# By OpenAlex ID (bare or full URL form both work)
-w = json.loads(http_get("https://api.openalex.org/works/W2626778328?mailto=you@example.com"))
-print(w["display_name"], w["cited_by_count"])
-# Confirmed: Attention Is All You Need 6526
-
-# By DOI (pass the full DOI URL as the entity ID)
-w = json.loads(http_get(
-    "https://api.openalex.org/works/https://doi.org/10.1038/nature14539?mailto=you@example.com"
-))
-print(w["display_name"], w["cited_by_count"])
-# Confirmed: Deep learning 79790
-```
-
-### Reconstruct abstract from inverted index
-
-OpenAlex does not return abstracts as plain strings — they come as an inverted index (`{word: [position, ...], ...}`) due to publisher agreements. Reconstruct as follows:
-
-```python
-from helpers import http_get
-import json
-
-w = json.loads(http_get(
-    "https://api.openalex.org/works/W2626778328"
-    "?select=id,display_name,abstract_inverted_index"
-    "&mailto=you@example.com"
-))
-aii = w.get("abstract_inverted_index") or {}
-words_pos = [(pos, word) for word, positions in aii.items() for pos in positions]
-abstract = " ".join(word for _, word in sorted(words_pos))
-print(abstract[:200])
-# Confirmed: The dominant sequence transduction models are based on complex recurrent
-# or convolutional neural networks in an encoder-decoder configuration...
-```
-
-### Author lookup
-
-```python
-from helpers import http_get
-import json
-
-# Search by name
-data = json.loads(http_get(
-    "https://api.openalex.org/authors?search=geoffrey+hinton&per-page=3&mailto=you@example.com"
-))
-for a in data["results"]:
-    bare_id = a["id"].split("/")[-1]    # e.g. A5108093963
-    print(bare_id, a["display_name"], a["works_count"], "works |", a["cited_by_count"], "cites")
-    affils = a.get("affiliations", [])
-    if affils:
-        print("  latest affil:", affils[0]["institution"]["display_name"])
-# Confirmed:
-# A5108093963 Geoffrey E. Hinton 384 works | 446018 cites
-#   latest affil: University of New Brunswick
-
-# Fetch by bare ID
-a = json.loads(http_get("https://api.openalex.org/authors/A5108093963?mailto=you@example.com"))
-print(a["display_name"], a["works_count"])
-# Confirmed: Geoffrey E. Hinton 384
-
-# Get all works by this author (sorted by citations)
-works_data = json.loads(http_get(
-    "https://api.openalex.org/works"
-    "?filter=author.id:A5108093963"
-    "&per-page=5&sort=cited_by_count:desc"
-    "&select=id,display_name,cited_by_count,publication_year"
-    "&mailto=you@example.com"
-))
-for w in works_data["results"]:
-    print(w["publication_year"], w["display_name"][:55], w["cited_by_count"])
-# Confirmed:
-# 2015 Deep learning 79790
-# 2017 ImageNet classification with deep convolutional neural netwo 75670
-# 2008 Visualizing Data using t-SNE 35710
-```
-
-### Institution lookup
-
-```python
-from helpers import http_get
-import json
-
-data = json.loads(http_get(
-    "https://api.openalex.org/institutions?search=MIT&per-page=3&mailto=you@example.com"
-))
-for inst in data["results"]:
-    bare_id = inst["id"].split("/")[-1]     # e.g. I63966007
-    print(bare_id, inst["display_name"], inst["country_code"], inst["works_count"], "works")
-# Confirmed: I63966007 Massachusetts Institute of Technology US 340302 works
-
-# Works from an institution
-works = json.loads(http_get(
-    "https://api.openalex.org/works"
-    "?filter=institutions.id:I63966007"
-    "&per-page=3&sort=cited_by_count:desc"
-    "&select=id,display_name,cited_by_count,publication_year"
-    "&mailto=you@example.com"
-))
-print("total MIT works:", works["meta"]["count"])
-# Confirmed: 323992
-```
-
-### Concept/Topic lookup
-
-Concepts (legacy, level-based hierarchy) and Topics (newer, 4-level hierarchy) are both available.
-
-```python
-from helpers import http_get
-import json
-
-# Concepts endpoint (Wikidata-linked)
-data = json.loads(http_get(
-    "https://api.openalex.org/concepts?search=machine+learning&per-page=5&mailto=you@example.com"
-))
-for c in data["results"]:
-    bare_id = c["id"].split("/")[-1]    # e.g. C119857082
-    print(bare_id, c["display_name"], "level:", c["level"], "works:", c["works_count"])
-# Confirmed: C119857082 Machine learning level: 1 works: 4960536
-
-# Topics endpoint (newer: domain > field > subfield > topic)
-data2 = json.loads(http_get(
-    "https://api.openalex.org/topics?search=machine+learning&per-page=3&mailto=you@example.com"
-))
-for t in data2["results"]:
-    print(t["id"].split("/")[-1], t["display_name"])
-    print("  ", t.get("domain", {}).get("display_name"), ">",
-          t.get("field", {}).get("display_name"), ">",
-          t.get("subfield", {}).get("display_name"))
-# Confirmed: T11948 Machine Learning in Materials Science
-#   Physical Sciences > Materials Science > Materials Chemistry
-```
-
-### Source (journal/venue) lookup
-
-```python
-from helpers import http_get
-import json
-
-data = json.loads(http_get(
-    "https://api.openalex.org/sources?search=nature&per-page=3&mailto=you@example.com"
-))
-for s in data["results"]:
-    bare_id = s["id"].split("/")[-1]    # e.g. S137773608
-    print(bare_id, s["display_name"], s["type"], "issn:", s["issn"], "oa:", s["is_oa"])
-# Confirmed: S137773608 Nature journal issn: ['0028-0836', '1476-4687'] oa: False
-
-# Works in a source
-works = json.loads(http_get(
-    "https://api.openalex.org/works?filter=primary_location.source.id:S137773608"
-    "&per-page=3&sort=cited_by_count:desc"
-    "&select=id,display_name,cited_by_count"
-    "&mailto=you@example.com"
-))
-print("Nature works:", works["meta"]["count"])
-```
-
-### Funder lookup
-
-```python
-from helpers import http_get
-import json
-
-data = json.loads(http_get(
-    "https://api.openalex.org/funders?search=national+science+foundation&per-page=3&mailto=you@example.com"
-))
-for f in data["results"]:
-    bare_id = f["id"].split("/")[-1]    # e.g. F4320306076
-    print(bare_id, f["display_name"], f["country_code"], f["works_count"], "works")
-# Confirmed: F4320306076 National Science Foundation US
-```
-
-### Citation traversal
-
-```python
-from helpers import http_get
-import json
-
-paper_id = "W2626778328"  # Attention Is All You Need
-
-# Papers that CITE this paper (forward citations)
-citing = json.loads(http_get(
-    f"https://api.openalex.org/works?filter=cites:{paper_id}"
-    "&per-page=5&sort=cited_by_count:desc"
-    "&select=id,display_name,publication_year,cited_by_count"
-    "&mailto=you@example.com"
-))
-print("papers citing Attention:", citing["meta"]["count"])
-for w in citing["results"]:
-    print(f"  {w['publication_year']} {w['display_name'][:55]} ({w['cited_by_count']} cites)")
-# Confirmed: 6536 papers cite it; top: AlphaFold2 (43435), ViT (21409)
-
-# Papers THIS paper cites (backward — list of IDs in the work object)
-paper = json.loads(http_get(
-    f"https://api.openalex.org/works/{paper_id}?select=referenced_works&mailto=you@example.com"
-))
-refs = paper.get("referenced_works", [])
-ref_ids = [r.split("/")[-1] for r in refs]     # bare IDs like W1632114991
-print(f"references {len(ref_ids)} works:", ref_ids[:3])
-# Confirmed: references 28 works
-```
-
-### Cursor pagination (bulk harvest)
-
-Use cursor pagination (not page-based) for more than 10,000 results. Page-based fails with HTTP 400 beyond page 50 at per-page=200.
-
-```python
-from helpers import http_get
-import json, urllib.parse
-
-def harvest_works(query_filter, max_results=1000, mailto="you@example.com"):
-    """Yield work dicts using cursor pagination."""
-    cursor = "*"
-    collected = 0
-    while collected < max_results:
-        per_page = min(200, max_results - collected)
-        encoded_cursor = urllib.parse.quote(cursor, safe="")
-        url = (
-            f"https://api.openalex.org/works"
-            f"?filter={query_filter}"
-            f"&per-page={per_page}"
-            f"&cursor={encoded_cursor}"
-            f"&select=id,display_name,publication_year,cited_by_count"
-            f"&mailto={mailto}"
-        )
-        data = json.loads(http_get(url))
-        results = data.get("results", [])
-        if not results:
-            break
-        for w in results:
-            yield w
-        collected += len(results)
-        next_cursor = data["meta"].get("next_cursor")
-        if not next_cursor:
-            break
-        cursor = next_cursor
-
-for w in harvest_works("concepts.id:C119857082,publication_year:2023", max_results=400):
-    print(w["id"].split("/")[-1], w["display_name"][:55])
-```
-
-### Group-by analytics
-
-```python
-from helpers import http_get
-import json
-
-# Publication counts by year for machine learning papers
-data = json.loads(http_get(
-    "https://api.openalex.org/works"
-    "?filter=concepts.id:C119857082"    # C119857082 = Machine learning concept
-    "&group_by=publication_year"
-    "&mailto=you@example.com"
-))
-print("groups_count:", data["meta"]["groups_count"])
-for g in data.get("group_by", [])[:5]:
-    print(f"  {g['key']}: {g['count']:,} works")
-# Confirmed (2026-04-18):
-#   2026: 5,678,538 works
-#   2025: 5,332,194 works
-#   2020: 3,966,880 works
-
-# Other useful group_by fields: open_access.oa_status, type, institutions.country_code
-# authorships.institutions.country_code, primary_location.source.id
-```
-
-## Filter syntax reference
-
-Filters go in the `filter=` param as comma-separated `field:value` pairs. All conditions are AND-ed.
-
-```
-# Exact match
-filter=publication_year:2023
-
-# Full-text search on a field
-filter=title.search:deep+learning
-
-# Combine multiple (AND)
-filter=title.search:CRISPR,publication_year:2022,open_access.is_oa:true
-
-# OR within one field (pipe operator)
-filter=publication_year:2022|2023
-
-# Negation
-filter=publication_year:!2020
-
-# Range
-filter=cited_by_count:>1000
-filter=publication_year:<2010
-filter=cited_by_count:100-500
-
-# Nested field access
-filter=author.id:A5108093963
-filter=institutions.id:I63966007
-filter=concepts.id:C119857082
-filter=primary_location.source.id:S137773608
-filter=open_access.is_oa:true
-filter=cites:W2626778328         # papers citing this work
-```
-
-Commonly useful filter fields for works:
-
-| Filter field | Example | Notes |
-|---|---|---|
-| `title.search` | `title.search:machine+learning` | Full-text on title |
-| `abstract.search` | `abstract.search:attention` | Full-text on abstract |
-| `publication_year` | `publication_year:2023` | Exact year |
-| `from_publication_date` | `from_publication_date:2023-01-01` | Date range start |
-| `to_publication_date` | `to_publication_date:2023-12-31` | Date range end |
-| `cited_by_count` | `cited_by_count:>500` | Range with `>`, `<`, `-` |
-| `open_access.is_oa` | `open_access.is_oa:true` | OA filter |
-| `author.id` | `author.id:A5108093963` | By author OpenAlex ID |
-| `institutions.id` | `institutions.id:I63966007` | By institution ID |
-| `concepts.id` | `concepts.id:C119857082` | By concept ID |
-| `primary_location.source.id` | `primary_location.source.id:S137773608` | By journal/source |
-| `type` | `type:journal-article` | Work type |
-| `language` | `language:en` | ISO 639-1 language code |
-| `cites` | `cites:W2626778328` | Works citing this paper |
-| `doi` | `doi:10.1038/nature14539` | By DOI (no `https://doi.org/` prefix) |
-
-## URL and parameter reference
-
-### API base
-
-```
-https://api.openalex.org/{entity_type}
-```
-
-Entity types: `works`, `authors`, `institutions`, `sources`, `concepts`, `topics`, `funders`, `publishers`
-
-### Query parameters
-
-| Parameter | Example | Notes |
-|---|---|---|
-| `search` | `search=deep+learning` | Full-text relevance search across entity |
-| `filter` | `filter=publication_year:2023` | Structured filters (see above) |
-| `sort` | `sort=cited_by_count:desc` | Sort field + direction; use `relevance_score:desc` with `search` |
-| `per-page` | `per-page=200` | Max 200 per page |
-| `page` | `page=2` | Page number; fails (HTTP 400) if `per-page * page > 10000` |
-| `cursor` | `cursor=*` | Cursor for bulk pagination; `*` = first page |
-| `select` | `select=id,doi,display_name` | Return only these fields (reduces payload) |
-| `group_by` | `group_by=publication_year` | Aggregate counts by field (returns `group_by` array) |
-| `mailto` | `mailto=you@example.com` | **Always include** — enables polite pool |
-
-### Entity ID prefix convention
-
-OpenAlex IDs use a letter prefix on the numeric ID:
-
-| Prefix | Entity | Example |
-|---|---|---|
-| `W` | Work (paper) | `W2626778328` |
-| `A` | Author | `A5108093963` |
-| `I` | Institution | `I63966007` |
-| `S` | Source (journal) | `S137773608` |
-| `C` | Concept | `C119857082` |
-| `T` | Topic | `T11948` |
-| `F` | Funder | `F4320306076` |
-| `P` | Publisher | `P4310319965` |
-
-Full entity URLs: `https://openalex.org/{ID}` (canonical form returned in `id` field).
-Bare ID is always `entity_url.split("/")[-1]`.
-
-### Sort fields
-
-Works: `cited_by_count`, `publication_date`, `relevance_score` (only with `search`), `fwci`
-Authors: `cited_by_count`, `works_count`
-All: append `:desc` or `:asc`
-
-## Rate limits
-
-| Pool | Rate | Daily cap |
-|---|---|---|
-| Polite pool (with `mailto=`) | 10 req/s | 100,000 req/day |
-| Common pool (no `mailto`) | 100 req/s | 100,000 req/day |
-
-- No API key required — the polite pool is opt-in via `mailto=`.
-- Response includes `meta.cost_usd` (typically $0.001 per call).
-- No `Retry-After` header when throttled — just add a short sleep on 429.
-- For bulk harvesting >1,000 results, use cursor pagination + respect the polite pool.
-
-## Gotchas
-
-- **Never use the browser for OpenAlex.** The API returns complete structured JSON for all entity types. No HTML scraping needed.
-
-- **`mailto=` goes in every call, not just once.** It is a query parameter, not a header. There is no session. Omitting it puts you in the common pool (higher contention, less predictable).
-
-- **OpenAlex IDs in the `id` field are full URLs, not bare IDs.** The field returns `https://openalex.org/W2626778328`. Always `.split("/")[-1]` to get the bare `W2626778328` form needed for `filter=cites:`, `filter=author.id:`, etc.
-
-- **DOI lookup uses the full DOI URL as the path parameter.** Correct: `GET /works/https://doi.org/10.1038/nature14539`. Incorrect: `GET /works/10.1038/nature14539` (returns 404).
-
-- **Page-based pagination hard stops at 10,000 results.** `per-page=200&page=51` returns HTTP 400. Use `cursor=*` pagination for harvesting more than 10K results — it has no such limit.
-
-- **`cursor=*` must be URL-encoded on subsequent pages.** The `next_cursor` value contains `+`, `=`, `/` characters. Always `urllib.parse.quote(cursor, safe="")` before interpolating into the URL.
-
-- **`group_by` and `page` are incompatible — use `group_by` without `page`/`cursor`.** Group-by returns a `group_by` list, not `results`. The `per-page` param sets max groups returned (default 200).
-
-- **`abstract_inverted_index` may be `null` for some papers.** Publisher agreements prevent OpenAlex from providing abstracts for many closed-access works. Always check `if aii:` before reconstructing.
-
-- **`select` significantly reduces response size and latency.** A full work object has 50+ fields; specifying `select=id,doi,display_name,cited_by_count` cuts payload by ~90%. Always use `select=` in bulk harvests.
-
-- **`sort=relevance_score:desc` only works with `search=`.** Using it without a `search` param returns results in undefined order. Use `cited_by_count:desc` or `publication_date:desc` for filter-only queries.
-
-- **The `concepts` field is deprecated in favor of `topics`.** Concepts (Wikidata-linked, 5 levels) are still populated and useful, but OpenAlex now recommends `topics` (4-level hierarchy: domain > field > subfield > topic) going forward.
-
-- **`open_access.oa_url` can be `null` even when `is_oa=true`.** Check `best_oa_location.pdf_url` instead — it is more reliably populated when an OA PDF exists.
-
-- **Negation filter syntax is `field:!value`, not `field!=value`.** Example: `filter=publication_year:!2020` excludes 2020.
-
-- **Author disambiguation is imperfect.** The same person may appear as multiple author entities. Use ORCID (`ids.orcid`) when available to cross-reference. The `display_name_alternatives` field lists name variants.
-
-- **The `funders.grants_count` field returns `None` in API responses** despite the docs mentioning it. Use `works_count` and `cited_by_count` instead for funder-level metrics.
-
-- **`per-page` with `cursor=*` ignores `page=`.** When cursor pagination is active, `page` is set to `null` in `meta`. Do not combine cursor + page.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/openstreetmap/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/openstreetmap/scraping.md
deleted file mode 100644
index 8db19da66..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/openstreetmap/scraping.md
+++ /dev/null
@@ -1,490 +0,0 @@
-# OpenStreetMap — Nominatim Geocoding + Overpass API
-
-Two fully public, no-auth APIs. Everything is a direct HTTP call — never need a browser.
-
-- **Nominatim**: geocoding (place name → lat/lon and reverse). Rate limit: 1 req/s.
-- **Overpass API**: spatial query engine over the full OSM dataset. Rate limit: 2 concurrent slots per IP on the public instance.
-
-**Do not use `http_get` without overriding `User-Agent`** — its default `Mozilla/5.0` is blocked by both APIs with HTTP 403. Pass `headers={"User-Agent": "browser-harness/1.0"}` on every call.
-
----
-
-## Fastest path: forward geocode a place
-
-```python
-import json, urllib.parse
-from helpers import http_get
-
-UA = {"User-Agent": "browser-harness/1.0"}
-
-def geocode(query: str, limit: int = 3) -> list[dict]:
-    q = urllib.parse.quote(query)
-    raw = http_get(
-        f"https://nominatim.openstreetmap.org/search?q={q}&format=json&limit={limit}&addressdetails=1",
-        headers=UA
-    )
-    return json.loads(raw)  # [] when nothing found
-
-results = geocode("Eiffel Tower")
-# results[0]['display_name']  == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
-# results[0]['lat']            == '48.8582599'   ← STRING, not float
-# results[0]['lon']            == '2.2945006'    ← STRING, not float
-# results[0]['type']           == 'tower'
-# results[0]['class']          == 'man_made'
-# results[0]['importance']     == 0.6205937724353116
-# results[0]['osm_type']       == 'way'
-# results[0]['osm_id']         == 5013364
-# results[0]['boundingbox']    == ['48.8574753', '48.8590453', '2.2933119', '2.2956897']  ← all strings
-# results[0]['address']['city']     == 'Paris'
-# results[0]['address']['postcode'] == '75007'
-# results[0]['address']['country']  == 'France'
-# results[0]['address']['country_code'] == 'fr'
-```
-
----
-
-## Nominatim: all three query modes
-
-### 1. Forward geocode (free-text)
-
-```python
-import json, urllib.parse
-from helpers import http_get
-
-UA = {"User-Agent": "browser-harness/1.0"}
-
-raw = http_get(
-    "https://nominatim.openstreetmap.org/search?q=Eiffel+Tower&format=json&limit=3&addressdetails=1",
-    headers=UA
-)
-results = json.loads(raw)
-# Returns [] when nothing found — no exception
-
-# Useful optional params:
-# &addressdetails=1   → adds 'address' dict to each result (city, postcode, road, etc.)
-# &extratags=1        → adds 'extratags' dict (website, wikidata, phone, etc.)
-# &namedetails=1      → adds 'namedetails' dict (name:en, name:fr, etc.)
-# &countrycodes=fr,de → restrict to countries (comma-separated ISO 3166-1 alpha-2)
-# &viewbox=2.2,48.8,2.4,48.9 &bounded=1  → restrict to bounding box (lon_min,lat_min,lon_max,lat_max)
-```
-
-### 2. Reverse geocode (lat/lon → address)
-
-```python
-raw = http_get(
-    "https://nominatim.openstreetmap.org/reverse?lat=48.8584&lon=2.2945&format=json",
-    headers=UA
-)
-result = json.loads(raw)
-# result['display_name']  == 'Avenue Gustave Eiffel, Quartier du Gros-Caillou, ..., France'
-# result['address']['road']         == 'Avenue Gustave Eiffel'
-# result['address']['city']         == 'Paris'
-# result['address']['postcode']     == '75007'
-# result['address']['country']      == 'France'
-# result['address']['country_code'] == 'fr'
-# result['address']['state']        == 'Île-de-France'
-# result['lat'], result['lon']  → strings (not floats)
-
-# Optional: &zoom=N (0-18) controls granularity of the returned address
-# zoom=3 → country, zoom=10 → city, zoom=18 → street/building (default)
-```
-
-### 3. Structured search (field-based)
-
-```python
-raw = http_get(
-    "https://nominatim.openstreetmap.org/search?city=Paris&country=France&format=json&limit=1",
-    headers=UA
-)
-result = json.loads(raw)[0]
-# result['name']          == 'Paris'
-# result['lat']           == '48.8534951'
-# result['lon']           == '2.3483915'
-# result['type']          == 'administrative'
-# result['place_rank']    == 12  (lower = broader: 4=country, 8=state, 12=city, 30=POI)
-# result['addresstype']   == 'city'
-# result['boundingbox']   == ['48.8155755', '48.9021560', '2.2241220', '2.4697602']
-
-# Supported structured params: street, city, county, state, country, postalcode
-```
-
-### 4. Lookup by OSM ID
-
-```python
-# Prefix: N=node, W=way, R=relation
-raw = http_get(
-    "https://nominatim.openstreetmap.org/lookup?osm_ids=W5013364&format=json",
-    headers=UA
-)
-result = json.loads(raw)
-# Returns list. Eiffel Tower way: result[0]['name'] == 'Tour Eiffel'
-# Supports up to 50 IDs: osm_ids=W5013364,N123456,R789
-```
-
----
-
-## Nominatim response field reference
-
-| Field | Type | Notes |
-|-------|------|-------|
-| `place_id` | int | Internal Nominatim ID — do not cache long-term, can change |
-| `osm_type` | str | `"node"`, `"way"`, or `"relation"` |
-| `osm_id` | int | The OSM element ID |
-| `lat` | **str** | Latitude as string — convert with `float(r['lat'])` |
-| `lon` | **str** | Longitude as string — convert with `float(r['lon'])` |
-| `display_name` | str | Full human-readable address string |
-| `name` | str | Short name of the place |
-| `type` | str | OSM type tag value: `"tower"`, `"administrative"`, `"restaurant"`, etc. |
-| `class` | str | OSM key: `"man_made"`, `"boundary"`, `"amenity"`, `"highway"`, etc. |
-| `addresstype` | str | Semantic category: `"city"`, `"road"`, `"man_made"`, etc. |
-| `place_rank` | int | Hierarchy rank: 4=country, 8=state, 12=city, 16=suburb, 30=POI |
-| `importance` | float | 0–1 relevance score (higher = more notable) |
-| `boundingbox` | list[str] | `[south_lat, north_lat, west_lon, east_lon]` — all strings, note unusual order |
-| `licence` | str | ODbL attribution string — include in user-facing output |
-| `address` | dict | Only present with `&addressdetails=1` or in reverse results |
-
-`address` dict common keys: `road`, `house_number`, `quarter`, `suburb`, `city_district`, `city`, `state`, `postcode`, `country`, `country_code`, `ISO3166-2-lvl4/6`.
-
----
-
-## Overpass API: query OSM data by tags
-
-Overpass is a read-only query engine over the full OSM planet. It supports finding POIs by tag, radius, bounding box, and combinations.
-
-**Endpoint**: `https://overpass-api.de/api/interpreter`
-**Backup instances** (use when main is overloaded, which happens often):
-- `https://overpass.openstreetmap.fr/api/interpreter` — requires non-Mozilla User-Agent
-
-**http_get works for GET requests** — pass `headers={"User-Agent": "browser-harness/1.0"}`. For POST, use `urllib` directly (see example below).
-
-### GET query (simplest for http_get)
-
-```python
-import json, urllib.parse
-from helpers import http_get
-
-UA = {"User-Agent": "browser-harness/1.0"}
-OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
-
-def overpass_get(query: str) -> dict:
-    url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
-    raw = http_get(url, headers=UA)
-    if not raw.startswith("{"):
-        raise RuntimeError(f"Overpass error (HTML returned): {raw[:200]}")
-    return json.loads(raw)
-
-# Find cafes in central Paris (bbox: south_lat, west_lon, north_lat, east_lon)
-r = overpass_get('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 10;')
-# r['version']    == 0.6
-# r['generator']  == 'Overpass API 0.7.62.7 375dc00a'
-# r['elements']   → list of matching OSM elements
-
-for cafe in r['elements']:
-    print(cafe['tags'].get('name'), cafe['lat'], cafe['lon'])
-# 'Café de l\'Alma' 48.8609068 2.3015143
-# 'Le Campanella'   48.8585847 2.3032822
-# 'Kozy Bosquet'    48.855445  2.3054013
-
-# Find restaurants within 500m radius of a point (around filter)
-r = overpass_get(
-    '[out:json][timeout:25];node["amenity"="restaurant"](around:500,37.7749,-122.4194);out 10;'
-)
-for rest in r['elements']:
-    print(rest['tags'].get('name'), rest['tags'].get('cuisine',''))
-# 'Nepalese Indian Cusine' 'indian;nepali'
-# 'Local Diner' 'coffee_shop;italian;burger;seafood'
-# 'Moya Cafe' ''
-```
-
-### POST query (for complex QL, avoids URL length limits)
-
-```python
-import json, urllib.parse, urllib.request, gzip
-from helpers import http_get  # http_get is GET-only; use urllib for POST
-
-OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
-
-def overpass_post(query: str) -> dict:
-    """POST to Overpass — no URL length limits, preferred for multi-statement QL."""
-    data = urllib.parse.urlencode({"data": query}).encode()
-    req = urllib.request.Request(
-        OVERPASS, data=data, method="POST",
-        headers={
-            "User-Agent": "browser-harness/1.0",
-            "Content-Type": "application/x-www-form-urlencoded",
-            "Accept-Encoding": "gzip",
-        }
-    )
-    with urllib.request.urlopen(req, timeout=30) as r:
-        body = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            body = gzip.decompress(body)
-    body = body.decode()
-    if not body.startswith("{"):
-        raise RuntimeError(f"Overpass error (HTML): {body[:300]}")
-    return json.loads(body)
-
-# Example: cafes in Paris bbox
-r = overpass_post('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 5;')
-print(len(r['elements']))  # 5 (or up to 5)
-```
-
-### Overpass element structure
-
-Every element in `r['elements']` is a dict with at minimum:
-
-```python
-{
-    "type": "node",          # "node", "way", or "relation"
-    "id": 308684349,         # int — OSM element ID (stable, use for dedup)
-    "lat": 48.8609068,       # float — ONLY present for node type
-    "lon": 2.3015143,        # float — ONLY present for node type
-    "tags": {                # dict — all OSM tags on this element
-        "amenity": "cafe",
-        "name": "Café de l'Alma",
-        "name:fr": "Café de l'Alma",
-        "outdoor_seating": "yes",
-        "payment:credit_cards": "yes",
-        "phone": "+33 1 45 51 56 74",
-        "opening_hours": "Mo-Sa 08:00-23:00; Su 09:00-19:00",  # optional
-        "website": "https://...",                               # optional
-        "wheelchair": "yes"                                     # optional
-    }
-}
-```
-
-For `way` elements, use `out center;` to get a `center` dict with lat/lon instead of a node list:
-
-```python
-# way element with out center:
-{
-    "type": "way",
-    "id": 338411946,
-    "center": {"lat": 48.8660087, "lon": 2.3153233},  # centroid of the polygon
-    "nodes": [3454913623, 3454913707, ...],            # node IDs forming the boundary
-    "tags": {"amenity": "cafe", "name": "Café 1902", ...}
-}
-
-# Query to get both nodes and ways with lat/lon:
-query = '[out:json][timeout:25];(node["amenity"="cafe"](48.85,2.29,48.87,2.32);way["amenity"="cafe"](48.85,2.29,48.87,2.32););out center 20;'
-r = overpass_get(query)
-for el in r['elements']:
-    if el['type'] == 'node':
-        lat, lon = el['lat'], el['lon']
-    else:  # way
-        lat, lon = el['center']['lat'], el['center']['lon']
-    print(el['tags'].get('name'), lat, lon)
-```
-
-### Overpass QL quick reference
-
-```
-[out:json][timeout:25]      # Required header: JSON output, 25s timeout
-[maxsize:52428800]          # Optional: 50MB max result size (default is server limit)
-
-node["amenity"="cafe"](south,west,north,east);out N;
-#  ↑ bbox order: south_lat, west_lon, north_lat, east_lon
-#  Note: DIFFERENT from Nominatim's boundingbox field which is [south,north,west,east]
-
-node["amenity"="cafe"](around:RADIUS_METERS,LAT,LON);out N;
-
-node["amenity"~"cafe|restaurant"](bbox);out N;    # regex match on tag value
-node[!"name"](bbox);out N;                        # elements WITHOUT the 'name' tag
-node["name"~"Star",i](bbox);out N;               # case-insensitive regex
-
-# Union of types:
-(node["amenity"="cafe"](bbox); way["amenity"="cafe"](bbox););out center N;
-
-# Multiple tags (AND logic):
-node["amenity"="cafe"]["outdoor_seating"="yes"](bbox);out N;
-```
-
----
-
-## OSM tile server (reference only, no scraping)
-
-```
-https://{a,b,c}.tile.openstreetmap.org/{z}/{x}/{y}.png
-```
-
-- Subdomains `a`, `b`, `c` for load balancing
-- `z` = zoom level 0–19, `x`/`y` = tile coordinates
-- Returns 256×256 PNG tiles
-- Policy: max 2 req/s per IP, non-commercial use, must display OSM attribution
-- Tile coordinate calculator: `https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames`
-- Bulk tile downloading is prohibited — use Overpass or data extracts instead
-
-```python
-# Convert lat/lon to tile coordinates
-import math
-
-def lat_lon_to_tile(lat, lon, zoom):
-    n = 2 ** zoom
-    x = int((lon + 180) / 360 * n)
-    y = int((1 - math.log(math.tan(math.radians(lat)) + 1 / math.cos(math.radians(lat))) / math.pi) / 2 * n)
-    return x, y
-
-x, y = lat_lon_to_tile(48.8582, 2.2945, 14)
-url = f"https://a.tile.openstreetmap.org/14/{x}/{y}.png"
-# url == 'https://a.tile.openstreetmap.org/14/8281/5646.png'
-```
-
----
-
-## Rate limits
-
-| API | Limit | Enforcement | 429 behavior |
-|-----|-------|-------------|--------------|
-| Nominatim | 1 req/s | Soft — rapid requests work but you get delayed/dropped | Returns HTTP 403 if your IP is banned (not 429) |
-| Overpass (main) | 2 concurrent slots per IP | Hard — 3rd concurrent req returns HTML error immediately | HTML error page with `rate_limited` in body |
-| Overpass (main) | Also: query complexity quota | Resets over time (~per hour) | HTML error page with `rate_limited` |
-| Tile server | 2 req/s per IP | Soft/hard | IP block |
-
-**Check your Overpass quota**:
-```python
-raw = http_get("https://overpass-api.de/api/status", headers={"User-Agent": "browser-harness/1.0"})
-print(raw)
-# Connected as: 1728118854
-# Rate limit: 2
-# 2 slots available now.
-# Slot available after: 2026-04-18T11:00:00Z, in 30 seconds.
-```
-
-**Handle rate limiting in production**:
-```python
-import time
-
-def overpass_get_with_retry(query: str, max_retries: int = 3) -> dict:
-    for attempt in range(max_retries):
-        url = f"https://overpass.openstreetmap.fr/api/interpreter?data={urllib.parse.quote(query)}"
-        raw = http_get(url, headers={"User-Agent": "browser-harness/1.0"})
-        if raw.startswith("{"):
-            return json.loads(raw)
-        if "rate_limited" in raw or "too busy" in raw:
-            wait = 2 ** attempt * 10  # 10s, 20s, 40s
-            time.sleep(wait)
-            continue
-        raise RuntimeError(f"Overpass error: {raw[:200]}")
-    raise RuntimeError("Overpass: too many retries")
-```
-
----
-
-## Complete working example
-
-```python
-import json, time, urllib.parse, urllib.request, gzip
-from helpers import http_get
-
-UA = {"User-Agent": "browser-harness/1.0"}
-NOMINATIM = "https://nominatim.openstreetmap.org"
-OVERPASS   = "https://overpass.openstreetmap.fr/api/interpreter"
-
-def geocode(query: str, limit: int = 1) -> list[dict]:
-    """Forward geocode — returns [] if nothing found."""
-    q = urllib.parse.quote(query)
-    raw = http_get(f"{NOMINATIM}/search?q={q}&format=json&limit={limit}&addressdetails=1", headers=UA)
-    return json.loads(raw)
-
-def reverse_geocode(lat: float, lon: float) -> dict:
-    """Reverse geocode — always returns a result (nearest road/place)."""
-    raw = http_get(f"{NOMINATIM}/reverse?lat={lat}&lon={lon}&format=json", headers=UA)
-    return json.loads(raw)
-
-def overpass_get(query: str) -> list[dict]:
-    """Run an Overpass QL query, return elements list."""
-    url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
-    raw = http_get(url, headers=UA)
-    if not raw.startswith("{"):
-        raise RuntimeError(f"Overpass error: {raw[:200]}")
-    return json.loads(raw)["elements"]
-
-def overpass_post(query: str) -> list[dict]:
-    """POST variant — avoids URL length limits for complex queries."""
-    data = urllib.parse.urlencode({"data": query}).encode()
-    req = urllib.request.Request(
-        OVERPASS, data=data, method="POST",
-        headers={"User-Agent": "browser-harness/1.0",
-                 "Content-Type": "application/x-www-form-urlencoded",
-                 "Accept-Encoding": "gzip"}
-    )
-    with urllib.request.urlopen(req, timeout=30) as r:
-        body = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            body = gzip.decompress(body)
-    body = body.decode()
-    if not body.startswith("{"):
-        raise RuntimeError(f"Overpass error: {body[:300]}")
-    return json.loads(body)["elements"]
-
-# --- Usage examples (validated 2026-04-18) ---
-
-# 1. Geocode a landmark
-places = geocode("Eiffel Tower", limit=3)
-# places[0]['lat']  == '48.8582599'  (string)
-# places[0]['lon']  == '2.2945006'   (string)
-# places[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
-# places[0]['address']['city'] == 'Paris'
-lat = float(places[0]['lat'])
-lon = float(places[0]['lon'])
-
-# 2. Reverse geocode the coordinates
-addr = reverse_geocode(lat, lon)
-# addr['address']['road']    == 'Avenue Gustave Eiffel'
-# addr['address']['city']    == 'Paris'
-# addr['address']['postcode']== '75007'
-# addr['address']['country'] == 'France'
-
-# 3. Find nearby cafes (wait 1s between nominatim and overpass if same script)
-time.sleep(1)
-cafes = overpass_get(
-    f"[out:json][timeout:25];node[\"amenity\"=\"cafe\"](around:500,{lat},{lon});out 10;"
-)
-for cafe in cafes:
-    print(f"{cafe['tags'].get('name','?'):30s}  {cafe['lat']:.4f}, {cafe['lon']:.4f}")
-# Café de l'Alma                  48.8609, 2.3015
-# Le Campanella                   48.8586, 2.3033
-
-# 4. Structured city lookup + find restaurants in bounding box
-time.sleep(1)
-paris = geocode("Paris, France")[0]
-bb = paris['boundingbox']  # [south_lat, north_lat, west_lon, east_lon] ← Nominatim order!
-# For Overpass: need (south_lat, west_lon, north_lat, east_lon) ← DIFFERENT order
-south, north, west, east = bb[0], bb[1], bb[2], bb[3]
-# Restrict to center slice to avoid massive result set
-center_bbox = f"48.855,2.295,48.865,2.315"
-rests = overpass_post(
-    f"[out:json][timeout:25];node[\"amenity\"=\"restaurant\"]({center_bbox});out 5;"
-)
-print(f"Found {len(rests)} restaurants near Paris center")
-```
-
----
-
-## Gotchas
-
-**`http_get` default UA (`Mozilla/5.0`) is blocked by both APIs.** Always pass `headers={"User-Agent": "browser-harness/1.0"}`. The `headers` kwarg in `http_get` does a `.update()` so it properly overrides the default. Confirmed: Mozilla/5.0 → 403 on Nominatim; `browser-harness/1.0` → 200.
-
-**Blocked User-Agent patterns on Nominatim**: `Mozilla/5.0`, `python-requests/*`, `Wget/*`. Accepted: any non-generic app-style UA like `browser-harness/1.0`, `MyApp/2.0`, `curl/7.x`. Nominatim policy requires a descriptive UA with contact info, but in practice any non-library string passes.
-
-**Nominatim lat/lon are strings, Overpass lat/lon are floats.** Always convert Nominatim coordinates: `float(result['lat'])`. Overpass element `lat`/`lon` are native Python floats — no conversion needed.
-
-**Nominatim `boundingbox` field order is `[south_lat, north_lat, west_lon, east_lon]` — NOT `[south, west, north, east]`.** Overpass bbox uses `(south_lat, west_lon, north_lat, east_lon)`. When feeding a Nominatim bounding box into Overpass, you must reorder: `f"({bb[0]},{bb[2]},{bb[1]},{bb[3]})"`.
-
-**`overpass-api.de` main instance is frequently overloaded.** Returns HTTP 504 (timeout) or an HTML error page with `rate_limited` when busy. The FR mirror (`overpass.openstreetmap.fr`) is usually more responsive but also blocks `Mozilla/5.0`. Always detect non-JSON responses: `if not raw.startswith("{")`.
-
-**Overpass error responses are HTML, not JSON.** The API returns HTTP 200 with an HTML error page when rate-limited or when the server is too busy. Always check `raw.startswith("{")` before parsing.
-
-**Overpass rate limit: 2 concurrent slots, NOT 2 requests/s.** You can run 2 queries simultaneously. A 3rd concurrent query immediately returns an error. Sequential queries with no sleep between them work fine as long as each completes before the next starts.
-
-**`out N;` limits results to N elements — use it.** Without a limit, large bounding boxes can return thousands of elements and hit the 512MB memory limit, returning a `maxsize` error. Default safe limit: `out 50;` for exploration, `out 500;` for bulk collection.
-
-**Overpass QL bbox order is `(south, west, north, east)` — latitude FIRST.** This is the opposite of the standard GeoJSON convention `[west, south, east, north]`. The `around:` filter uses `(around:METERS,LAT,LON)` — note lat before lon.
-
-**`name` tag in Overpass is the local-language name.** For Paris cafes this is French. English names may appear under `name:en` but are often absent. Never assume `name` is in English.
-
-**Nominatim `/reverse` always returns the nearest result** — it never returns an empty response (unlike `/search`). If the coordinates are in the ocean, it still returns the nearest coastline or country.
-
-**`place_id` is internal and ephemeral** — do not store it for long-term use. Use `osm_type` + `osm_id` for stable references (e.g., `way/5013364` for the Eiffel Tower).
-
-**Overpass `http_get` POST workaround**: `http_get` only supports GET. For POST requests (needed to avoid URL length limits for complex multi-statement QL), use `urllib.request.Request` directly as shown in the `overpass_post()` example above.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/package-registries/npm-pypi.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/package-registries/npm-pypi.md
deleted file mode 100644
index 21eca6b7a..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/package-registries/npm-pypi.md
+++ /dev/null
@@ -1,478 +0,0 @@
-# npm & PyPI — Package Registry Data Extraction
-
-`https://registry.npmjs.org` · `https://api.npmjs.org` · `https://pypi.org` · `https://pypistats.org`
-
-Both registries expose full JSON APIs with no auth required. Never use a browser — every data point is available over HTTP.
-
-Tested 2026-04-18 with `uv run python` + `http_get`.
-
----
-
-## Latency reference (measured)
-
-| Endpoint | Latency |
-|----------|---------|
-| PyPI package JSON | ~80ms |
-| npm downloads point | ~110ms |
-| npm registry full doc (react = 6.3MB) | ~280ms |
-| npm registry search | ~330ms |
-| pypistats.org recent | ~480ms |
-
----
-
-## npm Registry
-
-### Package metadata
-
-Two endpoints — pick based on what you need:
-
-**Full registry document** — includes all version history, time map, author, bugs, homepage, keywords, README (when present). Large for popular packages (react = 6.3MB).
-
-```python
-import json
-data = json.loads(http_get("https://registry.npmjs.org/react"))
-
-# Top-level keys: _id, name, dist-tags, versions, time, bugs, author,
-#                 license, homepage, keywords, repository, description,
-#                 contributors, maintainers, readme, readmeFilename, users
-print(data['name'])                          # 'react'
-print(data['dist-tags']['latest'])           # '19.2.5'
-print(data['time']['created'])               # '2011-10-26T17:46:21.942Z'
-print(data['time']['modified'])              # '2026-04-18T00:57:09.913Z'
-
-latest = data['dist-tags']['latest']
-v = data['versions'][latest]
-# Version object keys: name, version, description, license, keywords,
-#   homepage, bugs, repository, engines, exports, main, scripts,
-#   dependencies, devDependencies, peerDependencies, dist, maintainers,
-#   _npmUser, _nodeVersion, _npmVersion
-print(v['description'])                      # 'React is a JavaScript library...'
-print(v['license'])                          # 'MIT'
-print(list(v.get('dependencies', {}).keys())) # [] (react 19 has no runtime deps)
-print(v.get('homepage'))                     # 'https://react.dev/'
-print(len(data['versions']))                 # 2785 — all published versions
-```
-
-**Single version endpoint** — 1–2KB instead of megabytes. Use when you only need one version's data.
-
-```python
-import json
-# Fetch a specific version
-v = json.loads(http_get("https://registry.npmjs.org/react/19.2.5"))
-print(v['name'], v['version'], v['description'])
-
-# Fetch latest directly (no need to resolve dist-tags first)
-v = json.loads(http_get("https://registry.npmjs.org/react/latest"))
-print(v['version'])   # '19.2.5'
-```
-
-**Abbreviated document** — skips time map and (in theory) README; versions dict still present. Use `Accept` header.
-
-```python
-import json, urllib.request, gzip
-
-req = urllib.request.Request(
-    "https://registry.npmjs.org/react",
-    headers={
-        "Accept": "application/vnd.npm.install-v1+json",
-        "Accept-Encoding": "gzip"
-    }
-)
-with urllib.request.urlopen(req, timeout=20) as r:
-    raw = r.read()
-    if r.headers.get("Content-Encoding") == "gzip":
-        raw = gzip.decompress(raw)
-data = json.loads(raw)
-# Keys: name, dist-tags, versions, modified (no time map, no readme)
-print(data['dist-tags']['latest'])           # '4.18.1' (for lodash)
-```
-
-Note: abbreviated is still large (react: 2.7MB) — use single-version endpoint when possible.
-
-### Scoped packages
-
-Scoped packages (`@scope/name`) work with a direct path — no encoding needed:
-
-```python
-import json
-data = json.loads(http_get("https://registry.npmjs.org/@playwright/test"))
-print(data['name'])                          # '@playwright/test'
-print(data['dist-tags']['latest'])           # '1.59.1'
-print(len(data['versions']))                 # 3148
-```
-
-If constructing URLs dynamically, either form works:
-```python
-# Direct path (preferred)
-url = f"https://registry.npmjs.org/{pkg}"          # '@playwright/test'
-# URL-encoded slash
-url = f"https://registry.npmjs.org/{pkg.replace('/', '%2F')}"
-```
-
-### Download statistics
-
-The npm downloads API is separate from the registry and very fast (~110ms).
-
-**Point query** — single number for a period:
-
-```python
-import json
-
-# Supported periods: last-day, last-week, last-month, last-year
-# Also accepts ISO date ranges: YYYY-MM-DD:YYYY-MM-DD
-
-stats = json.loads(http_get("https://api.npmjs.org/downloads/point/last-week/react"))
-print(stats['downloads'])   # 123302510
-print(stats['start'])       # '2026-04-11'
-print(stats['end'])         # '2026-04-17'
-print(stats['package'])     # 'react'
-
-# Confirmed values (2026-04-18):
-# last-day:   19,411,762
-# last-week: 123,302,510
-# last-month: 502,719,511
-# last-year: 3,000,644,845
-```
-
-**Bulk point query** — up to ~128 packages in one call, comma-separated:
-
-```python
-import json
-
-bulk = json.loads(http_get(
-    "https://api.npmjs.org/downloads/point/last-week/"
-    "react,vue,angular,webpack,typescript,eslint,jest,prettier,rollup,babel"
-))
-# Returns dict keyed by package name
-for pkg, info in bulk.items():
-    print(f"{pkg}: {info['downloads']:,}")
-# react: 123,302,510
-# vue: 11,042,359
-# angular: 524,366
-# webpack: 44,425,549
-# typescript: 180,054,359
-# eslint: 126,113,686
-# jest: 43,394,412
-# prettier: 87,551,734
-# rollup: 103,431,439
-# babel: 139,207
-```
-
-**Range query** — downloads per day over a period:
-
-```python
-import json
-
-resp = json.loads(http_get(
-    "https://api.npmjs.org/downloads/range/2025-01-01:2025-01-07/react"
-))
-# resp['downloads'] is a list of {downloads, day} objects
-for entry in resp['downloads']:
-    print(entry['day'], entry['downloads'])
-# 2025-01-01  1336801
-# 2025-01-02  3288088
-# 2025-01-03  3381680
-# ...
-```
-
-### Search
-
-```python
-import json
-
-# Fields: text, size (max ~250), from (offset), quality, popularity, maintenance weights
-data = json.loads(http_get(
-    "https://registry.npmjs.org/-/v1/search?text=browser+automation&size=5"
-))
-print(data['total'])   # total results matching the query
-
-for obj in data['objects']:
-    p = obj['package']
-    s = obj['score']
-    # p keys: name, version, description, keywords, date, links, publisher, maintainers
-    # s keys: final, detail.quality, detail.popularity, detail.maintenance
-    print(
-        p['name'],
-        p['version'],
-        f"{s['final']:.2f}",
-        p.get('description', '')[:60]
-    )
-# agent-browser 0.26.0 462.28 Browser automation CLI for AI agents
-# nightmare     3.0.2  306.64 A high-level browser automation library.
-```
-
-Score breakdown (all three are 0–1 floats):
-- `quality` — code quality signals (tests, lint, TypeScript types)
-- `popularity` — download counts normalized
-- `maintenance` — release frequency, open issues
-
-`final` is a weighted combination and can exceed 1.0 for extremely popular packages.
-
-### Error handling
-
-```python
-import json, urllib.error
-
-try:
-    data = json.loads(http_get("https://registry.npmjs.org/nonexistent-pkg-xyz"))
-except urllib.error.HTTPError as e:
-    # 404 for missing packages
-    print(e.code)                            # 404
-    print(json.loads(e.read()))             # {'error': 'Not found'}
-```
-
----
-
-## PyPI
-
-### Package metadata
-
-```python
-import json
-
-# Latest version metadata
-data = json.loads(http_get("https://pypi.org/pypi/requests/json"))
-info = data['info']
-
-# info keys (selected):
-print(info['name'])             # 'requests'
-print(info['version'])          # '2.33.1'
-print(info['summary'])          # 'Python HTTP for Humans.'
-print(info['license'])          # 'Apache-2.0'
-print(info['author'])           # None (sometimes empty — check author_email)
-print(info['author_email'])     # '"Kenneth Reitz" <me@kennethreitz.org>'
-print(info['requires_python'])  # '>=3.10'
-print(info['home_page'])        # None (may be empty — check project_urls)
-print(info['project_urls'])
-# {'Documentation': 'https://requests.readthedocs.io',
-#  'Source': 'https://github.com/psf/requests'}
-
-requires = info.get('requires_dist') or []
-print(requires[:5])
-# ['charset_normalizer<4,>=2', 'idna<4,>=2.5', 'urllib3<3,>=1.26',
-#  'certifi>=2023.5.7', 'PySocks!=1.5.7,>=1.5.6; extra == "socks"']
-
-print(info.get('classifiers', [])[:3])
-# ['Development Status :: 5 - Production/Stable',
-#  'Intended Audience :: Developers',
-#  'License :: OSI Approved :: Apache Software License']
-
-# data['urls'] — list of dist files for the latest version
-for f in data['urls']:
-    # keys: filename, packagetype, python_version, size, digests, url,
-    #       upload_time, requires_python, yanked, yanked_reason
-    print(f['packagetype'], f['python_version'], f['filename'], f['size'])
-# bdist_wheel  py3     requests-2.33.1-py3-none-any.whl  64947
-# sdist        source  requests-2.33.1.tar.gz           134120
-```
-
-### Specific version
-
-```python
-import json
-
-# Fetch a pinned version (not just latest)
-data = json.loads(http_get("https://pypi.org/pypi/requests/2.32.3/json"))
-print(data['info']['version'])   # '2.32.3'
-# Same structure as the latest endpoint
-```
-
-### Version history and yanked releases
-
-```python
-import json
-
-data = json.loads(http_get("https://pypi.org/pypi/requests/json"))
-
-# data['releases'] is a dict: version_string -> list of file objects
-versions = list(data['releases'].keys())
-print("Total versions:", len(versions))   # 159
-# Versions are insertion-ordered (chronological, oldest first)
-# dict key order is stable
-
-# Find yanked versions
-yanked = [
-    (ver, files[0]['yanked_reason'])
-    for ver, files in data['releases'].items()
-    if files and files[0].get('yanked')
-]
-print(yanked[:2])
-# [('2.32.0', 'Yanked due to conflicts with CVE-2024-35195 mitigation'),
-#  ('2.32.1', 'Yanked due to conflicts with CVE-2024-35195 mitigation ')]
-
-# info.yanked is True only if the LATEST version is yanked
-print(data['info']['yanked'])            # False
-print(data['info']['yanked_reason'])     # None
-```
-
-### Download statistics (pypistats.org)
-
-PyPI does not expose download counts in its own JSON API. Use pypistats.org.
-
-```python
-import json
-
-# Recent (last day/week/month) — fastest, single call
-stats = json.loads(http_get("https://pypistats.org/api/packages/requests/recent"))
-d = stats['data']
-print(d['last_day'])    # 52969887
-print(d['last_week'])   # 356556988
-print(d['last_month'])  # 1385411770
-
-# Historical daily totals (overall, going back ~6 months)
-overall = json.loads(http_get("https://pypistats.org/api/packages/requests/overall"))
-# overall['data'] is list of {category, date, downloads}
-# category is 'with_mirrors' or 'without_mirrors'
-for row in overall['data'][:3]:
-    print(row['date'], row['category'], row['downloads'])
-# 2025-10-19  with_mirrors     21916634
-# 2025-10-19  without_mirrors  21882953
-
-# Without mirrors (pip installs only, more accurate for real usage):
-clean = json.loads(http_get(
-    "https://pypistats.org/api/packages/requests/overall?mirrors=false"
-))
-
-# By Python major version
-by_python = json.loads(http_get(
-    "https://pypistats.org/api/packages/requests/python_major"
-))
-# data rows: {category: '3', date: '...', downloads: N}
-
-# By OS
-by_sys = json.loads(http_get(
-    "https://pypistats.org/api/packages/requests/system"
-))
-# data rows: {category: 'Darwin'|'Linux'|'Windows'|'other'|'null', date, downloads}
-
-# By Python minor version
-by_minor = json.loads(http_get(
-    "https://pypistats.org/api/packages/requests/python_minor"
-))
-```
-
-### Parallel fetch for multiple packages
-
-```python
-import json
-from concurrent.futures import ThreadPoolExecutor
-
-packages = ['numpy', 'pandas', 'scikit-learn', 'torch', 'tensorflow']
-
-def get_pypi_info(pkg):
-    d = json.loads(http_get(f"https://pypi.org/pypi/{pkg}/json"))
-    return {
-        'name': pkg,
-        'version': d['info']['version'],
-        'summary': d['info']['summary'],
-        'requires_python': d['info']['requires_python'],
-    }
-
-with ThreadPoolExecutor(max_workers=5) as ex:
-    results = list(ex.map(get_pypi_info, packages))
-
-for r in results:
-    print(r['name'], r['version'], r['summary'][:50])
-# numpy        2.4.4  Fundamental package for array computing in Python
-# pandas       3.0.2  Powerful data structures for data analysis, time s
-# scikit-learn 1.8.0  A set of python modules for machine learning and d
-# torch        2.11.0 Tensors and Dynamic neural networks in Python with
-# tensorflow   2.21.0 TensorFlow is an open source machine learning fram
-```
-
-### Error handling
-
-```python
-import json, urllib.error
-
-try:
-    data = json.loads(http_get("https://pypi.org/pypi/nonexistent-xyz-abc/json"))
-except urllib.error.HTTPError as e:
-    print(e.code)   # 404
-    # Body is HTML, not JSON — don't try to parse it
-```
-
----
-
-## Parallel fetch patterns
-
-### Mixed registry + stats in one shot
-
-```python
-import json
-from concurrent.futures import ThreadPoolExecutor
-
-def npm_info(pkg):
-    # Use single-version endpoint (1-2KB) not full registry doc (MB)
-    v = json.loads(http_get(f"https://registry.npmjs.org/{pkg}/latest"))
-    s = json.loads(http_get(f"https://api.npmjs.org/downloads/point/last-month/{pkg}"))
-    return {'name': pkg, 'version': v['version'], 'downloads': s['downloads']}
-
-pkgs = ['react', 'vue', 'svelte', 'solid-js', 'preact']
-with ThreadPoolExecutor(max_workers=5) as ex:
-    results = list(ex.map(npm_info, pkgs))
-for r in results:
-    print(r['name'], r['version'], f"{r['downloads']:,}")
-```
-
-### npm bulk downloads (most efficient for many packages)
-
-```python
-import json
-
-# Up to ~128 packages in one HTTP call
-pkgs = ['react', 'vue', 'angular', 'svelte']
-bulk = json.loads(http_get(
-    f"https://api.npmjs.org/downloads/point/last-week/{','.join(pkgs)}"
-))
-# Returns: {pkg_name: {'downloads': N, 'start': '...', 'end': '...', 'package': '...'}, ...}
-sorted_pkgs = sorted(bulk.items(), key=lambda x: x[1]['downloads'], reverse=True)
-for name, info in sorted_pkgs:
-    print(f"{name}: {info['downloads']:,}")
-```
-
----
-
-## Rate limits
-
-No rate limits encountered across rapid bursts of 10 sequential calls per endpoint (2026-04-18 testing):
-
-| API | Observed limit |
-|-----|----------------|
-| npm registry (`registry.npmjs.org`) | None observed |
-| npm downloads (`api.npmjs.org`) | None observed |
-| npm search | None observed |
-| PyPI JSON (`pypi.org`) | None observed |
-| pypistats.org | None observed |
-
-npm's official documentation mentions soft rate limits at very high volumes, but normal task-level usage (dozens of calls) is unaffected. If building a large scraper, add a short sleep between batches as a precaution.
-
----
-
-## Gotchas
-
-- **Full npm registry doc is huge** — `registry.npmjs.org/react` is 6.3MB (2785 versions). When you only need the latest version metadata, fetch `registry.npmjs.org/react/latest` (~1.8KB) instead. Similarly for any specific version.
-
-- **npm `versions` dict keys are ordered oldest-first** — The last key is NOT necessarily the latest release; it may be a canary/experimental build. Always use `dist-tags.latest` to identify the stable latest version.
-
-- **PyPI `author` field is often `None`** — Many packages set `author_email` instead (often in `"Name" <email>` format). Fall back: `info['author'] or info['author_email']`.
-
-- **PyPI `home_page` is frequently empty** — Check `info['project_urls']` for `Homepage`, `Source`, `Documentation` links instead.
-
-- **PyPI `requires_dist` can be `None`** — Not an empty list — `None`. Always guard: `info.get('requires_dist') or []`.
-
-- **PyPI XML-RPC API is dead** — `https://pypi.org/pypi` (XML-RPC) returns a fault for most methods including `package_releases`. Use JSON API only.
-
-- **pypistats.org `total` field is `None`** — The `total` key in response JSON is null; compute sums from `data` list yourself.
-
-- **pypistats.org data goes back ~6 months** — The `overall` endpoint returns daily rows for roughly the past 180 days, not full history.
-
-- **PyPI yanked versions** — `data['releases'][ver][0]['yanked']` is `True` for yanked versions. `data['info']['yanked']` is only `True` if the latest version itself is yanked. Both `yanked` and `yanked_reason` fields exist on each file object.
-
-- **npm scoped packages** — Both `registry.npmjs.org/@scope/name` (direct path) and `registry.npmjs.org/@scope%2Fname` (URL-encoded) work. Use the direct path form.
-
-- **npm downloads bulk response is a dict** — When you request multiple packages, the response is `{pkg_name: {...}}`, not a list. Single-package response is a flat object with `downloads`, `start`, `end`, `package` directly.
-
-- **`http_get` handles gzip transparently** — The helper already decompresses gzip responses. No manual decompression needed.
-
-- **Never use a browser for either registry** — All data is JSON over HTTP. `http_get` calls take 80–480ms; a browser navigation would take 3–8 seconds with no benefit.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/polymarket/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/polymarket/scraping.md
deleted file mode 100644
index 75805231c..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/polymarket/scraping.md
+++ /dev/null
@@ -1,234 +0,0 @@
-# Polymarket — Market Data, Outcomes & Comments
-
-Polymarket (`polymarket.com`) is a Next.js SPA. Its DOM is **selector-hostile** — class names are CSS-module hashes that rotate on every deploy (`styles_row__aB3cD`), and there are **no `data-testid` attributes anywhere on the event pages** (confirmed April 2026: `document.querySelectorAll('[data-testid]').length === 0` on a live event page).
-
-**Always try the public Gamma API first.** It returns everything the page shows — outcomes, prices, volume, comments, tags — as clean JSON, and it doesn't need a browser at all.
-
-## URL patterns
-
-- Event page: `https://polymarket.com/event/<slug>` — a group of related markets (e.g. "Iran x Israel/US conflict ends by…" holds one market per proposed end-date).
-- Single market (rare nowadays; most UI routes are events): `https://polymarket.com/market/<slug>`.
-- Event slug is visible in the browser URL and is the stable join key for the API.
-
-## Path 1: Gamma API (preferred — no browser needed)
-
-Base: `https://gamma-api.polymarket.com`. No auth, no key, no rate limit observed under normal use. All data structured JSON. **Use `http_get` from `helpers.py`.**
-
-### Event metadata + outcomes
-
-```python
-import json
-from helpers import http_get
-
-ev = json.loads(http_get(
-    "https://gamma-api.polymarket.com/events?slug=iran-x-israelus-conflict-ends-by"
-))[0]
-
-print(ev["title"])          # "Iran x Israel/US conflict ends by...?"
-print(ev["volume"])          # 96575490.91253869  (float USD)
-print(ev["endDate"])         # "2026-03-31T00:00:00Z"
-print(ev["closed"])          # bool
-print([t["label"] for t in ev["tags"]])   # ["Middle East", "Iran", "World", ...]
-print(len(ev["markets"]))    # 9 — one sub-market per outcome (grouped event)
-```
-
-Event top-level keys (Apr 2026): `active archived closed commentCount competitive createdAt description endDate id image liquidity markets negRisk negRiskAugmented openInterest slug startDate tags ticker title updatedAt volume volume1mo volume1wk volume1yr volume24hr`.
-
-Each entry in `ev["markets"]` is a binary market with these useful keys:
-`question groupItemTitle outcomes outcomePrices volume lastTradePrice bestBid bestAsk conditionId clobTokenIds closed endDate slug orderPriceMinTickSize`.
-
-**Critical quirk:** `outcomes` and `outcomePrices` are **JSON-encoded strings**, not lists. Parse them:
-
-```python
-for m in ev["markets"]:
-    outcomes = json.loads(m["outcomes"])           # ["Yes", "No"]
-    prices   = json.loads(m["outcomePrices"])      # ["0", "1"]
-    yes_price, no_price = float(prices[0]), float(prices[1])
-    print(f"{m['groupItemTitle']}: YES {yes_price:.3f} / NO {no_price:.3f} | vol ${m.get('volume') or 0:,.0f}")
-```
-
-`groupItemTitle` is the human label for the outcome within a grouped event ("April 7", "May 15", "June 30"). `question` is the full binary phrasing ("Iran x Israel/US conflict ends by March 7?"). For non-grouped single markets, `groupItemTitle` may be empty and the outcome label is implicit (the market's `question` is the Yes/No phrasing).
-
-**`lastTradePrice` ≠ midpoint.** For live markets, the UI shows the Yes/No **cents** derived from the order book, which is closer to the **mid of `bestBid` and `bestAsk`**. `lastTradePrice` can be stale. Prefer:
-
-```python
-def yes_cents(m):
-    if m.get("bestBid") and m.get("bestAsk"):
-        return (float(m["bestBid"]) + float(m["bestAsk"])) / 2 * 100
-    return float(json.loads(m["outcomePrices"])[0]) * 100
-```
-
-### Comments
-
-Endpoint: `https://gamma-api.polymarket.com/comments` (NOT `comments-api.polymarket.com` — that hostname no longer resolves as of Apr 2026).
-
-Required query params: `parent_entity_type=Event` + `parent_entity_id=<event_id>`. For a market-level comment thread (rare), use `parent_entity_type=Market` + the `conditionId` or market id.
-
-```python
-import json
-from helpers import http_get
-
-ev = json.loads(http_get("https://gamma-api.polymarket.com/events?slug=iran-x-israelus-conflict-ends-by"))[0]
-
-raw = http_get(
-    f"https://gamma-api.polymarket.com/comments"
-    f"?parent_entity_type=Event&parent_entity_id={ev['id']}"
-    f"&limit=10&order=reactionCount&ascending=false"
-)
-for c in json.loads(raw):
-    if "body" not in c:     # deleted/removed comments have no body field
-        continue
-    author = (c.get("profile") or {}).get("pseudonym") or c.get("userAddress", "")[:8]
-    print(f"[{c['createdAt'][:10]}] {author} (+{c['reactionCount']}): {c['body'][:100]}")
-```
-
-Comment keys: `id body parentEntityID parentEntityType profile userAddress createdAt updatedAt reactionCount reportCount`.
-
-Profile keys: `name pseudonym displayUsernamePublic proxyWallet baseAddress profileImage`. Prefer `profile.pseudonym` over `profile.name` — Polymarket assigns the human-readable handle (e.g. `Next-Ride`, `Flamboyant-Subsidiary`) as `pseudonym`; `name` is often an opaque username like `aa99011`.
-
-Useful query params:
-- `order=reactionCount` + `ascending=false` → top comments (upvotes)
-- `order=createdAt` + `ascending=false` → newest (default)
-- `limit=<n>` — up to ~100 per page
-- `offset=<n>` — pagination
-
-### Other handy Gamma endpoints
-
-- `/events?active=true&limit=50&order=volume24hr&ascending=false` — trending events
-- `/events?tag_id=<id>` — events by tag (tags show in event `tags[].id`, e.g. `78` = Iran, `154` = Middle East)
-- `/markets?slug=<market-slug>` — single market lookup
-- `/markets/<id>/prices-history?interval=1d&fidelity=60` — price history series
-
-### CLOB (order book, trades)
-
-Real-time order book lives on `https://clob.polymarket.com`. Needed when `bestBid`/`bestAsk` on Gamma are stale (rare) or for trade history. Market-data endpoints don't need auth; order placement does.
-
-```python
-# Mid-price from live order book
-ob = json.loads(http_get(f"https://clob.polymarket.com/book?token_id={clob_token_id}"))
-best_bid = float(ob["bids"][0]["price"]) if ob.get("bids") else None
-best_ask = float(ob["asks"][0]["price"]) if ob.get("asks") else None
-```
-
-`clob_token_id` comes from `market["clobTokenIds"]` (a JSON-encoded pair `[yes_token_id, no_token_id]`).
-
-## Path 2: Browser DOM extraction (fallback)
-
-Use this only when the API path is unavailable, blocked, or you need exactly-what-user-sees (e.g. A/B variant of the UI, or to corroborate a weird resolution state). The Gamma API is always cheaper.
-
-### Why naive DOM extraction fails
-
-Polymarket's event page renders every outcome row inside nested `<div>`s with **CSS-module class names that change every deploy**. There are **no stable selectors** — no `data-testid`, no `role`, no semantic classes. Three naive approaches that look fine and are actually broken:
-
-1. `document.querySelectorAll('div')` + `innerText` pattern-match — **produces duplicates.** Every ancestor div's `innerText` contains the concatenation of its descendants' text, so "$45,718,857 Vol. · 100% · 99.9¢ · 0.1¢" matches at 4–6 ancestor levels for the same outcome row.
-2. Regex on `document.body.innerText` — loses positional structure. You get flat lists of `['99.9¢', '0.1¢', '99.9¢', ...]` and `['April 7', 'April 15', ...]` and can't join them into rows.
-3. Picking the first N elements of each list — fragile to header/footer noise (the sidebar also has YES/NO cents for related markets).
-
-### The leaf-div-disambiguation pattern
-
-**Only emit text from DOM leaves** — elements with `children.length === 0`. A leaf node's `innerText` is precisely what it renders, never a concatenation of siblings. Then group adjacent leaves by their **nearest common ancestor** to assemble rows.
-
-```bash
-browser-harness <<'PY'
-new_tab("https://polymarket.com/event/iran-x-israelus-conflict-ends-by")
-wait_for_load()
-wait(3.0)   # SPA hydration
-
-# Fingerprints for each cell-type in an outcome row
-labels = js(r"""
-(()=>{
-  const leaves = [...document.querySelectorAll('*')].filter(e => e.children.length === 0);
-  const out = [];
-  const vol = /^\$[\d,]+\s*Vol\.?$/;
-  const pct = /^\d+%$/;
-  const price = /^\d+(\.\d+)?¢$/;
-  const rows = new Map();   // key = nearest ancestor that contains >=2 fingerprints
-
-  // Find the smallest ancestor that wraps each outcome row. We walk up from every
-  // fingerprint leaf, and the first ancestor that ALSO contains another fingerprint
-  // leaf of a DIFFERENT kind is the row container.
-  const fingerprint = (t) => vol.test(t) ? 'vol' : pct.test(t) ? 'pct' : price.test(t) ? 'price' : null;
-  const hit = leaves
-    .map(e => ({el:e, text:(e.innerText||'').trim()}))
-    .filter(o => fingerprint(o.text));
-  for (const o of hit) {
-    let node = o.el.parentElement;
-    while (node) {
-      const inner = node.innerText || '';
-      const kinds = new Set();
-      if (/\$[\d,]+\s*Vol\.?/.test(inner)) kinds.add('vol');
-      if (/\b\d+%/.test(inner)) kinds.add('pct');
-      if (/\d+(\.\d+)?¢/.test(inner)) kinds.add('price');
-      if (kinds.size >= 2) {
-        const txtNodes = [...node.querySelectorAll('*')]
-          .filter(e => e.children.length === 0)
-          .map(e => (e.innerText||'').trim())
-          .filter(Boolean);
-        rows.set(node, txtNodes);
-        break;
-      }
-      node = node.parentElement;
-    }
-  }
-  return [...rows.values()];
-})()
-""")
-print(labels)
-PY
-```
-
-Then assemble rows in Python by matching fingerprints:
-
-```python
-import re
-VOL   = re.compile(r'^\$[\d,]+\s*Vol\.?$')
-PCT   = re.compile(r'^\d+%$')
-PRICE = re.compile(r'^\d+(\.\d+)?¢$')
-
-def assemble(leaf_lists):
-    rows = []
-    for leaves in leaf_lists:
-        label = next((l for l in leaves if not VOL.match(l) and not PCT.match(l) and not PRICE.match(l) and len(l) < 40), None)
-        vol   = next((l for l in leaves if VOL.match(l)), None)
-        pct   = next((l for l in leaves if PCT.match(l)), None)
-        prices = [l for l in leaves if PRICE.match(l)]
-        rows.append({
-            "outcome":    label,
-            "yes_cents":  float(prices[0].rstrip("¢")) if len(prices) >= 1 else None,
-            "no_cents":   float(prices[1].rstrip("¢")) if len(prices) >= 2 else None,
-            "chance_pct": int(pct.rstrip("%")) if pct else None,
-            "volume":     vol,
-        })
-    return rows
-```
-
-### Live-measured leaf counts (trial event, April 2026)
-
-```
-h1:                "Iran x Israel/US conflict ends by...?"
-data-testid count:  0
-price-leaves:      ['99.9¢', '0.1¢', '99.9¢', '0.1¢', '99.9¢', '0.1¢', '99.9¢', '0.1¢', '99.9¢', '0.1¢']
-vol-leaves:        ['$96,575,491 Vol.', '$45,718,857 Vol.', '$16,801,875 Vol.', '$13,374,944 Vol.', '$5,809,323 Vol.']
-pct-leaves:        ['100%', '100%', '100%', '100%', '5%', '4%']
-```
-
-Note the first `vol-leaf` is the **event-level total** ($96.5M), while the next four are per-outcome volumes. The first four `pct-leaves` are the outcome chances; `5%` / `4%` are from the related-markets sidebar. Filter by nearest-common-ancestor as above to stay inside the outcome block.
-
-## Gotchas
-
-- **No `data-testid` attributes.** Don't waste time grepping. Confirmed with `document.querySelectorAll('[data-testid]').length === 0` on a live event page.
-- **CSS-module class names rotate on deploy.** Never pin to `styles_xxx` classes — they're invalid within a week.
-- **`outcomes` and `outcomePrices` are JSON-encoded strings in the Gamma payload.** Run `json.loads()` on them before use. Agents repeatedly trip on this.
-- **`lastTradePrice` lags the visible cents.** Use `(bestBid + bestAsk)/2` when both exist, otherwise fall back to `outcomePrices[0]`.
-- **Comments live at `gamma-api.polymarket.com/comments`, not `comments-api.polymarket.com`.** The latter hostname doesn't resolve (April 2026).
-- **Deleted comments have no `body` field.** The envelope still ships (with `id`, `createdAt`, `profile`, `media`, `parentCommentID`, etc.) but `body` is absent. Always `if "body" not in c: continue` before indexing. `c["body"]` will `KeyError` otherwise.
-- **`profile.pseudonym` is the handle.** `profile.name` is an opaque username. Always prefer `pseudonym` for display.
-- **Grouped vs single markets.** An event with `ev["markets"]` of length N > 1 is a grouped event — each sub-market is one outcome slot. Length 1 is a traditional binary market; the `question` holds the Yes/No framing.
-- **Naive `querySelectorAll('div') + innerText`-match duplicates every row 4–6 times** because ancestor divs' innerText contains their descendants' concatenated text. Always filter to DOM leaves (`children.length === 0`) and group by nearest common ancestor.
-- **Hydration wait.** `wait_for_load()` returns before the SPA paints the outcome table. Add `wait(2.5)`–`wait(3.5)` before reading DOM. Irrelevant for the API path.
-- **Arc Profile B only on this machine.** Polymarket on Profile A may trigger the wallet-connect modal if a previous session left a wallet linked — Profile B is the clean default.
-- **Sidebar sub-markets share the same fingerprints.** Any `vol-leaf` / `pct-leaf` / `price-leaf` outside the main event container is noise. The leaf-disambiguation pattern above filters this by requiring a nearest-ancestor containing ≥2 fingerprint kinds — sidebar related-market cards hold only `pct-leaves`, so they're excluded.
-
-## Always prefer the API
-
-The leaf-disambiguation DOM path exists as a corroboration / fallback tool. If you find yourself writing it for a fresh task, stop and check whether `/events?slug=…` already gives you the fields you need. As of April 2026, it does for: title, resolution status (`closed` + `umaResolutionStatus`), end date, all outcome labels/prices/volumes, comment counts, tags. DOM extraction is only worth it for visual-only UI state (which doesn't exist on Polymarket — the API is the source of truth).
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/producthunt/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/producthunt/scraping.md
deleted file mode 100644
index cb83b00e5..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/producthunt/scraping.md
+++ /dev/null
@@ -1,307 +0,0 @@
-# Product Hunt Scraping Skills
-
-Field-tested against https://www.producthunt.com on 2026-04-18.
-All selectors verified with actual browser runs.
-
----
-
-## Page Structure Overview
-
-Product Hunt is a React SPA. Key structural facts discovered:
-
-- **No login wall** — all product data is accessible without signing in
-- **No cookie banner** — page loads cleanly with no consent dialogs
-- **Product URLs use `/products/` not `/posts/`** — the `a[href^="/posts/"]` selector matches nothing
-- **`data-test` attributes** are the most reliable selectors throughout the site
-- **4 homepage sections**: today, yesterday, last week, last month (5 products each, plus "see all")
-- **Today's votes are hidden** for the first 4 hours of each day (`—` instead of count)
-- **Homepage has 30 fixed post-items** — scrolling does NOT load more
-- **`goto_url()` may return `ERR_ABORTED`** for producthunt.com in some browser sessions — use `new_tab()` instead
-
----
-
-## Navigation Pattern
-
-```python
-# goto_url() may fail on Product Hunt — use new_tab() reliably
-tid = new_tab("https://www.producthunt.com")
-wait(4)  # React SPA needs time; wait_for_load() alone is insufficient
-page = page_info()
-# Verify: url should be 'https://www.producthunt.com/'
-```
-
----
-
-## Homepage — Extract Daily Product Feed
-
-The homepage shows today's launches plus rolling sections for yesterday, last week, last month.
-
-### Working selector: `[data-test^="post-item-"]`
-
-```python
-# Full extraction with name, tagline, slug, votes, topics
-products = js("""
-JSON.stringify(
-  Array.from(document.querySelectorAll('[data-test^="post-item-"]')).map(el => {
-    var id = el.getAttribute('data-test').replace('post-item-', '');
-    var nameEl = el.querySelector('[data-test^="post-name-"]');
-    var productLink = el.querySelector('a[href^="/products/"]');
-    var voteBtn = el.querySelector('[data-test="vote-button"]');
-    var voteCount = voteBtn ? voteBtn.textContent.trim() : null;
-    var topicLinks = Array.from(el.querySelectorAll('a[href^="/topics/"]')).map(a => a.textContent.trim());
-    var name = nameEl ? nameEl.textContent.trim() : '';
-    var lines = el.innerText.split('\\n').map(l => l.trim()).filter(l => l);
-    var tagline = lines.find(l => l !== name && !topicLinks.includes(l) && l !== '•' && !/^[0-9—]/.test(l) && l.length > 5);
-    return {
-      id: id,
-      name: name,
-      slug: productLink ? productLink.getAttribute('href') : null,
-      votes: voteCount,
-      topics: topicLinks,
-      tagline: tagline || null
-    };
-  })
-)
-""")
-```
-
-**Sample output:**
-```json
-[
-  {"id":"1126372","name":"Vercel Flags","slug":"/products/vercel","votes":"—","topics":["Software Engineering","Developer Tools"],"tagline":"Feature flags, targeting rules, rollouts. All from Vercel."},
-  {"id":"1125388","name":"1. Claude Opus 4.7","slug":"/products/claude-opus-4-7","votes":"466","topics":["API","Artificial Intelligence","Development"],"tagline":"Claude's most capable model for reasoning and agentic coding"}
-]
-```
-
-**Votes:**
-- `"—"` = vote count hidden (today's products during first 4 hours)
-- `"466"` = numeric string (yesterday/older products)
-- `"2,152"` = comma-formatted for large counts (parse: `voteCount.replace(',', '')`)
-
-**Name prefix**: Ranked products show rank in name: `"1. Claude Opus 4.7"` — strip with `re.sub(r'^\d+\. ', '', name)`.
-
----
-
-## Daily Leaderboard — Best URL for Complete Daily Lists
-
-The leaderboard shows all products for any given day with actual vote counts.
-
-```
-URL: https://www.producthunt.com/leaderboard/daily/YYYY/M/D
-Example: https://www.producthunt.com/leaderboard/daily/2026/4/18
-```
-
-- Uses zero-padded-free month/day (April = `4`, not `04`)
-- Uses same `[data-test^="post-item-"]` selector
-- Shows 12–19 products per day
-- Same extraction JS as homepage works identically
-
-**Yesterday's results with real vote counts:**
-```json
-{"id":"1125388","name":"1. Claude Opus 4.7","slug":"/products/claude-opus-4-7","votes":"466","tagline":"Claude's most capable model for reasoning and agentic coding"}
-{"id":"1118208","name":"2. Build Check","slug":"/products/build-check-for-outsiders","votes":"396"}
-```
-
----
-
-## Weekly Leaderboard
-
-```
-URL: https://www.producthunt.com/leaderboard/weekly/YYYY/WW
-Example: https://www.producthunt.com/leaderboard/weekly/2026/16
-```
-
-- Week number is ISO week (week 16 = April 13–19, 2026)
-- Current week may return 0 items until the week ends
-- Same `[data-test^="post-item-"]` selector
-
----
-
-## Monthly Leaderboard
-
-```
-URL: https://www.producthunt.com/leaderboard/monthly/YYYY/M
-Example: https://www.producthunt.com/leaderboard/monthly/2026/4
-```
-
----
-
-## Topic Page
-
-URL: `https://www.producthunt.com/topics/developer-tools`
-
-Selector changes on topic pages — uses `[data-test^="product:"]` not `post-item-`.
-
-```python
-# Navigate to topic
-new_tab("https://www.producthunt.com/topics/developer-tools")
-wait(3)
-
-products = js("""
-JSON.stringify(
-  Array.from(document.querySelectorAll('[data-test^="product:"]')).map(el => {
-    var slug = el.getAttribute('data-test').replace('product:', '');
-    var link = el.querySelector('a[href^="/products/"]');
-    return {
-      slug: slug,
-      href: link ? link.getAttribute('href') : null,
-      text: el.outerText.trim().substring(0, 200)
-    };
-  })
-)
-""")
-```
-
-**Sample output:**
-```json
-{"slug":"figma","href":"/products/figma","text":"FigmaThe collaborative interface design tool4.9 (1.4K reviews)..."}
-```
-
-Returns ~15 top-rated products in the topic, not recent launches.
-
----
-
-## Category Page
-
-URL: `https://www.producthunt.com/categories/ai-agents`
-
-Same `[data-test^="product:"]` selector as topics. Returns 15 top-reviewed products in that category.
-
-```json
-{"slug":"elevenlabs","href":"/products/elevenlabs","text":"ElevenLabs\nCreate natural AI voices instantly...\n4.9 (165 reviews)"}
-```
-
----
-
-## Product Detail Page
-
-URL: `https://www.producthunt.com/products/claude-opus-4-7`
-
-### Get total vote count (sidebar button)
-```python
-# Use [data-test="vote-button"] — different from [data-test="action-bar-vote-button"]
-vote_text = js("document.querySelector('[data-test=\"vote-button\"]').outerText.trim().replace(/\\s+/g, ' ')")
-# Returns: "Upvote • 466 points"
-# Parse votes: vote_text.split('•')[1].strip().replace(' points', '').replace(',', '')
-```
-
-### Get review count and rating
-```python
-review_link = js("JSON.stringify(Array.from(document.querySelectorAll('a')).filter(a => a.href && a.href.includes('/reviews') && a.outerText.includes('review')).map(a => a.outerText.trim()).slice(0, 1))")
-# Returns: ["1 review"] or ["5.0\n(731 reviews)"]
-```
-
-### Get day rank (sidebar shows "#1 Day Rank")
-No dedicated `data-test` for rank — parse from sidebar context or use leaderboard position.
-
-### Comments (action-bar-vote-button)
-Each comment has its own `[data-test="action-bar-vote-button"]` with text like `"Upvote (13)"`.
-
----
-
-## Search Results
-
-URL: `https://www.producthunt.com/search?q=AI+agent`
-
-Selector: `[data-test^="spotlight-result-product-"]`
-
-```python
-new_tab("https://www.producthunt.com/search?q=AI+agent")
-wait(3)
-
-results = js("""
-JSON.stringify(
-  Array.from(document.querySelectorAll('[data-test^="spotlight-result-product-"]')).map(el => {
-    var id = el.getAttribute('data-test').replace('spotlight-result-product-', '');
-    var lines = el.outerText.trim().split('\\n').map(l => l.trim()).filter(l => l);
-    return {
-      id: id,
-      name: lines[0] || null,
-      tagline: lines[1] || null,
-      review_text: lines[2] || null
-    };
-  })
-)
-""")
-```
-
-**Note:** Search result elements are `<button>` elements (not `<a>` links), so there is no `href` in the DOM. Product URL must be constructed: `https://www.producthunt.com/products/<slug>` where slug must be derived by other means. The element's `data-test` ID matches the numeric product ID, not the slug.
-
-**Sample output:**
-```json
-{"id":"526014","name":"/ai","tagline":"Access ChatGPT anywhere you type '/ai'","review_text":"2 reviews"}
-{"id":"991302","name":"Naoma AI Demo Agent","tagline":"The first video agent that runs conversational product demos","review_text":"5 reviews"}
-```
-
----
-
-## Key Selector Reference
-
-| Page | Selector | Count | Notes |
-|------|----------|-------|-------|
-| Homepage | `[data-test^="post-item-"]` | 30 | 4 sections × ~5–7 products |
-| Homepage | `[data-test^="post-name-"]` | 30 | Product name elements |
-| Homepage/Leaderboard | `[data-test="vote-button"]` | varies | `—` for hidden; numeric for visible |
-| Topics/Categories | `[data-test^="product:"]` | ~15 | Top-rated products |
-| Search | `[data-test^="spotlight-result-product-"]` | 10 | Button elements, no href |
-| Product detail | `[data-test="vote-button"]` | 1 | Main vote: "Upvote • N points" |
-| Product detail | `[data-test="action-bar-vote-button"]` | many | Comment upvotes: "Upvote (N)" |
-
----
-
-## Common Pitfalls
-
-1. **`innerText` returns `None` on complex elements** — use `outerText` or break into simple single-property expressions. Avoid chaining DOM traversal inside `JSON.stringify()` on large objects.
-
-2. **`goto_url()` returns `ERR_ABORTED`** for producthunt.com in some browser sessions — always use `new_tab("url")` instead.
-
-3. **`a[href^="/posts/"]` matches nothing** — Product Hunt uses `/products/` for product URLs, not `/posts/`.
-
-4. **Today's votes are always `—`** during the first 4 hours of the day — use yesterday's leaderboard for confirmed vote counts.
-
-5. **Homepage does not lazy-load more products on scroll** — 30 items is the fixed set. Use leaderboard pages for complete daily listings.
-
-6. **JSON.stringify of DOM-heavy objects returns `None`** — serialize only primitives (strings, numbers) not live DOM node properties.
-
-7. **Ranked product names contain rank prefix** — `"1. Claude Opus 4.7"` — strip with regex `re.sub(r'^\d+\.\s+', '', name)`.
-
-8. **`wait(3)` required after `wait_for_load()`** — the React SPA continues rendering after the load event.
-
----
-
-## Recommended Workflow for Scraping Today's Launches
-
-```python
-# 1. Open Product Hunt in a new tab
-new_tab("https://www.producthunt.com/leaderboard/daily/2026/4/18")
-wait(4)
-
-# 2. Extract all products with metadata
-products = js("""
-JSON.stringify(
-  Array.from(document.querySelectorAll('[data-test^="post-item-"]')).map(el => {
-    var id = el.getAttribute('data-test').replace('post-item-', '');
-    var nameEl = el.querySelector('[data-test^="post-name-"]');
-    var productLink = el.querySelector('a[href^="/products/"]');
-    var voteBtn = el.querySelector('[data-test="vote-button"]');
-    var topicLinks = Array.from(el.querySelectorAll('a[href^="/topics/"]')).map(a => a.textContent.trim());
-    var name = nameEl ? nameEl.textContent.trim() : '';
-    var lines = el.innerText.split('\\n').map(l => l.trim()).filter(l => l);
-    var tagline = lines.find(l => l !== name && !topicLinks.includes(l) && l !== '•' && !/^[0-9—]/.test(l) && l.length > 5);
-    return {
-      id: id,
-      name: name,
-      slug: productLink ? productLink.getAttribute('href') : null,
-      votes: voteBtn ? voteBtn.textContent.trim() : null,
-      topics: topicLinks,
-      tagline: tagline || null
-    };
-  })
-)
-""")
-import json
-data = json.loads(products)
-print(f"Found {len(data)} products")
-for p in data:
-    print(f"  {p['name']} — {p['votes']} votes — {p['tagline']}")
-```
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/pubmed/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/pubmed/scraping.md
deleted file mode 100644
index 240bf917c..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/pubmed/scraping.md
+++ /dev/null
@@ -1,421 +0,0 @@
-# PubMed / NCBI — Scraping & Data Extraction
-
-`https://pubmed.ncbi.nlm.nih.gov` — 37 M+ biomedical citations. **Never use the browser for PubMed.** All data is reachable via `http_get` using the NCBI E-utilities REST API. No API key required; a free key raises the rate limit from 3 to 10 req/s.
-
-## Do this first
-
-**ESearch → ESummary is the fastest pipeline for most tasks — two calls, JSON responses, no XML parsing.**
-
-```python
-import json
-from helpers import http_get
-
-# Step 1: search → get PMIDs
-search = json.loads(http_get(
-    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
-    "?db=pubmed&term=deep+learning+radiology&retmax=10&retmode=json"
-))
-pmids = search['esearchresult']['idlist']   # e.g. ['41999029', '41998456', ...]
-count = search['esearchresult']['count']    # total hits across all pages
-
-# Step 2: fetch lightweight metadata for all PMIDs in one call
-summary = json.loads(http_get(
-    f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi"
-    f"?db=pubmed&id={','.join(pmids)}&retmode=json"
-))
-result = summary['result']
-for uid in result['uids']:
-    art = result[uid]
-    print(uid, art['pubdate'], art['source'])
-    print("  ", art['title'][:80])
-    print("  authors:", [a['name'] for a in art['authors'][:3]])
-# Confirmed output (2026-04-18):
-# 41999029 2026 Apr 18 Med Sci Monit
-#    Use of Deep Learning Models in the Diagnosis of Proptosis Through Orbi
-#    authors: ['Kesimal U', 'Akkaya HE', 'Polat Ö']
-# 41998456 2026 Apr 17 Sci Rep
-#    ...
-```
-
-Use **EFetch XML** when you need: full abstract text, MeSH terms, complete author names (not just "Last I"), structured abstract labels, or the DOI from within the article record.
-
-## Common workflows
-
-### Search PubMed (ESearch)
-
-```python
-import json
-from helpers import http_get
-
-data = json.loads(http_get(
-    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
-    "?db=pubmed"
-    "&term=large+language+models+clinical"
-    "&retmax=5"
-    "&retmode=json"
-    "&sort=pub+date"                          # newest first; default is relevance
-    "&datetype=pdat"                          # filter by publication date
-    "&mindate=2024/01/01&maxdate=2024/12/31"  # YYYY/MM/DD format
-))
-result = data['esearchresult']
-print("Total hits:", result['count'])         # '24160' — note: string, not int
-print("PMIDs:", result['idlist'])
-print("Query translation:", result['querytranslation'])
-# Confirmed output (2026-04-18):
-# Total hits: 24160
-# PMIDs: ['41996895', '41996722', '41996006', '41995888', '41995759']
-# Query translation: "large language models"[MeSH Terms] OR ...
-```
-
-#### ESearch field tags (append to term)
-
-```
-machine learning[MeSH Terms]        MeSH controlled vocabulary
-Hinton GE[Author]                   author last + initials
-attention is all you need[Title]    title words
-Nature[Journal]                     journal name
-2024[pdat]                          publication year
-```
-
-Boolean operators: `AND`, `OR`, `NOT`. Phrase search: `"exact phrase"[Title]`.
-
-#### Sort options (`sort=`)
-
-| Value | Effect |
-|---|---|
-| *(omit)* | Relevance (default) |
-| `pub+date` | Most recent publication first |
-| `Author` | First author alphabetical |
-| `JournalName` | Journal alphabetical |
-
-### Lightweight metadata — ESummary (JSON, no XML)
-
-```python
-import json
-from helpers import http_get
-
-data = json.loads(http_get(
-    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi"
-    "?db=pubmed&id=41999029,41998456,41997837&retmode=json"
-))
-result = data['result']
-for uid in result['uids']:
-    art = result[uid]
-    # Key fields available:
-    title         = art['title']            # full title string
-    source        = art['source']           # abbreviated journal name
-    fulljournalname = art['fulljournalname']
-    pubdate       = art['pubdate']          # e.g. '2026 Apr 18'
-    epubdate      = art['epubdate']         # e-pub ahead of print date (may be empty)
-    authors       = art['authors']          # list of {'name': 'Last I', 'authtype': ...}
-    volume        = art['volume']
-    issue         = art['issue']
-    pages         = art['pages']
-    pubtype       = art['pubtype']          # list: ['Journal Article', 'Review', ...]
-    # Extract DOI from elocationid or articleids:
-    doi_field     = art['elocationid']      # e.g. 'doi: 10.12659/MSM.951157'
-    article_ids   = {x['idtype']: x['value'] for x in art['articleids']}
-    doi           = article_ids.get('doi')
-    pmc_id        = article_ids.get('pmc')  # PMC ID if open access
-    print(uid, pubdate, source)
-    print(" ", title[:70])
-    print("  doi:", doi, "| pmc:", pmc_id)
-# Confirmed output (2026-04-18):
-# 41999029 2026 Apr 18 Med Sci Monit
-#    Use of Deep Learning Models in the Diagnosis of Proptosis Through Orbi
-#   doi: 10.12659/MSM.951157 | pmc: None
-```
-
-### Full article metadata — EFetch XML
-
-Use this for full abstracts, complete author names, MeSH terms, structured abstract sections.
-
-```python
-import json, xml.etree.ElementTree as ET
-from helpers import http_get
-
-raw = http_get(
-    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
-    "?db=pubmed&id=41999029,36328784&retmode=xml&rettype=abstract"
-)
-root = ET.fromstring(raw)
-
-for art in root.findall('.//PubmedArticle'):
-    mc      = art.find('MedlineCitation')
-    pmid    = mc.find('PMID').text
-    article = mc.find('Article')
-
-    # Title — use itertext() to handle embedded tags like <i>, <sub>
-    title   = ''.join(article.find('ArticleTitle').itertext()).strip()
-
-    # Abstract — plain or structured (BACKGROUND / METHODS / RESULTS / CONCLUSION)
-    abstract_el = article.find('Abstract')
-    if abstract_el is not None:
-        sections = []
-        for t in abstract_el.findall('AbstractText'):
-            label = t.get('Label', '')          # e.g. 'BACKGROUND', 'METHODS'
-            text  = ''.join(t.itertext()).strip()
-            sections.append(f"[{label}] {text}" if label else text)
-        abstract = ' '.join(sections)
-    else:
-        abstract = ''                           # ~15% of articles have no abstract
-
-    # Journal + year
-    journal  = article.find('Journal')
-    j_title  = journal.find('Title').text if journal is not None else ''
-    pub_date = journal.find('.//PubDate') if journal is not None else None
-    if pub_date is not None:
-        year_el    = pub_date.find('Year')
-        medline_el = pub_date.find('MedlineDate')   # fallback for old/seasonal dates
-        season_el  = pub_date.find('Season')        # e.g. 'Jul-Aug', 'Oct-Dec'
-        year = (year_el.text if year_el is not None
-                else medline_el.text[:4] if medline_el is not None else '')
-
-    # DOI
-    doi_el = next(
-        (e for e in article.findall('ELocationID') if e.get('EIdType') == 'doi'),
-        None
-    )
-    doi = doi_el.text if doi_el is not None else ''
-
-    # Authors — handle CollectiveName (consortium/group authors)
-    author_list = article.find('AuthorList')
-    authors = []
-    if author_list is not None:
-        for a in author_list.findall('Author'):
-            collective = a.find('CollectiveName')
-            last       = a.find('LastName')
-            fore       = a.find('ForeName')
-            initials   = a.find('Initials')
-            if collective is not None:
-                authors.append(collective.text)
-            elif last is not None:
-                full = last.text
-                if fore is not None:
-                    full += f", {fore.text}"
-                authors.append(full)
-
-    # MeSH controlled vocabulary terms
-    mesh_list = mc.find('MeshHeadingList')
-    mesh_terms = []
-    if mesh_list is not None:
-        mesh_terms = [
-            mh.find('DescriptorName').text
-            for mh in mesh_list.findall('MeshHeading')
-            if mh.find('DescriptorName') is not None
-        ]
-
-    print(f"PMID={pmid} ({year}) {j_title}")
-    print(f"  Title: {title[:70]}")
-    print(f"  Authors: {authors[:3]}")
-    print(f"  DOI: {doi}")
-    print(f"  MeSH: {mesh_terms[:4]}")
-    print(f"  Abstract: {abstract[:120]}")
-# Confirmed output (2026-04-18):
-# PMID=41999029 (2026) Medical science monitor : international medical...
-#   Title: Use of Deep Learning Models in the Diagnosis of Proptosis Thro
-#   Authors: ['Kesimal, Uğur', 'Akkaya, Habip Eser', 'Polat, Önder']
-#   DOI: 10.12659/MSM.951157
-#   MeSH: ['Humans', 'Deep Learning', 'Exophthalmos', 'Magnetic Resonance Imaging']
-#   Abstract: BACKGROUND Proptosis is a common manifestation of orbital disease...
-# PMID=36328784 (...)
-#   Abstract: [OBJECTIVES] Physical inactivity and sedentary behaviour...  ← structured
-```
-
-### Large result sets — usehistory + WebEnv
-
-When `count` exceeds `retmax` (max 10 000), use server-side history to paginate EFetch without re-running ESearch on every page.
-
-```python
-import json, xml.etree.ElementTree as ET
-from helpers import http_get
-
-# Step 1: ESearch with usehistory=y — NCBI holds result set on server
-search = json.loads(http_get(
-    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
-    "?db=pubmed&term=CRISPR+gene+editing&retmax=0&retmode=json&usehistory=y"
-))
-webenv    = search['esearchresult']['webenv']       # server-side session token
-query_key = search['esearchresult']['querykey']     # result set ID within session
-total     = int(search['esearchresult']['count'])
-print(f"Total: {total}, WebEnv: {webenv[:30]}..., query_key: {query_key}")
-# Confirmed output (2026-04-18):
-# Total: 24160, WebEnv: MCID_69e4203757db89391008d6f1..., query_key: 1
-
-# Step 2: EFetch pages using WebEnv (no re-searching)
-batch_size = 200
-for start in range(0, min(total, 1000), batch_size):  # cap at 1000 for demo
-    raw = http_get(
-        f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
-        f"?db=pubmed&query_key={query_key}&WebEnv={webenv}"
-        f"&retstart={start}&retmax={batch_size}&retmode=xml&rettype=abstract"
-    )
-    root = ET.fromstring(raw)
-    articles = root.findall('.//PubmedArticle')
-    print(f"  Fetched {len(articles)} articles (start={start})")
-    # process articles here...
-```
-
-### EInfo — list available NCBI databases
-
-```python
-import json
-from helpers import http_get
-
-data = json.loads(http_get(
-    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?retmode=json"
-))
-dbs = data['einforesult']['dblist']
-print(f"Total databases: {len(dbs)}")   # Confirmed: 39 (2026-04-18)
-print(dbs[:10])
-# ['pubmed', 'protein', 'nuccore', 'ipg', 'nucleotide', 'structure',
-#  'genome', 'annotinfo', 'assembly', 'bioproject']
-```
-
-Get PubMed-specific metadata (field list, link list):
-
-```python
-import json
-from helpers import http_get
-
-data = json.loads(http_get(
-    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed&retmode=json"
-))
-db_info = data['einforesult']['dbinfo'][0]
-print("DB name:", db_info['dbname'])
-print("Record count:", db_info['count'])    # total PubMed records
-link_names = [l['name'] for l in db_info.get('linklist', [])]
-print(f"Link types ({len(link_names)}):", link_names[:5])
-# Confirmed (2026-04-18):
-# DB name: pubmed
-# Record count: 37620453
-# Link types (48): ['pubmed_assembly', 'pubmed_bioproject', ...]
-```
-
-### ELink — cross-database linking
-
-ELink connects a PubMed record to associated data in other NCBI databases. The `pubmed_pubmed` "related articles" linkname relies on a similarity server that is intermittently unavailable (returns `"Couldn't resolve #exLinkSrv2, the address table is empty."`). Use the non-similarity links below instead.
-
-```python
-import json
-from helpers import http_get
-
-# Link a PMID to its free full-text in PMC (if open access)
-# linkname=pubmed_pmc — may also hit the server outage; check error field
-data = json.loads(http_get(
-    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi"
-    "?dbfrom=pubmed&id=38325330&linkname=pubmed_pmc&retmode=json"
-))
-error = data.get('ERROR', '')
-if error:
-    print("ELink error:", error)   # 'Couldn't resolve #exLinkSrv2...' — NCBI server issue
-else:
-    for ls in data.get('linksets', []):
-        for lsdb in ls.get('linksetdbs', []):
-            print(lsdb['linkname'], "→", lsdb['links'][:5])
-```
-
-Available ELink linknames from pubmed (48 total):
-
-| linkname | Target |
-|---|---|
-| `pubmed_pmc` | Free full text in PMC |
-| `pubmed_pubmed_citedin` | Articles citing this paper |
-| `pubmed_pubmed_refs` | References cited by this paper |
-| `pubmed_gene` | Related Gene records |
-| `pubmed_clinvar` | Clinical variants associated with publication |
-| `pubmed_gds` | Related GEO datasets |
-
-**Practical alternative**: If ELink is down, extract DOI from EFetch/ESummary and use `https://doi.org/{doi}` directly for the full-text link.
-
-## URL and parameter reference
-
-### E-utilities base URLs
-
-```
-https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi   # search → PMIDs
-https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi  # PMIDs → JSON summary
-https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi    # PMIDs → full XML
-https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi     # cross-db links
-https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi     # DB metadata
-```
-
-### ESearch parameters
-
-| Parameter | Values | Notes |
-|---|---|---|
-| `db` | `pubmed` | Always `pubmed` for PubMed |
-| `term` | query string | Supports field tags like `[Author]`, `[Title]`, `[MeSH Terms]` |
-| `retmax` | integer, max 10000 | Results returned per call |
-| `retmode` | `json` | JSON output |
-| `sort` | `pub+date`, `Author`, `JournalName` | Default is relevance |
-| `datetype` | `pdat` (pub), `edat` (entrez), `mdat` (modified) | |
-| `mindate`, `maxdate` | `YYYY/MM/DD` or `YYYY` | Requires `datetype` |
-| `usehistory` | `y` | Store results on server; returns `webenv` + `querykey` |
-
-### EFetch parameters
-
-| Parameter | Values | Notes |
-|---|---|---|
-| `db` | `pubmed` | |
-| `id` | `38000000,37999999` | Comma-separated PMIDs; max ~200 per call |
-| `query_key` + `WebEnv` | from ESearch `usehistory=y` | Alternative to `id` for large sets |
-| `retstart` | integer | Offset for pagination with WebEnv |
-| `retmax` | integer, max 10000 | Batch size |
-| `retmode` | `xml` | Use XML for EFetch (JSON not available for full records) |
-| `rettype` | `abstract` | Returns abstract + core metadata |
-
-### PubMed article URL construction
-
-```python
-pmid = "41999029"
-pubmed_url  = f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/"
-doi         = "10.12659/MSM.951157"
-doi_url     = f"https://doi.org/{doi}"        # resolves to publisher page
-pmc_id      = "PMC9876543"                    # from ESummary articleids
-pmc_url     = f"https://www.ncbi.nlm.nih.gov/pmc/articles/{pmc_id}/"
-```
-
-## Gotchas
-
-- **`count` is a string, not int.** `search['esearchresult']['count']` returns `'24160'`, not `24160`. Always cast with `int()` before arithmetic.
-
-- **EFetch retmode must be `xml` for full records.** Unlike ESearch and ESummary, EFetch with `retmode=json` returns flat text (the MEDLINE citation text format), not structured JSON. Parse EFetch responses with `xml.etree.ElementTree`.
-
-- **`ArticleTitle` may contain embedded XML tags.** Titles with italics (`<i>Staphylococcus aureus</i>`) or math (`<sub>2</sub>`) are mixed-content nodes. Always use `''.join(el.itertext())` instead of `el.text`, which silently drops everything after the first child tag.
-
-- **~15% of articles have no abstract.** `article.find('Abstract')` returns `None` for short communications, editorials, letters, and older records. Always guard with `if abstract_el is not None`.
-
-- **Author names vary in structure — always handle `CollectiveName`.** Consortium papers list a group name (`'GeKeR Study Group'`, `'Breast Cancer Association Consortium'`) under `<CollectiveName>` instead of `<LastName>/<ForeName>`. Individual authors have `<LastName>` + optionally `<ForeName>` and `<Initials>`. Check `CollectiveName` first; falling through to `LastName` without the check produces `None` errors.
-  - Confirmed real examples (2026-04-18): PMID 37586835 (`GeKeR Study Group`), PMID 36328784 (`Breast Cancer Association Consortium`)
-
-- **PubDate has three possible structures.** Most articles have `<Year>` + optional `<Month>` + optional `<Day>`. Seasonal journals use `<Season>` (e.g. `Jul-Aug`, `Oct-Dec`) instead of `<Month>`. A minority of older records use `<MedlineDate>` (e.g. `1995 Fall`) with no `<Year>`. Safe extraction pattern:
-  ```python
-  pub_date = journal.find('.//PubDate')
-  year_el    = pub_date.find('Year')    if pub_date is not None else None
-  medline_el = pub_date.find('MedlineDate') if pub_date is not None else None
-  year = (year_el.text if year_el is not None
-          else medline_el.text[:4] if medline_el is not None else '')
-  ```
-
-- **Batch EFetch: keep IDs to ~200 per call.** The API accepts comma-separated IDs in `id=`, but very large batches (500+) occasionally time out or return truncated XML. For >200 articles, iterate in chunks or use `usehistory` + `WebEnv`.
-
-- **ELink `pubmed_pubmed` (related articles) is intermittently broken.** The NCBI similarity server returns `"Couldn't resolve #exLinkSrv2, the address table is empty."` — this is a persistent server-side issue as of 2026-04-18, not a rate-limit error. Other linknames (`pubmed_gene`, `pubmed_pmc`, `pubmed_clinvar`) fail with the same error. Use the DOI as a fallback link to publisher full text.
-
-- **Rate limits: 3 req/s without API key, 10 req/s with free key.** Exceeding 3 req/s returns HTTP 429. Insert `time.sleep(0.34)` between sequential calls without a key. Get a free API key at https://www.ncbi.nlm.nih.gov/account/ and append `&api_key=YOUR_KEY` to all URLs.
-
-- **`retmax` upper bound is 10 000 for ESearch.** To retrieve more than 10 000 PMIDs for a search, use `usehistory=y` and page through EFetch with `retstart` offsets. EFetch itself also accepts `retmax` up to 10 000 per call.
-
-- **`retmax=0` in ESearch returns only the count, not IDs — useful for counting.** Combine with `usehistory=y` to store the result for later paging without fetching IDs upfront:
-  ```python
-  search = json.loads(http_get(
-      "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
-      "?db=pubmed&term=cancer&retmax=0&retmode=json&usehistory=y"
-  ))
-  total  = int(search['esearchresult']['count'])   # e.g. 4800000
-  webenv = search['esearchresult']['webenv']
-  ```
-
-- **ESummary `authors` field uses abbreviated names (`Last I`), not full names.** Use EFetch XML to get `ForeName` (e.g. `'Kesimal, Uğur'` vs ESummary `'Kesimal U'`). For bulk tasks where full names are not needed, ESummary is faster.
-
-- **`querytranslation` shows how NCBI interpreted your term.** The ESearch response includes `esearchresult.querytranslation` — a MeSH-expanded version of your query. Inspect it to verify the search matched what you intended.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/quora/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/quora/scraping.md
deleted file mode 100644
index 24df080e1..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/quora/scraping.md
+++ /dev/null
@@ -1,364 +0,0 @@
-# Quora — Data Extraction
-
-`https://www.quora.com` — Q&A platform. One reliable access path: `http_get` with a Chrome UA against question, answer, topic, and profile pages. Quora SSR-renders all public data into `window.ansFrontendGlobals.data.inlineQueryResults` via `.push()` calls. No browser needed for read-only tasks.
-
-## Do this first: pick your access path
-
-| Goal | Best approach | Latency |
-|------|--------------|---------|
-| Question metadata + first ~3 ranked answers | `http_get` question page + parse push payloads | ~600ms |
-| Single answer (full text + upvotes + views) | `http_get` answer permalink | ~400ms |
-| Answer count for a question | question page, payload with `answerCount` | same request as above |
-| Topic metadata (id, name, follower count) | `http_get` topic page + parse push payloads | ~400ms |
-| User profile (name, follower/following, credential) | `http_get` profile page + parse push payloads | ~500ms |
-| Keyword search results | NOT available via http_get — server returns no result data | N/A |
-
-**Never use a browser for read-only Quora tasks.** All question, answer, topic, and profile data is server-rendered. Browser is only needed for authenticated actions (posting, upvoting, following) or for getting more than the first ~3 answers on a question page (the rest load via XHR pagination).
-
----
-
-## UA requirement: Chrome or Firefox — NOT bare Mozilla/5.0
-
-```
-bare "Mozilla/5.0"  -> HTTP 403
-Googlebot UA        -> HTTP 403
-Chrome UA           -> HTTP 200  (confirmed working)
-Firefox UA          -> HTTP 200  (confirmed working)
-```
-
-Use this header bundle for all requests:
-
-```python
-import urllib.request, gzip, json, re
-
-CHROME_UA = (
-    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
-    "AppleWebKit/537.36 (KHTML, like Gecko) "
-    "Chrome/123.0.0.0 Safari/537.36"
-)
-
-def quora_get(url):
-    """Fetch any public Quora page. Returns HTML string.
-    Requires Chrome/Firefox UA — bare Mozilla/5.0 returns 403.
-    """
-    req = urllib.request.Request(url, headers={
-        "User-Agent": CHROME_UA,
-        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-        "Accept-Encoding": "gzip",
-        "Accept-Language": "en-US,en;q=0.9",
-    })
-    with urllib.request.urlopen(req, timeout=20) as r:
-        data = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            data = gzip.decompress(data)
-        return data.decode()
-```
-
----
-
-## The data format: `ansFrontendGlobals.data.inlineQueryResults`
-
-Quora SSR embeds all page data as a series of `.push("...")` calls inside `<script>` blocks. Each call pushes a JSON-encoded string (with escaped quotes) into `inlineQueryResults`. There are no JSON-LD blocks, no `__NEXT_DATA__`, no React hydration state — only these push calls.
-
-```python
-def extract_quora_payloads(html):
-    """Extract and parse all push() payloads from a Quora page.
-    Returns list of dicts (already decoded from double JSON encoding).
-    """
-    raw_payloads = re.findall(r'\.push\("((?:[^"\\]|\\.)*)"\)', html)
-    results = []
-    for raw in raw_payloads:
-        try:
-            # Two levels of encoding: outer JS string escape, inner JSON
-            inner = json.loads('"' + raw + '"')   # decode JS string escaping
-            results.append(json.loads(inner))      # decode actual JSON
-        except Exception:
-            pass
-    return results
-```
-
-A question page returns **16 payloads**. A profile or topic page returns **3 payloads**. The payloads that matter are identified by their `data` keys, not by position (positions are stable across requests for the same page type, but best to key on content).
-
----
-
-## Path 1: Question page — metadata + first answers (fastest)
-
-```python
-def quora_question(url):
-    """
-    Scrape a Quora question page.
-    Returns:
-      question: {qid, id, title, url, slug, topics}
-      answers:  list of answer dicts (first ~3 ranked answers only)
-      answer_count: total answer count (all answers, not just loaded)
-      related_questions: list of question title strings
-    Only the first ~3 highest-ranked answers are SSR'd.
-    The rest require XHR pagination (browser or session cookies needed).
-    """
-    html = quora_get(url)
-    payloads = extract_quora_payloads(html)
-
-    def spans_to_text(json_str):
-        """Quora stores all text as serialized span objects."""
-        try:
-            doc = json.loads(json_str)
-            parts = []
-            for sec in doc.get('sections', []):
-                for span in sec.get('spans', []):
-                    if span.get('text'):
-                        parts.append(span['text'])
-                parts.append('\n')
-            return ''.join(parts).strip()
-        except Exception:
-            return json_str
-
-    def author_display_name(author_dict):
-        names = author_dict.get('names', [])
-        if names:
-            n = names[0]
-            return f"{n.get('givenName', '')} {n.get('familyName', '')}".strip()
-        return None
-
-    result = {'question': {}, 'answers': [], 'answer_count': None, 'related_questions': []}
-
-    for payload in payloads:
-        data = payload.get('data', payload)
-
-        # Question metadata — keyed by presence of 'qid' inside 'question'
-        if 'question' in data and isinstance(data['question'], dict):
-            q = data['question']
-            if q.get('qid') and not result['question']:
-                result['question'] = {
-                    'qid':   q.get('qid'),
-                    'id':    q.get('id'),
-                    'title': spans_to_text(q.get('title', '')),
-                    'url':   q.get('url'),
-                    'slug':  q.get('slug'),
-                    'topics': [t['name'] for t in q.get('navigationTopics', [])],
-                }
-
-        # Total answer count — keyed by 'answerCount'
-        if 'answerCount' in data:
-            result['answer_count'] = data['answerCount']
-            rq = (data.get('bottomRelatedQuestionsInfo') or {}).get('relatedQuestions', [])
-            result['related_questions'] = [spans_to_text(r['title']) for r in rq]
-
-        # Answer nodes — keyed by node.__typename == 'QuestionAnswerItem2'
-        node = data.get('node', {})
-        if isinstance(node, dict) and node.get('__typename') == 'QuestionAnswerItem2':
-            answer = node.get('answer', {})
-            if answer.get('aid'):
-                a_author = answer.get('author') or {}
-                cred = answer.get('authorCredential') or {}
-                result['answers'].append({
-                    'aid':              answer.get('aid'),
-                    'index':            node.get('index'),
-                    'author_name':      author_display_name(a_author),
-                    'author_profile':   a_author.get('profileUrl'),
-                    'author_uid':       a_author.get('uid'),
-                    'author_credential': cred.get('translatedString'),
-                    'num_upvotes':      answer.get('numUpvotes'),
-                    'num_views':        answer.get('numViews'),
-                    'num_shares':       answer.get('numShares'),
-                    'num_comments':     answer.get('numDisplayComments'),
-                    'creation_time_us': answer.get('creationTime'),  # microseconds since epoch
-                    'viewer_has_access': answer.get('viewerHasAccess'),
-                    'perma_url':        answer.get('permaUrl'),
-                    'text':             spans_to_text(answer.get('content', '{}')),
-                })
-
-    return result
-```
-
-### Example output
-
-```python
-result = quora_question("https://www.quora.com/What-is-the-meaning-of-life")
-
-# result['question']:
-# {
-#   'qid': 2861,
-#   'id': 'UXVlc3Rpb25AMDoyODYx',
-#   'title': 'What is the meaning of life?',
-#   'url': '/What-is-the-meaning-of-life',
-#   'slug': 'What-is-the-meaning-of-life',
-#   'topics': ['Philosophy', 'The Big Unanswered Questions', 'Meaning of Life', ...]
-# }
-
-# result['answer_count']:  413
-
-# result['answers'][0]:
-# {
-#   'aid': 2779675,
-#   'index': 1,
-#   'author_name': 'Shubhankar Srivastava',
-#   'author_profile': '/profile/Shubhankar-Srivastava',
-#   'author_uid': 5381038,
-#   'author_credential': 'works at D. E. Shaw',
-#   'num_upvotes': 589,
-#   'num_views': 24085,
-#   'num_shares': 0,
-#   'num_comments': 8,
-#   'creation_time_us': 1373364681312036,   # divide by 1e6 for seconds
-#   'viewer_has_access': True,
-#   'perma_url': '/What-is-the-meaning-of-life/answer/Shubhankar-Srivastava',
-#   'text': 'Every morning in Africa, a deer wakes up...'
-# }
-```
-
-### Convert creation_time_us to datetime
-
-```python
-from datetime import datetime, timezone
-ts_sec = result['answers'][0]['creation_time_us'] / 1_000_000
-dt = datetime.fromtimestamp(ts_sec, tz=timezone.utc)
-# datetime(2013, 7, 9, 9, 31, 21, tzinfo=timezone.utc)
-```
-
----
-
-## Path 2: Single answer permalink
-
-Fetching `quora.com/{question-slug}/answer/{author-slug}` directly returns only that one answer's full data in 3 payloads instead of 16. Use this when you already know the answer URL.
-
-```python
-def quora_answer(answer_url):
-    """
-    Fetch a single answer by its permalink.
-    URL format: https://www.quora.com/{question-slug}/answer/{author-profile-slug}
-    Returns answer dict with: aid, num_upvotes, num_views, text, author info.
-    """
-    html = quora_get(answer_url)
-    payloads = extract_quora_payloads(html)
-
-    for payload in payloads:
-        data = payload.get('data', {})
-        if 'answer' in data and isinstance(data['answer'], dict):
-            a = data['answer']
-            author = a.get('author') or {}
-            names = author.get('names', [{}])
-            n = names[0] if names else {}
-            return {
-                'aid':          a.get('aid'),
-                'num_upvotes':  a.get('numUpvotes'),
-                'num_views':    a.get('numViews'),
-                'author_name':  f"{n.get('givenName','')} {n.get('familyName','')}".strip(),
-                'author_uid':   author.get('uid'),
-                'text':         _spans_to_text(a.get('content', '{}')),
-            }
-    return {}
-
-# Example:
-# quora_answer("https://www.quora.com/What-is-the-meaning-of-life/answer/Pararth-Shah")
-# -> {'aid': 4734237, 'num_upvotes': 234, 'num_views': 100643, 'author_name': 'Pararth Shah', ...}
-```
-
----
-
-## Path 3: Topic page
-
-```python
-def quora_topic(topic_url):
-    """
-    Fetch topic metadata from a Quora topic page.
-    URL format: https://www.quora.com/topic/{topic-slug}
-    Returns: tid, name, num_followers, url, is_following, has_leaderboard.
-    NOTE: The topic page itself only renders topic metadata, NOT the question feed.
-    Question feed requires browser (XHR-loaded via React).
-    """
-    html = quora_get(topic_url)
-    payloads = extract_quora_payloads(html)
-
-    for payload in payloads:
-        data = payload.get('data', {})
-        if 'topic' in data and isinstance(data['topic'], dict):
-            t = data['topic']
-            return {
-                'tid':           t.get('tid'),
-                'id':            t.get('id'),
-                'name':          t.get('name'),
-                'url':           t.get('url'),
-                'num_followers': t.get('numFollowers'),
-                'is_following':  t.get('isFollowing'),
-                'has_leaderboard': t.get('hasLeaderboard'),
-                'photo_url':     t.get('photoUrl'),
-                'is_locked':     t.get('isLocked'),
-            }
-    return {}
-
-# Example:
-# quora_topic("https://www.quora.com/topic/Python-programming-language")
-# -> {'tid': 13292, 'name': 'Python Programming Language', 'num_followers': 10, ...}
-```
-
----
-
-## Path 4: User profile page
-
-```python
-def quora_profile(profile_url):
-    """
-    Fetch user profile data from https://www.quora.com/profile/{username}
-    Returns: uid, name, credential, follower_count, following_count, profile_image_url.
-    """
-    html = quora_get(profile_url)
-    payloads = extract_quora_payloads(html)
-
-    for payload in payloads:
-        data = payload.get('data', {})
-        if 'user' in data and isinstance(data['user'], dict):
-            u = data['user']
-            names = u.get('names', [{}])
-            n = names[0] if names else {}
-            cred = u.get('profileCredential') or {}
-            return {
-                'uid':             u.get('uid'),
-                'id':              u.get('id'),
-                'name':            f"{n.get('givenName','')} {n.get('familyName','')}".strip(),
-                'profile_url':     u.get('profileUrl'),
-                'follower_count':  u.get('followerCount'),
-                'following_count': u.get('followingCount'),
-                'profile_image':   u.get('profileImageUrl'),
-                'credential':      cred.get('experience'),
-                'is_verified':     u.get('isVerified'),
-                'is_anon':         u.get('isAnon'),
-                'is_ai_account':   u.get('isAiAccount'),
-                'deactivated':     u.get('deactivated'),
-            }
-    return {}
-
-# Example:
-# quora_profile("https://www.quora.com/profile/Pararth-Shah")
-# -> {'uid': 4683832, 'name': 'Pararth Shah', 'follower_count': 5154,
-#     'following_count': 83, 'credential': 'Unfinished symphony.', ...}
-```
-
----
-
-## Gotchas
-
-- **Bare Mozilla/5.0 UA returns HTTP 403** — Always use a full Chrome or Firefox UA string. The default `http_get` helper's `"User-Agent": "Mozilla/5.0"` will be blocked. Do not use `http_get` directly; use the `quora_get` wrapper above.
-
-- **Googlebot UA returns HTTP 403** — Quora blocks crawler UAs. Only real browser UAs work.
-
-- **Double JSON encoding** — Each `.push()` argument is a JavaScript string literal containing JSON. To parse: first `json.loads('"' + raw + '"')` to decode the JS string escaping (converts `\\"` to `"`), then `json.loads(inner)` to parse the actual JSON object. Skipping either step produces parse errors.
-
-- **All text fields are serialized span objects** — `question.title`, `answer.content`, `user.descriptionQtextDocument.legacyJson`, etc. are all JSON strings containing a `{"sections": [{"spans": [...]}]}` document, not plain text. Always parse through `spans_to_text()`.
-
-- **Question page only SSR's the first ~3 answers** — The `answers` list in the result will contain at most 3 entries (the top-ranked answers). The `answer_count` field shows the true total (e.g. 413). To get more answers you need browser-based XHR pagination (Quora sends additional answers via GraphQL calls that require session auth in practice).
-
-- **`viewer_has_access: false` still includes full content** — Even when `viewerHasAccess` is `False` (answers from Quora+ Spaces / tribe-only content), the `content` field is still present in the SSR payload and the full text is readable. The flag only controls client-side gating in the browser.
-
-- **`creation_time_us` is microseconds, not milliseconds** — Divide by `1_000_000` (not `1_000`) to get a Unix timestamp in seconds. Confirmed: `1373364681312036 / 1e6 = 1373364681.3` (July 2013).
-
-- **`numFollowers` on topic pages may be 0 even for major topics** — The field reflects the logged-in user's follow state for some topics and appears to undercount. Treat as approximate.
-
-- **Search pages do not yield result data** — `https://www.quora.com/search?q=...` returns 3 payloads with viewer/network info only — no search results in the SSR payload. Search results are loaded client-side and are not accessible via http_get.
-
-- **Profile pages do not include the user's answer list** — The profile page SSR payload returns user metadata only. The list of a user's answers is loaded via XHR pagination. To get answers for a specific question, use the question URL directly.
-
-- **IDs are base64-encoded Relay global IDs** — `id: "UXVlc3Rpb25AMDoyODYx"` decodes to `"Question@0:2861"`. The numeric `qid`/`uid`/`aid`/`tid` fields are more useful for constructing URLs and deduplication. Use `qid` and `aid` as stable identifiers.
-
-- **`permaUrl` may be an absolute URL for Spaces answers** — Most answers have `permaUrl: "/Question-slug/answer/Author-Name"` (relative). Answers posted in a Quora Space have a full absolute URL like `"https://spacename.quora.com/Question-slug"`. Handle both forms.
-
-- **No public REST or GraphQL API** — Quora's internal `graphql/gql_para_POST` endpoint requires a valid `quora-formkey` header derived from the session, making it inaccessible without a real authenticated session. The SSR push-payload approach is the only reliable unauthenticated path.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/rawg/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/rawg/scraping.md
deleted file mode 100644
index bd306ed6e..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/rawg/scraping.md
+++ /dev/null
@@ -1,352 +0,0 @@
-# RAWG — Scraping & Data Extraction
-
-Field-tested against rawg.io on 2026-04-18.
-`https://rawg.io` — world's largest video game database with 500K+ games.
-
----
-
-## API status — key required, no workaround
-
-`https://api.rawg.io/api/` requires a valid API key on every request.
-Empty key, dummy key, and header spoofing all return **HTTP 401**. Confirmed:
-
-```
-api.rawg.io/api/games?page_size=5          -> 401
-api.rawg.io/api/games?page_size=5&key=     -> 401
-api.rawg.io/api/games?page_size=5&key=DEMO -> 401
-rawg.io/api/games?page_size=5              -> 401
-# Referer/Origin headers make no difference
-```
-
-Free API keys are available at `https://rawg.io/apidocs` after signing up at
-`https://rawg.io/signup` (no credit card, ~1 minute). Free tier: **20,000 requests/month**.
-Set the key in `.env` as `RAWG_API_KEY=<your_key>`.
-
----
-
-## Approach 1 (Fastest, no key): HTML scraping via `window.CLIENT_PARAMS`
-
-The website server-renders all game data into `window.CLIENT_PARAMS` in the page HTML.
-One `http_get` call, pure JSON parse, no browser required.
-Confirmed working on all tested game pages.
-
-### Single game page
-
-```python
-import json
-from helpers import http_get
-
-def extract_game(slug):
-    """
-    Fetch full game data from rawg.io/games/{slug}.
-    Handles canonical-slug redirects (e.g. 'disco-elysium-the-final-cut'
-    transparently becomes 'disco-elysium-final-cut').
-    Returns game dict or None.
-    """
-    resp = http_get(f"https://rawg.io/games/{slug}")
-    idx = resp.find('window.CLIENT_PARAMS = {')
-    if idx < 0:
-        return None
-    chunk = resp[idx + len('window.CLIENT_PARAMS = '):]
-    # Extract JSON by counting braces
-    depth, end = 0, 0
-    for i, c in enumerate(chunk):
-        if c == '{': depth += 1
-        elif c == '}':
-            depth -= 1
-            if depth == 0:
-                end = i + 1
-                break
-    params = json.loads(chunk[:end])
-    initial_state = params['initialState']
-    entities = initial_state['entities']
-    games = entities.get('games', {})
-    # game.slug has 'g-' prefix and reflects the canonical slug after any redirect
-    canonical_key = initial_state.get('game', {}).get('slug', '')
-    game = games.get(canonical_key)
-    if not game:
-        game = games.get(f'g-{slug}')
-    if not game:
-        for g in games.values():
-            if isinstance(g, dict) and g.get('slug') == slug:
-                return g
-    return game
-
-game = extract_game('the-witcher-3-wild-hunt')
-# All fields confirmed present:
-# game['name']             -> 'The Witcher 3: Wild Hunt'
-# game['id']               -> 3328
-# game['slug']             -> 'the-witcher-3-wild-hunt'
-# game['rating']           -> 4.64          (RAWG community score, 0-5)
-# game['rating_top']       -> 5
-# game['ratings_count']    -> 7184
-# game['metacritic']       -> 92            (None if no score)
-# game['released']         -> '2015-05-18'
-# game['updated']          -> '2026-04-17T23:18:04'
-# game['playtime']         -> 43            (average hours)
-# game['website']          -> 'https://thewitcher.com/en/witcher3'
-# game['background_image'] -> 'https://media.rawg.io/media/games/618/618c2031a07bbff6b4f611f10b6bcdbc.jpg'
-# game['added']            -> 22198         (count of users who added to library)
-# game['esrb_rating']      -> {'id': 4, 'name': 'Mature', 'slug': 'mature'}
-# game['genres']           -> [{'id': 4, 'name': 'Action', 'slug': 'action'}, ...]
-# game['platforms']        -> ['playstation5', 'xbox-series-x', 'pc', ...]  (slugs, cross-ref entities)
-# game['parent_platforms'] -> ['pc', 'playstation', 'xbox', 'mac', 'nintendo']
-# game['developers']       -> [{'id': 9023, 'name': 'CD PROJEKT RED', 'slug': '...'}]
-# game['publishers']       -> [{'id': 7411, 'name': 'CD PROJEKT RED', 'slug': '...'}]
-# game['tags']             -> [{'id': 31, 'name': 'Singleplayer', ...}, ...]
-# game['description_raw']  -> plain-text description (detail page only)
-# game['description']      -> HTML description
-# game['ratings']          -> [{'title': 'exceptional', 'percent': 76.53}, ...]
-# game['metacritic_platforms'] -> [{'metascore': 93, 'platform': {...}}, ...]
-```
-
-### Extract specific fields
-
-```python
-def game_summary(slug):
-    g = extract_game(slug)
-    if not g:
-        return None
-    return {
-        'id':           g['id'],
-        'name':         g['name'],
-        'slug':         g['slug'],
-        'rating':       g['rating'],
-        'metacritic':   g['metacritic'],
-        'released':     g['released'],
-        'playtime_hrs': g['playtime'],
-        'website':      g.get('website'),
-        'esrb':         (g.get('esrb_rating') or {}).get('name'),
-        'genres':       [ge['name'] for ge in g.get('genres', []) if isinstance(ge, dict)],
-        'platforms':    g.get('parent_platforms', []),
-        'developers':   [d['name'] for d in g.get('developers', []) if isinstance(d, dict)],
-        'publishers':   [p['name'] for p in g.get('publishers', []) if isinstance(p, dict)],
-        'tags':         [t['name'] for t in g.get('tags', []) if isinstance(t, dict)][:10],
-        'image':        g.get('background_image'),
-    }
-
-# Confirmed results:
-print(game_summary('red-dead-redemption-2'))
-# {'id': 28, 'name': 'Red Dead Redemption 2', 'rating': 4.59, 'metacritic': 96,
-#  'released': '2018-10-26', 'playtime_hrs': 21,
-#  'esrb': 'Mature',
-#  'genres': ['Action'],
-#  'platforms': ['pc', 'playstation', 'xbox'],
-#  'developers': ['Rockstar Games'], 'publishers': ['Rockstar Games'],
-#  'tags': ['Singleplayer', 'Multiplayer', 'Atmospheric', 'Great Soundtrack', 'Co-op', ...]}
-```
-
-### Top 40 games from the listing page
-
-The listing page always returns the same ~40 popular games regardless of URL params
-(ordering/search/genres params are client-side only — the server returns the same SSR payload).
-
-```python
-def top_games():
-    """Returns list of 40 game dicts from rawg.io/games listing page."""
-    resp = http_get("https://rawg.io/games")
-    idx = resp.find('window.CLIENT_PARAMS = {')
-    if idx < 0:
-        return []
-    chunk = resp[idx + len('window.CLIENT_PARAMS = '):]
-    depth, end = 0, 0
-    for i, c in enumerate(chunk):
-        if c == '{': depth += 1
-        elif c == '}':
-            depth -= 1
-            if depth == 0:
-                end = i + 1
-                break
-    params = json.loads(chunk[:end])
-    return list(params['initialState']['entities'].get('games', {}).values())
-
-games = top_games()
-# 40 games, each with: id, slug, name, released, rating, rating_top, ratings_count,
-# metacritic, playtime, added, genres (full objects), parent_platforms (slugs),
-# platforms (slugs), tags (full objects), esrb_rating, background_image, short_screenshots
-# NOTE: listing omits description, website, developers, publishers vs detail pages
-
-for g in games[:5]:
-    print(f"{g['name']} | rating={g['rating']} | metacritic={g['metacritic']}")
-# Grand Theft Auto V | rating=4.47 | metacritic=92
-# The Witcher 3: Wild Hunt | rating=4.64 | metacritic=92
-# Portal 2 | rating=4.58 | metacritic=95
-# Counter-Strike: Global Offensive | rating=3.57 | metacritic=81
-# Tomb Raider (2013) | rating=4.06 | metacritic=86
-```
-
-### Bulk / concurrent fetching
-
-```python
-from concurrent.futures import ThreadPoolExecutor
-
-slugs = ['portal-2', 'dark-souls-iii', 'minecraft', 'hades', 'celeste']
-with ThreadPoolExecutor(max_workers=3) as ex:
-    results = list(ex.map(extract_game, slugs))
-# Tested: 4 games in ~2.8s at max_workers=4
-# Occasional timeout at high concurrency — keep max_workers<=3 to stay reliable
-```
-
----
-
-## Approach 2: REST API (requires free key)
-
-All endpoints live at `https://api.rawg.io/api/`. Append `&key=YOUR_API_KEY` to every request.
-
-### Get a free key
-
-1. Sign up at `https://rawg.io/signup`
-2. Visit `https://rawg.io/apidocs` — click "Get API key"
-3. The key is a 40-char hex string
-4. Store as `RAWG_API_KEY` in `.env`
-
-### Games list / search
-
-```python
-import json, os
-from helpers import http_get
-
-KEY = os.environ['RAWG_API_KEY']
-
-# Search
-results = json.loads(http_get(
-    f"https://api.rawg.io/api/games?search=witcher&page_size=5&key={KEY}"
-))
-# results['count']   -> total matching games
-# results['next']    -> next page URL (pagination)
-# results['results'] -> list of game objects
-
-# Top-rated
-top = json.loads(http_get(
-    f"https://api.rawg.io/api/games?ordering=-metacritic&page_size=10&key={KEY}"
-))
-
-# By date range
-recent = json.loads(http_get(
-    f"https://api.rawg.io/api/games?dates=2024-01-01,2024-12-31&ordering=-added&page_size=20&key={KEY}"
-))
-
-# By platform (PC=4, PS4=18, Xbox One=1, Switch=7)
-pc_games = json.loads(http_get(
-    f"https://api.rawg.io/api/games?platforms=4&ordering=-rating&page_size=10&key={KEY}"
-))
-```
-
-### Game detail
-
-```python
-# By ID (faster if you have it)
-game = json.loads(http_get(f"https://api.rawg.io/api/games/3328?key={KEY}"))
-# game['name'], game['rating'], game['metacritic'], game['description_raw'], ...
-
-# By slug
-game = json.loads(http_get(
-    f"https://api.rawg.io/api/games/the-witcher-3-wild-hunt?key={KEY}"
-))
-```
-
-### API response fields (same as HTML scraping)
-
-```
-id, slug, name, released, tba, background_image,
-rating (0-5 RAWG community), rating_top, ratings, ratings_count,
-metacritic, playtime, added, added_by_status,
-platforms (list of {platform:{id,name,slug}, released_at}),
-parent_platforms (list of {platform:{id,name,slug}}),
-genres (list of {id,name,slug}),
-tags (list of {id,name,slug,language,games_count}),
-developers (list of {id,name,slug}),
-publishers (list of {id,name,slug}),
-stores (list of {id,store:{id,name,slug}}),
-esrb_rating ({id,name,slug}),
-website, description_raw, description, screenshots_count,
-movies_count, creators_count, achievements_count,
-metacritic_url, metacritic_platforms
-```
-
-### Platforms and Genres lists
-
-```python
-# All platforms
-platforms = json.loads(http_get(f"https://api.rawg.io/api/platforms?key={KEY}"))
-# results: [{id, name, slug, games_count, year_start, year_end, ...}]
-
-# Parent platforms only
-parents = json.loads(http_get(f"https://api.rawg.io/api/platforms/lists/parents?key={KEY}"))
-
-# Genres
-genres = json.loads(http_get(f"https://api.rawg.io/api/genres?key={KEY}"))
-# results: [{id, name, slug, games_count, image_background}]
-```
-
-### Pagination
-
-```python
-def get_all_pages(url_template, max_pages=5):
-    """Paginate through API results."""
-    results = []
-    url = url_template + "&page=1"
-    for _ in range(max_pages):
-        data = json.loads(http_get(url))
-        results.extend(data.get('results', []))
-        if not data.get('next'):
-            break
-        url = data['next']
-    return results
-```
-
----
-
-## Gotchas
-
-- **API is fully blocked without a key** — `401` for every endpoint, including empty key and
-  `rawg.io/api/` (non-`api.rawg.io` subdomain). No auth bypass exists.
-
-- **URL params are client-side on listing pages** — `rawg.io/games?ordering=-rating` and
-  `rawg.io/games?search=witcher` return identical 40-game SSR payloads. Params only affect
-  the React client after hydration. Use the API for real filtering, or scrape individual game
-  pages by slug.
-
-- **Slug canonical redirects** — Some slugs redirect internally:
-  `disco-elysium-the-final-cut` → `disco-elysium-final-cut`. The URL you fetch returns HTTP 200
-  but the routing state inside `CLIENT_PARAMS` reflects the canonical path. Always use
-  `initial_state['game']['slug']` as the lookup key (it already has the `g-` prefix),
-  not a constructed `'g-' + url_slug`.
-
-- **`g-` prefix on entity keys** — Game entities in `CLIENT_PARAMS.initialState.entities.games`
-  are keyed as `g-{slug}` (e.g. `g-the-witcher-3-wild-hunt`), not bare slugs.
-  The game state slug field also carries this prefix: `{'slug': 'g-the-witcher-3-wild-hunt'}`.
-
-- **Listing page gives 40 games, detail pages give full fields** — `description`, `website`,
-  `developers`, `publishers` are absent from the listing page payload. Only present on
-  individual game pages.
-
-- **Concurrent requests: keep max_workers ≤ 3** — At `max_workers=5` with 10 requests,
-  some pages timed out (20s default) or returned 502. Sequential or 3-worker parallel is
-  reliable. A brief `time.sleep(0.5)` between sequential requests avoids 502 spikes.
-
-- **`platforms` field in game entities uses slugs, `platform_entities` has full objects** —
-  In the HTML payload, `game['platforms']` is a list of slug strings
-  (`['playstation5', 'pc', ...]`). Full platform details live in
-  `entities.platforms[slug]` as `{'platform': {id, name, slug, ...}, 'released_at': '...'}`.
-  `game['parent_platforms']` is also a list of slug strings (`['pc', 'playstation', ...]`).
-
-- **`metacritic` is `None` for games without a score** — Always check `if game['metacritic']`
-  before using. Many indie/older games have no Metacritic score.
-
-- **`esrb_rating` is `None` for non-US-rated games** — Common for Japanese games and
-  anything outside the ESRB's jurisdiction.
-
-- **`god-of-war` slug resolves to God of War I (PS2, 2005)** — The PS4 2018 title uses
-  `god-of-war-4` or has its own entity key. Always verify the game name in the response.
-
-- **Free API tier: 20,000 requests/month** — Roughly 650/day. Listing endpoint returns
-  20 results per page by default (max `page_size=40`). For bulk data collection, the HTML
-  scraping approach has no documented rate limit but times out under heavy parallel load.
-
-- **`description_raw` vs `description`** — `description` is HTML with escaped unicode
-  (`\u003C` = `<`). `description_raw` is plain text, easier to work with. Both present
-  on detail pages only.
-
-- **`updated` field reflects last RAWG edit, not release date** — Use `released` for
-  the release date. `updated` changes frequently as the community edits entries.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/reddit/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/reddit/scraping.md
deleted file mode 100644
index 82ea0fd04..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/reddit/scraping.md
+++ /dev/null
@@ -1,124 +0,0 @@
-# Reddit — Scraping & Post Extraction
-
-Reddit's "new" web UI (`reddit.com`) is a Lit / web-components SPA built around custom elements (`shreddit-post`, `shreddit-comment`, `faceplate-*`). This makes DOM extraction unusually reliable — the custom element tags are stable and exposed on the element itself (no hashed class names).
-
-Use the browser when you're logged in (private subreddits, NSFW gates, rate-limit avoidance). For fully public content, the JSON API path below is faster.
-
-## URL patterns
-
-- Full post: `https://www.reddit.com/r/<sub>/comments/<id>/<slug>/`
-- Share short-link: `https://www.reddit.com/r/<sub>/s/<hash>` — redirects to the full URL once the page loads. `new_tab(short_url)` + `wait_for_load()` is enough; by the time you read `location.href` it will be the canonical one.
-- Old Reddit: append `/.json` to any post URL for anonymous JSON: `https://www.reddit.com/r/<sub>/comments/<id>/.json`.
-- Old UI (simpler DOM, no web components): `https://old.reddit.com/r/<sub>/comments/<id>/` — useful fallback when `shreddit-*` selectors change.
-
-## Path 1: JSON API (fastest for public posts)
-
-```python
-from helpers import http_get
-import json
-
-url = "https://www.reddit.com/r/cursor/comments/1l0u9y7/claude_code_prompt_to_autogenerate_full_cursor/.json"
-data = json.loads(http_get(url, headers={"User-Agent": "Mozilla/5.0"}))
-post = data[0]["data"]["children"][0]["data"]
-# post fields: title, selftext, author, score, num_comments, created_utc, url, permalink
-comments = data[1]["data"]["children"]  # list of { kind: "t1", data: {...} } or { kind: "more" }
-```
-
-Fails on:
-
-- Private / quarantined subreddits (401)
-- NSFW posts without an authenticated session
-- Anti-scraping 429s under load — back off or switch to the browser path
-
-## Path 2: Browser DOM extraction (logged-in)
-
-Core selector: every post renders inside a single `<shreddit-post>` custom element. Top-level comments are `<shreddit-comment depth="0">`.
-
-```bash
-browser-harness <<'PY'
-new_tab("https://www.reddit.com/r/vibecoding/comments/1kwuqpz/")
-wait_for_load()
-wait(3.0)  # SPA still hydrating after readyState=complete
-
-# Scroll to force comment tree lazy-load (twice, ~2000px each)
-scroll(500, 500, dy=2000); wait(1.0)
-scroll(500, 500, dy=2000); wait(1.0)
-
-data = js(r"""
-(()=>{
-  const postEl = document.querySelector('shreddit-post');
-  if(!postEl) return null;
-  const title = (postEl.querySelector('h1, [slot="title"]')||{}).innerText?.trim() || '';
-  const bodyEl = postEl.querySelector('[slot="text-body"] .md, [slot="text-body"]');
-  const body = bodyEl ? bodyEl.innerText.trim() : '';
-  const author = (postEl.querySelector('[slot="authorName"] a, a[data-testid="post_author_link"]')||{}).innerText?.trim() || '';
-  const subM = location.pathname.match(/^\/r\/([^\/]+)/);
-  const subreddit = subM ? subM[1] : '';
-  const scoreEl = postEl.querySelector('faceplate-number');
-  const score = scoreEl ? scoreEl.getAttribute('number') || scoreEl.innerText : '';
-  const comments = [];
-  for(const c of document.querySelectorAll('shreddit-comment[depth="0"]')){
-    const cBodyEl = c.querySelector('[slot="comment"] .md, [slot="comment"]');
-    const cBody = cBodyEl ? cBodyEl.innerText.trim() : '';
-    if(!cBody) continue;
-    comments.push({
-      author: c.getAttribute('author') || '',
-      score: c.getAttribute('score') || '',
-      body: cBody
-    });
-    if(comments.length >= 10) break;
-  }
-  return {subreddit, title, author, score, body, comments, url: location.href};
-})()
-""")
-print(data["title"], "·", len(data["body"]), "chars ·", len(data["comments"]), "comments")
-PY
-```
-
-### Key selectors
-
-| Target                 | Selector                                                              | Notes                                                                                                   |
-| ---------------------- | --------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
-| Post container         | `shreddit-post`                                                       | One per post page. Attributes include `post-title`, `post-id`, `subreddit-name`, `author`.              |
-| Post title             | `shreddit-post h1` or `[slot="title"]`                                | H1 is also the page title.                                                                              |
-| Post text body         | `shreddit-post [slot="text-body"] .md`                                | `.md` is the rendered markdown container. For link posts this selector returns null (there is no body). |
-| Post author name       | `[slot="authorName"] a`                                               | Plain text.                                                                                             |
-| Vote score             | `shreddit-post faceplate-number`                                      | Read the `number` attribute (digit string) — `innerText` is abbreviated ("1.2k").                       |
-| Top-level comment      | `shreddit-comment[depth="0"]`                                         | Depth is an attribute — `depth="1"` is a reply, etc.                                                    |
-| Comment body           | `shreddit-comment [slot="comment"] .md`                               | Same pattern as post body.                                                                              |
-| Comment author / score | `shreddit-comment` attributes: `author`, `score`, `created-timestamp` | Use `getAttribute`, not DOM descendants.                                                                |
-
-### Share links
-
-`/s/<hash>` URLs redirect before the SPA mounts. You don't need to resolve them manually — just `new_tab(url)` + `wait_for_load()` + `wait(2)`, then read `location.href` for the canonical path.
-
-### Comment tree lazy-loading
-
-New Reddit renders only the initial visible comments. To get more, **scroll twice**. `ensureReplies` / `more` placeholders exist but clicking them is brittle; scroll is the most reliable trigger. For a deep thread, loop `scroll + wait` until `shreddit-comment` count stabilizes between passes.
-
-### Login / gate detection
-
-```python
-state = js("""
-(()=>{
-  const loginWall = !!document.querySelector('a[href*="/login"], [data-testid="login-button"]');
-  const ageGate = !!document.querySelector('[data-testid="nsfw-gate"], shreddit-interstitial');
-  return {loginWall, ageGate};
-})()
-""")
-```
-
-If `ageGate` is true and you are logged in but haven't opted into NSFW content, the gate blocks extraction — toggle NSFW in account settings, not programmatically.
-
-## Gotchas
-
-- **`faceplate-number.innerText` is abbreviated** ("1.2k", "16.6k"). Always prefer `getAttribute('number')` for the exact digit count.
-- **`shreddit-comment` is a custom element, not a `<div>`.** CSS descendant selectors still work, but older jQuery-style parent traversals may not — stick to standard DOM.
-- **`depth="0"` is a string attribute.** `[depth="0"]` in a CSS selector works; `depth=0` (no quotes) also works in the newer parser, but the quoted form is safest.
-- **Collapsed comments render with body still in the DOM, but behind `expando-button`.** The `.md` selector still grabs the text — you don't need to expand.
-- **Post body can be empty.** For link posts or image posts, `[slot="text-body"]` doesn't exist; null-check before reading `.innerText`.
-- **`wait_for_load()` is not enough.** Reddit sometimes paints the post skeleton before the content hydrates. Add `wait(2.0)`–`wait(3.0)` after `wait_for_load()` or retry reads on null `shreddit-post`.
-- **Share URLs (`/s/<hash>`) can't be deep-linked into a comment.** They always land at the post top. If the original raindrop captured `/s/...`, the in-DOM permalink (read from `location.href` after load) is the canonical URL worth storing.
-- **Old Reddit (`old.reddit.com`) is a separate DOM** — no `shreddit-*` elements, no `faceplate-*`. If your login session was established on new Reddit, `old.reddit.com` will still honor the cookie.
-- **For NSFW or quarantined subs**, the browser path requires your account to have opted in. The JSON API requires OAuth with appropriate scope.
-- **`[slot="text-body"] .md .md`** — Reddit occasionally double-wraps; the selector `[slot="text-body"] .md` is the outer one and is what you want. Using `[slot="text-body"]` alone works too, but may include meta text.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/rest-countries/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/rest-countries/scraping.md
deleted file mode 100644
index 159ccdd1a..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/rest-countries/scraping.md
+++ /dev/null
@@ -1,233 +0,0 @@
-# REST Countries — Scraping & Data Extraction
-
-`https://restcountries.com` — open JSON API for country data. **Never use the browser.** All data is reachable via `http_get`. No auth required, no API key.
-
-## Do this first
-
-**Fetch all 250 countries in one call with a field filter — almost always the fastest approach.**
-
-```python
-import json
-from helpers import http_get
-
-data = http_get("https://restcountries.com/v3.1/all?fields=name,cca2,capital,population,area,region")
-countries = json.loads(data)
-# countries is a list of 250 dicts — confirmed 2026-04-18
-
-for c in countries:
-    name       = c["name"]["common"]          # "Germany"
-    official   = c["name"]["official"]        # "Federal Republic of Germany"
-    code       = c["cca2"]                    # "DE"
-    capital    = c["capital"][0] if c.get("capital") else None   # list — may be empty
-    population = c["population"]              # 83491249
-    area       = c["area"]                    # 357114.0 (km²)
-    region     = c["region"]                  # "Europe"
-    print(code, name, population)
-# Confirmed output (first result varies — API returns unsorted):
-# CI Ivory Coast 31719275
-```
-
-Use the `?fields=` query param to limit response size — essential when fetching all 250.
-
-## Common workflows
-
-### Lookup a single country by code (ISO 3166-1 alpha-2 or alpha-3)
-
-```python
-import json
-from helpers import http_get
-
-# Single code — returns a list (one element)
-data = http_get("https://restcountries.com/v3.1/alpha/DE")
-country = json.loads(data)[0]
-
-# But: /alpha/CODE?fields=... returns a plain dict, not a list — watch for this
-data2 = http_get("https://restcountries.com/v3.1/alpha/DE?fields=name,cca2,currencies,languages,flags")
-country2 = json.loads(data2)          # dict, NOT list
-
-name       = country2["name"]["common"]                             # "Germany"
-currencies = country2["currencies"]                                 # {"EUR": {"name": "euro", "symbol": "€"}}
-currency_codes = list(currencies.keys())                            # ["EUR"]
-currency_name  = currencies["EUR"]["name"]                          # "euro"
-languages  = country2["languages"]                                  # {"deu": "German"}
-lang_names = list(languages.values())                               # ["German"]
-flag_png   = country2["flags"]["png"]                               # "https://flagcdn.com/w320/de.png"
-flag_svg   = country2["flags"]["svg"]                               # "https://flagcdn.com/de.svg"
-flag_alt   = country2["flags"]["alt"]                               # description text
-
-print(name, currency_codes, lang_names)
-# Confirmed: Germany ['EUR'] ['German']
-```
-
-### Batch lookup — multiple codes in one call
-
-Use `/alpha?codes=` for fetching a known list of countries — always returns a list.
-
-```python
-import json
-from helpers import http_get
-
-codes = ["US", "GB", "FR", "DE", "JP", "CN", "IN", "BR", "AU", "CA"]
-data = http_get(f"https://restcountries.com/v3.1/alpha?codes={','.join(codes)}&fields=name,cca2,population")
-countries = json.loads(data)
-# Returns list, order NOT guaranteed to match input order
-for c in countries:
-    print(c["cca2"], c["name"]["common"], c["population"])
-# Confirmed: 10 results, returned in arbitrary order
-```
-
-### Search by name
-
-```python
-import json
-from helpers import http_get
-
-# Partial match (default) — may return multiple results
-data = http_get("https://restcountries.com/v3.1/name/united")
-results = json.loads(data)
-# Returns 7 countries: United States, UK, UAE, Tanzania, Mexico, ...
-
-# Exact match — use fullText=true with the full common or official name
-data2 = http_get("https://restcountries.com/v3.1/name/united%20kingdom?fullText=true")
-results2 = json.loads(data2)
-# Returns exactly 1 result: United Kingdom
-print(results2[0]["name"]["common"])  # United Kingdom
-```
-
-### Filter by region
-
-```python
-import json
-from helpers import http_get
-
-data = http_get("https://restcountries.com/v3.1/region/europe?fields=name,cca2,population")
-countries = json.loads(data)
-# 53 European countries — confirmed
-
-# Sort by population
-ranked = sorted(countries, key=lambda x: x["population"], reverse=True)
-for c in ranked[:5]:
-    print(c["cca2"], c["name"]["common"], f"{c['population']:,}")
-# Confirmed top 5: RU Russia, DE Germany, FR France, GB United Kingdom, IT Italy
-```
-
-Valid region values: `africa`, `americas`, `asia`, `europe`, `oceania`, `antarctic`
-
-### Filter by subregion
-
-```python
-import json
-from helpers import http_get
-
-data = http_get("https://restcountries.com/v3.1/subregion/Western%20Europe?fields=name,cca2")
-countries = json.loads(data)
-print([c["cca2"] for c in countries])
-# Confirmed: ['FR', 'NL', 'MC', 'DE', 'BE', 'LI', 'CH', 'LU']
-```
-
-### Filter by language
-
-```python
-import json
-from helpers import http_get
-
-data = http_get("https://restcountries.com/v3.1/lang/arabic")
-countries = json.loads(data)
-print(f"Arabic-speaking countries: {len(countries)}")
-# Confirmed: 25 countries
-
-# Language param is the language name (English), not the ISO 639-3 code
-# Works: arabic, french, spanish, english, portuguese, german, russian, chinese
-```
-
-### Filter by currency
-
-```python
-import json
-from helpers import http_get
-
-data = http_get("https://restcountries.com/v3.1/currency/EUR")
-countries = json.loads(data)
-print(f"EUR countries: {len(countries)}")  # Confirmed: 36
-names = [c["name"]["common"] for c in countries]
-print(names[:5])
-
-# Use ISO 4217 currency code (uppercase)
-```
-
-### Filter by capital city
-
-```python
-import json
-from helpers import http_get
-
-data = http_get("https://restcountries.com/v3.1/capital/berlin?fields=name,cca2,capital")
-result = json.loads(data)
-print(result[0]["name"]["common"], result[0]["capital"])
-# Confirmed: Germany ['Berlin']
-# Capital param is case-insensitive
-```
-
-### Full country detail — all fields
-
-```python
-import json
-from helpers import http_get
-
-data = http_get("https://restcountries.com/v3.1/alpha/US")
-c = json.loads(data)[0]
-
-# Available top-level keys (confirmed for US/DE):
-# name, tld, cca2, ccn3, cca3, cioc, independent, status, unMember,
-# currencies, idd, capital, altSpellings, region, subregion, languages,
-# translations, latlng, landlocked, borders, area, demonyms, flag (emoji),
-# maps, population, gini, fifa, car, timezones, continents, flags,
-# coatOfArms, startOfWeek, capitalInfo, postalCode
-
-print(c["idd"])          # {"root": "+1", "suffixes": ["201", "202", ...]}
-print(c["car"]["side"])  # "right" or "left"
-print(c["gini"])         # {"2018": 41.4}  — year keyed, may be absent
-print(c["timezones"])    # list of UTC offset strings
-print(c["borders"])      # list of cca3 codes for bordering countries
-print(c["latlng"])       # [lat, lng] of geographic center
-```
-
-## URL reference
-
-| Endpoint | Pattern | Notes |
-|---|---|---|
-| All countries | `/v3.1/all` | Always add `?fields=` |
-| By code | `/v3.1/alpha/{code}` | cca2 or cca3; single code → list (no fields) or dict (with fields) |
-| By codes | `/v3.1/alpha?codes=DE,FR,JP` | Always returns list |
-| By name | `/v3.1/name/{name}` | Partial; add `?fullText=true` for exact match |
-| By region | `/v3.1/region/{region}` | africa, americas, asia, europe, oceania, antarctic |
-| By subregion | `/v3.1/subregion/{subregion}` | URL-encode spaces as `%20` |
-| By language | `/v3.1/lang/{language}` | English language name |
-| By currency | `/v3.1/currency/{code}` | ISO 4217 (EUR, USD, GBP) |
-| By capital | `/v3.1/capital/{city}` | Case-insensitive |
-
-All endpoints accept `?fields=field1,field2,...` to limit response payload.
-
-## Gotchas
-
-- **`name` is a nested object, not a string.** Use `c["name"]["common"]` for the familiar English name, `c["name"]["official"]` for the full official name. `nativeName` is a dict keyed by ISO 639-3 language code.
-
-- **`/alpha/CODE` return type depends on whether `?fields=` is present.** Without `?fields=`, returns a list (one element). With `?fields=...`, returns a plain dict. Use `json.loads(data)[0]` for the no-fields case, `json.loads(data)` for the fields case. Using `/alpha?codes=CODE` always returns a list regardless.
-
-- **`currencies` is a dict keyed by ISO 4217 code, not a list.** `c["currencies"]["EUR"]["name"]` → `"euro"`, `c["currencies"]["EUR"]["symbol"]` → `"€"`. A country can have multiple currencies — iterate `currencies.items()`.
-
-- **`languages` is a dict keyed by ISO 639-3 code.** `c["languages"]["deu"]` → `"German"`. Use `list(c["languages"].values())` for a simple list of language names.
-
-- **`capital` is a list and may be empty.** Some territories (Antarctica, Bouvet Island, Macau, Heard Island) have no capital — `c.get("capital")` returns `[]`, not `None`. Guard with `c["capital"][0] if c.get("capital") else None`. South Africa has 3 capitals.
-
-- **`gini` is a dict keyed by year string, may be absent entirely.** `c["gini"]` → `{"2016": 31.9}`. Many small countries or territories have no gini data — always check `c.get("gini")`.
-
-- **`borders` uses cca3 codes, not cca2.** `c["borders"]` → `["AUT", "BEL", ...]`. Cross-reference with another `/alpha?codes=` call to resolve to names.
-
-- **`translations` covers ~45 languages.** Each entry: `c["translations"]["deu"]` → `{"official": "Bundesrepublik Deutschland", "common": "Deutschland"}`. Useful for multilingual apps.
-
-- **No rate limit headers, no documented rate limit.** In practice the API handles rapid sequential calls fine. For bulk crawling hundreds of per-country requests, add a short sleep (`time.sleep(0.5)`) between calls to be polite.
-
-- **404 returns JSON, not HTML.** `{"message": "Not Found", "status": 404}`. Wrap calls in try/except and check for this pattern when handling user-supplied country names or codes.
-
-- **`?fields=` is the key performance lever.** The full all-countries payload without field filtering is ~1.5 MB. With `?fields=name,cca2,population` it drops to ~50 KB. Always filter when you don't need all fields.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/salesforce/.gitkeep b/packages/bcode-browser/harness/agent-workspace/domain-skills/salesforce/.gitkeep
deleted file mode 100644
index e69de29bb..000000000
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/sec-edgar/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/sec-edgar/scraping.md
deleted file mode 100644
index fcb2262b9..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/sec-edgar/scraping.md
+++ /dev/null
@@ -1,361 +0,0 @@
-# SEC EDGAR — Scraping & Data Extraction
-
-`https://www.sec.gov` / `https://data.sec.gov` / `https://efts.sec.gov` — all public data, no auth required. Every workflow here is pure `http_get` — no browser needed.
-
-## Do this first
-
-**SEC.gov requires a custom User-Agent on `www.sec.gov` and `data.sec.gov`. Always pass `headers=UA` or you get 403.**
-
-```python
-import json
-UA = {"User-Agent": "browser-harness research@example.com"}
-# Format required: "CompanyName contact@email.com"
-# "Mozilla/5.0" (http_get default) works on efts.sec.gov and data.sec.gov
-# but FAILS on www.sec.gov (company_tickers.json, Archives/, etc.)
-```
-
-Start with `company_tickers.json` to resolve any ticker → CIK in one call, then branch to whichever endpoint you need.
-
-```python
-import json
-UA = {"User-Agent": "browser-harness research@example.com"}
-tickers = json.loads(http_get("https://www.sec.gov/files/company_tickers.json", headers=UA))
-# 10,391 public companies, ~50KB, always fresh
-# Entry format: {"cik_str": 320193, "ticker": "AAPL", "title": "Apple Inc."}
-
-# Look up by ticker (exact, case-sensitive in the data)
-aapl = next(v for v in tickers.values() if v['ticker'] == 'AAPL')
-# {'cik_str': 320193, 'ticker': 'AAPL', 'title': 'Apple Inc.'}
-
-# CIK is an int here; pad to 10 digits for API URLs
-cik = str(aapl['cik_str']).zfill(10)  # "0000320193"
-```
-
-## Common workflows
-
-### Ticker / name → CIK lookup
-
-```python
-import json
-UA = {"User-Agent": "browser-harness research@example.com"}
-tickers = json.loads(http_get("https://www.sec.gov/files/company_tickers.json", headers=UA))
-
-# By ticker
-tsla = next((v for v in tickers.values() if v['ticker'] == 'TSLA'), None)
-# {'cik_str': 1318605, 'ticker': 'TSLA', 'title': 'Tesla, Inc.'}
-
-# By partial name match
-apples = [v for v in tickers.values() if 'APPLE' in v['title'].upper()]
-# [{'cik_str': 320193, 'ticker': 'AAPL', 'title': 'Apple Inc.'}, ...]
-```
-
-### Company submissions (metadata + recent filings list)
-
-```python
-import json
-UA = {"User-Agent": "browser-harness research@example.com"}
-cik = "0000320193"  # Apple - always zero-pad to 10 digits
-data = json.loads(http_get(f"https://data.sec.gov/submissions/CIK{cik}.json", headers=UA))
-
-print(data['name'])             # "Apple Inc."
-print(data['cik'])              # "0000320193"
-print(data['sic'])              # "3571"
-print(data['sicDescription'])   # "Electronic Computers"
-print(data['tickers'])          # ["AAPL"]
-print(data['exchanges'])        # ["Nasdaq"]
-
-# Most recent ~1,000 filings are in data['filings']['recent']
-recent = data['filings']['recent']
-# Fields per filing (parallel arrays, same index):
-# accessionNumber, filingDate, reportDate, form, primaryDocument,
-# primaryDocDescription, size, isXBRL, items, fileNumber
-
-# Filter for 10-K and 10-Q only
-filings_10k = [
-    (f, d, a, doc)
-    for f, d, a, doc in zip(
-        recent['form'], recent['filingDate'],
-        recent['accessionNumber'], recent['primaryDocument']
-    )
-    if f in ('10-K', '10-Q')
-]
-# Result: [('10-Q', '2026-01-30', '0000320193-26-000006', 'aapl-20251227.htm'), ...]
-```
-
-### Build direct filing document URL
-
-```python
-# Given accessionNumber and primaryDocument from submissions JSON:
-accn = "0000320193-25-000079"
-doc  = "aapl-20250927.htm"
-cik  = "320193"  # int part only (no leading zeros) for Archives path
-
-accn_nodash = accn.replace("-", "")
-url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{accn_nodash}/{doc}"
-# https://www.sec.gov/Archives/edgar/data/320193/000032019325000079/aapl-20250927.htm
-
-# Full 10-K is 1.5MB of XBRL-tagged HTML — use http_get for text extraction
-content = http_get(url, headers=UA)  # UA required on www.sec.gov
-```
-
-### XBRL financial data — single company, one concept over time
-
-```python
-import json
-UA = {"User-Agent": "browser-harness research@example.com"}
-cik_padded = "0000320193"
-
-# companyconcept: one metric, all reported values (quarterly + annual)
-data = json.loads(http_get(
-    f"https://data.sec.gov/api/xbrl/companyconcept/CIK{cik_padded}/us-gaap/Assets.json",
-    headers=UA
-))
-# data keys: cik, taxonomy, tag, label, description, entityName, units
-# data['units']['USD'] -> list of {end, val, accn, fy, fp, form, filed}
-
-entries = data['units']['USD']
-
-# Deduplicate: same period re-reported across multiple filings — keep latest
-def annual_series(entries):
-    seen = {}
-    for e in entries:
-        if e.get('form') == '10-K' and e.get('fp') == 'FY':
-            end = e['end']
-            if end not in seen or e['filed'] > seen[end]['filed']:
-                seen[end] = e
-    return [seen[k] for k in sorted(seen)]
-
-assets = annual_series(entries)
-for e in assets[-5:]:
-    print(f"{e['end']}  ${e['val']/1e9:.1f}B")
-# 2021-09-25  $351.0B
-# 2022-09-24  $352.8B
-# 2023-09-30  $352.6B
-# 2024-09-28  $365.0B
-# 2025-09-27  $359.2B
-```
-
-### XBRL financial data — all US-GAAP metrics for a company
-
-```python
-import json
-UA = {"User-Agent": "browser-harness research@example.com"}
-
-# companyfacts: all reported XBRL concepts in one ~5MB call
-data = json.loads(http_get(
-    "https://data.sec.gov/api/xbrl/companyfacts/CIK0000320193.json",
-    headers=UA
-))
-# data['entityName'] = "Apple Inc."
-# data['facts'] = {'us-gaap': {...503 concepts...}, 'dei': {...}}
-
-usgaap = data['facts']['us-gaap']
-print(len(usgaap))   # 503 concepts for Apple
-
-# Common concept names (companies vary — check what's available):
-# Revenue:     RevenueFromContractWithCustomerExcludingAssessedTax  (post-2018 standard)
-#              SalesRevenueNet                                       (older filings)
-#              Revenues                                              (some companies still use)
-# Net income:  NetIncomeLoss
-# Assets:      Assets
-# Cash:        CashAndCashEquivalentsAtCarryingValue
-# EPS:         EarningsPerShareBasic, EarningsPerShareDiluted
-
-# Find all revenue-related concepts this company reported:
-revenue_keys = [k for k in usgaap if 'Revenue' in k]
-
-# Extract annual revenue — handle company-specific concept name
-for concept in ['RevenueFromContractWithCustomerExcludingAssessedTax', 'SalesRevenueNet', 'Revenues']:
-    if concept in usgaap:
-        entries = usgaap[concept]['units'].get('USD', [])
-        annual = {}
-        for e in entries:
-            if e.get('form') == '10-K' and e.get('fp') == 'FY':
-                end = e['end']
-                if end not in annual or e['filed'] > annual[end]['filed']:
-                    annual[end] = e
-        if annual:
-            print(f"Using: {concept}")
-            for end in sorted(annual)[-3:]:
-                print(f"  {end}  ${annual[end]['val']/1e9:.1f}B")
-            break
-# Apple output:
-# Using: RevenueFromContractWithCustomerExcludingAssessedTax
-#   2023-09-30  $383.3B
-#   2024-09-28  $391.0B
-#   2025-09-27  $416.2B
-```
-
-### Cross-company financial comparison (XBRL frames)
-
-```python
-import json
-UA = {"User-Agent": "browser-harness research@example.com"}
-
-# frames: one concept, one period, all companies that reported it
-# Period formats:
-#   CY2024           = calendar year 2024 (annual)
-#   CY2024Q4I        = Q4 2024 instantaneous (balance sheet items)
-#   CY2024Q4         = Q4 2024 duration (income statement items)
-
-# Top companies by annual revenue (2024)
-data = json.loads(http_get(
-    "https://data.sec.gov/api/xbrl/frames/us-gaap/RevenueFromContractWithCustomerExcludingAssessedTax/USD/CY2024.json",
-    headers=UA
-))
-companies = sorted(data['data'], key=lambda x: x['val'], reverse=True)
-# data['data'] entries: {accn, cik, entityName, loc, start, end, val}
-for c in companies[:5]:
-    print(f"{c['entityName']:<40}  ${c['val']/1e9:.0f}B")
-# Walmart Inc.                              $675B
-# AMAZON.COM, INC.                          $638B
-# Apple Inc.                                $391B
-# McKESSON CORPORATION                      $359B
-# Alphabet Inc.                             $350B
-
-# Total assets snapshot end of 2024 (balance sheet = instantaneous)
-data2 = json.loads(http_get(
-    "https://data.sec.gov/api/xbrl/frames/us-gaap/Assets/USD/CY2024Q4I.json",
-    headers=UA
-))
-# 6,229 companies for this frame
-```
-
-### Full-text search across all filings
-
-```python
-import json
-UA = {"User-Agent": "browser-harness research@example.com"}
-
-# Search for any phrase across filing documents
-# Params: q (quoted phrase), forms (comma-separated), dateRange=custom,
-#         startdt, enddt, size (max 100), from (offset for pagination)
-url = (
-    "https://efts.sec.gov/LATEST/search-index"
-    "?q=%22climate+risk%22"
-    "&forms=10-K"
-    "&dateRange=custom&startdt=2024-01-01"
-    "&size=10&from=0"
-)
-data = json.loads(http_get(url, headers=UA))
-# Note: default http_get UA (Mozilla/5.0) works fine on efts.sec.gov
-
-print(data['hits']['total']['value'])   # e.g. 1438 matching documents
-hits = data['hits']['hits']             # up to 100 per call
-
-for h in hits:
-    src = h['_source']
-    # Key fields: display_names, ciks, form, file_date, adsh (accession), period_ending
-    name = src['display_names'][0] if src.get('display_names') else '?'
-    cik  = src['ciks'][0] if src.get('ciks') else '?'
-    print(f"{name}  form={src['form']}  filed={src['file_date']}  accn={src['adsh']}")
-
-# Pagination: max 100 per page, use from= to walk through results
-# Page 2: from=100, Page 3: from=200, etc.
-for page in range(0, 300, 100):
-    page_url = url + f"&from={page}"
-    page_data = json.loads(http_get(page_url, headers=UA))
-    if not page_data['hits']['hits']:
-        break
-    # process...
-
-# Aggregations — group hits by entity, SIC, state
-aggs = data['aggregations']
-top_entities = aggs['entity_filter']['buckets']   # [{key: "Name (CIK...)", doc_count: N}, ...]
-top_sics     = aggs['sic_filter']['buckets']
-top_states   = aggs['biz_states_filter']['buckets']
-```
-
-### Find a company's CIK by name search (via search aggregations)
-
-```python
-import json, re
-UA = {"User-Agent": "browser-harness research@example.com"}
-
-# Best method: company_tickers.json (fastest, all tickers)
-tickers = json.loads(http_get("https://www.sec.gov/files/company_tickers.json", headers=UA))
-msft = next(v for v in tickers.values() if v['ticker'] == 'MSFT')
-# CIK = msft['cik_str']  → 789019
-
-# Alternative: full-text search aggregations (finds CIK from company name)
-data = json.loads(http_get(
-    "https://efts.sec.gov/LATEST/search-index?q=%22microsoft+corporation%22&forms=10-K",
-    headers=UA
-))
-buckets = data['aggregations']['entity_filter']['buckets']
-# [{'key': 'MICROSOFT CORP  (MSFT)  (CIK 0000789019)', 'doc_count': 11}, ...]
-for b in buckets[:3]:
-    m = re.search(r'\(CIK (\d+)\)', b['key'])
-    if m:
-        print(f"{b['key'][:50]}  →  CIK {m.group(1)}")
-```
-
-### Parallel fetching (multiple companies)
-
-```python
-import json
-from concurrent.futures import ThreadPoolExecutor
-
-UA = {"User-Agent": "browser-harness research@example.com"}
-
-def get_company_meta(ticker_cik):
-    ticker, cik = ticker_cik
-    subs = json.loads(http_get(f"https://data.sec.gov/submissions/CIK{cik}.json", headers=UA))
-    return {"ticker": ticker, "name": subs['name'], "sic": subs['sic']}
-
-companies = [("AAPL", "0000320193"), ("TSLA", "0001318605"), ("MSFT", "0000789019")]
-with ThreadPoolExecutor(max_workers=3) as ex:
-    results = list(ex.map(get_company_meta, companies))
-# Confirmed: 3 requests complete in ~0.28s
-# SEC rate limit: 10 req/sec — stay at max_workers ≤ 8 to be safe
-```
-
-## API reference
-
-| Endpoint | What it returns | UA required |
-|---|---|---|
-| `www.sec.gov/files/company_tickers.json` | All 10,391 tickers → CIK mapping | YES |
-| `data.sec.gov/submissions/CIK{10-digit}.json` | Company meta + ~1000 recent filings | YES |
-| `data.sec.gov/api/xbrl/companyfacts/CIK{10-digit}.json` | All XBRL facts (~5MB) | YES |
-| `data.sec.gov/api/xbrl/companyconcept/CIK{10-digit}/{taxonomy}/{concept}.json` | One concept, all values | YES |
-| `data.sec.gov/api/xbrl/frames/{taxonomy}/{concept}/{unit}/{period}.json` | All companies for one period | YES |
-| `efts.sec.gov/LATEST/search-index?q=...` | Full-text search across filings | NO (Mozilla/5.0 works) |
-| `www.sec.gov/Archives/edgar/data/{cik}/{accn-nodash}/{doc}` | Actual filing document | YES |
-
-`data.sec.gov` accepts `Mozilla/5.0` (the http_get default). `www.sec.gov` (Archives, company_tickers) requires the `"CompanyName email@example.com"` format.
-
-## Rate limits
-
-SEC documents a **10 requests/second** limit. In practice:
-- 12 rapid sequential calls to `data.sec.gov` completed in 2.4s (5 req/s) with no throttling.
-- 3 parallel calls completed in 0.28s without issue.
-- Stay at `max_workers ≤ 8` for ThreadPoolExecutor to respect the 10 req/s ceiling.
-- No per-day or per-hour cap documented; the 10/s limit is the only stated constraint.
-
-## Gotchas
-
-- **`www.sec.gov` returns 403 with `Mozilla/5.0` UA** — The http_get default (`"Mozilla/5.0"`) works on `data.sec.gov` and `efts.sec.gov` but is blocked on `www.sec.gov`. Always pass `headers=UA` where UA includes your company name and email. Confirmed: `"python-requests/2.28"` → 403.
-
-- **`data.sec.gov` is more permissive** — `Mozilla/5.0` works on `data.sec.gov` (submissions, xbrl). But always use the proper UA anyway — SEC's policy page explicitly requires it and they can add stricter checks at any time.
-
-- **XBRL contains duplicate entries per period** — The same fiscal year end date appears multiple times when a company restates or re-files. Each entry has a `filed` date and `accn` (accession). To get the canonical value, deduplicate by `end` keeping the entry with the latest `filed` date.
-
-- **Revenue concept name varies by company and era** — There is no single canonical "revenue" concept. Apple uses `RevenueFromContractWithCustomerExcludingAssessedTax`. Microsoft uses the same for recent years, but older filings used `SalesRevenueNet`. Always check which concepts are actually present: `[k for k in usgaap if 'Revenue' in k]`.
-
-- **`fp` field for annual filings is `'FY'`, but quarterly values also appear in 10-K** — A 10-K re-reports each quarter (fp=Q1, Q2, Q3) plus the full year (fp=FY). Filter on both `form == '10-K'` AND `fp == 'FY'` to get only annual totals.
-
-- **`companyfacts` is ~5MB per company** — For a single metric, use `companyconcept` instead (much smaller). Only use `companyfacts` when you need multiple concepts from the same company.
-
-- **`submissions` recent filings cap at ~1,000** — Very old filings don't appear. If you need historical data before that window, use the `filings.files` array in submissions JSON to find older filing JSON pages (`data.sec.gov/submissions/CIK{cik}-submissions-001.json`, etc.).
-
-- **`adsh` in search results is the accession number** — The search index returns `adsh` (no dashes). To build the document URL, insert dashes: `adsh[:10] + '-' + adsh[10:12] + '-' + adsh[12:]`, or use the `accessionNumber` field from submissions JSON (which already has dashes).
-
-- **`size` param is capped at 100** — Requesting `size=200` silently returns 100 hits. Walk results with `from=0`, `from=100`, etc. Maximum reachable index is 10,000 (Elasticsearch default).
-
-- **Search total `'gte'` relation means >10,000 hits** — When `total['relation'] == 'gte'`, there are more than 10,000 results (only first 10,000 accessible). Narrow with `dateRange` or `forms` filters.
-
-- **`company_tickers.json` covers only exchange-listed companies** — ~10,391 entries. Many SEC filers (private companies, bond issuers, FHLBs) have CIKs but no ticker. Find them via the full-text search aggregations or `submissions` lookup if you have the CIK.
-
-- **Filing document is XBRL-tagged HTML, 1–2MB** — Retrieving the actual 10-K HTML works but is large. For financial data extraction, always prefer the XBRL API endpoints over parsing the document.
-
-- **CIK format gotcha** — `company_tickers.json` returns `cik_str` as an int (`320193`). The submissions and xbrl APIs require a 10-digit zero-padded string in the filename (`CIK0000320193`). Always use `str(cik).zfill(10)` when building URLs.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/soundcloud/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/soundcloud/scraping.md
deleted file mode 100644
index 8904e54c6..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/soundcloud/scraping.md
+++ /dev/null
@@ -1,362 +0,0 @@
-# SoundCloud — Data Extraction
-
-Field-tested against soundcloud.com on 2026-04-18.
-No authentication required for any approach documented here. All code uses `http_get` (pure HTTP, no browser).
-
----
-
-## Approach 1 (Fastest): oEmbed API — No Auth, No Client ID
-
-`https://soundcloud.com/oembed?url=<resource_url>&format=json`
-
-Returns JSON in ~0.3s. Works for **tracks, playlists/sets, and user profiles**. No key required.
-
-```python
-from helpers import http_get
-import json
-
-def soundcloud_oembed(resource_url):
-    """Fetch oEmbed metadata for any public SoundCloud URL.
-
-    Works for:
-      - https://soundcloud.com/{user}/{track-slug}
-      - https://soundcloud.com/{user}/sets/{playlist-slug}
-      - https://soundcloud.com/{user}
-    """
-    url = f"https://soundcloud.com/oembed?url={resource_url}&format=json"
-    return json.loads(http_get(url))
-
-# Track
-track = soundcloud_oembed("https://soundcloud.com/forss/flickermood")
-# {
-#   "version": 1.0,
-#   "type": "rich",
-#   "provider_name": "SoundCloud",
-#   "provider_url": "https://soundcloud.com",
-#   "height": 400,
-#   "width": "100%",
-#   "title": "Flickermood by Forss",
-#   "description": "From the Soulhack album...",
-#   "thumbnail_url": "https://i1.sndcdn.com/artworks-000067273316-smsiqx-t500x500.jpg",
-#   "html": "<iframe width=\"100%\" height=\"400\" scrolling=\"no\" frameborder=\"no\" src=\"https://w.soundcloud.com/player/?visual=true&url=...\">",
-#   "author_name": "Forss",
-#   "author_url": "https://soundcloud.com/forss"
-# }
-
-# Playlist/set
-pl = soundcloud_oembed("https://soundcloud.com/forss/sets/soulhack")
-# title="Soulhack by Forss", description="My 2003 debut album...", height=450
-
-# User profile
-user = soundcloud_oembed("https://soundcloud.com/forss")
-# title="Forss", description="Artist & Founder SoundCloud", height=450
-```
-
-### oEmbed fields
-
-| Field | Type | Notes |
-|-------|------|-------|
-| `title` | str | "{Track Title} by {Artist}" for tracks, "{Name}" for users |
-| `author_name` | str | Artist/user display name |
-| `author_url` | str | Profile URL |
-| `thumbnail_url` | str | Artwork at 500×500px (t500x500) |
-| `description` | str | Track/profile description (may contain HTML entities) |
-| `html` | str | Embed iframe for the SoundCloud player widget |
-| `height` | int | 400 for tracks, 450 for playlists and users |
-| `width` | str | Always `"100%"` |
-
----
-
-## Approach 2: Page Hydration (`__sc_hydration`) — Rich Metadata, No Client ID
-
-Every SoundCloud page embeds a JSON array in a `<script>` tag as `window.__sc_hydration`. This contains full API-grade metadata with no key required.
-
-```python
-from helpers import http_get
-import json, re
-
-def extract_hydration(page_url):
-    """Extract __sc_hydration JSON from any SoundCloud page."""
-    html = http_get(page_url)
-    match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
-    if not match:
-        return []
-    return json.loads(match.group(1))
-
-def get_hydration_by_type(page_url, hydratable):
-    """Get the 'data' dict for a specific hydratable type."""
-    for obj in extract_hydration(page_url):
-        if obj.get('hydratable') == hydratable:
-            return obj.get('data')
-    return None
-
-# Track page — hydration key is 'sound'
-track = get_hydration_by_type("https://soundcloud.com/forss/flickermood", "sound")
-# track['id']             = 293
-# track['title']          = "Flickermood"
-# track['playback_count'] = 962685
-# track['likes_count']    = 2592
-# track['duration']       = 213886  (milliseconds)
-# track['genre']          = "Electronic"
-# track['created_at']     = "2007-09-22T14:45:46Z"
-# track['artwork_url']    = "https://i1.sndcdn.com/artworks-000067273316-smsiqx-large.jpg"
-# track['waveform_url']   = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
-# track['streamable']     = True
-# track['downloadable']   = True
-# track['license']        = "all-rights-reserved"
-# track['tag_list']       = "downtempo"
-# track['urn']            = "soundcloud:tracks:293"
-# track['media']          = {'transcodings': [...]}  (HLS/progressive stream URLs — need auth)
-# track['user']           = {full user object nested}
-
-# User page — hydration key is 'user'
-user = get_hydration_by_type("https://soundcloud.com/forss", "user")
-# user['id']               = 183
-# user['username']         = "Forss"
-# user['full_name']        = "Eric Quidenus-Wahlforss"
-# user['followers_count']  = 132203
-# user['track_count']      = 26
-# user['verified']         = True
-# user['city']             = "Berlin"
-# user['country_code']     = "DE"
-# user['description']      = "Artist & Founder SoundCloud"
-# user['creator_subscription'] = {'product': {'id': 'creator-pro-unlimited'}}
-# user['badges']           = {'pro_unlimited': True, 'verified': True}
-
-# Playlist/set page — hydration key is 'playlist'
-playlist = get_hydration_by_type("https://soundcloud.com/forss/sets/soulhack", "playlist")
-# playlist['id']           = 18
-# playlist['title']        = "Soulhack"
-# playlist['track_count']  = 11
-# playlist['tracks']       = [full track objects list]
-# playlist['is_album']     = True/False
-# playlist['genre']        = "Electronic"
-```
-
-### All hydration keys on a typical page
-
-| `hydratable` | Content |
-|---|---|
-| `sound` | Full track object (on track pages) |
-| `playlist` | Full playlist + all tracks (on set pages) |
-| `user` | Full user object (on any page with a profile) |
-| `apiClient` | `{'id': '<client_id>', 'isExpiring': False}` — the client_id |
-| `geoip` | Viewer country/city/coordinates |
-| `features` | Feature flags dict |
-| `anonymousId` | Session tracking ID (not useful) |
-
----
-
-## Approach 3: API v2 — Full Query Power (Requires Client ID)
-
-The `client_id` lives in every page's `__sc_hydration` under the `apiClient` key. It is **stable across all pages and sessions** — extract once and reuse.
-
-```python
-from helpers import http_get
-import json, re
-
-def get_client_id(page_url="https://soundcloud.com"):
-    """Extract client_id from any SoundCloud page's __sc_hydration."""
-    html = http_get(page_url)
-    match = re.search(r'window\.__sc_hydration\s*=\s*(\[.*?\]);\s*<', html, re.DOTALL)
-    if not match:
-        raise ValueError("No hydration found")
-    for obj in json.loads(match.group(1)):
-        if obj.get('hydratable') == 'apiClient':
-            return obj['data']['id']
-    raise ValueError("apiClient not found in hydration")
-
-CLIENT_ID = get_client_id()  # "efg2kjLJnAJpInbN6P3hsHzispI1SKQH" (example — extract fresh)
-
-def sc_api(path, **params):
-    """Call api-v2.soundcloud.com. Returns parsed JSON."""
-    params['client_id'] = CLIENT_ID
-    qs = "&".join(f"{k}={v}" for k, v in params.items())
-    return json.loads(http_get(f"https://api-v2.soundcloud.com/{path}?{qs}"))
-```
-
-### Resolve any URL to a resource
-
-```python
-# Resolve a permalink URL to get its resource with full metadata
-track = sc_api("resolve", url="https://soundcloud.com/forss/flickermood")
-# Returns: {'kind': 'track', 'id': 293, 'title': 'Flickermood', ...}
-
-user = sc_api("resolve", url="https://soundcloud.com/forss")
-# Returns: {'kind': 'user', 'id': 183, 'username': 'Forss', ...}
-```
-
-### Track lookup
-
-```python
-# Single track by numeric ID
-track = sc_api("tracks/293")
-
-# Bulk track lookup (comma-separated IDs — returns list)
-tracks = sc_api("tracks", ids="293,290,48031525")
-# Returns a JSON array directly (not wrapped in 'collection')
-for t in tracks:
-    print(t['id'], t['title'], t['playback_count'])
-```
-
-### Search
-
-```python
-# Tracks
-results = sc_api("search/tracks", q="jazz", limit=20)
-# results['collection']    = list of track objects
-# results['total_results'] = 5293248
-# results['next_href']     = pagination URL (see below)
-
-# Users
-results = sc_api("search/users", q="jazz", limit=10)
-
-# Playlists/sets
-results = sc_api("search/playlists", q="jazz", limit=10)
-
-# Paginate with next_href
-def paginate(first_response):
-    """Yield all pages of a collection response."""
-    yield from first_response.get('collection', [])
-    next_href = first_response.get('next_href')
-    while next_href:
-        page = json.loads(http_get(f"{next_href}&client_id={CLIENT_ID}"))
-        yield from page.get('collection', [])
-        next_href = page.get('next_href')
-```
-
-### Trending charts
-
-```python
-# Trending tracks across all genres
-trending = sc_api("charts", kind="trending",
-                  genre="soundcloud:genres:all-music", limit=20)
-for item in trending['collection']:
-    t = item['track']
-    print(f"{t['title']} — score={item['score']:.4f}")
-
-# Genre options: soundcloud:genres:all-music, soundcloud:genres:electronic,
-#                soundcloud:genres:hiphoprap, soundcloud:genres:ambient, etc.
-```
-
-### User resources
-
-```python
-user_id = 183  # numeric ID from resolve or hydration
-
-# User's tracks
-tracks = sc_api(f"users/{user_id}/tracks", limit=20)
-# tracks['collection'] = list of track objects
-
-# User's playlists
-playlists = sc_api(f"users/{user_id}/playlists", limit=10)
-
-# User's likes
-likes = sc_api(f"users/{user_id}/likes", limit=10)
-
-# Related tracks for a track
-related = sc_api("tracks/293/related", limit=10)
-# related['collection'] = list of track objects
-```
-
-### Waveform data
-
-```python
-# Waveform URL comes from track['waveform_url']
-waveform_url = "https://wave.sndcdn.com/cWHNerOLlkUq_m.json"
-waveform = json.loads(http_get(waveform_url))
-# {
-#   'width': 1800,   # number of sample points
-#   'height': 140,   # max amplitude value
-#   'samples': [11, 86, 91, 80, ...]  # 1800 amplitude values
-# }
-```
-
----
-
-## Full track fields from `__sc_hydration` / API v2
-
-```
-id               int     Numeric track ID (e.g. 293)
-urn              str     "soundcloud:tracks:293"
-title            str     Track title
-description      str     May contain HTML entities/tags
-genre            str     Genre string
-tag_list         str     Space-separated tags
-created_at       str     ISO 8601 UTC
-last_modified    str     ISO 8601 UTC
-release_date     str     ISO 8601 UTC (original release)
-display_date     str     ISO 8601 UTC (shown to users)
-duration         int     Milliseconds
-full_duration    int     Milliseconds (untruncated)
-playback_count   int
-likes_count      int
-reposts_count    int
-comment_count    int
-download_count   int
-artwork_url      str     e.g. .../artworks-...-large.jpg (replace 'large' with 't500x500' for 500px)
-waveform_url     str     https://wave.sndcdn.com/....json
-permalink        str     Slug (e.g. "flickermood")
-permalink_url    str     Full canonical URL
-streamable       bool
-downloadable     bool
-license          str     e.g. "all-rights-reserved", "cc-by"
-sharing          str     "public" or "private"
-state            str     "finished" | "processing" | "failed"
-monetization_model str   "AD_SUPPORTED" | "SUB_HIGH_TIER" | "NOT_APPLICABLE"
-embeddable_by    str     "all" | "me" | "none"
-user             dict    Nested user object (id, username, avatar_url, verified, ...)
-user_id          int     Owner numeric ID
-publisher_metadata dict  {artist, publisher, isrc, contains_music, ...}
-media            dict    {'transcodings': [...]}  — stream URLs (require OAuth, not usable without login)
-label_name       str     Record label
-purchase_url     str     External buy link
-station_urn      str     "soundcloud:system-playlists:track-stations:{id}"
-```
-
----
-
-## Gotchas
-
-**client_id is required for api-v2.soundcloud.com** — requests without it return HTTP 401. Always extract from `__sc_hydration['apiClient']['id']`.
-
-**client_id source: hydration, not JS bundles** — the JS bundles on `a-v2.sndcdn.com` do NOT contain the `client_id` pattern. The only reliable source is the `apiClient` object in the page hydration. It is stable across all pages (same value from homepage, track pages, user pages) and does not appear to rotate on short timescales.
-
-**Artwork URL sizes** — hydration/API returns `...-large.jpg` (100×100). Replace the size suffix to get larger images:
-- `-large.jpg` → 100×100
-- `-t300x300.jpg` → 300×300
-- `-t500x500.jpg` → 500×500 (oEmbed returns this size)
-
-**Regex must use `re.DOTALL`** — the `__sc_hydration` JSON spans multiple lines. Without `re.DOTALL`, the `.` in the regex won't match newlines.
-
-**Stream URLs (media.transcodings) are gated** — the HLS/progressive audio stream URLs in `track['media']['transcodings']` require an OAuth token even to fetch a stream manifest. They cannot be played without a logged-in session.
-
-**Bulk track lookup returns a list, not collection** — `GET /tracks?ids=...` returns a JSON array directly. Do NOT look for `.get('collection')`.
-
-**Search `total_results` can be huge** — results like 5M+ are normal for broad queries. Use `next_href` for pagination; do not calculate offsets manually.
-
-**oEmbed description contains HTML** — SoundCloud descriptions may include `&nbsp;` and anchor tags. Decode with `html.unescape()` if you need plain text.
-
-**HTTP 400 on some endpoints** — `/tracks/{id}/comments` returns 400 without OAuth headers. Timed comments are not accessible without login.
-
-**No browser required** — all documented approaches work with plain `http_get`. SoundCloud does not require JavaScript rendering for metadata extraction.
-
-**Rate limits** — 20 rapid sequential API v2 requests completed without errors in testing. SoundCloud does not publish official rate limits; stay under ~50 req/s for sustained scraping. oEmbed is more lenient than api-v2.
-
----
-
-## Quick Reference
-
-| Goal | Approach | Auth |
-|------|----------|------|
-| Track title/author/thumbnail from URL | oEmbed | None |
-| Full track metadata + play counts | `__sc_hydration` `sound` key | None |
-| Full user profile + stats | `__sc_hydration` `user` key | None |
-| Full playlist with all tracks | `__sc_hydration` `playlist` key | None |
-| Search tracks/users/playlists | API v2 `/search/*` | client_id |
-| Trending charts | API v2 `/charts` | client_id |
-| Bulk track lookup by IDs | API v2 `/tracks?ids=` | client_id |
-| User's track list | API v2 `/users/{id}/tracks` | client_id |
-| Resolve permalink to resource | API v2 `/resolve?url=` | client_id |
-| Waveform amplitude data | Direct fetch of `waveform_url` | None |
-| Audio stream playback | OAuth login required | Login |
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/spotify/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/spotify/scraping.md
deleted file mode 100644
index 357f5a254..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/spotify/scraping.md
+++ /dev/null
@@ -1,339 +0,0 @@
-# Spotify — Data Extraction
-
-Field-tested against open.spotify.com on 2026-04-18.
-No authentication required for any approach documented here.
-
----
-
-## Approach 1 (Fastest): oEmbed API — No Auth, No Browser
-
-`https://open.spotify.com/oembed?url=<resource_url>`
-
-Returns JSON in ~0.25s. Works for tracks, albums, playlists, and artists. Does **not** work for episodes/shows.
-
-```python
-from helpers import http_get
-import json
-
-def spotify_oembed(resource_type, resource_id):
-    """Fetch oEmbed metadata for a Spotify resource.
-
-    resource_type: 'track', 'album', 'playlist', or 'artist'
-    resource_id:   Spotify ID (22-char alphanumeric)
-    """
-    resource_url = f"https://open.spotify.com/{resource_type}/{resource_id}"
-    url = f"https://open.spotify.com/oembed?url={resource_url}"
-    data = json.loads(http_get(url))
-    return data
-
-# Example: track
-track = spotify_oembed("track", "4PTG3Z6ehGkBFwjybzWkR8")
-# {
-#   "title":           "Never Gonna Give You Up",
-#   "thumbnail_url":   "https://image-cdn-ak.spotifycdn.com/image/ab67616100005174...",
-#   "thumbnail_width": 320,
-#   "thumbnail_height": 320,
-#   "type":            "rich",
-#   "html":            "<iframe ...src=\"https://open.spotify.com/embed/track/4PTG3Z6...\"...>",
-#   "iframe_url":      "https://open.spotify.com/embed/track/4PTG3Z6ehGkBFwjybzWkR8?utm_source=oembed",
-#   "width":           456,
-#   "height":          152,
-#   "version":         "1.0",
-#   "provider_name":   "Spotify",
-#   "provider_url":    "https://spotify.com"
-# }
-
-# Artist (height is 352 — taller widget)
-artist = spotify_oembed("artist", "0gxyHStUsqpMadRV0Di1Qt")
-# title="Rick Astley", thumbnail_url=<artist photo URL>
-
-# Album
-album = spotify_oembed("album", "4LH4d3cOWNNsVw41Gqt2kv")
-# title="The Dark Side of the Moon", thumbnail_url=<album art URL>
-
-# Playlist
-pl = spotify_oembed("playlist", "37i9dQZF1DXcBWIGoYBM5M")
-# title="Today's Top Hits", thumbnail_url=<playlist cover URL>
-```
-
-### Bulk fetching (ThreadPoolExecutor)
-
-```python
-from concurrent.futures import ThreadPoolExecutor
-import json
-from helpers import http_get
-
-track_ids = [
-    "4PTG3Z6ehGkBFwjybzWkR8",
-    "7qiZfU4dY1lWllzX7mPBI3",
-    "0VjIjW4GlUZAMYd2vXMi3b",
-]
-
-def fetch_oembed(tid):
-    url = f"https://open.spotify.com/oembed?url=https://open.spotify.com/track/{tid}"
-    try:
-        return json.loads(http_get(url))
-    except Exception as e:
-        return {"error": str(e), "id": tid}
-
-with ThreadPoolExecutor(max_workers=5) as ex:
-    results = list(ex.map(fetch_oembed, track_ids))
-# 5 tracks: ~1.3s total, ~0.26s per track
-```
-
----
-
-## Approach 2: Static HTML — Rich Metadata via http_get
-
-Every open.spotify.com page (track, album, playlist, artist) serves full HTML with no JS requirement. The HTML contains JSON-LD and Open Graph tags that provide structured data.
-
-### Track page — all extractable fields
-
-```python
-from helpers import http_get
-import json, re
-
-def scrape_track(track_id):
-    url = f"https://open.spotify.com/track/{track_id}"
-    html = http_get(url)
-
-    # ---- JSON-LD (most structured) ----
-    ld_raw = re.search(r'<script type="application/ld\+json"[^>]*>(.*?)</script>', html, re.DOTALL)
-    ld = json.loads(ld_raw.group(1)) if ld_raw else {}
-
-    # ---- Open Graph / music: meta tags ----
-    metas = {}
-    for m in re.finditer(r'<meta\s+(?:property|name)="([^"]+)"\s+content="([^"]*)"', html):
-        key, val = m.group(1), m.group(2)
-        if key not in metas:
-            metas[key] = val
-
-    musician_urls = re.findall(r'<meta\s+(?:property|name)="music:musician"\s+content="([^"]*)"', html)
-    allowed_countries = re.findall(r'<meta\s+property="og:restrictions:country:allowed"\s+content="([^"]*)"', html)
-
-    return {
-        "title":          metas.get("og:title"),
-        "artist":         metas.get("music:musician_description"),
-        "artist_urls":    musician_urls,               # spotify artist page URLs
-        "album_url":      metas.get("music:album"),    # spotify album page URL
-        "track_number":   metas.get("music:album:track"),
-        "duration_s":     int(metas.get("music:duration", 0)),
-        "release_date":   metas.get("music:release_date"),  # YYYY-MM-DD
-        "cover_art":      metas.get("og:image"),       # 640px JPG
-        "audio_preview":  metas.get("og:audio"),       # 30s MP3 (may be None)
-        "spotify_url":    metas.get("og:url"),
-        "description":    metas.get("og:description"),
-        "eligible_regions": allowed_countries,
-        "ld_name":        ld.get("name"),
-        "ld_date":        ld.get("datePublished"),
-    }
-
-# Tested on track/4PTG3Z6ehGkBFwjybzWkR8 (Never Gonna Give You Up):
-# {
-#   "title":         "Never Gonna Give You Up",
-#   "artist":        "Rick Astley",
-#   "artist_urls":   ["https://open.spotify.com/artist/0gxyHStUsqpMadRV0Di1Qt"],
-#   "album_url":     "https://open.spotify.com/album/6eUW0wxWtzkFdaEFsTJto6",
-#   "track_number":  "1",
-#   "duration_s":    214,
-#   "release_date":  "1987-11-12",
-#   "cover_art":     "https://i.scdn.co/image/ab67616d0000b27315ebbedaacef61af244262a8",
-#   "audio_preview": "https://p.scdn.co/mp3-preview/b4c682084c3fd05538726d0a126b7e14b6e92c83",
-#   "spotify_url":   "https://open.spotify.com/track/4PTG3Z6ehGkBFwjybzWkR8",
-#   "eligible_regions": [185 country codes],
-# }
-```
-
-### Artist page — fields available
-
-```python
-def scrape_artist(artist_id):
-    url = f"https://open.spotify.com/artist/{artist_id}"
-    html = http_get(url)
-
-    ld_raw = re.search(r'<script type="application/ld\+json"[^>]*>(.*?)</script>', html, re.DOTALL)
-    ld = json.loads(ld_raw.group(1)) if ld_raw else {}
-
-    metas = {}
-    for m in re.finditer(r'<meta\s+(?:property|name)="([^"]+)"\s+content="([^"]*)"', html):
-        if m.group(1) not in metas:
-            metas[m.group(1)] = m.group(2)
-
-    return {
-        "name":              metas.get("og:title"),
-        "monthly_listeners": metas.get("og:description"),  # "Artist · 6.7M monthly listeners."
-        "image":             metas.get("og:image"),         # full-size artist photo
-        "spotify_url":       metas.get("og:url"),
-        "description":       ld.get("description"),
-    }
-
-# Tested on Rick Astley (artist/0gxyHStUsqpMadRV0Di1Qt):
-# {
-#   "name":              "Rick Astley",
-#   "monthly_listeners": "Artist · 6.7M monthly listeners.",
-#   "image":             "https://i.scdn.co/image/ab6761610000e5ebe834a63a0cfa3c0f57a9a434",
-# }
-```
-
----
-
-## Approach 3: Embed Page — Structured JSON with Track Lists
-
-`https://open.spotify.com/embed/{type}/{id}` returns a small Next.js SSR page. Its `__NEXT_DATA__` script tag contains a fully-parsed entity object. This is the only no-auth route that returns track listings for albums, playlists, and artists.
-
-```python
-from helpers import http_get
-import json, re
-
-def scrape_embed(resource_type, resource_id):
-    """
-    resource_type: 'track', 'album', 'playlist', or 'artist'
-    Returns the entity dict from __NEXT_DATA__.
-    """
-    url = f"https://open.spotify.com/embed/{resource_type}/{resource_id}"
-    html = http_get(url)
-    m = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-    data = json.loads(m.group(1))
-    return data['props']['pageProps']['state']['data']['entity']
-
-# ---- TRACK ----
-entity = scrape_embed("track", "4PTG3Z6ehGkBFwjybzWkR8")
-# entity keys: type, name, uri, id, title, artists, releaseDate, duration,
-#              isPlayable, isExplicit, audioPreview, hasVideo, visualIdentity
-# {
-#   "name":     "Never Gonna Give You Up",
-#   "uri":      "spotify:track:4PTG3Z6ehGkBFwjybzWkR8",
-#   "artists":  [{"name": "Rick Astley", "uri": "spotify:artist:0gxyHStUsqpMadRV0Di1Qt"}],
-#   "duration": 213573,   # milliseconds
-#   "releaseDate": {"isoString": "1987-11-12T00:00:00Z"},
-#   "isPlayable": True,
-#   "isExplicit": False,
-#   "audioPreview": {"url": "https://p.scdn.co/mp3-preview/b4c682..."},
-#   "visualIdentity": {
-#     "image": [
-#       {"url": "https://image-cdn-fa.spotifycdn.com/image/ab67616d00001e02...", "maxWidth": 300, "maxHeight": 300},
-#       {"url": "https://image-cdn-fa.spotifycdn.com/image/ab67616d000048...", "maxWidth": 64,  "maxHeight": 64},
-#       {"url": "https://image-cdn-fa.spotifycdn.com/image/ab67616d0000b27...", "maxWidth": 640, "maxHeight": 640},
-#     ]
-#   }
-# }
-
-# ---- ALBUM (includes full track list) ----
-entity = scrape_embed("album", "6fu8fvc7O4p8Gb8KMTBTUW")
-# entity.trackList — list of all album tracks, e.g. 12 items:
-# [{
-#   "uri":          "spotify:track:4e1zdmsDwNBNe9rk7HHC0i",
-#   "title":        "Prelude for Piano No. 1 in E-Flat Major",
-#   "subtitle":     "Eduard Abramyan,\u00a0Sona Shaboyan",
-#   "duration":     107426,
-#   "isPlayable":   True,
-#   "audioPreview": {"url": "https://p.scdn.co/mp3-preview/d03c37..."},
-#   "entityType":   "track"
-# }, ...]
-
-# ---- PLAYLIST (includes up to 50 tracks) ----
-entity = scrape_embed("playlist", "37i9dQZF1DXcBWIGoYBM5M")
-# entity.trackList — 50 items
-# entity.subtitle  — "Spotify"
-# entity.authors   — [{"name": "Spotify"}]
-
-# ---- ARTIST (includes top 10 tracks) ----
-entity = scrape_embed("artist", "0gxyHStUsqpMadRV0Di1Qt")
-# entity.trackList — 10 top tracks, same shape as album trackList
-# entity.subtitle  — "Top tracks"
-```
-
-### Bonus: Anonymous access token (embedded in every embed page)
-
-The embed page SSR data includes a short-lived anonymous Spotify Web Player access token. The token is valid (~1 hour) but **anonymous tokens are severely rate-limited for api.spotify.com/v1 calls** (observed `Retry-After: 79561` seconds on the tracks endpoint after a few requests).
-
-```python
-def get_embed_token(resource_type="track", resource_id="4PTG3Z6ehGkBFwjybzWkR8"):
-    """Extract the anonymous access token from an embed page."""
-    url = f"https://open.spotify.com/embed/{resource_type}/{resource_id}"
-    html = http_get(url)
-    m = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-    data = json.loads(m.group(1))
-    session = data['props']['pageProps']['state']['settings']['session']
-    return {
-        "access_token":  session['accessToken'],
-        "expires_ms":    session['accessTokenExpirationTimestampMs'],
-        "is_anonymous":  session['isAnonymous'],          # always True
-        "client_id":     data['props']['pageProps']['config']['clientId'],
-    }
-
-# Returned fields (verified 2026-04-18):
-# access_token: "BQBfxv..." (a standard Spotify Bearer token, ~160 chars)
-# expires_ms:   1776512455031 (~1 hour TTL)
-# is_anonymous: True
-# client_id:    "ab9ad0d96a624805a7d51e8868df1f97"
-
-# WARNING: Do NOT use this token to hammer api.spotify.com/v1 — anonymous tokens
-# share a global rate-limit bucket. One call can trigger a 22-hour ban window.
-# Use the embed page __NEXT_DATA__ directly instead (Approach 3 above).
-```
-
----
-
-## What Requires a Browser
-
-The following are **not accessible** via http_get and require the CDP browser:
-
-- Lyrics (login-gated; JSON-LD confirms: `isAccessibleForFree: false`)
-- Search (`/search?q=...`) — loads client-side only, no meaningful HTML on first response
-- User library / listening history — requires OAuth
-- Full audio playback — requires OAuth + Widevine DRM
-- Podcast episodes — oEmbed returns 404; embed page `__NEXT_DATA__` lacks `state.data.entity`
-- Track recommendations beyond the top-10 artist view
-- Artist discography / full album list
-
-If browser access is needed for search:
-
-```python
-goto_url("https://open.spotify.com/search")
-wait_for_load()
-wait(2)
-# Type into the search box
-js("document.querySelector('input[data-testid=\"search-input\"]').focus()")
-type_text("never gonna give you up")
-wait(1)
-# Results appear in [data-testid="top-results-card"] or similar dynamic selectors
-```
-
----
-
-## URL Patterns
-
-| Resource  | URL pattern                                    | ID format         |
-|-----------|------------------------------------------------|-------------------|
-| Track     | `https://open.spotify.com/track/{id}`          | 22-char alphanum  |
-| Album     | `https://open.spotify.com/album/{id}`          | 22-char alphanum  |
-| Artist    | `https://open.spotify.com/artist/{id}`         | 22-char alphanum  |
-| Playlist  | `https://open.spotify.com/playlist/{id}`       | 22-char alphanum  |
-| oEmbed    | `https://open.spotify.com/oembed?url={resource_url}` | any of above |
-| Embed     | `https://open.spotify.com/embed/{type}/{id}`   | same ID           |
-
-Extract Spotify ID from any URL:
-
-```python
-import re
-def spotify_id(url):
-    m = re.search(r'spotify\.com/(?:embed/)?(?:track|album|artist|playlist)/([A-Za-z0-9]{22})', url)
-    return m.group(1) if m else None
-```
-
----
-
-## Gotchas
-
-- **oEmbed 404 for valid IDs**: A 404 from oEmbed can mean the resource is region-locked or not available for embedding, not necessarily that the ID is wrong. Verified: track `3n3Ppam7vgaVa1iaRUIOKE` returns 404 on oEmbed despite existing on Spotify.
-- **oEmbed 404 for artists**: Only works with valid, existing artist IDs. The artist ID `4gzpq5DumSF1a1LpGLBBl5` returns 404 — verify IDs from canonical Spotify URLs before using.
-- **oEmbed does not support episodes**: `open.spotify.com/episode/{id}` always returns 404 from the oEmbed endpoint.
-- **Embed page for episodes**: The embed page SSR for episodes does not include `state.data.entity` in the expected structure — parse defensively.
-- **Anonymous token rate limiting**: The access token from embed pages is valid but severely rate-limited for `api.spotify.com/v1`. Observed `Retry-After: 79561` (~22 hours) after 2-3 rapid API calls. Use embed `__NEXT_DATA__` data instead of the API.
-- **`get_access_token` endpoint blocked**: `https://open.spotify.com/get_access_token?reason=transport&productType=web_player` returns HTTP 403 from plain http_get regardless of headers. Token must be sourced from the embed page HTML.
-- **`music:musician` meta tag dedup**: `re.findall` on `music:musician` returns all artist URLs. `dict(re.finditer(...))` would only keep the last one — always use `findall` for multi-value tags.
-- **Cover art CDN differences**: oEmbed thumbnail uses `image-cdn-ak.spotifycdn.com`; track page `og:image` uses `i.scdn.co`. Both are publicly accessible. The embed `visualIdentity.image` array provides three sizes (64, 300, 640).
-- **No `__NEXT_DATA__` on main open.spotify.com pages**: The SSR `__NEXT_DATA__` pattern only works on `open.spotify.com/embed/*`, not on main track/album/artist pages. Those pages use JSON-LD and Open Graph tags instead.
-- **Track duration units differ**: `music:duration` meta tag is in **seconds** (integer). Embed `__NEXT_DATA__` `entity.duration` is in **milliseconds**.
-- **Rate limits for http_get pages**: No rate limit observed on oEmbed or static HTML pages in testing (10 concurrent requests succeeded; ~0.25s avg per oEmbed call).
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/spreadshirt/.gitkeep b/packages/bcode-browser/harness/agent-workspace/domain-skills/spreadshirt/.gitkeep
deleted file mode 100644
index e69de29bb..000000000
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/stackoverflow/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/stackoverflow/scraping.md
deleted file mode 100644
index e7f179c9d..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/stackoverflow/scraping.md
+++ /dev/null
@@ -1,435 +0,0 @@
-# Stack Overflow — Scraping & Data Extraction
-
-`https://stackoverflow.com` — all public read-only data is available via the Stack Exchange API v2.3. No auth, no browser required for any read operation. API is fast, returns gzip-compressed JSON, and works transparently with `http_get`.
-
-## Do this first: pick your access path
-
-| Goal | Best approach | Notes |
-|------|--------------|-------|
-| Top/hot questions by tag | `GET /2.3/questions` | Add `filter=withbody` for question text |
-| Answers for a question | `GET /2.3/questions/{id}/answers` | Add `filter=withbody` for answer text |
-| Search by keyword + tag | `GET /2.3/search/advanced` | More filters than `/search` |
-| Simple title keyword search | `GET /2.3/search` | `intitle=` param |
-| Fetch by known question IDs | `GET /2.3/questions/{id1};{id2};...` | Semicolon-delimited batch, up to 100 |
-| User profile + reputation | `GET /2.3/users/{id}` | Public fields only |
-| User activity timeline | `GET /2.3/users/{id}/timeline` | Events: badges, answers, questions |
-| User's questions / answers | `GET /2.3/users/{id}/questions` or `/answers` | Standard listing |
-| Comments on a post | `GET /2.3/questions/{id}/comments` | Needs `filter=withbody` for body |
-| Related questions | `GET /2.3/questions/{id}/related` | Returns linked/similar questions |
-| Answer by ID directly | `GET /2.3/answers/{id}` | One or more semicolon-separated IDs |
-| Popular tags | `GET /2.3/tags` | Sort by `popular`, `activity`, or `name` |
-| Site-wide statistics | `GET /2.3/info` | Total questions, quota, etc. |
-| Question HTML page | `http_get` with User-Agent | Returns 777KB HTML; prefer API |
-
-**Use the API for all data tasks.** The HTML page is 777KB, lacks clean structure, and the JSON-LD block only contains `WebSite` and `Organization` objects (no `QAPage` or `Question` schema). The API returns the same data in milliseconds, fully structured.
-
----
-
-## Quota limits
-
-The API is unauthenticated-friendly but strictly quota-capped per IP per day:
-
-| Auth level | Daily quota | Burst |
-|------------|-------------|-------|
-| No key (unauthenticated) | **300 requests/day** | No enforced burst limit observed |
-| With API key | **10,000 requests/day** | Same |
-
-Check your remaining quota in every response envelope:
-
-```python
-import json
-data = json.loads(http_get("https://api.stackexchange.com/2.3/info?site=stackoverflow"))
-print("Quota remaining:", data.get('quota_remaining'))  # e.g. 273
-print("Quota max:", data.get('quota_max'))              # 300 unauthenticated, 10000 with key
-# Confirmed: quota_max=300, quota_remaining decrements per call
-```
-
-Every API response includes `quota_remaining` in the envelope. Monitor it. When it hits 0, all calls return HTTP 400 with `error_id: 502` (throttle_violation). There is no retry-after header — wait until midnight UTC.
-
-**If you have an API key**, append `&key=YOUR_KEY` to any URL to use the 10,000/day quota.
-
----
-
-## Response envelope
-
-Every response from the Stack Exchange API is wrapped in a consistent envelope:
-
-```python
-{
-  "items": [...],          # list of result objects
-  "has_more": True/False,  # whether more pages exist
-  "quota_max": 300,        # total daily quota
-  "quota_remaining": 273,  # calls left today
-  "backoff": None          # seconds to wait before next call (rare)
-}
-```
-
-Always check `data.get('backoff')` — if it returns an integer, sleep that many seconds before the next call. Ignoring it causes throttle errors.
-
-Error responses raise `urllib.error.HTTPError` (not a JSON envelope):
-- HTTP 400 — invalid parameter (e.g. bad site name) — raises exception
-- HTTP 400 with JSON body — quota exhausted or throttle_violation
-
-```python
-try:
-    data = json.loads(http_get("https://api.stackexchange.com/2.3/questions?site=stackoverflow&pagesize=1"))
-except Exception as e:
-    print("API error:", e)   # HTTPError HTTP Error 400: Bad Request
-```
-
----
-
-## `filter=withbody` — required for post content
-
-By default, the API strips the `body` field from all responses. You **must** add `filter=withbody` to get question or answer text. This applies to questions, answers, and comments alike.
-
-```python
-import json
-
-# WITHOUT filter=withbody — body field is ABSENT
-data = json.loads(http_get("https://api.stackexchange.com/2.3/questions?order=desc&sort=votes&tagged=python&site=stackoverflow&pagesize=1"))
-q = data['items'][0]
-print("Has body:", 'body' in q)   # False
-print("Keys:", sorted(q.keys()))
-# ['accepted_answer_id', 'answer_count', 'content_license', 'creation_date',
-#  'is_answered', 'last_activity_date', 'last_edit_date', 'link', 'owner',
-#  'protected_date', 'question_id', 'score', 'tags', 'title', 'view_count']
-
-# WITH filter=withbody — body field is PRESENT
-data = json.loads(http_get("https://api.stackexchange.com/2.3/questions?order=desc&sort=votes&tagged=python&site=stackoverflow&pagesize=1&filter=withbody"))
-q = data['items'][0]
-print("Has body:", 'body' in q)   # True
-print("Body preview:", q['body'][:60])
-# '<p>What functionality does the <a href="https://do...'
-```
-
----
-
-## HTML encoding in API responses
-
-The API returns HTML in two contexts, and plain text in a third:
-
-- **`body` field** (questions, answers, comments) — full HTML markup. Headings, code blocks, links, blockquotes, lists. Strip with `html.parser` for plain text.
-- **`title` field** — HTML-entity-encoded plain text. Quotes, angle brackets, and ampersands are escaped (`&quot;`, `&lt;`, `&amp;`). Decode with `html.unescape()`.
-- **`display_name`, `link`, `tags`** — plain text, no encoding.
-
-```python
-import json, html
-from html.parser import HTMLParser
-
-data = json.loads(http_get("https://api.stackexchange.com/2.3/questions/231767?site=stackoverflow&filter=withbody"))
-q = data['items'][0]
-
-# Title has HTML entities
-print("Raw title:", q['title'])
-# 'What does the &quot;yield&quot; keyword do in Python?'
-print("Decoded:", html.unescape(q['title']))
-# 'What does the "yield" keyword do in Python?'
-
-# Body is full HTML — strip for plain text
-class Stripper(HTMLParser):
-    def __init__(self):
-        super().__init__()
-        self.text = []
-    def handle_data(self, d):
-        self.text.append(d)
-    def get_text(self):
-        return ''.join(self.text)
-
-s = Stripper()
-s.feed(q['body'])
-print(s.get_text()[:200])
-# 'What functionality does the yield keyword do in Python?\nWhat is the ...'
-```
-
----
-
-## Common workflows
-
-### Top questions by tag (API)
-
-```python
-import json, html
-data = json.loads(http_get(
-    "https://api.stackexchange.com/2.3/questions"
-    "?order=desc&sort=votes&tagged=python&site=stackoverflow&pagesize=5&filter=withbody"
-))
-for q in data['items']:
-    print(q['question_id'], q['score'], html.unescape(q['title'])[:60])
-    print("  Tags:", q['tags'][:3], "Answers:", q['answer_count'])
-print("Quota remaining:", data.get('quota_remaining'))
-# 231767 13133 What does the "yield" keyword do in Python?
-#   Tags: ['python', 'iterator', 'generator'] Answers: 51
-# 419163 8438 What does if __name__ == "__main__": do?
-#   Tags: ['python', 'namespaces', 'program-entry-point'] Answers: 40
-# Quota remaining: 299
-```
-
-Sort options for `/questions`: `activity`, `votes`, `creation`, `hot`, `week`, `month`.
-
-### Answers for a question
-
-```python
-import json
-data = json.loads(http_get(
-    "https://api.stackexchange.com/2.3/questions/231767/answers"
-    "?order=desc&sort=votes&site=stackoverflow&filter=withbody&pagesize=3"
-))
-for a in data['items']:
-    print(f"Score: {a['score']}, Accepted: {a.get('is_accepted')}")
-    print(f"  Body preview: {a['body'][:150]}")
-# Score: 18307, Accepted: True
-#   Body preview: <p>To understand what <a href="...">yield</a> does, ...
-# Score: 2596, Accepted: False
-# Score: 802, Accepted: False
-```
-
-Answer fields (with `filter=withbody`): `answer_id`, `question_id`, `score`, `is_accepted`, `body`, `owner`, `creation_date`, `last_activity_date`, `content_license`.
-
-### Fetch questions by ID (batch)
-
-Fetch up to 100 questions in one call using semicolons:
-
-```python
-import json
-data = json.loads(http_get(
-    "https://api.stackexchange.com/2.3/questions/231767;419163;394809"
-    "?site=stackoverflow&filter=withbody"
-))
-print("Fetched:", len(data['items']))   # 3
-for q in data['items']:
-    print(q['question_id'], q['score'], q['title'][:50])
-# 231767 13133 What does the &quot;yield&quot; keyword do in Pyth
-# 419163 8438  What does if __name__ == &quot;__main__&quot;: do?
-# 394809 8125  Does Python have a ternary conditional operator?
-```
-
-### Search — `search/advanced` vs `search`
-
-Use `/search/advanced` when you need combined keyword + tag filtering. Use `/search` when searching only by title keyword (`intitle=`).
-
-```python
-import json
-
-# search/advanced: keyword in body OR title, filtered by tag, sorted by relevance
-data = json.loads(http_get(
-    "https://api.stackexchange.com/2.3/search/advanced"
-    "?q=asyncio+event+loop&tagged=python&site=stackoverflow&pagesize=5&order=desc&sort=relevance"
-))
-for q in data['items']:
-    print(q['score'], q['answer_count'], q['title'][:70])
-# 137 3  "Asyncio Event Loop is Closed" when getting loop
-# 47  3  Can an asyncio event loop run in the background without suspending the
-
-# search: title-only keyword search via intitle=
-data = json.loads(http_get(
-    "https://api.stackexchange.com/2.3/search"
-    "?intitle=asyncio+event+loop&site=stackoverflow&pagesize=5&order=desc&sort=relevance"
-))
-```
-
-`search/advanced` additional params: `accepted=True` (only questions with accepted answers), `answers=1` (minimum answer count), `body=` (keyword in body), `user=` (filter by owner user ID), `views=` (minimum view count), `fromdate=`/`todate=` (Unix timestamps).
-
-### User profile
-
-```python
-import json
-
-# Basic user info
-user = json.loads(http_get("https://api.stackexchange.com/2.3/users/1?site=stackoverflow"))
-u = user['items'][0]
-print("User:", u['display_name'], "Rep:", u['reputation'], "Badges:", u['badge_counts'])
-# User: Jeff Atwood  Rep: 64159  Badges: {'bronze': 153, 'silver': 153, 'gold': 48}
-
-# Fields: user_id, display_name, reputation, badge_counts, location, link,
-#         creation_date, last_access_date, is_employee, account_id,
-#         accept_rate, profile_image, website_url
-
-# Timeline (badge, question, answer events)
-data = json.loads(http_get("https://api.stackexchange.com/2.3/users/1/timeline?site=stackoverflow&pagesize=5"))
-print("Event types:", set(i['timeline_type'] for i in data['items']))
-# {'badge'}
-
-# User's top answers
-answers = json.loads(http_get("https://api.stackexchange.com/2.3/users/1/answers?site=stackoverflow&pagesize=5&order=desc&sort=votes"))
-for a in answers['items']:
-    print("Score:", a['score'], "Question ID:", a.get('question_id'))
-
-# User's questions
-questions = json.loads(http_get("https://api.stackexchange.com/2.3/users/1/questions?site=stackoverflow&pagesize=3&order=desc&sort=votes"))
-for q in questions['items']:
-    print(q['question_id'], q['score'], q['title'][:60])
-# 9  2273  How do I calculate someone&#39;s age based on a DateTime typ
-# 11 1656  Calculate relative time in C#
-```
-
-### Comments (requires `filter=withbody`)
-
-```python
-import json
-data = json.loads(http_get(
-    "https://api.stackexchange.com/2.3/questions/231767/comments"
-    "?site=stackoverflow&pagesize=5&order=desc&sort=creation&filter=withbody"
-))
-for c in data['items']:
-    print("Score:", c['score'], "Body:", c.get('body','')[:80])
-# Comment keys (without filter): comment_id, content_license, creation_date,
-#   edited, owner, post_id, reply_to_user, score
-# With filter=withbody: adds 'body' field (HTML-encoded)
-```
-
-### Related questions
-
-```python
-import json
-related = json.loads(http_get(
-    "https://api.stackexchange.com/2.3/questions/231767/related?site=stackoverflow&pagesize=5"
-))
-for q in related['items']:
-    print(q['question_id'], q['score'], q['title'][:60])
-# 25232350 15 how generators work in python
-# 28880095 11 What does a plain yield keyword do in Python?
-```
-
-### Popular tags
-
-```python
-import json
-tags = json.loads(http_get("https://api.stackexchange.com/2.3/tags?order=desc&sort=popular&site=stackoverflow&pagesize=5"))
-for t in tags['items']:
-    print(f"{t['name']}: {t['count']:,} questions")
-# javascript: 2,531,995 questions
-# java: 1,921,907 questions
-# c#: 1,626,728 questions
-# python: (check live — grows daily)
-```
-
----
-
-## Pagination
-
-Use `page=` (1-indexed) and `pagesize=` (max 100). Check `has_more` in the envelope to know whether a next page exists.
-
-```python
-import json
-
-def fetch_all_pages(url_base, max_pages=5):
-    """Fetch multiple pages from any Stack Exchange API endpoint."""
-    results = []
-    for page in range(1, max_pages + 1):
-        data = json.loads(http_get(f"{url_base}&page={page}"))
-        results.extend(data['items'])
-        if not data.get('has_more'):
-            break
-        if data.get('backoff'):
-            import time; time.sleep(data['backoff'])
-    return results
-
-questions = fetch_all_pages(
-    "https://api.stackexchange.com/2.3/questions?order=desc&sort=votes"
-    "&tagged=python&site=stackoverflow&pagesize=10",
-    max_pages=3
-)
-print("Total fetched:", len(questions))  # up to 30
-```
-
-Note: `page=2` with `pagesize=3` returns the 4th–6th items. Confirmed working — `has_more: True` on page 2 of top Python questions.
-
----
-
-## Parallel fetching (multiple questions or answers)
-
-```python
-import json
-from concurrent.futures import ThreadPoolExecutor
-
-def fetch_top_answer(qid):
-    data = json.loads(http_get(
-        f"https://api.stackexchange.com/2.3/questions/{qid}/answers"
-        "?order=desc&sort=votes&site=stackoverflow&filter=withbody&pagesize=1"
-    ))
-    if data['items']:
-        a = data['items'][0]
-        return {"qid": qid, "top_score": a['score'], "accepted": a.get('is_accepted')}
-    return {"qid": qid, "top_score": 0}
-
-qids = [231767, 419163, 394809, 100003, 82831]
-with ThreadPoolExecutor(max_workers=3) as ex:
-    results = list(ex.map(fetch_top_answer, qids))
-
-for r in results:
-    print(r)
-# {'qid': 231767, 'top_score': 18307, 'accepted': True}
-# {'qid': 419163, 'top_score': 9051, 'accepted': True}
-# {'qid': 394809, 'top_score': 9355, 'accepted': True}
-# {'qid': 100003, 'top_score': 9334, 'accepted': False}
-# {'qid': 82831, 'top_score': 6793, 'accepted': False}
-```
-
-Keep `max_workers` at 3 or below when unauthenticated — parallel calls consume quota simultaneously. At 3 workers, 5 questions used 5 quota units (expected).
-
----
-
-## HTML page scraping (avoid for data tasks)
-
-The HTML page works but returns 777KB and has no clean `QAPage` JSON-LD. Use it only when you need something not in the API (e.g. rendered MathJax, ads context).
-
-```python
-import re, html as htmllib
-headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"}
-page = http_get("https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do-in-python", headers=headers)
-print("HTML length:", len(page))   # 777138
-
-# Page title (includes site suffix)
-title_m = re.search(r'<title>([^<]+)</title>', page)
-if title_m:
-    print(htmllib.unescape(title_m.group(1)))
-# 'iterator - What does the "yield" keyword do in Python? - Stack Overflow'
-
-# Answer count via itemprop
-ans_count = re.search(r'itemprop="answerCount"[^>]*>(\d+)<', page)
-if ans_count:
-    print("Answers:", ans_count.group(1))   # '51'
-
-# Score via itemprop (has whitespace around number)
-score_m = re.search(r'itemprop="upvoteCount"[^>]*>\s*(-?\d+)\s*<', page)
-if score_m:
-    print("Score:", score_m.group(1))   # '13133'
-
-# JSON-LD is present but only has WebSite and Organization — NOT QAPage/Question
-ld_match = re.search(r'<script type="application/ld\+json">(.*?)</script>', page, re.DOTALL)
-if ld_match:
-    d = json.loads(ld_match.group(1))
-    types = [item.get('@type') for item in d.get('@graph', [])]
-    print("JSON-LD types:", types)   # ['WebSite', 'Organization'] — no QAPage
-```
-
----
-
-## Gotchas
-
-- **300 req/day unauthenticated is per IP, resets at midnight UTC.** 6 tests consumed ~27 quota units in one session. With parallel workers and loops, you can burn through 300 in minutes. Always check `quota_remaining` in responses.
-
-- **`filter=withbody` is required for body content.** Without it, `body` is simply absent from the response — no error, no empty string, just a missing key. Applies to questions, answers, AND comments.
-
-- **Title field has HTML entities, body field has full HTML markup.** They need different decoding strategies: `html.unescape()` for titles, `HTMLParser` stripping for bodies. Don't confuse them.
-
-- **Titles in API responses contain `&quot;`, `&lt;`, `&amp;`, `&#39;`** — raw output is `What does the &quot;yield&quot; keyword do in Python?`. Always call `html.unescape()` before displaying or comparing.
-
-- **Batch IDs with semicolons, not commas.** `/questions/231767;419163;394809` fetches 3 questions in one API call. Using commas returns a 400 error.
-
-- **`search/advanced` includes body text in results; `/search` only searches titles.** Use `search/advanced` with `q=` for full-text search. Use `/search` with `intitle=` for title-only.
-
-- **HTTP errors are raised as exceptions, not returned as JSON.** A bad `site=` param causes `urllib.error.HTTPError: HTTP Error 400: Bad Request` — there's no JSON body accessible from `http_get`. Wrap API calls in try/except.
-
-- **`backoff` in the response envelope must be respected.** If `data.get('backoff')` returns an integer (rare, typically 10–30 seconds), sleep that duration before the next call. Ignoring it will cause throttle errors on subsequent requests.
-
-- **`/info` endpoint wraps stats inside `items[0]`**, not directly in the envelope. Access as `data['items'][0]['total_questions']`.
-
-- **JSON-LD on the HTML page is NOT QAPage schema.** The `<script type="application/ld+json">` block only contains `WebSite` and `Organization` objects in the `@graph` array. There is no `Question`, `Answer`, or `QAPage` type — confirmed on the most-voted Python question (231767). Don't rely on structured data from the HTML page.
-
-- **User timeline `timeline_type` can be `badge`, `question`, `answer`, `comment`, `revision`, `suggested_edit`, `accepted`.** For very old/inactive users, all recent events may be `badge` only.
-
-- **Multi-site support.** Change `site=stackoverflow` to any Stack Exchange site: `site=superuser`, `site=serverfault`, `site=askubuntu`, `site=unix`, `site=datascience`, `site=math`. Same API, same quota pool per IP.
-
-- **`pagesize` max is 100.** Requesting more returns a 400 error. For bulk fetching, loop with `page=` and check `has_more`.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/steam/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/steam/scraping.md
deleted file mode 100644
index 267b4ab4c..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/steam/scraping.md
+++ /dev/null
@@ -1,575 +0,0 @@
-# Steam — Scraping & Data Extraction
-
-Field-tested against store.steampowered.com on 2026-04-18. All code blocks validated with live requests.
-
-## Fastest approach: App Details API (no auth, no browser)
-
-The `appdetails` endpoint is the primary source for all game data. No API key, no cookies, no auth required. Returns clean JSON for any appid.
-
-```python
-import json
-from helpers import http_get
-
-def get_app(appid, cc="US"):
-    """
-    Fetch full game/DLC/software data by Steam appid.
-    cc = ISO-3166 country code for correct regional pricing (default: US).
-    Returns None if appid not found or no longer on Steam.
-    """
-    resp = http_get(
-        f"https://store.steampowered.com/api/appdetails?appids={appid}&cc={cc}"
-    )
-    data = json.loads(resp)
-    entry = data[str(appid)]
-    if not entry["success"]:
-        return None
-    return entry["data"]
-
-game = get_app(292030)   # The Witcher 3
-# game["name"]               -> "The Witcher 3: Wild Hunt"
-# game["steam_appid"]        -> 292030
-# game["type"]               -> "game" | "dlc" | "demo" | "advertising" | "mod" | "video"
-# game["required_age"]       -> 18    (int, 0 if no restriction)
-# game["is_free"]            -> False
-# game["short_description"]  -> plain-text one-liner
-# game["about_the_game"]     -> HTML
-# game["detailed_description"]-> HTML
-# game["website"]            -> "https://www.thewitcher.com/witcher3"
-# game["header_image"]       -> URL to 460x215px header image
-# game["capsule_image"]      -> URL to smaller capsule image
-# game["background"]         -> URL to store page background
-# game["supported_languages"]-> HTML string with language list (use html.unescape())
-# game["developers"]         -> ["CD PROJEKT RED"]
-# game["publishers"]         -> ["CD PROJEKT RED"]
-# game["platforms"]          -> {"windows": True, "mac": False, "linux": False}
-# game["metacritic"]         -> {"score": 93, "url": "https://www.metacritic.com/..."}
-# game["genres"]             -> [{"id": "3", "description": "RPG"}]
-# game["categories"]         -> [{"id": 2, "description": "Single-player"}, ...]
-# game["release_date"]       -> {"coming_soon": False, "date": "May 18, 2015"}
-# game["dlc"]                -> [355880, 378649, ...]  (list of DLC appids)
-# game["legal_notice"]       -> copyright text
-# game["ratings"]            -> per-region rating board data (ESRB, PEGI, USK, ...)
-# game["content_descriptors"]-> {"ids": [1, 5], "notes": "..."}
-# game["recommendations"]    -> {"total": 812249}
-# game["achievements"]       -> {"total": 78, "highlighted": [...]}
-# game["support_info"]       -> {"url": "...", "email": "..."}
-# game["pc_requirements"]    -> {"minimum": "<html>...", "recommended": "<html>..."}
-# game["mac_requirements"]   -> same structure or []
-# game["linux_requirements"] -> same structure or []
-```
-
----
-
-## Price overview
-
-Prices are always in **cents** (integer). Use `final_formatted` for display.
-
-```python
-game = get_app(292030)
-po = game.get("price_overview")
-# po is None for free-to-play games (is_free=True)
-
-if po:
-    print(po["currency"])           # "USD"
-    print(po["final"])              # 3999          (cents — $39.99)
-    print(po["initial"])            # 3999          (original price in cents)
-    print(po["discount_percent"])   # 0             (0–100)
-    print(po["final_formatted"])    # "$39.99"       (always present, ready to display)
-    print(po["initial_formatted"])  # ""             (EMPTY when not discounted!)
-                                    # "$49.99"       (only set when discount_percent > 0)
-```
-
-**Critical**: `initial_formatted` is an empty string when `discount_percent == 0`.
-Always use `final_formatted` for displaying current price.
-
-```python
-def price_display(game):
-    """Returns (current_price_str, original_price_str_or_None, discount_pct)."""
-    if game.get("is_free"):
-        return ("Free", None, 0)
-    po = game.get("price_overview")
-    if not po:
-        return ("N/A", None, 0)
-    disc = po["discount_percent"]
-    orig = po["initial_formatted"] if disc > 0 else None
-    return (po["final_formatted"], orig, disc)
-
-# Witcher3: ("$39.99", None, 0)
-# Discounted game: ("$24.99", "$49.99", 50)
-# Dota2: ("Free", None, 0)
-```
-
-### Regional pricing
-
-Pass `cc=` (ISO-3166 country code) to get local currency:
-
-```python
-get_app(292030, cc="GB")["price_overview"]
-# {"currency": "GBP", "initial": 2499, "final": 2499, ..., "final_formatted": "£24.99"}
-
-get_app(292030, cc="DE")["price_overview"]
-# {"currency": "EUR", "initial": 2999, "final": 2999, ..., "final_formatted": "29,99€"}
-```
-
----
-
-## Bulk / concurrent fetching
-
-10 games in 0.54s with 5 workers — no rate-limit errors observed:
-
-```python
-import json
-from concurrent.futures import ThreadPoolExecutor
-from helpers import http_get
-
-def fetch_game(appid, cc="US"):
-    resp = http_get(
-        f"https://store.steampowered.com/api/appdetails?appids={appid}&cc={cc}"
-    )
-    data = json.loads(resp)
-    entry = data[str(appid)]
-    return entry["data"] if entry["success"] else None
-
-appids = [292030, 570, 413150, 427520, 730, 550, 220, 400, 218620, 105600]
-
-with ThreadPoolExecutor(max_workers=5) as ex:
-    games = list(ex.map(fetch_game, appids))
-# Completed in ~0.54s
-# games[i] is None if appid not found
-```
-
-**Confirmed field values for common appids:**
-- `570` (Dota 2): `is_free=True`, `price_overview=None`, `required_age=0`
-- `292030` (Witcher 3): `is_free=False`, `required_age=18`, `metacritic.score=93`
-- `413150` (Stardew Valley): `is_free=False`, `required_age=0`, `metacritic=None`
-- `427520` (Factorio): `is_free=False`, `required_age=0`
-
----
-
-## Partial field fetch (filters=)
-
-Fetch only specific fields to reduce payload size:
-
-```python
-# Price only (tiny response)
-resp = http_get("https://store.steampowered.com/api/appdetails?appids=292030&filters=price_overview")
-data = json.loads(resp)["292030"]["data"]
-# data keys: ["price_overview"]
-
-# Basic metadata (no price, no media)
-resp = http_get("https://store.steampowered.com/api/appdetails?appids=292030&filters=basic")
-# data keys: about_the_game, capsule_image, capsule_imagev5, detailed_description, dlc,
-#            header_image, is_free, legal_notice, linux_requirements, mac_requirements,
-#            name, pc_requirements, required_age, reviews, short_description,
-#            steam_appid, supported_languages, type, website
-
-# Multiple filters comma-separated
-resp = http_get("https://store.steampowered.com/api/appdetails?appids=292030&filters=screenshots,price_overview")
-# data keys: ["price_overview", "screenshots"]
-```
-
----
-
-## Media fields
-
-### Screenshots
-
-```python
-game = get_app(292030)
-for ss in game["screenshots"]:       # 18 screenshots for Witcher 3
-    print(ss["id"])                  # 0, 1, 2, ...
-    print(ss["path_thumbnail"])      # 600x338 JPEG URL
-    print(ss["path_full"])           # 1920x1080 JPEG URL
-```
-
-### Movies / trailers
-
-```python
-for m in game["movies"]:             # 4 trailers for Witcher 3
-    print(m["id"])                   # integer
-    print(m["name"])                 # trailer title
-    print(m["thumbnail"])            # thumbnail URL
-    print(m["highlight"])            # bool — main trailer flag
-    # m["webm"]  -> None (old format, mostly absent)
-    # m["mp4"]   -> None (old format, mostly absent)
-    # m["dash_av1"]  -> dash_av1 stream URL (present on modern entries)
-    # m["dash_h264"] -> dash_h264 stream URL
-    # m["hls_h264"]  -> HLS stream URL
-```
-
----
-
-## Ratings and content descriptors
-
-The `ratings` dict contains per-region rating board data for mature games:
-
-```python
-game = get_app(292030)
-
-# ESRB (North America)
-esrb = game["ratings"].get("esrb", {})
-esrb["rating"]          # "m" (lowercase)  -> M for Mature
-esrb["descriptors"]     # "Blood and Gore\r\nIntense Violence\r\nNudity\r\n..."
-esrb["use_age_gate"]    # "true" (string, not bool)
-esrb["required_age"]    # "17" (string, not int)
-
-# PEGI (Europe)
-pegi = game["ratings"].get("pegi", {})
-pegi["rating"]          # "18"
-pegi["descriptors"]     # "Violence\r\nBad language"
-
-# USK (Germany)
-usk = game["ratings"].get("usk", {})
-usk["rating"]           # "18"
-
-# steam_germany (Germany digital-only classification)
-sg = game["ratings"].get("steam_germany", {})
-sg["rating"]            # "16"
-sg["banned"]            # "0"  (1 = banned in Germany)
-
-# igrs (Indonesia)
-igrs = game["ratings"].get("igrs", {})
-igrs["rating"]          # "BANNED" if banned there
-igrs["banned"]          # "1"
-
-# Other keys: oflc, nzoflc, kgrb, dejus, mda, fpb, csrr, crl
-```
-
-Content descriptor IDs (from `content_descriptors.ids`):
-- `1` = Some Nudity or Sexual Content
-- `5` = Frequent Violence or Gore
-
----
-
-## Age-gated store pages
-
-**The `appdetails` API completely bypasses age gates.** It returns full data for any game regardless of rating or age restriction — no cookies needed.
-
-The **store webpage** (`store.steampowered.com/app/{appid}/`) redirects mature games to an age verification form:
-
-```
-GET https://store.steampowered.com/app/292030/
--> 302 -> https://store.steampowered.com/agecheck/app/292030/
-```
-
-To bypass the age gate on the store page, send the `birthtime` cookie:
-
-```python
-import urllib.request
-
-def get_store_page(appid):
-    """Fetch game store HTML page, bypassing age gate."""
-    req = urllib.request.Request(
-        f"https://store.steampowered.com/app/{appid}/",
-        headers={
-            "User-Agent": "Mozilla/5.0",
-            "Cookie": "birthtime=631152001; lastagecheckage=1-January-1990"
-        }
-    )
-    with urllib.request.urlopen(req, timeout=15) as r:
-        html = r.read().decode("utf-8", errors="replace")
-        if "agecheck" in r.url:
-            return None   # Age gate not bypassed
-        return html
-```
-
-`birthtime=631152001` = January 1, 1990 in Unix time. Steam accepts any date before the current year minus the required age.
-
----
-
-## Search
-
-### storesearch API (title search, up to 10 results)
-
-```python
-import json, urllib.parse
-from helpers import http_get
-
-def search_games(term, cc="US", lang="english"):
-    """
-    Returns up to 10 matching apps/DLC/bundles.
-    No pagination — always exactly 10 results max.
-    """
-    q = urllib.parse.quote(term)
-    resp = http_get(
-        f"https://store.steampowered.com/api/storesearch/?term={q}&l={lang}&cc={cc}"
-    )
-    data = json.loads(resp)
-    return data["items"]
-
-results = search_games("witcher")
-# [
-#   {"type": "app", "name": "The Witcher 3: Wild Hunt", "id": 292030,
-#    "price": {"currency": "USD", "initial": 3999, "final": 3999},
-#    "tiny_image": "https://...", "metascore": "93",
-#    "platforms": {"windows": True, "mac": False, "linux": False},
-#    "streamingvideo": False},
-#   {"type": "sub", ...},  # bundles have type="sub"
-#   ...
-# ]
-```
-
-**Search result fields:**
-- `id` — appid (or subid for bundles)
-- `type` — `"app"` | `"sub"` (bundle)
-- `name` — game title
-- `price` — `{"currency": "USD", "initial": cents, "final": cents}` — `None` for F2P
-- `metascore` — string e.g. `"93"`, `"0"` if no score
-- `platforms` — `{"windows": bool, "mac": bool, "linux": bool}`
-- `tiny_image` — 231x87px capsule image URL
-- `streamingvideo` — bool
-
-**Note:** `price` in search results has only `initial` and `final` — no `discount_percent` or `formatted` strings. Use `appdetails` for full pricing.
-
----
-
-## Review scores and user reviews
-
-```python
-import json, urllib.parse
-from helpers import http_get
-
-def get_reviews(appid, num=10, language="english", filter="recent",
-                review_type="all", purchase_type="all", cursor="*"):
-    """
-    filter: "recent" | "updated" | "all"
-    review_type: "all" | "positive" | "negative"
-    purchase_type: "all" | "steam" | "non_steam_purchase"
-    language: "english" | "all" | ISO code
-    cursor: use returned cursor for next page (URL-encode it)
-    """
-    encoded_cursor = urllib.parse.quote(cursor)
-    resp = http_get(
-        f"https://store.steampowered.com/appreviews/{appid}"
-        f"?json=1&num_per_page={num}&language={language}"
-        f"&filter={filter}&review_type={review_type}"
-        f"&purchase_type={purchase_type}&cursor={encoded_cursor}"
-    )
-    return json.loads(resp)
-
-result = get_reviews(292030, num=5, language="english")
-
-# result["success"]          -> 1 (int, not bool)
-# result["cursor"]           -> "AoJ4rq..."  (base64, URL-encode for next page)
-# result["query_summary"]["review_score"]      -> 9      (0–9 score)
-# result["query_summary"]["review_score_desc"] -> "Overwhelmingly Positive"
-# result["query_summary"]["total_positive"]    -> 226883
-# result["query_summary"]["total_negative"]    -> 7499
-# result["query_summary"]["total_reviews"]     -> 234382  (steam purchase only)
-# result["reviews"]          -> list of review objects
-```
-
-**Review score descriptions (review_score int to string):**
-
-| Score | Description |
-|-------|-------------|
-| 9 | Overwhelmingly Positive |
-| 8 | Very Positive |
-| 7 | Mostly Positive |
-| 6 | Positive (Mixed) |
-| 5 | Mixed |
-| 4 | Mostly Negative |
-| 3 | Negative |
-| 2 | Mostly Negative |
-| 1 | Overwhelmingly Negative |
-| 0 | No reviews |
-
-**Confirmed scores:** Witcher 3 = 9, Counter-Strike 2 = 8, Stardew Valley = 9, Factorio = 9.
-
-### Review object fields
-
-```python
-review = result["reviews"][0]
-review["recommendationid"]           # "221423937"  — unique review ID
-review["voted_up"]                   # True/False  — positive/negative
-review["votes_up"]                   # 213         — helpful votes
-review["votes_funny"]                # 66
-review["weighted_vote_score"]        # 0.8405...   — Steam's helpfulness score
-review["comment_count"]              # 20
-review["steam_purchase"]             # True
-review["received_for_free"]          # False
-review["written_during_early_access"]# False
-review["timestamp_created"]          # 1774209092  (Unix timestamp)
-review["timestamp_updated"]          # Unix timestamp
-review["language"]                   # "english"
-review["review"]                     # review text
-review["app_release_date"]           # Unix timestamp of game release
-
-review["author"]["steamid"]              # "76561198..."
-review["author"]["personaname"]          # display name
-review["author"]["num_games_owned"]      # 1039
-review["author"]["num_reviews"]          # 180
-review["author"]["playtime_forever"]     # 1146  (minutes total)
-review["author"]["playtime_last_two_weeks"] # minutes in last 2 weeks
-review["author"]["playtime_at_review"]   # minutes at time of review
-review["author"]["last_played"]          # Unix timestamp
-```
-
-### Cursor-based pagination
-
-```python
-import urllib.parse, json
-from helpers import http_get
-
-def get_all_reviews(appid, max_pages=5, num_per_page=100, language="all"):
-    """Paginate through reviews using cursor."""
-    cursor = "*"
-    all_reviews = []
-    for _ in range(max_pages):
-        resp = http_get(
-            f"https://store.steampowered.com/appreviews/{appid}"
-            f"?json=1&num_per_page={num_per_page}&language={language}"
-            f"&filter=recent&cursor={urllib.parse.quote(cursor)}"
-        )
-        data = json.loads(resp)
-        batch = data.get("reviews", [])
-        if not batch:
-            break
-        all_reviews.extend(batch)
-        cursor = data.get("cursor", "")
-        if not cursor:
-            break
-    return all_reviews
-```
-
----
-
-## Featured games
-
-### Featured items (rotating store front)
-
-```python
-import json
-from helpers import http_get
-
-data = json.loads(http_get("https://store.steampowered.com/api/featured/"))
-# data["large_capsules"]  -> 1-3 hero banner items
-# data["featured_win"]    -> 10 featured items for Windows
-# data["featured_mac"]    -> macOS featured
-# data["featured_linux"]  -> Linux featured
-# data["status"]          -> 1
-
-item = data["featured_win"][0]
-# item["id"]                -> appid
-# item["name"]              -> game title
-# item["discounted"]        -> bool
-# item["discount_percent"]  -> 0-100
-# item["original_price"]    -> cents
-# item["final_price"]       -> cents
-# item["currency"]          -> "USD"
-# item["windows_available"] -> bool
-# item["mac_available"]     -> bool
-# item["linux_available"]   -> bool
-# item["large_capsule_image"] -> URL
-# item["small_capsule_image"] -> URL
-# item["header_image"]      -> URL
-# item["controller_support"] -> "full" | "partial" | ""
-```
-
-### Featured categories (top sellers, specials, new releases, coming soon)
-
-```python
-data = json.loads(http_get("https://store.steampowered.com/api/featuredcategories/"))
-
-# Named sections (most useful):
-specials    = data["specials"]["items"]      # 10 on-sale games
-top_sellers = data["top_sellers"]["items"]  # 10 top sellers
-new_releases= data["new_releases"]["items"] # 30 new releases
-coming_soon = data["coming_soon"]["items"]  # 10 upcoming games
-
-# Numbered keys "0" through "7" are spotlight banners (deals/events)
-
-item = top_sellers[0]
-# item["id"]                   -> appid
-# item["name"]                 -> game title
-# item["discounted"]           -> bool
-# item["discount_percent"]     -> 0-100
-# item["original_price"]       -> cents (None for upcoming games)
-# item["final_price"]          -> cents (0 for upcoming)
-# item["currency"]             -> "USD"
-# item["discount_expiration"]  -> Unix timestamp (present for active sales)
-# item["windows_available"]    -> bool
-# item["mac_available"]        -> bool
-# item["linux_available"]      -> bool
-# item["header_image"]         -> URL
-```
-
----
-
-## App list (all Steam apps)
-
-The `ISteamApps/GetAppList` API endpoint (v1, v2, v0001, v0002) currently returns **HTTP 404** from `api.steampowered.com` as of 2026-04-18. The endpoint is effectively retired without a Steamworks API key.
-
-**Workaround:** Use the featured categories and search APIs to discover appids, then batch-fetch via `appdetails`.
-
-```python
-# Discover appids from top sellers + new releases
-import json
-from helpers import http_get
-
-def get_all_store_appids():
-    data = json.loads(http_get("https://store.steampowered.com/api/featuredcategories/"))
-    appids = set()
-    for key in ["specials", "top_sellers", "new_releases", "coming_soon"]:
-        for item in data.get(key, {}).get("items", []):
-            appids.add(item["id"])
-    for key in ["featured_win", "featured_mac", "featured_linux"]:
-        for item in data.get(key, []):
-            appids.add(item["id"])
-    return sorted(appids)
-
-# Returns ~50 store-front appids (enough to seed further discovery)
-```
-
----
-
-## Rate limits
-
-Steam's public APIs are generous. Confirmed during testing:
-
-- **10 sequential requests in 1.59s** — all HTTP 200, no throttling
-- **10 concurrent requests (5 workers) in 0.54s** — all succeeded
-- **No `Retry-After` header** observed at any concurrency level
-
-Practical limits (undocumented, inferred from community reports):
-- ~200 requests/5 minutes per IP to `appdetails` before soft throttling (returns `success: false`)
-- Review API is more restrictive — keep to ~50 requests/minute
-
----
-
-## Gotchas
-
-**`success: false` with no data field** — When an appid is invalid, removed, or unreleased, the response is `{"999999": {"success": false}}` with no `data` key. Always check `entry["success"]` before accessing `entry["data"]`.
-
-```python
-entry = json.loads(resp)[str(appid)]
-if not entry["success"]:
-    return None   # game removed or never existed
-game = entry["data"]
-```
-
-**Multiple appids in one call — not supported** — `appids=292030,570` returns HTTP 400. The API only accepts a single appid per call. Use `ThreadPoolExecutor` for bulk fetching.
-
-**`price_overview` is `None` for free games** — When `is_free=True`, the `price_overview` key is absent or `None`. Never index `game["price_overview"]["final"]` without a None check.
-
-**`initial_formatted` is empty string when not on sale** — When `discount_percent == 0`, `initial_formatted` is `""`. Only `final_formatted` is reliably present and non-empty. Use `final_formatted` for display in all cases.
-
-**Store page age gate** — `store.steampowered.com/app/{appid}/` redirects mature games to `/agecheck/app/{appid}/`. The `appdetails` API completely bypasses this — no cookies needed. For browser-based scraping of the store page, send `Cookie: birthtime=631152001; lastagecheckage=1-January-1990`.
-
-**`storesearch` always returns ≤ 10 results** — No pagination. `total` in the response is always 10, not the true result count. For finding specific games, this is sufficient. For catalog browsing, use `appdetails` with known appids.
-
-**`metascore` is string `"0"` in search results, int `93` in appdetails** — Inconsistent types. In `storesearch` results, `metascore` is a string (e.g. `"93"`, `"0"`). In `appdetails`, `metacritic` is a dict `{"score": 93, "url": "..."}` or absent entirely. Always `int()` the storesearch value.
-
-**`appdetails` returns `type: "dlc"` for DLC** — Check `game["type"]` before treating every appid as a standalone game. Type values: `"game"`, `"dlc"`, `"demo"`, `"advertising"`, `"mod"`, `"video"`.
-
-**`ratings` dict uses string booleans** — `use_age_gate` and `required_age` inside `ratings[board]` are strings (`"true"`, `"17"`), not native types. `banned` is also a string `"0"` or `"1"`.
-
-**`ISteamApps/GetAppList` is dead** — HTTP 404 for v1, v2, v0001, v0002 endpoints as of 2026-04-18. Use store front APIs and search to discover appids instead.
-
-**`supported_languages` is HTML** — The field contains escaped HTML like `English<strong>*</strong>, French`. Starred languages have full audio. Use `html.unescape()` and strip tags to get a clean list.
-
-**`release_date.date` is a locale string, not ISO** — Value is `"May 18, 2015"` not `"2015-05-18"`. Parse with `datetime.strptime(d, "%B %d, %Y")` or use regex.
-
-**Review `purchase_type` changes total counts** — `purchase_type=all` includes reviews from non-Steam purchases (physical, Humble, etc.). `purchase_type=steam` is Steam-only. Witcher 3 example: `all`=802,072 reviews, `steam`=234,385.
-
-**Currency requires `cc=` param** — Without `cc=`, you get USD by default. Pass `cc=GB` for GBP, `cc=DE` for EUR, etc. Country codes are ISO-3166 (2-letter, uppercase).
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/substack/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/substack/scraping.md
deleted file mode 100644
index 3e8194301..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/substack/scraping.md
+++ /dev/null
@@ -1,338 +0,0 @@
-# Substack — Data Extraction
-
-Field-tested against multiple Substack publications on 2026-04-27.
-No authentication required for any approach documented here.
-All endpoints work via `http_get` without a browser.
-
----
-
-## TL;DR
-
-Substack exposes a clean public REST API at `{publication}.substack.com/api/v1/`.
-Every publication hosted on Substack (custom domain or `{name}.substack.com`)
-responds to the same API paths. No API key, no login, no browser required.
-
-**What you can do:**
-- List all posts from any publication (`/api/v1/posts`)
-- Fetch full post content by slug (`/api/v1/posts/{slug}`)
-- Fetch post comments (`/api/v1/post/{post_id}/comments`)
-- Read the RSS feed (`/feed`) for title/date/link/description metadata
-
-**Limitations:**
-- Paid-only post bodies return a truncated HTML preview for `body_html` (not the full article)
-- No cross-publication search API accessible without a logged-in session
-- Comment endpoint uses `post_id` (integer), not slug
-
----
-
-## Approach 1 (Recommended): Publication Post List
-
-`GET https://{subdomain}.substack.com/api/v1/posts?limit=N&offset=N`
-
-Works for any Substack publication. Returns posts sorted newest-first.
-
-```python
-from helpers import http_get
-import json
-
-def substack_list_posts(publication_url, limit=20, offset=0):
-    """List posts from a Substack publication.
-
-    Args:
-        publication_url: Base URL of the publication, e.g.
-                         'https://www.slowboring.com' or
-                         'https://simonwillison.substack.com'
-        limit: Number of posts to return (max observed: 100)
-        offset: Pagination offset
-
-    Returns list of post dicts with keys: title, subtitle, slug,
-    canonical_url, post_date, audience, wordcount, reactions, restacks.
-    audience is 'everyone' (free) or 'only_paid' (paywalled).
-    """
-    url = f"{publication_url.rstrip('/')}/api/v1/posts?limit={limit}&offset={offset}"
-    posts = json.loads(http_get(url))
-    return [
-        {
-            "title":         p.get("title"),
-            "subtitle":      p.get("subtitle"),
-            "slug":          p.get("slug"),
-            "url":           p.get("canonical_url"),
-            "post_date":     p.get("post_date"),
-            "audience":      p.get("audience"),   # 'everyone' or 'only_paid'
-            "wordcount":     p.get("wordcount"),
-            "reactions":     p.get("reactions"),  # e.g. {"❤": 221}
-            "restacks":      p.get("restacks"),
-            "cover_image":   p.get("cover_image"),
-            "post_id":       p.get("id"),
-        }
-        for p in posts
-    ]
-
-posts = substack_list_posts("https://www.slowboring.com", limit=10)
-# [
-#   {
-#     "title":     "What to make of the generic ballot",
-#     "subtitle":  "Plus ties, Mamdani, the Obama legacy, and fundraising's diminishing returns",
-#     "slug":      "what-to-make-of-the-generic-ballot",
-#     "url":       "https://www.slowboring.com/p/what-to-make-of-the-generic-ballot",
-#     "post_date": "2026-04-24T10:03:26.581Z",
-#     "audience":  "everyone",
-#     "wordcount": 4369,
-#     "reactions": {"❤": 221},
-#     "restacks":  10,
-#     "post_id":   194950421,
-#   },
-#   ...
-# ]
-
-# Filter for free (non-paywalled) posts only
-free_posts = [p for p in posts if p["audience"] == "everyone"]
-```
-
-### Pagination
-
-```python
-def substack_all_posts(publication_url, max_posts=200):
-    """Fetch all posts from a publication via paginated API."""
-    all_posts = []
-    offset = 0
-    batch_size = 50
-    while len(all_posts) < max_posts:
-        batch = substack_list_posts(publication_url, limit=batch_size, offset=offset)
-        if not batch:
-            break
-        all_posts.extend(batch)
-        if len(batch) < batch_size:
-            break  # last page
-        offset += batch_size
-    return all_posts[:max_posts]
-```
-
----
-
-## Approach 2: Full Post Content by Slug
-
-`GET https://{subdomain}.substack.com/api/v1/posts/{slug}`
-
-Returns the full post including `body_html` for free posts. Paywalled posts
-return a truncated HTML preview for `body_html` (not the full article).
-
-```python
-from helpers import http_get
-import json, re
-
-def substack_get_post(publication_url, slug):
-    """Fetch full content of a single Substack post by slug.
-
-    Returns title, body as plain text, body_html, author, date,
-    and metadata. body_html is a truncated preview for paywalled posts.
-    """
-    url = f"{publication_url.rstrip('/')}/api/v1/posts/{slug}"
-    post = json.loads(http_get(url))
-
-    body_html = post.get("body_html")
-    body_text = None
-    if body_html:
-        # Strip HTML tags for plain text
-        body_text = re.sub(r'<[^>]+>', ' ', body_html)
-        body_text = re.sub(r'\s+', ' ', body_text).strip()
-
-    return {
-        "title":         post.get("title"),
-        "subtitle":      post.get("subtitle"),
-        "slug":          post.get("slug"),
-        "url":           post.get("canonical_url"),
-        "post_date":     post.get("post_date"),
-        "audience":      post.get("audience"),
-        "wordcount":     post.get("wordcount"),
-        "reactions":     post.get("reactions"),
-        "restacks":      post.get("restacks"),
-        "body_html":     body_html,   # full article if free; truncated preview if paywalled
-        "body_text":     body_text,   # full plain text if free; truncated if paywalled
-        "truncated_preview": post.get("truncated_body_text"),  # always present
-        "post_id":       post.get("id"),
-        "publication_id": post.get("publication_id"),
-    }
-
-post = substack_get_post(
-    "https://www.slowboring.com",
-    "what-to-make-of-the-generic-ballot"
-)
-# Free post (audience == "everyone"):
-# {
-#   "title":    "What to make of the generic ballot",
-#   "audience": "everyone",
-#   "wordcount": 4369,
-#   "body_html": "<p>I suppose this isn't a huge surprise...</p>...",  # ~40KB full article
-#   "body_text": "I suppose this isn't a huge surprise ...",           # ~25KB plain text
-#   "post_id":  194950421,
-# }
-
-# Paywalled post (audience == "only_paid"):
-# post["body_html"]        -> truncated HTML preview (a few hundred bytes, not the full article)
-# post["body_text"]        -> truncated plain text (stripped from truncated HTML)
-# post["truncated_preview"] -> short plaintext excerpt (separate, always present)
-# Use audience == "everyone" as the reliable signal for full content availability.
-```
-
----
-
-## Approach 3: Post Comments
-
-`GET https://{subdomain}.substack.com/api/v1/post/{post_id}/comments?limit=N`
-
-Note: uses **integer `post_id`**, not slug. Get `post_id` from the post list
-or post detail responses.
-
-```python
-from helpers import http_get
-import json
-
-def substack_get_comments(publication_url, post_id, limit=50):
-    """Fetch top-level comments for a Substack post.
-
-    Args:
-        publication_url: Base URL of the publication
-        post_id: Integer post ID (from post list or post detail)
-        limit: Max comments to return
-
-    Returns list of comment dicts.
-    """
-    url = f"{publication_url.rstrip('/')}/api/v1/post/{post_id}/comments?limit={limit}"
-    data = json.loads(http_get(url))
-    comments = data.get("comments", [])
-    return [
-        {
-            "comment_id":     c.get("id"),
-            "author":         c.get("name"),
-            "author_handle":  c.get("handle"),
-            "body":           c.get("body"),
-            "date":           c.get("date"),
-            "reaction_count": c.get("reaction_count"),  # e.g. {"❤": 99}
-            "children_count": c.get("children_count"),  # reply count
-            "restacks":       c.get("restacks"),
-        }
-        for c in comments
-        if not c.get("deleted")
-    ]
-
-comments = substack_get_comments("https://www.slowboring.com", 194950421, limit=10)
-# [
-#   {
-#     "comment_id":     248392394,
-#     "author":         "John from FL",
-#     "body":           "Sam asks: \"don't they kind of have a point...\"",
-#     "date":           "2026-04-24T10:20:21.997Z",
-#     "reaction_count": {"❤": 99},
-#     "children_count": 3,
-#   },
-#   ...
-# ]
-```
-
----
-
-## Approach 4: RSS Feed (Lightweight Metadata)
-
-`GET https://{subdomain}.substack.com/feed`
-
-Returns an RSS 2.0 feed. Useful when you only need title/date/link/description
-without hitting the JSON API. Works as a quick check without parsing JSON.
-
-```python
-from helpers import http_get
-import re
-
-def substack_rss(publication_url, max_items=20):
-    """Fetch recent post metadata via RSS feed.
-
-    Lighter than the JSON API — only returns title, link, pubDate,
-    and description (short excerpt). Does not include body_html or wordcount.
-    """
-    rss = http_get(f"{publication_url.rstrip('/')}/feed")
-    items = re.findall(
-        r'<item>(.*?)</item>',
-        rss,
-        re.DOTALL
-    )[:max_items]
-
-    results = []
-    for item in items:
-        title = re.search(r'<title><!\[CDATA\[(.*?)\]\]></title>', item)
-        link  = re.search(r'<link>(https?://[^<]+)</link>', item)
-        date  = re.search(r'<pubDate>(.*?)</pubDate>', item)
-        desc  = re.search(r'<description><!\[CDATA\[(.*?)\]\]></description>', item, re.DOTALL)
-        results.append({
-            "title":       title.group(1) if title else None,
-            "link":        link.group(1) if link else None,
-            "pub_date":    date.group(1) if date else None,
-            "description": desc.group(1).strip() if desc else None,
-        })
-    return results
-
-feed = substack_rss("https://www.slowboring.com", max_items=5)
-# [
-#   {
-#     "title":    "Sunday Mailbag + Thread",
-#     "link":     "https://www.slowboring.com/p/sunday-mailbag-thread-48b",
-#     "pub_date": "Sun, 26 Apr 2026 17:02:04 GMT",
-#     "description": "Ask your questions below.",
-#   },
-#   ...
-# ]
-```
-
----
-
-## Publication URL Formats
-
-Substack publications use one of two URL formats:
-
-```python
-# Format 1: native subdomain (older or simpler publications)
-"https://simonwillison.substack.com"
-
-# Format 2: custom domain (larger publications, purchased domain)
-"https://www.slowboring.com"         # Matthew Yglesias — Slow Boring
-"https://unchartedterritories.tomaspueyo.com"   # Tomas Pueyo
-
-# Both formats use identical API paths:
-# {base_url}/api/v1/posts
-# {base_url}/api/v1/posts/{slug}
-# {base_url}/api/v1/post/{post_id}/comments
-# {base_url}/feed
-```
-
-If you only know a publication's Substack handle (e.g., `matthewyglesias`),
-the canonical subdomain URL is `https://matthewyglesias.substack.com`. Custom
-domain URLs are listed on the publication's about page or in the RSS feed's
-`<link>` element.
-
----
-
-## Gotchas
-
-- **Paywalled post `body_html` is a truncated preview, not `null`** — the API
-  returns a short HTML excerpt (typically a few hundred to a few KB). It is
-  never `null`. The reliable way to detect full content availability is
-  `audience == "everyone"`. For paywalled posts, compare `len(body_html)` to
-  `wordcount * ~7` (average bytes per word) — a large gap means truncation.
-  `truncated_body_text` (plaintext) is always present regardless of audience.
-- **Comments endpoint uses integer `post_id`, not slug** — `/api/v1/post/{id}/comments`
-  is correct. `/api/v1/posts/{slug}/comments` returns 404.
-- **`reactions` field is a dict with emoji keys**, e.g. `{"❤": 221}` — not a
-  plain integer. Sum the values for total reaction count:
-  `total = sum(post["reactions"].values())`.
-- **`limit` on post list is not strictly capped** — values up to at least 100
-  work; beyond that behavior is untested.
-- **Custom domains and `{name}.substack.com` are interchangeable** — use
-  whichever you have. The `x-sub` response header always reflects the internal
-  publication handle.
-- **`audience` values**: only `"everyone"` and `"only_paid"` observed. A third
-  value `"founding"` exists in Substack's data model but is rare.
-- **No unauthenticated cross-publication search** — `substack.com/api/v1/search`
-  returns HTML (a React page), not JSON. To find publications, use external
-  search engines (`site:substack.com {query}`) or the RSS discovery approach.
-- **Podcast posts** have `type == "podcast"` and `podcast_url` set; their
-  `body_html` may be a show-notes HTML block. Check `type` to distinguish
-  newsletter posts from podcast episodes.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/thetechgeeks/pricing.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/thetechgeeks/pricing.md
deleted file mode 100644
index bd62ae0c4..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/thetechgeeks/pricing.md
+++ /dev/null
@@ -1,52 +0,0 @@
-# The Tech Geeks AU — Ubiquiti pricing
-
-`https://thetechgeeks.com` — Shopify Online Store 2.0, AU Ubiquiti reseller. Prices GST-inclusive ("All Prices Include Australian GST At 10%" in footer).
-
-## Do this first
-
-**Hit the `.js` endpoint, not the DOM.** Shopify exposes canonical product JSON — no scraping, no screenshots.
-
-```python
-import json
-d = json.loads(http_get(f"https://thetechgeeks.com/products/{handle}.js"))
-# {'title', 'price' (AUD cents — divide by 100), 'available', 'variants', 'compare_at_price', ...}
-```
-
-Use this for title / SKU / price. One `http_get` replaces `goto + wait_for_load + screenshot + regex`.
-
-## Do NOT trust `.js.available` for stock
-
-Tech Geeks marks many in-stock products `available: false` (backorder / order-from-supplier). Verified counterexample: UDM-Pro-Max `available: true`, U6-LR `available: false` but Add-to-cart live. To know real stock, cross-check the DOM:
-
-- `document.querySelector('.price--sold-out')` present → truly sold out
-- Body text contains "Sold out" (case-insensitive) near the product title → sold out
-- `document.querySelector('product-form__submit[disabled]')` → sold out
-
-Only if `.js.available = false` AND one of the above fires is the product actually unbuyable.
-
-## Sold-out pages have junk prices
-
-Confirmed: UACC-Rack-12U-Wall listed **$3,080 AUD** (real AU street ~$420–$630). The sold-out listing carries stale / data-entry prices that nobody cleans up.
-
-**Sanity gate before reporting any Tech Geeks price:**
-
-1. If `.js.available = false`, treat the price as unverified.
-2. If the price deviates >2× from another AU vendor or the `store.ui.com` MSRP for the same SKU, assume Tech Geeks is wrong — not the other source.
-3. Only in-stock prices should land in a final table.
-
-## Finding the right product URL
-
-Slugs are long marketing titles, not SKUs. Don't guess. Two reliable shortcuts:
-
-- `https://thetechgeeks.com/search?q=<SKU>` → scrape first `a[href*="/products/"]` link
-- Google `site:thetechgeeks.com <SKU>` when the internal search misses
-
-## Known gaps in their Ubiquiti catalogue (as of 2026-04)
-
-- **USP-PDU-Pro** — not stocked (no AU-plug Ubiquiti SKU exists anywhere in AU; region-wide gap, not a Tech Geeks issue)
-- **U-Cable-C6-CMP** (plenum Cat6) — only **U-Cable-C6-CMR** (riser) is carried
-- `available: false` is common even on items they'll still order in
-
-## Don't use a browser for this
-
-Product pages are static HTML + one JSON endpoint. `http_get` over `asyncio`/`ThreadPoolExecutor` fetches all SKUs in <5s. CDP is wasted here unless you need to click through a cart / checkout flow.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/tiktok/upload.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/tiktok/upload.md
deleted file mode 100644
index c356560a7..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/tiktok/upload.md
+++ /dev/null
@@ -1,107 +0,0 @@
-# TikTok Studio — Upload Video
-
-URL: `https://www.tiktok.com/tiktokstudio/upload?from=upload&lang=en` (always append `&lang=en`)
-
-## Prerequisites
-
-- Logged into TikTok in the Chrome profile browser-harness is attached to
-- Video file on local disk (mp4, <50MB)
-
-## Stale draft banner
-
-TikTok shows "A video you were editing wasn't saved" if a previous upload was abandoned. Dismiss it:
-
-1. Find the banner Discard button (y < 300 in the page)
-2. CDP `click_at_xy(x, y)` on it
-3. A confirmation modal appears — find the red Discard button (y > 300) and CDP `click_at_xy(x, y)`
-4. Repeat if multiple stale drafts are stacked
-
-## Upload flow
-
-### 1. Attach file
-
-```python
-upload_file('input[type="file"]', "/path/to/video.mp4")
-wait(12)  # processing takes ~10s for 5-10MB
-```
-
-### 2. Caption
-
-TikTok pre-fills caption with the filename. Clear it first:
-
-```python
-js("document.querySelector('div[contenteditable=\"true\"][role=\"combobox\"]').focus()")
-press_key("End")
-for _ in range(25): press_key("Backspace")  # clear filename
-type_text("your caption here #hashtag1 #hashtag2")
-press_key("Escape")  # dismiss hashtag suggestions
-click_at_xy(700, 50)        # click away to deselect
-```
-
-Verify: `js('document.querySelector(\'div[contenteditable="true"][role="combobox"]\').innerText')`
-
-### 3. Schedule
-
-Click the Schedule radio label:
-```python
-js("(()=>{var l=document.querySelectorAll('label');for(var i=0;i<l.length;i++){if(l[i].textContent.trim()==='Schedule'){l[i].click();break}}})()")
-```
-
-**Time picker** — uses a scroll-wheel list, NOT a native select. Each `scroll(dy=32)` steps +1 unit, `dy=-32` steps -1 unit.
-
-```python
-# 1. ScrollIntoView and open the time picker
-js("...scrollIntoView the time input...")
-click_at_xy(time_input_x, time_input_y)
-
-# 2. Read default time, calculate difference
-default_hour, default_min = 13, 5  # from input value
-target_hour, target_min = 20, 25
-
-# 3. Scroll hour column (left, x ≈ 349)
-for _ in range(target_hour - default_hour):
-    scroll(349, dropdown_y, dy=32)  # +1 hour per step
-
-# 4. Scroll minute column (right, x ≈ 437)
-for _ in range((target_min - default_min) // 5):
-    scroll(437, dropdown_y, dy=32)  # +5 min per step
-
-# 5. Close and verify
-press_key("Escape")
-```
-
-**Date picker** — click the date input, then click the target day number span.
-
-### 4. AI-generated content disclosure
-
-Under "Show more" section. Toggle is `[aria-checked]` inside the "AI-generated content" parent.
-
-```python
-# Expand settings
-js("...click 'Show more' span...")
-# ScrollIntoView the toggle
-js("...scrollIntoView 'ai-generated content' span...")
-# Read state and click if false
-# A "Turn on" confirmation dialog may appear — click it
-```
-
-### 5. Submit
-
-Scroll the Schedule button into view, then CDP `click_at_xy(x, y)`. After success, page redirects to `/tiktokstudio/content`.
-
-```python
-js("...scrollIntoView Schedule button (offsetWidth > 100)...")
-click_at_xy(button_x, button_y)
-wait(6)
-assert "content" in page_info()["url"]
-```
-
-## Gotchas
-
-- **JS `.click()` doesn't work on TikTok's time picker items** — must use CDP `click_at_xy(x, y)`
-- **Time picker uses virtual scroll** — `scroll(x, y, dy=32)` changes value, NOT regular DOM scroll
-- **Caption contenteditable appends on type** — always clear with End + Backspace first, never set innerHTML (breaks React state)
-- **beforeunload dialog** blocks navigation if upload is in progress — use `cdp("Page.handleJavaScriptDialog", accept=True)` to dismiss (see `interaction-skills/dialogs.md`)
-- **Schedule button text** is "Schedule" only after the Schedule radio is selected (otherwise "Post")
-- **"Show more" section** expands the page and pushes the time picker off-viewport — collapse it before adjusting time, expand after
-- **Unicode narrow no-break space** (char 8239) appears between time and AM/PM in scheduled post listings — use `.indexOf('12:30')` not exact string match
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/tradingview/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/tradingview/scraping.md
deleted file mode 100644
index d4d9839ba..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/tradingview/scraping.md
+++ /dev/null
@@ -1,309 +0,0 @@
-# TradingView — Scraping & Data Extraction
-
-`https://www.tradingview.com` — charting platform with multiple internal REST APIs. Stock/crypto/forex screener and symbol search work without auth. Use `http_get` or raw `urllib` for all workflows except JS-rendered chart pages.
-
-## Do this first
-
-**Use the scanner API for bulk screener data — one POST, no browser, full column control.**
-
-```python
-import json, urllib.request
-
-def tv_scan(payload, market="america"):
-    data = json.dumps(payload).encode()
-    req = urllib.request.Request(
-        f"https://scanner.tradingview.com/{market}/scan",
-        data=data,
-        headers={"Content-Type": "application/json", "User-Agent": "Mozilla/5.0"}
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        return json.loads(r.read())
-```
-
-**No auth, no Referer, no cookies required for the scanner.** Responses arrive in ~200ms.
-
-## Common workflows
-
-### Top stocks by market cap (screener)
-
-```python
-import json, urllib.request
-
-payload = {
-    "filter": [],
-    "options": {"lang": "en"},
-    "columns": ["name", "close", "change", "volume", "market_cap_basic"],
-    "sort": {"sortBy": "market_cap_basic", "sortOrder": "desc"},
-    "range": [0, 10]   # [start, end] — half-open, so this returns rows 0–9
-}
-
-data = json.dumps(payload).encode()
-req = urllib.request.Request(
-    "https://scanner.tradingview.com/america/scan",
-    data=data,
-    headers={"Content-Type": "application/json", "User-Agent": "Mozilla/5.0"}
-)
-with urllib.request.urlopen(req, timeout=20) as r:
-    resp = json.loads(r.read())
-
-# resp["totalCount"] = 19549 (all US-listed instruments)
-# resp["data"] is a list of {"s": "NASDAQ:NVDA", "d": [col0, col1, ...]}
-# "d" values align positionally with "columns" in the payload
-
-cols = payload["columns"]
-for item in resp["data"]:
-    row = dict(zip(cols, item["d"]))
-    symbol = item["s"]   # e.g. "NASDAQ:AAPL"
-    print(symbol, row["close"], row["change"], row["market_cap_basic"])
-# NASDAQ:NVDA  201.68  1.68  4900823822021.0
-# NASDAQ:AAPL  270.23  2.59  3967284528489.0
-# ...
-```
-
-**Critical**: `"d"` is a plain positional array — index 0 = columns[0], index 1 = columns[1], etc. There are no keys in the row data itself.
-
-### Pagination
-
-```python
-# Page 1: range [0, 20]
-# Page 2: range [20, 40]
-payload["range"] = [20, 40]
-```
-
-### Filtering stocks
-
-```python
-payload = {
-    "filter": [
-        {"left": "market_cap_basic", "operation": "greater", "right": 10_000_000_000},
-        {"left": "volume",           "operation": "greater", "right": 5_000_000},
-        {"left": "change",           "operation": "in_range", "right": [2, 10]},
-        {"left": "exchange",         "operation": "equal",   "right": "NASDAQ"},
-        {"left": "sector",           "operation": "equal",   "right": "Electronic Technology"},
-    ],
-    "columns": ["name", "close", "change", "volume", "market_cap_basic",
-                "description", "sector", "industry"],
-    "sort": {"sortBy": "market_cap_basic", "sortOrder": "desc"},
-    "range": [0, 20]
-}
-```
-
-Valid filter operations: `greater`, `less`, `equal`, `in_range` (right = [min, max]), `match` (substring on `name`).
-
-Sector names use TradingView taxonomy (not GICS). Confirmed working values:
-- `"Electronic Technology"` — NVDA, AAPL, TSM
-- `"Technology Services"` — MSFT, GOOGL, META
-- `"Finance"`, `"Health Technology"`, `"Consumer Non-Durables"`
-
-### Full list of tested valid column names
-
-```python
-# Price & volume
-"name"                     # ticker (e.g. "AAPL")
-"description"              # full name ("Apple Inc.")
-"close"                    # last price
-"open", "high", "low"
-"volume"
-"change"                   # % change today
-"change_abs"               # absolute price change
-"change|1M"                # 1-month % change (also: |6M, |1Y)
-"High.1M", "High.6M"       # period high
-"High.All", "Low.All"      # all-time high/low
-"price_52_week_high"       # confirmed works
-"price_52_week_low"        # confirmed works
-"premarket_change"         # pre-market %
-"postmarket_change"        # after-hours %
-"gap"                      # overnight gap %
-"change_from_open_abs"     # intraday move from open
-"average_volume_10d_calc"  # 10-day avg volume
-"relative_volume_10d_calc" # relative volume vs 10-day avg
-"relative_volume_intraday|5"  # intraday relative vol (5m bars)
-
-# Fundamentals
-"market_cap_basic"          # market cap in USD
-"earnings_per_share_diluted_ttm"  # EPS TTM
-"price_earnings_ttm"        # P/E TTM
-"P/E"                       # P/E (snapshot)
-"dividends_yield"           # dividend yield %
-"beta_1_year"               # beta
-"float_shares_outstanding"  # float shares
-
-# Technical ratings & indicators
-"Recommend.All"   # composite rating: -1 (strong sell) to +1 (strong buy)
-"RSI"             # RSI 14
-"MACD.macd"       # MACD line
-
-# Classification
-"sector", "industry", "country", "exchange"
-"type"    # "stock", "fund", "dr" (depository receipt), etc.
-
-# NOTE: "52_week_high" / "52_week_low" are INVALID — use "price_52_week_high" / "price_52_week_low"
-# NOTE: "EPS_diluted_net" is INVALID — use "earnings_per_share_diluted_ttm"
-```
-
-Bad columns return HTTP 400 with `{"error": "Unknown field \"X\""}`.
-
-### Other scanner markets
-
-```python
-# market argument options (confirmed working):
-# "america"  — US equities (19,549 instruments)
-# "crypto"   — crypto across exchanges (56,455 instruments)
-# "forex"    — FX pairs (6,401 instruments)
-# "futures"  — futures (53,947 instruments)
-
-# Crypto example
-payload = {
-    "filter": [],
-    "columns": ["name", "close", "change", "volume", "market_cap_calc"],
-    "sort": {"sortBy": "market_cap_calc", "sortOrder": "desc"},
-    "range": [0, 10]
-}
-resp = tv_scan(payload, market="crypto")
-# Returns BTC, ETH, etc. across Binance, Bybit, OKX...
-```
-
-### Symbol search (requires Origin header)
-
-```python
-import json, urllib.request
-
-def symbol_search(query, exchange="", type_filter="", limit=50):
-    url = (
-        f"https://symbol-search.tradingview.com/symbol_search/v3/"
-        f"?text={query}&hl=1&exchange={exchange}&lang=en"
-        f"&search_type={type_filter or 'undefined'}&domain=production"
-    )
-    req = urllib.request.Request(url, headers={
-        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
-        "Origin": "https://www.tradingview.com",   # REQUIRED — 403 without this
-    })
-    with urllib.request.urlopen(req, timeout=15) as r:
-        return json.loads(r.read())
-
-result = symbol_search("AAPL")
-# result["symbols_remaining"] = 137
-# result["symbols"] = list of up to 50 matches
-# result["symbols"][0] keys:
-#   symbol, description, type, exchange, country, currency_code,
-#   cusip, isin, cik_code, logoid, provider_id, source_id,
-#   is_primary_listing, typespecs
-```
-
-**Gotcha**: `symbol-search.tradingview.com` requires `Origin: https://www.tradingview.com`. Referer alone is not enough. The scanner API does NOT need Origin or Referer.
-
-Filter by exchange and type:
-
-```python
-# Exact match on NASDAQ:AAPL
-result = symbol_search("AAPL", exchange="NASDAQ", type_filter="stock")
-# Returns 1 result — exact symbol only when exchange is specified
-```
-
-### News headlines for a symbol
-
-```python
-import json, urllib.request
-
-def get_news(symbol, limit=20):
-    # symbol format: "NASDAQ:AAPL", "NYSE:TSLA"
-    url = (
-        f"https://news-headlines.tradingview.com/v2/view/headlines/symbol"
-        f"?symbol={symbol}&client=web&streaming=false&lang=en&limit={limit}"
-    )
-    req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
-    with urllib.request.urlopen(req, timeout=15) as r:
-        data = json.loads(r.read())
-    return data["items"]  # list of news items
-
-items = get_news("NASDAQ:AAPL", limit=10)
-# item keys: id, title, provider, sourceLogoId, published (unix ts),
-#            source, urgency, link, permission, relatedSymbols, storyPath
-# example:
-# items[0]["title"]     = "Apple Clears Major Legal Hurdle..."
-# items[0]["published"] = 1776472317  (unix timestamp)
-# items[0]["link"]      = "https://stocktwits.com/..."
-# items[0]["relatedSymbols"] = [{"symbol": "NASDAQ:AAPL", "logoid": "apple"}]
-```
-
-No auth or special headers needed. Returns up to 200 items per request.
-
-### Published trading ideas feed
-
-```python
-import json, urllib.request
-
-def get_ideas(sort="trending", page=1, symbol=None):
-    # Valid sort values (others return 400):
-    # "trending", "recent", "latest_popular", "week_popular",
-    # "suggested", "recent_extended", "picked_time"
-    url = f"https://www.tradingview.com/api/v1/ideas/?lang=en&sort={sort}&page={page}"
-    if symbol:
-        url += f"&symbol={symbol}"  # e.g. "NASDAQ:AAPL"
-    req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
-    with urllib.request.urlopen(req, timeout=15) as r:
-        return json.loads(r.read())
-
-data = get_ideas("trending")
-# data["count"]      = 1000 (always 1000 — soft cap)
-# data["page_size"]  = 20
-# data["page_count"] = 50
-# data["next"]       = "https://www.tradingview.com/api/v1/ideas/?page=2"
-# data["results"]    = list of idea objects
-
-idea = data["results"][0]
-# idea keys: id, name, description, created_at, chart_url, views_count,
-#            likes_count, comments_count, is_video, is_education, is_hot,
-#            symbol (dict with name/exchange/type/interval/direction),
-#            user (dict with username/is_pro/badges), image (big/middle URLs)
-# idea["symbol"]["direction"]: 1=long, 2=short, 0=neutral
-
-# Filter by symbol:
-aapl_ideas = get_ideas(symbol="NASDAQ:AAPL")
-```
-
-## API summary table
-
-| Endpoint | Auth | Headers needed | Speed |
-|---|---|---|---|
-| `scanner.tradingview.com/{market}/scan` | None | None | ~200ms |
-| `symbol-search.tradingview.com/symbol_search/v3/` | None | `Origin: https://www.tradingview.com` | ~150ms |
-| `symbol-search.tradingview.com/symbol_search/` (v1) | None | `Origin: https://www.tradingview.com` | ~100ms |
-| `news-headlines.tradingview.com/v2/view/headlines/symbol` | None | None | ~400ms |
-| `www.tradingview.com/api/v1/ideas/` | None | None | ~300ms |
-| `data.tradingview.com/quotes/` | None | None | **Dead** — connection refused |
-| `economic-calendar.tradingview.com/events` | Yes | — | HTTP 403 |
-
-## Gotchas
-
-**Scanner `range` is half-open**: `[0, 10]` returns rows 0–9 (10 rows total). `[10, 20]` for the next page.
-
-**Column order is critical**: The `"d"` array in each result row is positional — it exactly mirrors your `"columns"` array. Always zip them: `dict(zip(columns, item["d"]))`.
-
-**`data.tradingview.com/quotes/` is dead**: The URL `https://data.tradingview.com/quotes/?symbols=NASDAQ:AAPL` closes the connection without a response. Use the scanner API instead for real-time quotes.
-
-**Scanner needs no Referer**: `scanner.tradingview.com` works with just `User-Agent`. The symbol-search subdomain checks `Origin` (CORS enforcement on the server side).
-
-**Symbol search highlights**: The v3 endpoint wraps matched text in `<em>` tags (e.g. `"<em>AAPL</em>"`). Strip them: `re.sub(r'</?em>', '', symbol["symbol"])`.
-
-**Ideas sort validation**: Only specific values work. `"sort=popular"` returns 400. Use `"trending"`, `"recent"`, `"latest_popular"`, `"week_popular"`, `"suggested"`.
-
-**Ideas count cap**: The API always reports `count=1000` regardless of actual corpus size. With `page_size=20`, max pages is 50.
-
-**Scanner server is AWS CloudFront** (`X-Amz-Cf-Pop` header) with a custom `Server: tv` — no Cloudflare. No anti-bot on the scanner subdomain. Main `www.tradingview.com` is a React SPA with `window.initData = {}` (empty — no embedded data). All data is loaded via API calls after hydration.
-
-**Rate limits**: No 429s observed in testing. 5 concurrent scanner calls complete in ~1s. Symbol search returns `symbols_remaining` in the response (counts against some quota — varies 90–180 across calls but never blocks). Observed no blocking after 15 rapid calls in a row.
-
-**Sector names**: Use TradingView's own taxonomy, not GICS. "Technology" does not exist — use `"Electronic Technology"` (hardware/semis) or `"Technology Services"` (software/internet).
-
-## When to use the browser
-
-The charting UI (`/chart/`), symbol detail pages (`/symbols/NASDAQ-AAPL/`), and the ideas page (`/ideas/`) are React SPAs — their visible data comes from the APIs above, not embedded HTML. Use browser + JS extraction only if you need visual chart screenshots or data from auth-gated pages (watchlists, portfolio, paper trading).
-
-```python
-# Only if you need a chart screenshot:
-goto_url("https://www.tradingview.com/chart/?symbol=NASDAQ:AAPL")
-wait_for_load()
-wait(3)   # chart renders asynchronously after readyState
-capture_screenshot("/tmp/aapl_chart.png", full=False)
-```
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/trello/boards-and-lists.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/trello/boards-and-lists.md
deleted file mode 100644
index 04ca76826..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/trello/boards-and-lists.md
+++ /dev/null
@@ -1,88 +0,0 @@
-# Trello — boards and lists
-
-Read-only pattern for Trello: land on a board URL, scrape lists and their
-cards. Works for signed-in agents that inherit Tom's Chrome session —
-no login dance, no captcha.
-
-## URL patterns
-
-- `https://trello.com/` — marketing home for anonymous; when signed in,
-  redirects to `https://trello.com/u/<username>/boards` (the dashboard).
-  Note: `https://trello.com/boards` (no `/u/<username>`) returns an
-  error page for signed-in users. Always route via `/u/<username>/boards`.
-- Board URL: `https://trello.com/b/<8-char-boardId>/<slug>`. The `slug`
-  is cosmetic — the boardId is the canonical identifier.
-- Card URL: `https://trello.com/c/<8-char-cardId>/<n>-<slug>`, where
-  `<n>` is the card's 1-based position within its list at creation time.
-
-## Stable selectors
-
-- List: `[data-testid="list"]` — or `[data-list-id]` for the id. Returns
-  exactly the lists on the board, no sidebar noise.
-- List header: within a list, `[data-testid="list-header-name"]`.
-  Falling back to `h2` inside the list also works, but the board's
-  outer chrome has h2s too (sidebar plans etc.) — stay scoped to the
-  list element.
-- Card: `a[href*="/c/"]` — scoped to each list element. The `href` is
-  the canonical link; the visible text is the card name.
-- Card name within the anchor: `[data-testid="card-name"]`. Anchor
-  `innerText` is a reasonable fallback.
-
-## Site structure
-
-The board page is a single-page React app. All lists render in one pass
-into horizontally scrolling columns. Cards within a list are vertical.
-Count via `document.querySelectorAll('[data-testid="list"]').length` —
-matches the visible column count.
-
-## Framework / interaction quirks
-
-- `goto_url('https://trello.com/')` redirects asynchronously to the user's
-  boards dashboard. After `wait_for_load()`, the URL in `page_info()`
-  will be `.../u/<username>/boards`. Don't hard-code the username;
-  read it back from `page_info()['url']` if you need it.
-- On initial load the board's lists can take ~1-2 s to render after
-  `wait_for_load()` returns. A brief `time.sleep(2)` before scraping is
-  reliable; alternatively, wait for `[data-testid="list"]` to be
-  present with a count > 0.
-
-## Waits
-
-- `wait_for_load()` on the board URL returns before all lists/cards are
-  in the DOM. The list containers appear first (empty), then cards
-  populate. Wait for a non-zero card count inside a list before
-  declaring "done".
-
-## Traps
-
-- A board user-link scrape (`a[href^="/b/"]`) on the dashboard returns
-  each board twice (recent + starred/all). Deduplicate by `href`.
-- Sidebar h2 headings include plan names ("Standard", "Premium") —
-  don't treat them as list names. Scope header reads to elements
-  inside `[data-testid="list"]`.
-
-## Read-only, one-hit scrape (JS-as-extract, fast path)
-
-```js
-(() => Array.from(document.querySelectorAll('[data-testid="list"]'))
-  .map(list => ({
-    name: list.querySelector('[data-testid="list-header-name"]')?.innerText?.trim(),
-    cards: Array.from(list.querySelectorAll('a[href*="/c/"]'))
-      .map(a => ({
-        href: a.getAttribute('href'),
-        title: (a.querySelector('[data-testid="card-name"]')
-                ?? a).innerText.trim()
-      }))
-      .filter(c => c.title)
-  })))()
-```
-
-Wrap in `js(...)` + `json.loads()` and you have structured board data
-in one round trip.
-
-## What NOT to capture here
-
-Private API endpoints (Trello exposes `https://trello.com/1/...` REST
-endpoints, but those need a token). Keep this file DOM-only. If you
-discover public API shapes worth documenting, put them in a
-separate `api.md` under this skill folder.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/trustpilot/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/trustpilot/scraping.md
deleted file mode 100644
index 3236e8bd8..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/trustpilot/scraping.md
+++ /dev/null
@@ -1,375 +0,0 @@
-# Trustpilot — Company Reviews Scraping
-
-Field-tested against trustpilot.com on 2026-04-18.
-`http_get` with a generic Mozilla/5.0 UA works — no JS challenge, no Cloudflare block.
-The Trustpilot Consumer API (`api.trustpilot.com`) returns 403 for all endpoints without an API key.
-
----
-
-## Fastest Approach: `http_get` + `__NEXT_DATA__`
-
-Trustpilot is a Next.js SSR app. Every company review page embeds the full data payload in a
-`<script id="__NEXT_DATA__">` JSON block — no browser needed. This includes the business unit
-metadata, all 20 reviews for the current page, pagination info, and rating distribution.
-
-```python
-import re, json
-from helpers import http_get
-
-def get_trustpilot_page(domain, page=1, stars=None, languages='en', verified=False):
-    """
-    Fetch one page of reviews for a company domain.
-    Returns (business_unit, reviews, pagination, rating_distribution).
-    Returns (None, [], {}, {}) if page is beyond the cap or no data.
-    """
-    url = f"https://www.trustpilot.com/review/{domain}?languages={languages}&page={page}"
-    if stars:
-        url += f"&stars={stars}"
-    if verified:
-        url += "&verified=true"
-
-    html = http_get(url)
-    m = re.search(
-        r'<script id="__NEXT_DATA__" type="application/json">(.*?)</script>',
-        html, re.DOTALL
-    )
-    if not m:
-        return None, [], {}, {}
-
-    data = json.loads(m.group(1))
-    pp = data['props']['pageProps']
-    bu = pp['businessUnit']
-    filters = pp.get('filters') or {}
-    pagination = filters.get('pagination', {})
-    ratings = filters.get('reviewStatistics', {}).get('ratings', {})
-    reviews = pp.get('reviews', [])
-
-    return bu, reviews, pagination, ratings
-```
-
----
-
-## Business Unit (Company) Metadata
-
-```python
-bu, reviews, pagination, ratings = get_trustpilot_page("amazon.com")
-
-# Confirmed fields (tested 2026-04-18):
-bu['id']               # '46ad346800006400050092d0'  — stable MongoDB ObjectId
-bu['displayName']      # 'Amazon'
-bu['identifyingName']  # 'www.amazon.com'
-bu['trustScore']       # 1.7  (float, 1.0–5.0)
-bu['stars']            # 1.5  (display stars: 1, 1.5, 2, 2.5 … 5)
-bu['numberOfReviews']  # 45228  — total across all languages
-bu['websiteUrl']       # 'https://www.amazon.com'
-bu['isClaimed']        # True/False
-bu['isClosed']         # True/False
-bu['isCollectingReviews']  # True/False
-
-# Rating distribution (from filters.reviewStatistics.ratings):
-ratings  # {'total': 45228, 'one': 29718, 'two': 2701, 'three': 1759, 'four': 2367, 'five': 8683}
-
-# Pagination (filtered count, default is English only):
-pagination  # {'currentPage': 1, 'perPage': 20, 'totalCount': 28039, 'totalPages': 1402}
-```
-
----
-
-## Review Fields
-
-Each review in the `reviews` list has these confirmed fields:
-
-```python
-review = {
-    'id':      '69e3103e09f46d6b5910f3c1',  # hex ObjectId, unique
-    'rating':  1,                             # int 1–5
-    'title':   'UNDELIVERABLE',
-    'text':    'UNDELIVERABLE\nThis is the only explanation...',
-    'language': 'en',
-    'likes':   0,                             # upvote count
-    'source':  'Organic',                     # 'Organic' or 'Invitation'
-    'filtered': False,
-    'isPending': False,
-
-    'dates': {
-        'experiencedDate': '2026-03-29T00:00:00.000Z',  # when they used the service
-        'publishedDate':   '2026-04-18T07:01:50.000Z',  # when review was posted
-        'updatedDate':     None,
-        'submittedDate':   None,
-    },
-
-    'consumer': {
-        'id':              '5cafe2feb158a8533b443467',
-        'displayName':     'Baldy Bloke',
-        'imageUrl':        'https://user-images.trustpilot.com/...',
-        'numberOfReviews': 17,
-        'countryCode':     'GB',
-        'hasImage':        True,
-        'isVerified':      False,
-    },
-
-    'labels': {
-        'verification': {
-            'isVerified':        False,
-            'verificationLevel': 'not-verified',   # or 'verified'
-            'reviewSourceName':  'Organic',
-            'verificationSource': 'invitation',
-            'createdDateTime':   '2026-04-18T07:01:50.000Z',
-            'hasDachExclusion':  False,
-        },
-        'merged': None,
-    },
-
-    'reply': None,           # or {'message': '...', 'publishedDate': '...', 'updatedDate': None}
-    'location': None,        # populated for multi-location businesses
-    'productReviews': [],    # non-empty for product-level reviews
-}
-```
-
----
-
-## Paginating — Collect Up to 200 Reviews
-
-**Hard cap: pages 1–10 work; page 11+ returns an empty `reviews` array (no error, just empty).**
-This cap applies per filter combination, so `stars=1` gives 200 reviews, `stars=2` gives another
-200, etc.
-
-```python
-import re, json, time
-from helpers import http_get
-
-def collect_reviews(domain, stars=None, languages='en', max_pages=10, delay=0.5):
-    """
-    Collect up to max_pages*20 = 200 reviews. Returns list of review dicts.
-    stars: 1-5 to filter by rating (None = all)
-    languages: 'en' (default), 'all', or ISO code like 'de'
-    delay: seconds between requests (0.5 is safe; tested 5 rapid reqs with no block)
-    """
-    base = f"https://www.trustpilot.com/review/{domain}"
-    params = f"?languages={languages}"
-    if stars:
-        params += f"&stars={stars}"
-
-    all_reviews = []
-    seen_ids = set()
-
-    for page in range(1, max_pages + 1):
-        url = f"{base}{params}&page={page}"
-        html = http_get(url)
-        m = re.search(
-            r'<script id="__NEXT_DATA__" type="application/json">(.*?)</script>',
-            html, re.DOTALL
-        )
-        if not m:
-            break
-        data = json.loads(m.group(1))
-        reviews = data['props']['pageProps'].get('reviews', [])
-        if not reviews:
-            break   # hit the page 10 cap or truly no more reviews
-
-        new = [r for r in reviews if r['id'] not in seen_ids]
-        seen_ids.update(r['id'] for r in reviews)
-        all_reviews.extend(new)
-
-        if page < max_pages:
-            time.sleep(delay)
-
-    return all_reviews
-
-
-# Usage — 200 reviews per call:
-reviews = collect_reviews("shopify.com")               # English only, all ratings
-reviews_1star = collect_reviews("amazon.com", stars=1) # 200 x 1-star reviews
-reviews_all = collect_reviews("stripe.com", languages='all')  # all languages
-```
-
-### Maximize unique reviews by sweeping all star ratings
-
-Since each star filter gives an independent 200-review window, you can collect up to 1,000
-reviews per company (pages are deduplicated across filters):
-
-```python
-all_reviews = {}
-for stars in range(1, 6):
-    for r in collect_reviews("amazon.com", stars=stars, delay=0.5):
-        all_reviews[r['id']] = r
-
-print(f"Total unique reviews: {len(all_reviews)}")
-```
-
----
-
-## Filters Reference
-
-All filter params are appended to the base URL `https://www.trustpilot.com/review/{domain}`:
-
-| Param | Values | Notes |
-|---|---|---|
-| `page` | 1–10 | Page 11+ returns empty `reviews` (tested). 20 reviews per page. |
-| `languages` | `en`, `all`, `de`, `fr`, `it`, `nl`, `sv`, `da`… | Default is `en`. Use `all` for all languages. |
-| `stars` | `1`, `2`, `3`, `4`, `5` | Filter to that star rating only. Works correctly. |
-| `verified` | `true` | Returns only invitation-verified reviews. Amazon has only ~21 verified reviews total. |
-| `date` | `last30days`, `last6months`, `last12months` | Reflected in `filters.selected.date` but data volume unchanged vs no filter — server-side filtering may be best-effort. |
-| `sort` | `recency`, `highest_rated`, `lowest_rated`, `helpful` | The `sort` param is accepted but **ignored server-side** via SSR — `filters.selected.sort` always returns `recency`. Sort only works in browser JS navigation. |
-
----
-
-## Pagination Object
-
-```python
-# From filters.pagination (present on pages 1–10 when data exists):
-pagination = {
-    'currentPage': 1,
-    'perPage':     20,
-    'totalCount':  28039,   # filtered count (e.g. English only)
-    'totalPages':  1402,    # math: ceil(totalCount / 20)
-}
-
-# NOTE: totalPages can be 1402 but you can only access pages 1–10 (200 reviews).
-# On page 11+ the reviews list is empty and pagination is absent.
-```
-
----
-
-## Rate Limits and Anti-bot
-
-- **No Cloudflare, no DataDome** — plain HTTP with `Mozilla/5.0` UA works immediately (tested
-  5 rapid requests in <5 seconds without any block).
-- **No CAPTCHA** observed during any test run.
-- **No 429 / rate-limit headers** seen on rapid sequential requests.
-- Safe rate: 0.5s between requests is conservative. Tested 5 consecutive requests at natural
-  speed (0.2–1s each) with no issue.
-- **robots.txt** has `User-agent: * / Disallow: /` (all paths blocked for unnamed bots) and
-  explicitly blocks `anthropic-ai`, `ClaudeBot`, `Claude-User`, `Claude-SearchBot`, `GPTBot`,
-  `anthropic-ai`, `CCBot`, etc. Despite this, `http_get` with `Mozilla/5.0` UA is not blocked
-  server-side (robots.txt is advisory only). Respect the policy if operating at scale.
-
----
-
-## Consumer API (`api.trustpilot.com`)
-
-All Consumer API endpoints require an API key (OAuth2 client credentials). Without a key:
-
-```
-GET https://api.trustpilot.com/v1/business-units/find?name=amazon.com  → 403 Forbidden
-GET https://api.trustpilot.com/v1/business-units/{id}/reviews          → 403 Forbidden
-```
-
-The Business Unit ID embedded in `__NEXT_DATA__` (`businessUnit.id`) is the same ID used in the
-Consumer API, so if you have an API key, you can use it directly without a separate lookup.
-
----
-
-## Gotchas
-
-1. **Page cap is 10, not `totalPages`**: `filters.pagination.totalPages` may show 1402, but
-   requests for pages 11+ return `reviews: []` silently. The server-rendered SSR cap is
-   hard-coded at page 10 (200 reviews).
-
-2. **`totalCount` in pagination is language-filtered**: With `languages=en`, `totalCount` is the
-   English-only count (e.g. 28,039 for Amazon). `businessUnit.numberOfReviews` is the true total
-   across all languages (45,228). Use `languages=all` to see the full count in pagination.
-
-3. **Sort param ignored in SSR**: `?sort=highest_rated` is reflected in `filters.selected.sort`
-   in the JSON but the reviews returned are always `recency`-sorted. Sort only takes effect
-   via browser-side JS navigation.
-
-4. **Verified filter is narrow**: Amazon has 45,228 reviews but only 21 are `isVerified=True`
-   (verificationLevel = 'verified'). Most reviews are organic/not-verified. Page 1 of
-   `verified=true` shows a misleading `totalCount=28039` — page 2 corrects to `totalCount=21`.
-
-5. **`date` filter behavior**: The `date` param is reflected in `filters.selected.date` but the
-   total review counts and returned reviews do not visibly change vs no filter in testing. The
-   server may apply it only partially or it may affect ordering rather than filtering.
-
-6. **`languages=en` is the default** and the server returns it even without the param. Use
-   `languages=all` explicitly to get reviews in all languages.
-
-7. **No `__NEXT_DATA__` fallback**: Never observed an empty or missing `__NEXT_DATA__` on valid
-   company pages. If absent, the domain may not have a Trustpilot profile — check for a
-   redirect or 404 in the HTML title.
-
-8. **Stars `1.5` vs `2`**: `businessUnit.stars` uses half-star display values (1.5, 2.0, etc).
-   `businessUnit.trustScore` is the precise float (1.7). Use `trustScore` for numeric comparison.
-
----
-
-## Complete One-Shot Example
-
-```python
-import re, json, time
-from helpers import http_get
-
-def scrape_trustpilot(domain, max_unique=200):
-    """
-    Scrape up to max_unique reviews. Returns (company_info, reviews_list).
-    With max_unique=1000, sweeps all 5 star ratings to maximize coverage.
-    """
-    def _fetch_page(domain, page, stars=None, languages='en'):
-        url = f"https://www.trustpilot.com/review/{domain}?languages={languages}&page={page}"
-        if stars:
-            url += f"&stars={stars}"
-        html = http_get(url)
-        m = re.search(
-            r'<script id="__NEXT_DATA__" type="application/json">(.*?)</script>',
-            html, re.DOTALL
-        )
-        if not m:
-            return None, []
-        d = json.loads(m.group(1))
-        pp = d['props']['pageProps']
-        return pp['businessUnit'], pp.get('reviews', [])
-
-    company_info = None
-    all_reviews = {}
-
-    # First page to get company info
-    bu, reviews = _fetch_page(domain, 1)
-    company_info = {
-        'id':           bu['id'],
-        'name':         bu['displayName'],
-        'domain':       bu['identifyingName'],
-        'trust_score':  bu['trustScore'],
-        'stars':        bu['stars'],
-        'total_reviews': bu['numberOfReviews'],
-        'is_claimed':   bu['isClaimed'],
-    }
-    for r in reviews:
-        all_reviews[r['id']] = r
-
-    if max_unique <= 20:
-        return company_info, list(all_reviews.values())
-
-    # Pages 2–10 (no star filter)
-    for page in range(2, 11):
-        if len(all_reviews) >= max_unique:
-            break
-        _, reviews = _fetch_page(domain, page)
-        if not reviews:
-            break
-        for r in reviews:
-            all_reviews[r['id']] = r
-        time.sleep(0.5)
-
-    # If we want more, sweep by star rating
-    if len(all_reviews) < max_unique and max_unique > 200:
-        for stars in range(1, 6):
-            for page in range(1, 11):
-                if len(all_reviews) >= max_unique:
-                    break
-                _, reviews = _fetch_page(domain, page, stars=stars)
-                if not reviews:
-                    break
-                for r in reviews:
-                    all_reviews[r['id']] = r
-                time.sleep(0.5)
-
-    return company_info, list(all_reviews.values())[:max_unique]
-
-
-# Run it:
-company, reviews = scrape_trustpilot("shopify.com", max_unique=200)
-print(f"{company['name']} — TrustScore {company['trust_score']} — {company['total_reviews']} total reviews")
-print(f"Collected: {len(reviews)} reviews")
-print(f"Sample: [{reviews[0]['rating']}★] {reviews[0]['title'][:60]}")
-```
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/walmart/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/walmart/scraping.md
deleted file mode 100644
index 5a5c70a79..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/walmart/scraping.md
+++ /dev/null
@@ -1,444 +0,0 @@
-# Walmart — Product Search & Data Extraction
-
-Field-tested against walmart.com on 2026-04-18 using `http_get` (no browser required).
-All code blocks were run and outputs verified against live responses.
-
----
-
-## Fastest Approach: `http_get` with `__NEXT_DATA__`
-
-Walmart's Next.js SSR embeds the full search or product payload as JSON in a
-`<script id="__NEXT_DATA__">` tag. **No browser needed for search or product detail pages.**
-~2–3 s per page fetch; no CAPTCHA or session cookies required.
-
-### Critical UA rule
-
-| User-Agent | Result |
-|---|---|
-| `Mozilla/5.0` (bare) | Full HTML + `__NEXT_DATA__` — **use this** |
-| `Mozilla/5.0 ... Chrome/120 ...` (full) | PerimeterX "Robot or human?" challenge (200, 15 KB) |
-| `Safari/17` full UA | Works (full HTML, ~1.15 MB) |
-| `curl/7.x` | PerimeterX challenge |
-| `python-requests/2.31` | PerimeterX challenge |
-
-The bare `Mozilla/5.0` string bypasses PerimeterX. Any UA that looks like a headless
-client or includes a recognizable browser fingerprint triggers the JS challenge page.
-
-### Base fetch helper
-
-```python
-import json, re, gzip, urllib.request
-
-def fetch_walmart(url):
-    """
-    Fetch any walmart.com page.
-    Returns decoded HTML string.
-    Raises RuntimeError if PerimeterX bot challenge is returned.
-    """
-    req = urllib.request.Request(
-        url,
-        headers={"User-Agent": "Mozilla/5.0", "Accept-Encoding": "gzip"},
-    )
-    with urllib.request.urlopen(req, timeout=20) as r:
-        data = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            data = gzip.decompress(data)
-        html = data.decode()
-    if "Robot or human" in html:
-        raise RuntimeError(f"PerimeterX challenge triggered: {url}")
-    return html
-
-def parse_next_data(html):
-    m = re.search(r'id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-    if not m:
-        raise ValueError("__NEXT_DATA__ not found — page structure may have changed")
-    return json.loads(m.group(1))
-```
-
----
-
-## Search Results
-
-### URL patterns
-
-```python
-# Keyword search
-"https://www.walmart.com/search?q=laptop"
-
-# Pagination — append &page=N
-"https://www.walmart.com/search?q=laptop&page=2"
-
-# Sort options (confirmed working)
-"https://www.walmart.com/search?q=laptop&sort=best_match"    # default
-"https://www.walmart.com/search?q=laptop&sort=best_seller"
-"https://www.walmart.com/search?q=laptop&sort=price_low"
-"https://www.walmart.com/search?q=laptop&sort=customer_rating"
-
-# Price filter
-"https://www.walmart.com/search?q=laptop&min_price=200&max_price=500"
-
-# Browse by category (department ID path)
-"https://www.walmart.com/browse/electronics/laptops/3944_1089430_3951"
-```
-
-### `__NEXT_DATA__` path to items
-
-```
-data
-  .props.pageProps.initialData.searchResult
-    .aggregatedCount        — int: total matching products (e.g. 18818)
-    .paginationV2.maxPage   — int: last page number
-    .itemStacks[]           — array of stacks (usually 2: sponsored + organic)
-      .items[]              — array of product objects
-```
-
-### Full extractor (field-tested)
-
-```python
-def extract_search_results(html):
-    """
-    Returns (items, total_count, max_page).
-    items is a list of dicts with confirmed fields.
-    """
-    data = parse_next_data(html)
-    sr = data["props"]["pageProps"]["initialData"]["searchResult"]
-
-    items = []
-    for stack in sr.get("itemStacks", []):
-        for item in stack.get("items", []):
-            pi = item.get("priceInfo") or {}
-            img = item.get("imageInfo") or {}
-            rating = item.get("rating") or {}
-            avail = item.get("availabilityStatusV2") or {}
-            items.append({
-                "usItemId":        item.get("usItemId"),           # str, Walmart item ID
-                "name":            item.get("name"),               # str
-                "brand":           item.get("brand"),              # str or None
-                "price":           item.get("price"),              # int, current price in USD
-                "linePrice":       pi.get("linePrice"),            # str "$429.00"
-                "wasPrice":        pi.get("wasPrice") or None,     # str "$699.00" or None
-                "savings":         pi.get("savings") or None,      # str "SAVE $270.00" or None
-                "averageRating":   rating.get("averageRating"),    # float e.g. 4.3
-                "numberOfReviews": rating.get("numberOfReviews"),  # int
-                "availability":    avail.get("value"),             # "IN_STOCK" / "OUT_OF_STOCK"
-                "isSponsored":     bool(item.get("isSponsoredFlag")),
-                "url":             "https://www.walmart.com" + (item.get("canonicalUrl") or "").split("?")[0],
-                "thumbnailUrl":    img.get("thumbnailUrl"),
-            })
-
-    total = sr.get("aggregatedCount")
-    max_page = (sr.get("paginationV2") or {}).get("maxPage")
-    return items, total, max_page
-
-
-# Usage
-html = fetch_walmart("https://www.walmart.com/search?q=laptop")
-items, total, max_page = extract_search_results(html)
-# items: 66 items on page 1, total=18818, max_page=11
-
-# Filter out sponsored
-organic = [i for i in items if not i["isSponsored"]]
-```
-
-### Field notes (confirmed)
-
-- **`usItemId`**: string, matches the numeric ID at the end of `/ip/.../ITEMID` URLs.
-  Some non-product rows (ad widgets) have `usItemId=None` — filter with `if item.get("usItemId")`.
-- **`price`**: integer cents-less price (e.g. `429` for "$429.00"). Use `priceInfo.linePrice` for
-  the formatted string including the dollar sign.
-- **`wasPrice` / `savings`**: only present when item is on sale. Always `None` for full-price items.
-- **`isSponsoredFlag`**: the first batch of results across both itemStacks are frequently sponsored.
-  On a laptop search, ~56 of 66 SSR items carry `isSponsoredFlag: true`.
-- **`rating`**: present on ~91% of items (60/66 in test). `averageRating` is a float; `numberOfReviews` is int.
-- **`canonicalUrl`**: always includes `?classType=...&athbdg=...` query params — strip with `.split("?")[0]`
-  to get a clean URL.
-- **Two itemStacks**: Walmart returns two stacks (`itemStacks[0]` and `itemStacks[1]`). Merge them.
-  `itemStacks[0]` is the primary grid; `itemStacks[1]` is a secondary sponsored/related block.
-
-### Pagination
-
-```python
-for page in range(1, max_page + 1):
-    html = fetch_walmart(f"https://www.walmart.com/search?q=laptop&page={page}")
-    items, _, _ = extract_search_results(html)
-    # process items...
-```
-
-Page responses average ~2.5 s each. No rate-limiting was observed across 3 sequential requests.
-For bulk scraping, add a 1–2 s delay between requests to be safe.
-
----
-
-## Product Detail Page
-
-### URL pattern
-
-```
-https://www.walmart.com/ip/{slug}/{usItemId}
-```
-
-The slug is ignored in routing — only the numeric `usItemId` matters.
-These work identically:
-```
-https://www.walmart.com/ip/anything/19717318352
-https://www.walmart.com/ip/Apple-MacBook-Neo/19717318352
-```
-
-### `__NEXT_DATA__` path on a product page
-
-```
-data.props.pageProps.initialData.data
-  .product        — core product object
-  .idml           — long description, specs, highlights, warranty
-  .reviews        — rating breakdown + first 10 customer reviews (SSR)
-```
-
-### Full extractor (field-tested)
-
-```python
-def extract_product_detail(html):
-    """
-    Returns a dict with all confirmed product fields.
-    idml.specifications returns all spec rows as a flat dict.
-    reviews returns the SSR-rendered first 10 customer reviews.
-    """
-    data = parse_next_data(html)
-    d = data["props"]["pageProps"]["initialData"]["data"]
-    product = d["product"]
-    idml    = d.get("idml") or {}
-    reviews = d.get("reviews") or {}
-
-    pi = product.get("priceInfo") or {}
-    cp = pi.get("currentPrice") or {}
-    img = product.get("imageInfo") or {}
-    avail = product.get("availabilityStatusV2") or {}
-
-    specs = {
-        spec.get("name"): spec.get("value")
-        for spec in (idml.get("specifications") or [])
-    }
-
-    all_images = [
-        img_item.get("url")
-        for img_item in (img.get("allImages") or [])
-        if img_item.get("url")
-    ]
-
-    customer_reviews = [
-        {
-            "title":    r.get("reviewTitle"),
-            "rating":   r.get("rating"),           # int 1-5 (field is "rating", NOT "overallRating")
-            "text":     r.get("reviewText"),
-            "author":   r.get("userNickname"),
-            "date":     r.get("reviewSubmissionTime"),
-        }
-        for r in (reviews.get("customerReviews") or [])
-    ]
-
-    return {
-        # identity
-        "usItemId":            product.get("usItemId"),
-        "name":                product.get("name"),
-        "brand":               product.get("brand"),
-        "model":               product.get("model"),
-        "upc":                 product.get("upc"),
-        # price
-        "price":               cp.get("price"),            # float, e.g. 599
-        "priceString":         cp.get("priceString"),      # "$599.00"
-        "wasPrice":            (pi.get("wasPrice") or {}).get("priceString"),
-        "savings":             (pi.get("savings") or {}).get("savingsString"),
-        # availability
-        "availability":        avail.get("value"),         # "IN_STOCK" / "OUT_OF_STOCK"
-        "availabilityDisplay": avail.get("display"),       # "In stock"
-        # ratings
-        "averageRating":       product.get("averageRating"),
-        "numberOfReviews":     product.get("numberOfReviews"),
-        # text
-        "shortDescription":    product.get("shortDescription"),
-        "longDescription":     idml.get("longDescription"),  # HTML string
-        # media
-        "thumbnailUrl":        img.get("thumbnailUrl"),
-        "allImages":           all_images,          # up to 10 image URLs
-        # specs
-        "specifications":      specs,               # {"Brand": "Apple", "Processor": "A18 Pro", ...}
-        "highlights":          [                    # top highlighted specs with icons
-            {"name": h.get("name"), "value": h.get("value")}
-            for h in (idml.get("productHighlights") or [])
-        ],
-        # URL
-        "canonicalUrl":        "https://www.walmart.com" + (product.get("canonicalUrl") or ""),
-        # fulfillment
-        "fulfillmentOptions":  product.get("fulfillmentOptions") or [],
-        # reviews (SSR-rendered, first 10)
-        "reviewSummary": {
-            "averageOverallRating":    reviews.get("averageOverallRating"),
-            "totalReviewCount":        reviews.get("totalReviewCount"),
-            "reviewsWithTextCount":    reviews.get("reviewsWithTextCount"),
-            "recommendedPercentage":   reviews.get("recommendedPercentage"),
-        },
-        "customerReviews":     customer_reviews,
-    }
-
-
-# Usage
-url = "https://www.walmart.com/ip/Apple-MacBook-Neo/19717318352"
-html = fetch_walmart(url)
-product = extract_product_detail(html)
-
-# Example output (confirmed live):
-# product["name"]         → "Apple MacBook Neo 13-inch Apple A18 Pro chip..."
-# product["price"]        → 599
-# product["priceString"]  → "$599.00"
-# product["availability"] → "IN_STOCK"
-# product["model"]        → "MHFD4LL/A"
-# product["upc"]          → "195950852745"
-# len(product["specifications"])  → 29 spec rows
-# len(product["allImages"])       → 10
-# product["specifications"]["Processor"] → "A18 Pro"
-```
-
-### Field notes (confirmed)
-
-- **`averageRating` / `numberOfReviews`** on the product node: present for items with reviews.
-  New/few-review items may return `None` for both.
-- **`reviewSummary.averageOverallRating`** in the reviews node often differs slightly from
-  `product.averageRating` — the reviews node is more precise (e.g. `4.75` vs `4.8`).
-- **`customerReviews`** (SSR): always the first 10 reviews. The per-review rating field is `"rating"`
-  (int 1–5), **not** `"overallRating"` (which is always `None`).
-- **`longDescription`**: raw HTML string including `<ul>/<li>` tags. Strip tags before display.
-- **`specifications`**: flat dict — confirmed 29–31 rows for electronics. Key names use display labels
-  (e.g. `"RAM memory"`, `"Screen size"`, `"HD capacity"`).
-- **`wasPrice` / `savings`** on detail page: same as search — `None` when item is not discounted.
-- **No JSON-LD**: Walmart product pages do **not** include `<script type="application/ld+json">`.
-  All structured data lives in `__NEXT_DATA__`.
-
----
-
-## Anti-Bot: PerimeterX
-
-Walmart uses **PerimeterX** (app ID `PXu6b0qd2S`, confirmed in `runtimeConfig.perimeterX`).
-
-| Signal | Detail |
-|---|---|
-| Bot detector | PerimeterX |
-| Challenge page | "Robot or human?" — 200 OK, 15 KB HTML |
-| Triggered by | Full browser UA strings (Chrome, curl, python-requests) |
-| Bypassed by | `User-Agent: Mozilla/5.0` (bare prefix only) |
-| No JS execution | SSR response is complete — no JS challenge to solve |
-
-Detection in code:
-```python
-if "Robot or human" in html:
-    raise RuntimeError("PerimeterX challenge — switch to browser harness")
-```
-
-If `http_get` starts returning the challenge after a run of successful fetches, switch to the
-browser harness (see below).
-
----
-
-## Browser Harness Fallback
-
-Use the browser harness when:
-- PerimeterX starts blocking `http_get` on your IP
-- You need to interact with the page (add to cart, filter UI, infinite scroll)
-- You need variant switching (color/size selectors)
-
-```python
-# Browser-based search extraction
-new_tab("https://www.walmart.com/search?q=laptop")
-wait_for_load()
-wait(2)  # JS renders product cards after readyState=complete
-
-# Extract via __NEXT_DATA__ in-browser (identical structure to http_get)
-import json
-nd = js("document.getElementById('__NEXT_DATA__')?.textContent")
-data = json.loads(nd)
-sr = data["props"]["pageProps"]["initialData"]["searchResult"]
-items = []
-for stack in sr.get("itemStacks", []):
-    items.extend(stack.get("items", []))
-```
-
-### Browser selectors (confirmed working for DOM-based extraction)
-
-```python
-# Product cards on search results page
-results = js("""
-  Array.from(document.querySelectorAll('[data-item-id]')).map(el => ({
-    itemId:    el.getAttribute('data-item-id'),
-    name:      el.querySelector('[itemprop="name"]')?.innerText?.trim(),
-    price:     el.querySelector('[itemprop="price"]')?.getAttribute('content'),
-    url:       el.querySelector('a[link-identifier]')?.href,
-  })).filter(r => r.itemId)
-""")
-
-# If [data-item-id] misses items, use the Next.js data attribute alternative:
-results_alt = js("""
-  Array.from(document.querySelectorAll('[data-testid="list-view"]'))
-    .map(el => el.innerText.trim())
-""")
-```
-
-> **Prefer `__NEXT_DATA__` over DOM selectors** even in-browser — the JSON is complete and
-> stable. DOM class names at Walmart are obfuscated and change between deployments.
-
-### Session gotcha
-
-Always open Walmart with `new_tab()` on first visit:
-```python
-new_tab("https://www.walmart.com/search?q=laptop")
-wait_for_load()
-wait(2)
-```
-After that, `goto_url()` works normally within the same session.
-
----
-
-## Public API
-
-Walmart's affiliate/partner API (`developer.api.walmart.com`) requires a registered API key
-and returns HTTP 403 without one. No unauthenticated public product API is available.
-The `__NEXT_DATA__` SSR approach replaces any need for the official API for read-only data.
-
----
-
-## Gotchas
-
-- **UA must be `Mozilla/5.0` bare**: Any fuller string (Chrome, Safari, curl, requests) hits
-  PerimeterX. This is counterintuitive — the *shorter*, less realistic UA is the one that works.
-
-- **Regex must use `id=` attribute match**: The regex
-  `r'<script id="__NEXT_DATA__" type="application/json">...'` fails because the actual tag is
-  `<script id="__NEXT_DATA__">` without `type`. Use:
-  ```python
-  re.search(r'id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-  ```
-
-- **`usItemId` can be `None`**: ~5/66 items on a page are non-product ad widgets with no `usItemId`.
-  Always filter: `[i for i in items if i.get("usItemId")]`.
-
-- **Two `itemStacks`**: Walmart returns two stacks. Iterate over all stacks or you'll miss
-  ~10 items from the second stack.
-
-- **`canonicalUrl` includes tracking params**: Always strip with `.split("?")[0]`.
-
-- **Review field is `"rating"` not `"overallRating"`**: Each `customerReviews` entry has a `"rating"`
-  int field (1–5). The `"overallRating"` field is always `None`. Don't confuse with
-  `product.averageRating` (the aggregate float).
-
-- **No JSON-LD on product pages**: Zero `<script type="application/ld+json">` tags were found.
-  All structured data is in `__NEXT_DATA__`.
-
-- **`longDescription` is HTML**: Strip tags before text use. May contain promotional/financing copy
-  mixed with real product description.
-
-- **Page sizes vary**: Page 1 returned 66 items across 2 stacks; page 2 returned 55.
-  Do not assume a fixed items-per-page count.
-
-- **`http_get` default already sends `Mozilla/5.0`**: `helpers.http_get()` uses
-  `"User-Agent": "Mozilla/5.0"` by default — no override needed when calling it directly.
-  Only pass a custom `headers=` if you need to change something else.
-
-- **`developer.api.walmart.com`** returns HTTP 403 without an API key. Not usable for
-  unauthenticated scraping.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/wayback-machine/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/wayback-machine/scraping.md
deleted file mode 100644
index 81635198b..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/wayback-machine/scraping.md
+++ /dev/null
@@ -1,306 +0,0 @@
-# Wayback Machine — CDX API & Snapshot Retrieval
-
-`https://web.archive.org` — all public data, no auth or API key required. Everything here is pure `http_get` — no browser needed.
-
-> **NOTE:** A comprehensive Internet Archive skill (covering CDX, item metadata, and search) already exists at `domain-skills/archive-org/scraping.md`. This file is a focused, CDX-first quick-reference for Wayback Machine snapshot work specifically.
-
-## Start here: CDX API
-
-The CDX (Capture/Crawl Index) API is the single fastest way to query the Wayback Machine. It returns structured JSON and supports filtering, collapsing, pagination, and nearest-date lookups.
-
-```python
-import json
-
-# Find all snapshots of a URL — the minimal starting query
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=example.com&output=json&limit=10"
-    "&fl=timestamp,original,statuscode,mimetype,length",
-    timeout=40.0   # CDX is slow — never use less than 40s
-)
-rows = json.loads(r)
-# rows[0] is ALWAYS the header row — slice rows[1:] for data
-for ts, orig, status, mime, length in rows[1:]:
-    print(f"https://web.archive.org/web/{ts}/{orig}  [{status}]")
-```
-
-**All CDX values are strings**, even numeric ones (`status='200'`, `length='4821'`). Cast explicitly with `int()` / `float()`.
-
----
-
-## Core CDX patterns
-
-### Nearest snapshot to a target date
-
-```python
-import json
-
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=example.com&output=json&limit=1"
-    "&fl=timestamp,original,statuscode"
-    "&closest=20230601120000&sort=closest",
-    timeout=60.0
-)
-rows = json.loads(r)
-ts, orig, status = rows[1]   # rows[0] is header
-snap_url = f"https://web.archive.org/web/{ts}/{orig}"
-# Timestamp format: 14-digit YYYYMMDDHHMMSS
-# Prefix shorthand: '20230601' (day), '202306' (month), '2023' (year)
-```
-
-### One snapshot per month (collapsed)
-
-```python
-import json
-
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=example.com&output=json"
-    "&collapse=timestamp:6"   # :6 = one per YYYYMM
-    "&from=20220101&to=20230101"
-    "&fl=timestamp,original,statuscode",
-    timeout=60.0
-)
-rows = json.loads(r)
-for ts, orig, status in rows[1:]:
-    print(f"{ts[:4]}-{ts[4:6]}  https://web.archive.org/web/{ts}/{orig}")
-
-# collapse=timestamp:N — collapse by first N timestamp digits:
-#   :4 = one per year
-#   :6 = one per month   (most common)
-#   :8 = one per day
-#   :10 = one per hour
-# Keeps the FIRST capture of each period — not the last.
-```
-
-### All pages under a domain or path
-
-```python
-import json
-
-# matchType=prefix — all URLs starting with the given path
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=example.com/blog/&matchType=prefix&output=json"
-    "&limit=20&fl=timestamp,original,statuscode"
-    "&filter=statuscode:200",   # only successful captures
-    timeout=60.0
-)
-rows = json.loads(r)
-for row in rows[1:]:
-    print(row)
-
-# matchType options:
-#   exact   (default) — this URL only
-#   prefix  — URL + all subpaths
-#   host    — all subdomains of the host
-#   domain  — host + all subdomains (broadest)
-```
-
-### Filter by status code or MIME type
-
-```python
-import json
-
-# Only successful HTML captures — combine multiple filters
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=example.com&output=json"
-    "&filter=statuscode:200"
-    "&filter=mimetype:text/html"
-    "&fl=timestamp,original,length"
-    "&limit=10",
-    timeout=40.0
-)
-rows = json.loads(r)
-
-# filter= uses regex. Examples:
-#   &filter=statuscode:200          exact match
-#   &filter=!statuscode:200         negation (all non-200)
-#   &filter=statuscode:[23]..       2xx and 3xx only
-#   &filter=mimetype:text/html      HTML only
-#   &filter=original:.*\\.pdf       URLs ending in .pdf
-# Multiple &filter= params are ANDed together.
-```
-
-### CDX field reference
-
-| Field | Description | Example value |
-|---|---|---|
-| `urlkey` | SURT-format URL (reversed domain) | `com,example)/` |
-| `timestamp` | Capture time, 14-digit `YYYYMMDDHHMMSS` | `20230601114925` |
-| `original` | Original crawled URL (includes port if non-standard) | `https://example.com/` |
-| `mimetype` | Content-Type at crawl time | `text/html` |
-| `statuscode` | HTTP status at crawl time (string) | `200` |
-| `digest` | SHA-1 of body, base32-encoded | `I4YBMQ6PHPWE2TD6TIXNWHZB6MXRNTSR` |
-| `length` | Content-length in bytes (string) | `4821` |
-
-Default `fl=` when omitted: all 7 fields above in that order.
-
----
-
-## Availability API (DO NOT USE as primary)
-
-```python
-import json
-
-# WARNING: This API is BROKEN — returns empty archived_snapshots
-# for URLs that ARE in the archive. Confirmed broken 2026-04-18.
-# Use CDX with ?sort=closest&limit=1 instead (see above).
-
-# Left here for reference only — do not rely on it:
-r = http_get(
-    "https://archive.org/wayback/available"
-    "?url=example.com&timestamp=20240101",
-    timeout=20.0
-)
-data = json.loads(r)
-# Returns: {"url": "example.com", "archived_snapshots": {}}
-# Even for well-archived URLs. Do not trust empty results.
-
-# CORRECT replacement:
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=example.com&output=json&limit=1"
-    "&fl=timestamp,original,statuscode"
-    "&closest=20240101000000&sort=closest",
-    timeout=60.0
-)
-rows = json.loads(r)
-if len(rows) > 1:
-    ts, orig, status = rows[1]
-    snap_url = f"https://web.archive.org/web/{ts}/{orig}"
-```
-
----
-
-## Paginate large result sets
-
-```python
-import json
-from urllib.parse import quote
-
-def cdx_all_snapshots(url, fl="timestamp,original,statuscode", page_size=500):
-    """Yield all CDX rows for a URL, page by page."""
-    base = (
-        "https://web.archive.org/cdx/search/cdx"
-        f"?url={quote(url, safe='')}&output=json"
-        f"&fl={fl}&limit={page_size}&showResumeKey=true"
-    )
-    resume_key = None
-    while True:
-        endpoint = base if resume_key is None else f"{base}&resumeKey={quote(resume_key)}"
-        rows = json.loads(http_get(endpoint, timeout=60.0))
-        # With showResumeKey=true, last two rows are [] and ['<key>']
-        has_resume = len(rows) >= 2 and rows[-2] == [] and rows[-1] != []
-        data_rows = rows[1:-2] if has_resume else rows[1:]
-        for row in data_rows:
-            yield row
-        if not has_resume:
-            break
-        resume_key = rows[-1][0]
-
-for ts, orig, status in cdx_all_snapshots("example.com"):
-    snap_url = f"https://web.archive.org/web/{ts}/{orig}"
-    # process...
-```
-
----
-
-## Retrieve the archived page
-
-```python
-# Direct snapshot URL: /web/{14-digit-timestamp}/{original-url}
-snap_url = "https://web.archive.org/web/20230601114925/https://example.com/"
-content = http_get(snap_url, timeout=30.0)
-# Returns archived HTML with Wayback toolbar injected inside:
-# <!-- BEGIN WAYBACK TOOLBAR INSERT --> ... <!-- END WAYBACK TOOLBAR INSERT -->
-# Strip those comments + their contents if you want the original HTML.
-
-# Canonical form for "get latest available" — use 14 zeros:
-latest = "https://web.archive.org/web/20240101000000*/example.com"
-# The * suffix returns a calendar page (HTML), not the archived page itself.
-# Use CDX to find the real timestamp, then fetch the direct URL.
-```
-
----
-
-## Advanced CDX: deduplicate by content digest
-
-```python
-import json
-
-# Find only snapshots where the content CHANGED — dedup by SHA-1 digest
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=example.com&output=json"
-    "&collapse=digest"           # one capture per unique body hash
-    "&fl=timestamp,original,digest,length"
-    "&filter=statuscode:200",
-    timeout=60.0
-)
-rows = json.loads(r)
-# rows[1:] are unique content versions across all time
-# Useful for detecting when a page actually changed, vs. being re-crawled identically
-```
-
----
-
-## CDX summary/count query
-
-```python
-import json
-
-# showNumPages=true returns total page count, not records
-# Use for estimating result size before a full fetch
-r = http_get(
-    "https://web.archive.org/cdx/search/cdx"
-    "?url=example.com&matchType=prefix"
-    "&showNumPages=true",
-    timeout=30.0
-)
-page_count = int(r.strip())   # returns plain integer, not JSON
-# 1 page ~ 150,000 records by default
-# Combine with &page=N for manual page-based pagination:
-# ?url=...&output=json&page=0, ?url=...&output=json&page=1, etc.
-```
-
----
-
-## Rate limits & timeouts
-
-| API | Typical latency | Safe timeout | Notes |
-|---|---|---|---|
-| CDX search | 5–40s | 60s | Intermittently slow; retry on timeout |
-| Snapshot fetch (`/web/`) | 2–10s | 30s | Reliable |
-| Metadata (`/metadata/`) | <1s | 20s | Fast, stable |
-| Advanced search | <1s | 20s | Fast, stable |
-
-No API key required. No documented rate limit. Be respectful: add `time.sleep(1)` between CDX calls in loops. 3 rapid sequential CDX calls (~10s) complete fine; 10+ rapid calls produce timeouts.
-
----
-
-## Gotchas
-
-- **CDX is slow — always set `timeout=60.0` for CDX calls.** 40s minimum, 60s recommended. Metadata and search APIs are fine at 20s. CDX slowness is server-side and unpredictable.
-
-- **Availability API (`/wayback/available`) is broken.** Returns `{"archived_snapshots": {}}` even for URLs with thousands of captures. Tested 2026-04-18 — do not use. Replacement: CDX with `?sort=closest&limit=1`.
-
-- **`rows[0]` is always the header when `output=json`.** Always slice `rows[1:]` for data. Forgetting this causes silent type errors because you're destructuring column names, not values.
-
-- **`output=json` must be explicit.** Omitting it returns space-separated text. There is no default JSON mode.
-
-- **All CDX values are strings.** `statuscode='200'` not `200`, `length='4821'` not `4821`. Cast: `int(row[4])`, `int(row[6])`.
-
-- **`original` preserves non-standard ports.** Old crawls captured `http://www.example.com:80/` — the `:80` is part of the `original` field. Build playback URLs verbatim: `f"https://web.archive.org/web/{ts}/{orig}"` works correctly with the port.
-
-- **`from=` / `to=` timestamps are exclusive at the boundary.** `to=20231231` means `to=20231231000000` — it excludes captures from Dec 31 itself. Use `to=20240101` to include all of 2023.
-
-- **`collapse=timestamp:6` keeps the FIRST capture of each period.** Not the most recent. Reverse the result set or filter client-side if you need the last.
-
-- **CDX `matchType=domain` can return millions of rows for popular sites.** Always add `&limit=` or `&showNumPages=true` first to estimate size.
-
-- **`showResumeKey=true` appends two sentinel rows.** The second-to-last row is `[]` (empty separator), the last row is `['<resume_key_string>']`. Slice `rows[1:-2]` for data rows when a resume key is present.
-
-- **Wayback toolbar is injected into every archived HTML page.** The injection is wrapped in `<!-- BEGIN WAYBACK TOOLBAR INSERT -->` / `<!-- END WAYBACK TOOLBAR INSERT -->` comments. Strip them if you need original HTML fidelity.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/weather/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/weather/scraping.md
deleted file mode 100644
index e5e286646..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/weather/scraping.md
+++ /dev/null
@@ -1,398 +0,0 @@
-# Weather APIs — Data Extraction
-
-Three free, no-auth weather APIs tested: **wttr.in** (simplest), **Open-Meteo** (most complete), **weather.gov / NWS** (US only, official).
-
-All work with `http_get` — no browser needed.
-
-## Do this first: pick your API
-
-| Goal | Best API | Latency | Notes |
-|------|----------|---------|-------|
-| Quick current + 3-day forecast, any city name | wttr.in `?format=j1` | ~800ms | US + international |
-| Rich hourly/daily/historical, any coordinates | Open-Meteo | ~700ms | 10K req/day free |
-| City name → coordinates | Open-Meteo geocoding | ~700ms | Use with Open-Meteo forecast |
-| Official US forecasts with PoP and text | weather.gov NWS | ~90ms /points + ~70ms /forecast | US only, 2-call flow |
-
-**Never use a browser for any of these APIs.** All return JSON over plain HTTP.
-
----
-
-## Fastest approach: wttr.in one-call current + 3-day forecast
-
-```python
-import json
-data = json.loads(http_get("https://wttr.in/San+Francisco?format=j1"))
-
-# Current conditions
-cc = data['current_condition'][0]
-print(cc['temp_F'], '°F /', cc['temp_C'], '°C')      # '47', '8'
-print(cc['FeelsLikeF'], '°F feels like')              # '46'
-print(cc['humidity'], '%')                            # '80'
-print(cc['windspeedMiles'], 'mph', cc['winddir16Point'])  # '3', 'SW'
-print(cc['weatherDesc'][0]['value'])                  # 'Partly cloudy'
-print(cc['precipMM'], 'mm precip')                   # '0.0'
-print(cc['visibility'], 'km', cc['visibilityMiles'], 'mi')
-print(cc['pressure'], 'hPa', cc['pressureInches'], 'inHg')
-print(cc['uvIndex'])                                  # '0'
-print(cc['cloudcover'], '%')                          # '50'
-print(cc['observation_time'])                         # '10:48 AM' (UTC)
-print(cc['localObsDateTime'])                         # '2026-04-18 03:34 AM' (local)
-
-# 3-day forecast (today + 2 more)
-for day in data['weather']:
-    print(day['date'], day['maxtempF'], '/', day['mintempF'], '°F')
-    # also: maxtempC, mintempC, avgtempF, avgtempC, sunHour, uvIndex, totalSnow_cm
-    astro = day['astronomy'][0]
-    print('  sunrise:', astro['sunrise'], 'sunset:', astro['sunset'])
-    print('  moon:', astro['moon_phase'], astro['moon_illumination'], '%')
-    # Hourly breakdown (8 entries per day, every 3 hours: time 0,300,600,...,2100)
-    for h in day['hourly']:
-        print(h['time'], h['tempF'], '°F', h['weatherDesc'][0]['value'])
-        # time is '0','300','600',...,'2100' (not HH:MM)
-        # also: chanceofrain, chanceofsnow, chanceofthunder, chanceoffog, humidity, etc.
-
-# Location info
-na = data['nearest_area'][0]
-print(na['areaName'][0]['value'])    # 'San Francisco'
-print(na['country'][0]['value'])     # 'United States of America'
-print(na['latitude'], na['longitude'])  # '37.775', '-122.418' (strings)
-print(na['region'][0]['value'])      # 'California'
-```
-
-**Works with city names, coordinates, airport codes (`~SFO`), and zip codes.**
-
----
-
-## Open-Meteo: most complete free weather API
-
-### Step 1: city name → coordinates (geocoding)
-
-```python
-import json
-geo = json.loads(http_get("https://geocoding-api.open-meteo.com/v1/search?name=Chicago&count=1"))
-city = geo['results'][0]
-lat  = city['latitude']   # 41.85003
-lon  = city['longitude']  # -87.65005
-tz   = city['timezone']   # 'America/Chicago'
-# Also available: city['elevation'], city['country'], city['country_code'],
-#                 city['admin1'] (state/province), city['population']
-```
-
-Always use `count=1` and take `results[0]` for unambiguous city names. For "San Francisco" `results[0]` is always the California city (pop 864K).
-
-### Current conditions (extended — preferred over current_weather)
-
-```python
-data = json.loads(http_get(
-    f"https://api.open-meteo.com/v1/forecast"
-    f"?latitude={lat}&longitude={lon}"
-    f"&current=temperature_2m,relative_humidity_2m,apparent_temperature,"
-    f"precipitation,weathercode,windspeed_10m,winddirection_10m,"
-    f"uv_index,surface_pressure"
-    f"&timezone={tz}"
-))
-
-cur   = data['current']
-units = data['current_units']
-# cur keys and units (all confirmed):
-# temperature_2m        °C   (or °F with &temperature_unit=fahrenheit)
-# relative_humidity_2m  %
-# apparent_temperature  °C
-# precipitation         mm
-# weathercode           WMO code int (see table below)
-# windspeed_10m         km/h (or mph with &windspeed_unit=mph)
-# winddirection_10m     °
-# uv_index              (unitless float)
-# surface_pressure      hPa
-# time                  ISO8601 local time (e.g. '2026-04-18T10:45')
-# interval              900 (seconds — 15-min update cadence)
-
-print(cur['temperature_2m'], units['temperature_2m'])   # 8.7 °C
-print(cur['apparent_temperature'])                      # 6.6
-print(cur['relative_humidity_2m'])                      # 80
-print(cur['windspeed_10m'], cur['winddirection_10m'])   # 6.1  242
-print(cur['weathercode'])                               # 0 = clear sky
-```
-
-The older `&current_weather=true` param works too — returns `data['current_weather']` with only temperature, windspeed, winddirection, weathercode, time, is_day, interval.
-
-### Hourly forecast
-
-```python
-data = json.loads(http_get(
-    f"https://api.open-meteo.com/v1/forecast"
-    f"?latitude={lat}&longitude={lon}"
-    f"&hourly=temperature_2m,dewpoint_2m,apparent_temperature,"
-    f"precipitation_probability,precipitation,rain,showers,snowfall,snow_depth,"
-    f"weathercode,cloudcover,visibility,windspeed_10m,winddirection_10m,"
-    f"windgusts_10m,uv_index"
-    f"&forecast_days=3&timezone={tz}"
-))
-
-hourly = data['hourly']
-units  = data['hourly_units']
-# hourly is a dict of parallel arrays, all same length
-# time entries: ISO8601 strings, one per hour ('2026-04-18T00:00', etc.)
-# 3 forecast days → 72 entries
-
-for i, t in enumerate(hourly['time'][:5]):
-    print(t,
-          hourly['temperature_2m'][i], units['temperature_2m'],
-          hourly['precipitation_probability'][i], units['precipitation_probability'],
-          hourly['windspeed_10m'][i], units['windspeed_10m'])
-
-# Confirmed units (all from live response):
-# temperature_2m              °C      dewpoint_2m          °C
-# apparent_temperature        °C      precipitation_probability  %
-# precipitation               mm      rain                 mm
-# showers                     mm      snowfall             cm
-# snow_depth                  m       weathercode          wmo code
-# cloudcover                  %       visibility           m (not km!)
-# windspeed_10m               km/h    winddirection_10m    °
-# windgusts_10m               km/h    uv_index             (unitless)
-```
-
-`forecast_days` defaults to 7, max is 16.
-
-### Daily forecast
-
-```python
-data = json.loads(http_get(
-    f"https://api.open-meteo.com/v1/forecast"
-    f"?latitude={lat}&longitude={lon}"
-    f"&daily=temperature_2m_max,temperature_2m_min,apparent_temperature_max,"
-    f"apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,"
-    f"precipitation_hours,precipitation_probability_max,"
-    f"windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,"
-    f"shortwave_radiation_sum,uv_index_max,sunrise,sunset"
-    f"&timezone={tz}&forecast_days=7"
-))
-
-daily = data['daily']
-units = data['daily_units']
-for i, date in enumerate(daily['time']):
-    print(date,
-          daily['temperature_2m_max'][i], '/', daily['temperature_2m_min'][i], units['temperature_2m_max'],
-          f"precip={daily['precipitation_sum'][i]}{units['precipitation_sum']}",
-          f"pop={daily['precipitation_probability_max'][i]}%",
-          f"UV={daily['uv_index_max'][i]}",
-          f"sunrise={daily['sunrise'][i]}",
-          f"sunset={daily['sunset'][i]}")
-# sunrise/sunset are ISO8601 local datetimes ('2026-04-18T06:29')
-# shortwave_radiation_sum in MJ/m²
-```
-
-### Historical data (archive API)
-
-Different subdomain — `archive-api.open-meteo.com`:
-
-```python
-data = json.loads(http_get(
-    "https://archive-api.open-meteo.com/v1/archive"
-    "?latitude=37.7749&longitude=-122.4194"
-    "&start_date=2024-01-01&end_date=2024-01-07"
-    "&daily=temperature_2m_max,precipitation_sum"
-    "&timezone=America/Los_Angeles"
-))
-# Returns same structure as forecast — daily dict of parallel arrays
-# Hourly also works: &hourly=temperature_2m,precipitation,weathercode
-# Data goes back to 1940 for most locations
-```
-
-### Unit overrides
-
-All unit conversions are server-side — just add params:
-
-```
-&temperature_unit=fahrenheit    # default: celsius
-&windspeed_unit=mph             # default: kmh (also: ms, kn)
-&precipitation_unit=inch        # default: mm
-```
-
----
-
-## weather.gov NWS (US only — 2-call flow)
-
-Required for official NWS text forecasts with probability-of-precipitation text and storm warnings.
-
-```python
-import json, urllib.request, gzip
-
-def nws_get(url):
-    """NWS requires a descriptive User-Agent or returns 403."""
-    h = {
-        "User-Agent": "(myapp.example.com, contact@example.com)",
-        "Accept": "application/geo+json",
-    }
-    req = urllib.request.Request(url, headers=h)
-    with urllib.request.urlopen(req, timeout=20) as r:
-        data = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            data = gzip.decompress(data)
-        return data.decode()
-
-# Call 1: resolve lat/lon to forecast office + grid cell (~90ms)
-pts  = json.loads(nws_get("https://api.weather.gov/points/37.7749,-122.4194"))
-prop = pts['properties']
-office = prop['gridId']    # 'MTR'
-gx     = prop['gridX']    # 85
-gy     = prop['gridY']    # 105
-forecast_url = prop['forecast']          # 7-day
-hourly_url   = prop['forecastHourly']    # hourly
-
-# Also available from /points: prop['timeZone'], prop['observationStations'],
-# prop['relativeLocation']['properties']['city'] and ['state']
-
-# Call 2: 7-day forecast (14 half-day periods) (~70ms)
-fc  = json.loads(nws_get(forecast_url))
-for p in fc['properties']['periods']:
-    print(p['name'],            # 'Saturday', 'Saturday Night', 'Sunday', ...
-          p['temperature'], p['temperatureUnit'],     # 74 F
-          p['windSpeed'], p['windDirection'],          # '6 to 14 mph' 'SW'
-          p['shortForecast'],                          # 'Mostly Sunny'
-          p['probabilityOfPrecipitation']['value'],    # 0  (integer percent)
-          p['isDaytime'])                              # True/False
-    # p['detailedForecast'] — plain English paragraph, e.g.
-    # 'Sunny, with a high near 74. Southwest wind 6 to 14 mph.'
-
-# Hourly (156 hours out — ~6.5 days)
-fch = json.loads(nws_get(hourly_url))
-for p in fch['properties']['periods'][:5]:
-    print(p['startTime'],           # '2026-04-18T03:00:00-07:00'
-          p['temperature'], '°F',
-          p['shortForecast'],
-          p['windSpeed'],
-          f"humidity={p['relativeHumidity']['value']}%",
-          f"dewpoint={p['dewpoint']['value']:.1f}°C")
-```
-
-`/points` response is cached `max-age=20500` (~5.7 hours) at the CDN — safe to call once per session and reuse grid coordinates.
-
----
-
-## WMO weather code table (Open-Meteo `weathercode`)
-
-```python
-WMO_CODES = {
-    0: "Clear sky",
-    1: "Mainly clear", 2: "Partly cloudy", 3: "Overcast",
-    45: "Fog", 48: "Icy fog",
-    51: "Light drizzle", 53: "Moderate drizzle", 55: "Dense drizzle",
-    61: "Slight rain", 63: "Moderate rain", 65: "Heavy rain",
-    71: "Slight snow", 73: "Moderate snow", 75: "Heavy snow",
-    77: "Snow grains",
-    80: "Slight rain showers", 81: "Moderate rain showers", 82: "Violent rain showers",
-    85: "Slight snow showers", 86: "Heavy snow showers",
-    95: "Thunderstorm",
-    96: "Thunderstorm with slight hail", 99: "Thunderstorm with heavy hail",
-}
-
-def wmo_desc(code):
-    return WMO_CODES.get(code, f"Unknown code {code}")
-```
-
----
-
-## Complete end-to-end pattern: city name → rich forecast
-
-```python
-import json
-
-def get_weather(city: str) -> dict:
-    """City name → current + 7-day daily forecast via Open-Meteo."""
-    # 1. Geocode
-    geo  = json.loads(http_get(
-        f"https://geocoding-api.open-meteo.com/v1/search?name={city.replace(' ', '+')}&count=1"
-    ))
-    if not geo.get('results'):
-        raise ValueError(f"City not found: {city}")
-    loc  = geo['results'][0]
-    lat, lon, tz = loc['latitude'], loc['longitude'], loc['timezone']
-
-    # 2. Forecast (single call: current + daily)
-    data = json.loads(http_get(
-        f"https://api.open-meteo.com/v1/forecast"
-        f"?latitude={lat}&longitude={lon}"
-        f"&current=temperature_2m,relative_humidity_2m,apparent_temperature,"
-        f"precipitation,weathercode,windspeed_10m,winddirection_10m,uv_index"
-        f"&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,"
-        f"precipitation_probability_max,weathercode,sunrise,sunset"
-        f"&timezone={tz}&forecast_days=7"
-    ))
-    return {"location": loc, "current": data['current'],
-            "daily": data['daily'], "units": {
-                "current": data['current_units'],
-                "daily": data['daily_units'],
-            }}
-
-result = get_weather("Tokyo")
-cur = result['current']
-print(f"{result['location']['name']}: {cur['temperature_2m']}°C feels like {cur['apparent_temperature']}°C")
-print(f"Humidity {cur['relative_humidity_2m']}%, wind {cur['windspeed_10m']} km/h")
-```
-
-Total: 2 API calls, ~1400ms combined.
-
----
-
-## Gotchas
-
-**wttr.in returns HTML (or ANSI art) instead of JSON if you forget `?format=j1`.**
-The `?format=j1` suffix is mandatory for JSON. Without it:
-- Browser `User-Agent` → full HTML page (~21KB)
-- `curl`/`Wget` User-Agent → ANSI escape-code ASCII art (~500B)
-Neither is parseable as JSON.
-
-**wttr.in text formats require a non-browser User-Agent.**
-`http_get()` sends `Mozilla/5.0` — wttr.in responds with an HTML page for `?format=%t`, `?format=3`, `?format=4`.
-Use `Wget/1.21` (or any non-browser UA) for text format endpoints:
-```python
-import urllib.request, gzip
-
-def http_get_wttr(url):
-    req = urllib.request.Request(url, headers={"User-Agent": "Wget/1.21", "Accept": "*/*"})
-    with urllib.request.urlopen(req, timeout=20) as r:
-        data = r.read()
-        if r.headers.get("Content-Encoding") == "gzip":
-            data = gzip.decompress(data)
-        return data.decode()
-
-# Text format tokens (URL-encode %): %25l=location, %25C=condition desc,
-# %25t=temp, %25f=feels-like, %25h=humidity, %25w=wind
-print(http_get_wttr("https://wttr.in/London?format=%25t"))           # '+55°F'
-print(http_get_wttr("https://wttr.in/Tokyo?format=3"))               # 'tokyo: ☀️   +69°F'
-print(http_get_wttr("https://wttr.in/Berlin?format=%25l:+%25C+%25t+(feels+%25f)+%25h+%25w"))
-# 'berlin: Sunny +65°F (feels +65°F) 42% ↖5mph'
-```
-
-**wttr.in `format=j1` returns only 3 days** (today + 2). Use Open-Meteo for longer forecasts (up to 16 days).
-
-**wttr.in `nearest_area.areaName` is often wrong.** The returned area name is a reverse-geocoded neighborhood, not the city you queried (`"Mccormickville"` for Chicago, `"Lomita Park"` for SFO airport). Use `request[0].query` for what was actually resolved.
-
-**wttr.in `hourly[].time` is `'0'`, `'300'`, `'600'`...`'2100'`** — not HH:MM strings. Parse as `int(time) // 100` for hours.
-
-**wttr.in `weatherDesc` is a list**: `cc['weatherDesc'][0]['value']`, not a string. Same for `areaName`, `country`, `region`, `weatherIconUrl`.
-
-**wttr.in unknown city returns HTTP 500**, not 404 or a JSON error.
-
-**Open-Meteo default timezone is GMT.** Always pass `&timezone={tz}` or daily `sunrise`/`sunset` values will be in UTC, and daily buckets will be wrong.
-
-**Open-Meteo `visibility` is in metres** (not km). Divide by 1000 to get km.
-
-**Open-Meteo returns HTTP 400 with JSON error body on bad params:**
-```json
-{"reason": "Latitude must be in range of -90 to 90°. Given: 999.0.", "error": true}
-```
-`http_get()` raises an exception on 4xx — catch `urllib.error.HTTPError` and read `e.read()` (may be gzip-compressed) for the reason.
-
-**weather.gov requires a descriptive `User-Agent`.** The NWS API blocks generic `python-urllib` or `Mozilla/5.0` agents sporadically. Always set `User-Agent: (yourapp.com, your@email.com)` or use your actual app name.
-
-**weather.gov is US-only.** `/points/{lat},{lon}` returns HTTP 404 for coordinates outside the US (including territories like Puerto Rico for some grid edges). Fall back to Open-Meteo for non-US locations.
-
-**weather.gov `windSpeed` is a string like `"6 to 14 mph"`**, not a number. Parse with regex if you need a numeric value.
-
-**weather.gov `probabilityOfPrecipitation` is a dict**: `p['probabilityOfPrecipitation']['value']`, with `p['probabilityOfPrecipitation']['unitCode']` = `'wmoUnit:percent'`.
-
-**Open-Meteo rate limit: 10,000 requests/day on the free tier.** The geocoding API and forecast API count separately. No rate limit headers are returned — track usage yourself.
-
-**weather.gov /points response is heavily cached** (`Cache-Control: public, max-age=20500`). Store the office/gridX/gridY and reuse — only call `/points` once per location.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/wellfound/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/wellfound/scraping.md
deleted file mode 100644
index 3b3afee69..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/wellfound/scraping.md
+++ /dev/null
@@ -1,596 +0,0 @@
-# Wellfound (AngelList) — Startup Jobs & Company Profiles
-
-Field-tested against wellfound.com on 2026-04-18.
-All confirmed via live HTTP probes and response header analysis.
-
----
-
-## Anti-bot verdict: browser required, no http_get workaround exists
-
-**`http_get` returns HTTP 403 on every Wellfound URL without exception** (except `robots.txt`).
-
-Tested endpoints (all 403):
-- `/company/stripe`
-- `/jobs`
-- `/jobs?role=engineer&location=remote`
-- `/company/stripe/jobs`
-- `/sitemap.xml`, `/sitemap_index.xml`
-- `/jobs.rss`
-- `POST /graphql` (HTTP 403, Cloudflare managed challenge)
-
-Old AngelList public API (`api.angel.co/1/...`) returns `404 Not Found` — permanently shut down.
-
-**Dual anti-bot stack confirmed from response headers:**
-
-| Layer | System | Evidence |
-|-------|--------|----------|
-| Page GETs | DataDome | `X-DataDome: protected`, `X-DD-B: 2`, `Set-Cookie: datadome=...` |
-| API POSTs | Cloudflare Bot Management | `Cf-Mitigated: challenge` |
-
-The 403 response body contains a DataDome captcha challenge script (`geo.captcha-delivery.com`) AND an embedded Cloudflare challenge (`window.__CF$cv$params`). Both fire simultaneously. Neither cookie can be replayed — both are TLS-fingerprint-bound.
-
-**Use `new_tab()` + `wait()` exclusively. Never use `http_get` for Wellfound.**
-
----
-
-## Tech stack (confirmed from response headers)
-
-Wellfound is a **Ruby on Rails + React + Apollo GraphQL** hybrid app — NOT a pure Next.js app.
-
-Confirmed headers from `robots.txt` (the only accessible endpoint):
-```
-x-runtime: 0.006700          → Rails rack middleware timer
-x-request-id: 4645fd66...    → Rails request ID
-x-xss-protection: 1; mode=block  → Rails security defaults
-Set-Cookie: _wellfound=...   → Rails session cookie
-Server: cloudflare           → Cloudflare CDN
-```
-
-Implications:
-- **`__NEXT_DATA__` is NOT present** — not a Next.js app
-- **`window.__APOLLO_STATE__` or `window.gon` may be present** — check these instead
-- CSRF token is in a `<meta name="csrf-token">` tag (Rails default)
-- Session cookie is `_wellfound=...` for anonymous sessions; login sessions add `_wellfound_session=...`
-
----
-
-## Do this first: open in new tab, wait for DataDome to resolve
-
-```python
-new_tab("https://wellfound.com/company/stripe")
-wait_for_load()
-wait(5)   # DataDome JS fingerprinting runs ~2-4s after readyState=complete
-```
-
-Verify you are past the DataDome challenge before extracting:
-
-```python
-title = js("document.title")
-url = page_info()["url"]
-
-if "wellfound.com" not in url or not title or "Just a moment" in title:
-    # DataDome or CF challenge did not resolve — wait longer
-    wait(8)
-    title = js("document.title")
-    if "Just a moment" in title or not title:
-        capture_screenshot("/tmp/wellfound_block.png")
-        raise RuntimeError("DataDome/CF challenge did not resolve — see screenshot")
-```
-
-DataDome resolves **silently** in a real Chrome session via CDP — no user interaction required.
-The challenge is a JS fingerprint check that passes automatically when running in a real browser.
-
----
-
-## URL patterns
-
-| Goal | URL |
-|------|-----|
-| Company profile | `https://wellfound.com/company/{slug}` |
-| Company jobs | `https://wellfound.com/company/{slug}/jobs` |
-| Company culture | `https://wellfound.com/company/{slug}/culture` |
-| Job board (all) | `https://wellfound.com/jobs` |
-| Job board filtered | `https://wellfound.com/jobs` — then use UI filters (query params are disallowed by robots.txt) |
-| Investor profile | `https://wellfound.com/investor/{slug}` |
-| User profile | `https://wellfound.com/u/{username}` (disallowed by robots.txt, login wall) |
-
-**Note on query params:** `robots.txt` disallows `?role=*`, `?jobId=*`, `?jobSlug=*`, `?location=*`.
-Wellfound enforces these with login walls or redirects for most filtered job searches.
-
----
-
-## Workflow 1: Company profile — name, description, team size, funding, tags
-
-Navigate to the company page and extract structured data. Most fields are visible without login.
-
-```python
-import json
-
-new_tab("https://wellfound.com/company/stripe")
-wait_for_load()
-wait(5)
-
-# Check for Apollo state (Rails + React app, not Next.js)
-# Wellfound embeds data in window.gon or inline script tags
-apollo_raw = js("""
-(function() {
-  // Try window.__APOLLO_STATE__ (Apollo Client cache)
-  if (window.__APOLLO_STATE__) return JSON.stringify(window.__APOLLO_STATE__);
-  // Try window.gon (Rails Gon gem)
-  if (window.gon) return JSON.stringify(window.gon);
-  // Try inline <script> tags containing startup data
-  var scripts = Array.from(document.querySelectorAll('script:not([src])'));
-  for (var s of scripts) {
-    var t = s.textContent || '';
-    if (t.includes('"name"') && t.includes('"description"') && t.includes('teamSize')) {
-      return t.substring(0, 5000);
-    }
-  }
-  return null;
-})()
-""")
-
-if apollo_raw:
-    try:
-        data = json.loads(apollo_raw)
-        # Apollo State: look for Startup:{id} keys
-        for key, val in data.items():
-            if key.startswith("Startup:") and isinstance(val, dict):
-                print("Company:", val.get("name"))
-                print("Description:", val.get("description") or val.get("highConcept"))
-                print("Team size:", val.get("teamSize"))
-                print("Total raised:", val.get("totalRaised"))
-                print("Hiring:", val.get("hiring"))
-        print(json.dumps(data, indent=2)[:3000])
-    except json.JSONDecodeError:
-        # Raw script tag — parse key fields with regex
-        import re
-        name = re.search(r'"name"\s*:\s*"([^"]+)"', apollo_raw)
-        desc = re.search(r'"description"\s*:\s*"([^"]+)"', apollo_raw)
-        print("Name:", name.group(1) if name else "not found")
-        print("Desc:", desc.group(1) if desc else "not found")
-```
-
-If the structured data path fails, fall back to DOM extraction:
-
-```python
-# DOM extraction — company profile page
-profile = js("""
-(function() {
-  // Company name — first h1 on the page
-  var nameEl = document.querySelector('h1');
-
-  // Description — first substantial paragraph or div with class containing 'description'
-  var descEl = (
-    document.querySelector('[class*="description"]') ||
-    document.querySelector('[class*="about"]') ||
-    document.querySelector('p[class*="startupDescription"]')
-  );
-
-  // Tags — market/role tags are links with /jobs?role= or /location/ in href
-  // Wellfound uses Tailwind (no stable class names) — use href pattern
-  var roleLinks = Array.from(document.querySelectorAll('a[href*="/jobs?role="]')).map(a => a.innerText.trim());
-  var locationLinks = Array.from(document.querySelectorAll('a[href*="/location/"]')).map(a => a.innerText.trim());
-
-  // Team size / funding — look in page text for patterns
-  var bodyText = document.body.innerText;
-
-  // Company size: "11-50 employees" or "51-200 people" pattern
-  var sizeMatch = bodyText.match(/(\d+[-–]\d+)\s+(employees|people)/i);
-  var teamSize = sizeMatch ? sizeMatch[0] : null;
-
-  // Funding: "$X.XM" or "Raised $X" pattern
-  var fundingMatch = bodyText.match(/\$[\d,.]+[KMBkm]\s*(raised|in funding|Series [A-Z])?/i);
-  var funding = fundingMatch ? fundingMatch[0] : null;
-
-  // Stage: "Series A", "Seed", "Series B", etc.
-  var stageMatch = bodyText.match(/\b(Seed|Series [A-Z]\+?|Pre-seed|Angel|Late Stage|Public)\b/);
-  var stage = stageMatch ? stageMatch[0] : null;
-
-  return JSON.stringify({
-    name:     nameEl   ? nameEl.innerText.trim() : null,
-    desc:     descEl   ? descEl.innerText.trim().substring(0, 500) : null,
-    teamSize: teamSize,
-    funding:  funding,
-    stage:    stage,
-    roles:    roleLinks.slice(0, 10),
-    locations: locationLinks.slice(0, 5),
-  });
-})()
-""")
-
-data = json.loads(profile)
-print(json.dumps(data, indent=2))
-```
-
----
-
-## Workflow 2: Company jobs listing
-
-```python
-import json
-
-company_slug = "stripe"
-new_tab(f"https://wellfound.com/company/{company_slug}/jobs")
-wait_for_load()
-wait(5)
-
-jobs = js("""
-(function() {
-  // Job listing cards — Wellfound uses role="listitem" or li elements in job list
-  var cards = document.querySelectorAll('[data-test^="StartupJobListing"], li[class*="job"], div[class*="JobListing"]');
-  if (!cards.length) {
-    // Broad fallback: all anchor tags with /jobs/ in href
-    var links = Array.from(document.querySelectorAll('a[href*="/jobs/"]'));
-    return JSON.stringify(links.map(a => ({
-      title: a.innerText.trim().split('\\n')[0],
-      href: a.href,
-    })).filter(j => j.title && j.title.length > 2).slice(0, 30));
-  }
-  return JSON.stringify(Array.from(cards).map(card => {
-    var titleEl = card.querySelector('h2, h3, [class*="title"], [class*="jobTitle"]');
-    var locEl   = card.querySelector('[class*="location"], [class*="Location"]');
-    var compEl  = card.querySelector('[class*="salary"], [class*="comp"], [class*="equity"]');
-    var linkEl  = card.querySelector('a[href*="/jobs/"]');
-    return {
-      title:    titleEl ? titleEl.innerText.trim() : '',
-      location: locEl   ? locEl.innerText.trim()   : '',
-      comp:     compEl  ? compEl.innerText.trim()   : '',
-      href:     linkEl  ? linkEl.href               : '',
-    };
-  }).filter(j => j.title));
-})()
-""")
-
-results = json.loads(jobs)
-print(f"Found {len(results)} jobs")
-for j in results:
-    print(f"  {j['title']} | {j.get('location','?')} | {j.get('comp','?')}")
-```
-
----
-
-## Workflow 3: Job board — browse all jobs
-
-The main `/jobs` page shows a curated job feed. Filters are not accessible via URL params (DataDome blocks `?role=...`). Use the UI dropdown filters after loading the page.
-
-```python
-import json
-
-new_tab("https://wellfound.com/jobs")
-wait_for_load()
-wait(5)
-
-# Extract visible job cards
-jobs = js("""
-(function() {
-  // Job cards on the main /jobs board
-  var cards = document.querySelectorAll(
-    '[data-test*="job"], [class*="JobCard"], [class*="jobListing"], ' +
-    'li[class*="job"], article[class*="job"]'
-  );
-  if (!cards.length) {
-    // Fallback: links to job detail pages
-    var links = Array.from(document.querySelectorAll('a[href*="/company/"][href*="/jobs/"]'));
-    return JSON.stringify(links.map(a => ({
-      href: a.href,
-      text: a.innerText.trim().substring(0, 100),
-    })).slice(0, 30));
-  }
-  return JSON.stringify(Array.from(cards).map(card => {
-    var titleEl   = card.querySelector('h2, h3, [class*="title"]');
-    var companyEl = card.querySelector('[class*="company"], [class*="startup"]');
-    var locEl     = card.querySelector('[class*="location"]');
-    var linkEl    = card.querySelector('a[href*="/jobs/"]');
-    return {
-      title:   titleEl   ? titleEl.innerText.trim()   : '',
-      company: companyEl ? companyEl.innerText.trim() : '',
-      location: locEl    ? locEl.innerText.trim()     : '',
-      href:    linkEl    ? linkEl.href                 : '',
-    };
-  }).filter(j => j.title));
-})()
-""")
-
-results = json.loads(jobs)
-print(f"Found {len(results)} jobs")
-```
-
----
-
-## Workflow 4: GraphQL API (authenticated sessions only)
-
-Wellfound's GraphQL endpoint (`/graphql`) requires:
-1. A valid `_wellfound` session cookie from a real browser load
-2. A CSRF token from the page's `<meta name="csrf-token">` tag
-3. Cloudflare Bot Management to have passed (only happens in a real Chrome session)
-
-**This approach only works from inside a browser session (after navigating to any Wellfound page).**
-
-```python
-import json
-
-# Step 1: Load any Wellfound page so the session cookie + DataDome cookie are set
-new_tab("https://wellfound.com/")
-wait_for_load()
-wait(5)
-
-# Step 2: Extract CSRF token from meta tag
-csrf = js("document.querySelector('meta[name=\"csrf-token\"]') ? document.querySelector('meta[name=\"csrf-token\"]').getAttribute('content') : null")
-if not csrf:
-    raise RuntimeError("CSRF token not found — page may not have loaded correctly")
-
-print(f"CSRF token: {csrf[:20]}...")
-
-# Step 3: Execute GraphQL query via fetch() from within the browser
-# This uses the browser's existing cookies automatically
-result = js(f"""
-(async function() {{
-  try {{
-    var resp = await fetch('/graphql', {{
-      method: 'POST',
-      credentials: 'include',
-      headers: {{
-        'Content-Type': 'application/json',
-        'Accept': 'application/json',
-        'x-csrf-token': '{csrf}',
-        'x-requested-with': 'XMLHttpRequest',
-      }},
-      body: JSON.stringify({{
-        query: `query StartupShow($slug: String!) {{
-          startup(slug: $slug) {{
-            id
-            name
-            description: highConcept
-            productDesc
-            teamSize
-            locations {{ displayName }}
-            markets {{ displayName }}
-            totalRaised
-            fundingStage
-            badges
-            hiring
-            jobListingsCount
-          }}
-        }}`,
-        variables: {{ slug: "stripe" }}
-      }})
-    }});
-    var data = await resp.json();
-    return JSON.stringify(data);
-  }} catch(e) {{
-    return JSON.stringify({{error: e.message}});
-  }}
-}})()
-""")
-
-# js() with async returns a Promise — use js_async() if available, or eval trick:
-# Note: the above may return None if js() doesn't await Promises.
-# Use this pattern instead if js() doesn't handle async:
-result_sync = js("""
-var done = false, out = null;
-fetch('/graphql', {
-  method: 'POST',
-  credentials: 'include',
-  headers: {
-    'Content-Type': 'application/json',
-    'Accept': 'application/json',
-    'x-csrf-token': document.querySelector('meta[name="csrf-token"]').content,
-    'x-requested-with': 'XMLHttpRequest',
-  },
-  body: JSON.stringify({
-    query: '{ __typename }',
-  })
-}).then(r => r.json()).then(d => { window._wf_gql_result = JSON.stringify(d); });
-'pending'
-""")
-# Wait for async result
-import time; time.sleep(3)
-gql_result = js("window._wf_gql_result || null")
-if gql_result:
-    data = json.loads(gql_result)
-    print("GraphQL response:", json.dumps(data, indent=2)[:1000])
-```
-
-### Known GraphQL operations
-
-| Operation | Purpose |
-|-----------|---------|
-| `StartupShow` | Full company profile (name, desc, funding, team size, markets) |
-| `JobListingsIndex` | Paginated job board |
-| `JobSearch` | Filtered job search by role/location |
-| `UserProfile` | User/candidate profile |
-| `InvestorShow` | VC/investor profile |
-
----
-
-## Handling the login wall
-
-Wellfound shows a sign-in modal on:
-- Job detail pages (immediately or after 2-3 seconds)
-- Candidate profile pages (immediately)
-- Some company pages after scrolling
-
-Company overview pages typically show content without login. Job listings require login to see full details and apply.
-
-```python
-def dismiss_wellfound_login_modal():
-    """Close the Wellfound sign-in modal. Safe to call if no modal is present."""
-    closed = js("""
-    (function() {
-      var selectors = [
-        'button[aria-label="Close"]',
-        'button[class*="close"]',
-        'button[class*="Close"]',
-        '[data-test="close-modal"]',
-        '[aria-label="Dismiss"]',
-        'button[class*="dismiss"]',
-        // Wellfound-specific: modal overlay dismiss
-        'div[class*="Modal"] button[type="button"]',
-      ];
-      for (var s of selectors) {
-        var btn = document.querySelector(s);
-        if (btn && btn.offsetParent !== null) {
-          btn.click();
-          return s;
-        }
-      }
-      // Try pressing Escape
-      document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape', keyCode: 27, bubbles: true}));
-      return 'escape';
-    })()
-    """)
-    if closed:
-        wait(1)
-    return closed
-```
-
----
-
-## Detecting DataDome / challenge page
-
-After `new_tab()` + `wait(5)`, verify you are on a real Wellfound page:
-
-```python
-def wellfound_is_blocked() -> bool:
-    """True if DataDome or Cloudflare challenge is still showing."""
-    title = js("document.title") or ""
-    url   = page_info()["url"]
-    # DataDome challenge page has no useful title; CF shows "Just a moment..."
-    blocked = (
-        "Just a moment" in title or
-        "wellfound.com" not in url or
-        "captcha-delivery.com" in js("document.body.innerHTML or ''") or
-        not title
-    )
-    return blocked
-
-# Usage
-new_tab("https://wellfound.com/company/stripe")
-wait_for_load()
-wait(5)
-
-if wellfound_is_blocked():
-    wait(8)   # DataDome sometimes needs up to 10s total
-    if wellfound_is_blocked():
-        capture_screenshot("/tmp/wellfound_blocked.png")
-        raise RuntimeError("DataDome/CF challenge did not resolve — see /tmp/wellfound_blocked.png")
-```
-
----
-
-## Key selectors reference
-
-Wellfound uses **Tailwind CSS** — no stable semantic class names. These patterns are robust:
-
-| Target | Selector strategy |
-|--------|------------------|
-| Company name | `h1` (first on page) |
-| Company description | `[class*="description"]`, `[class*="about"]` |
-| Team size | Text search: `/\d+[-–]\d+\s+(employees\|people)/i` |
-| Funding amount | Text search: `/\$[\d,.]+[KMBkm]/i` |
-| Funding stage | Text search: `/\b(Seed\|Series [A-Z]\+?\|Pre-seed\|Late Stage)\b/` |
-| Role/market tags | `a[href*="/jobs?role="]` |
-| Location tags | `a[href*="/location/"]` |
-| Job cards | `a[href*="/company/"][href*="/jobs/"]` (broad fallback) |
-| Job title | `h2`, `h3`, `[class*="title"]` within card |
-| CSRF token | `meta[name="csrf-token"]` |
-| Login modal | `button[aria-label="Close"]`, Escape key |
-
----
-
-## Common pitfalls
-
-1. **`http_get` is permanently blocked.** DataDome intercepts all non-browser HTTP requests with
-   a 403 + captcha challenge. No User-Agent, header combination, or cookie replay works.
-   `api.angel.co` is HTTP 404 (shut down). Use `new_tab()` exclusively.
-
-2. **NOT a Next.js app.** Wellfound is Ruby on Rails + React. There is no `__NEXT_DATA__` JSON
-   blob. Look for `window.__APOLLO_STATE__`, `window.gon`, or inline `<script>` tags instead.
-
-3. **`wait(5)` minimum after `wait_for_load()`.** DataDome runs JS fingerprinting probes for
-   2-4 seconds after `readyState = complete`. Extracting before this resolves returns the challenge
-   page HTML, not real content.
-
-4. **Tailwind CSS — no stable class names.** Wellfound uses Tailwind utility classes. Never
-   hardcode a specific class name. Use `href` attribute patterns, `data-test` attributes if present,
-   or semantic element selectors (`h1`, `h2`, `li`, `article`).
-
-5. **GraphQL requires both CSRF token AND browser session cookies.** The CSRF token is a
-   per-session value from `<meta name="csrf-token">`. Cloudflare Bot Management blocks
-   `POST /graphql` from non-browser sessions. Always fire GraphQL via `fetch()` inside the
-   browser session (not from Python's `http_get`).
-
-6. **`?role=` and `?location=` params are robots.txt-disallowed.** Wellfound may redirect or
-   show a login wall for filtered job search URLs. Load `/jobs` unfiltered and use in-page
-   UI filters (dropdowns) to narrow results.
-
-7. **Login wall on job details and user profiles.** Company overview pages load without login.
-   Individual job detail pages, and all `/u/{username}` profiles, hit a login modal immediately.
-   Call `dismiss_wellfound_login_modal()` right after `wait(5)` on these pages.
-
-8. **Rate limiting.** After ~5-10 rapid page navigations DataDome may harden. Use `wait(3)` between
-   `goto_url()` calls. If you get a captcha that does not auto-resolve, wait 30-60 seconds.
-
-9. **`new_tab()` over `goto_url()` for the first Wellfound page.** `goto_url()` in an existing tab
-   may inherit a stale DataDome fingerprint. `new_tab()` gives a clean origin context that
-   DataDome processes cleanly.
-
----
-
-## Anti-bot response identification
-
-What you see in the 403 body when NOT in a browser:
-
-```html
-<!-- DataDome challenge (page GETs) -->
-<script>var dd={'rt':'c','cid':'...','t':'bv','host':'geo.captcha-delivery.com',...}</script>
-<script src="https://ct.captcha-delivery.com/c.js"></script>
-<!-- rt='c' = captcha required; rt='i' = invisible solve; rt='b' = blocked -->
-
-<!-- Cloudflare challenge (API POSTs) -->
-<title>Just a moment...</title>
-<script>window.__CF$cv$params={r:'...',t:'...'}</script>
-```
-
-In a real Chrome browser, both challenges resolve automatically without user interaction.
-
----
-
-## Minimal working example
-
-```python
-import json
-
-# Open Wellfound company page
-new_tab("https://wellfound.com/company/openai")
-wait_for_load()
-wait(5)
-
-# Verify not blocked
-title = js("document.title")
-assert "Just a moment" not in (title or ""), f"Still on challenge page: {title}"
-
-# Extract company overview
-data = js("""
-(function() {
-  var name = document.querySelector('h1');
-  var bodyText = document.body.innerText;
-  var sizeMatch = bodyText.match(/(\\d+[-\\u2013]\\d+)\\s+(employees|people)/i);
-  var fundingMatch = bodyText.match(/\\$[\\d,.]+[KMBkm](?:\\s+(?:raised|total))?/i);
-  var stageMatch = bodyText.match(/\\b(Seed|Series [A-Z]\\+?|Pre-seed|Late Stage|Public)\\b/);
-  var tags = Array.from(document.querySelectorAll('a[href*="/jobs?role="]')).map(a => a.innerText.trim());
-  var locs = Array.from(document.querySelectorAll('a[href*="/location/"]')).map(a => a.innerText.trim());
-  return JSON.stringify({
-    name:     name ? name.innerText.trim() : null,
-    teamSize: sizeMatch ? sizeMatch[0] : null,
-    funding:  fundingMatch ? fundingMatch[0] : null,
-    stage:    stageMatch ? stageMatch[0] : null,
-    roles:    tags.slice(0, 8),
-    locations: locs.slice(0, 5),
-  });
-})()
-""")
-
-print(json.dumps(json.loads(data), indent=2))
-```
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/world-bank/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/world-bank/scraping.md
deleted file mode 100644
index d14b3b1de..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/world-bank/scraping.md
+++ /dev/null
@@ -1,356 +0,0 @@
-# World Bank Open Data — Scraping & Data Extraction
-
-`https://api.worldbank.org/v2` — free REST API for global development indicators. No API key, no auth, no browser needed. All data via `http_get`.
-
-## Do this first
-
-**Every response is a 2-element JSON array: `[metadata, data]`.** The metadata element is always at index 0 (pagination info); the data array is at index 1. This is the single biggest gotcha — `json.loads(raw)` gives you a list, not a dict.
-
-```python
-from helpers import http_get
-import json
-
-raw = http_get("https://api.worldbank.org/v2/country/US/indicator/NY.GDP.MKTP.CD?format=json")
-d = json.loads(raw)
-meta = d[0]   # {"page": 1, "pages": 2, "per_page": 50, "total": 66, ...}
-rows = d[1]   # list of data records
-```
-
-Always append `?format=json` — default response is XML.
-
-## Common workflows
-
-### Single country, single indicator
-
-```python
-from helpers import http_get
-import json
-
-raw = http_get("https://api.worldbank.org/v2/country/US/indicator/NY.GDP.MKTP.CD?format=json")
-d = json.loads(raw)
-meta, rows = d[0], d[1]
-
-for r in rows:
-    if r["value"] is not None:   # recent years often have null values
-        print(r["date"], r["value"])
-# Confirmed output (2026-04-18):
-# 2024 28750956130731.2
-# 2023 27292170793214.4
-# 2022 25604848907611.0
-# ...
-```
-
-### Most recent N values (`mrv` param)
-
-`mrv` (most recent values) skips null years and returns the N most recent non-provisional points.
-
-```python
-from helpers import http_get
-import json
-
-raw = http_get(
-    "https://api.worldbank.org/v2/country/US/indicator/NY.GDP.MKTP.CD"
-    "?format=json&mrv=5"
-)
-d = json.loads(raw)
-for r in d[1]:
-    print(r["date"], r["value"])
-# Confirmed output (2026-04-18):
-# 2024 28750956130731.2
-# 2023 27292170793214.4
-# 2022 25604848907611.0
-# 2021 23315080560000.0
-# 2020 21060473613000.0
-```
-
-### Multiple countries, date range
-
-Semicolon-delimit country codes in the URL path. Use `date=YYYY:YYYY` for a range.
-
-```python
-from helpers import http_get
-import json
-
-raw = http_get(
-    "https://api.worldbank.org/v2/country/US;CN;GB/indicator/SP.POP.TOTL"
-    "?format=json&date=2000:2023&per_page=100"
-)
-d = json.loads(raw)
-meta, rows = d[0], d[1]
-print(f"Total records: {meta['total']}, pages: {meta['pages']}")
-
-for r in rows:
-    print(r["country"]["value"], r["date"], r["value"])
-# Confirmed: returns 8 records per page (50 default), date range honored exactly
-# Countries: ['China', 'United States', 'United Kingdom']
-# Dates: ['2000', '2001', ..., '2023']
-```
-
-### All countries, latest value only
-
-Use `mrv=1` with `per_page=1000` to get all 266 countries in a single call.
-
-```python
-from helpers import http_get
-import json
-
-raw = http_get(
-    "https://api.worldbank.org/v2/country/all/indicator/NY.GDP.PCAP.CD"
-    "?format=json&mrv=1&per_page=1000"
-)
-d = json.loads(raw)
-meta, rows = d[0], d[1]
-print(f"Countries returned: {len(rows)}")  # 266 (includes aggregates)
-
-# Filter out regional aggregates — they have no iso2Code or have aggregate ids
-countries_only = [r for r in rows if len(r["country"]["id"]) == 2]
-for r in sorted(countries_only, key=lambda x: -(x["value"] or 0))[:5]:
-    print(r["country"]["value"], r["date"], f"${r['value']:,.0f}")
-# Confirmed output (2026-04-18):
-# Luxembourg 2024 $135,605
-# Norway 2024 $105,056
-# ...
-```
-
-### Full pagination (fetch all pages)
-
-```python
-from helpers import http_get
-import json
-
-def fetch_all_pages(base_url):
-    """Fetch all pages of a World Bank API endpoint."""
-    all_rows = []
-    page = 1
-    while True:
-        url = f"{base_url}&page={page}" if "?" in base_url else f"{base_url}?page={page}"
-        d = json.loads(http_get(url))
-        meta, rows = d[0], d[1]
-        all_rows.extend(rows)
-        if page >= meta["pages"]:
-            break
-        page += 1
-    return all_rows
-
-# Example: all US GDP data (66 years, 2 pages)
-rows = fetch_all_pages(
-    "https://api.worldbank.org/v2/country/US/indicator/NY.GDP.MKTP.CD"
-    "?format=json&per_page=50"
-)
-print(f"Total rows: {len(rows)}")  # 66
-non_null = [(r["date"], r["value"]) for r in rows if r["value"] is not None]
-print(f"Non-null: {len(non_null)}, range: {non_null[-1][0]}–{non_null[0][0]}")
-```
-
-### Indicators list (discover available indicators)
-
-```python
-from helpers import http_get
-import json
-
-raw = http_get("https://api.worldbank.org/v2/indicator?format=json&per_page=50")
-d = json.loads(raw)
-meta = d[0]
-print(f"Total indicators: {meta['total']}, pages: {meta['pages']}")
-# Confirmed: 29,511 indicators across 591 pages
-
-for ind in d[1][:3]:
-    print(ind["id"], "-", ind["name"])
-# 1.0.HCount.1.90usd - Poverty Headcount ($1.90 a day)
-# ...
-```
-
-### Indicators by topic
-
-```python
-from helpers import http_get
-import json
-
-# Topic 3 = Economy & Growth
-raw = http_get("https://api.worldbank.org/v2/topic/3/indicator?format=json&per_page=50")
-d = json.loads(raw)
-print(f"Economy & Growth indicators: {d[0]['total']}")  # 306
-
-for ind in d[1][:5]:
-    print(ind["id"], "-", ind["name"])
-```
-
-### Country metadata
-
-```python
-from helpers import http_get
-import json
-
-raw = http_get("https://api.worldbank.org/v2/country/US?format=json")
-d = json.loads(raw)
-c = d[1][0]
-print(c["name"], c["capitalCity"], c["region"]["value"], c["incomeLevel"]["value"])
-# United States  Washington D.C.  North America  High income
-
-# Filter countries by income level
-raw = http_get("https://api.worldbank.org/v2/country?format=json&incomeLevel=LIC&per_page=300")
-d = json.loads(raw)
-print(f"Low-income countries: {d[0]['total']}")  # 25
-```
-
-### Topics list
-
-```python
-from helpers import http_get
-import json
-
-raw = http_get("https://api.worldbank.org/v2/topics?format=json")
-d = json.loads(raw)
-for t in d[1]:
-    print(t["id"], t["value"])
-# 1  Agriculture & Rural Development
-# 2  Aid Effectiveness
-# 3  Economy & Growth
-# ... (21 topics total)
-```
-
-### Parallel fetch for multiple indicators (ThreadPoolExecutor)
-
-```python
-from helpers import http_get
-from concurrent.futures import ThreadPoolExecutor
-import json
-
-INDICATORS = {
-    "NY.GDP.MKTP.CD": "GDP (current US$)",
-    "SP.POP.TOTL": "Population",
-    "NY.GDP.PCAP.CD": "GDP per capita",
-}
-
-def fetch_indicator(ind_id):
-    url = (
-        f"https://api.worldbank.org/v2/country/US/indicator/{ind_id}"
-        f"?format=json&mrv=5"
-    )
-    d = json.loads(http_get(url))
-    return ind_id, d[1]
-
-with ThreadPoolExecutor(max_workers=3) as ex:
-    results = dict(ex.map(lambda i: fetch_indicator(i), INDICATORS))
-
-for ind_id, rows in results.items():
-    latest = next((r for r in rows if r["value"] is not None), None)
-    if latest:
-        print(f"{INDICATORS[ind_id]}: {latest['date']} = {latest['value']:,.2f}")
-```
-
-## URL reference
-
-### Base URL
-
-```
-https://api.worldbank.org/v2
-```
-
-HTTP redirects to HTTPS (302). Always use HTTPS directly.
-
-### Endpoint patterns
-
-| Endpoint | Description |
-|---|---|
-| `/country/{code}/indicator/{id}` | Single country + indicator time series |
-| `/country/{c1};{c2};{c3}/indicator/{id}` | Multi-country (semicolon-delimited) |
-| `/country/all/indicator/{id}` | All countries |
-| `/country/{code}` | Country metadata |
-| `/country` | All countries metadata (filterable) |
-| `/indicator` | All indicators list |
-| `/indicator/{id}` | Single indicator metadata |
-| `/topic/{id}/indicator` | Indicators for a topic |
-| `/topics` | All topics |
-
-### Query parameters
-
-| Parameter | Values | Notes |
-|---|---|---|
-| `format` | `json`, `xml` (default) | Always set `format=json` |
-| `per_page` | integer, default 50, max 1000 | Higher is faster for bulk |
-| `page` | integer, default 1 | For paginating results |
-| `date` | `2020`, `2000:2023` | Single year or colon-separated range |
-| `mrv` | integer | N most recent non-null values |
-| `gapfill` | `Y` | Forward-fill nulls when used with `mrv` |
-| `incomeLevel` | `LIC`, `MIC`, `HIC`, `LMC`, `UMC` | Filter countries by income |
-| `region` | `EAS`, `ECS`, `LAC`, `MEA`, `NAC`, `SAS`, `SSF` | Filter countries by region |
-
-### Common indicator IDs (confirmed working, 2026-04-18)
-
-| ID | Name |
-|---|---|
-| `NY.GDP.MKTP.CD` | GDP (current US$) |
-| `NY.GDP.PCAP.CD` | GDP per capita (current US$) |
-| `SP.POP.TOTL` | Population, total |
-| `SL.UEM.TOTL.ZS` | Unemployment (% of labor force) |
-| `FP.CPI.TOTL.ZG` | Inflation, consumer prices (%) |
-| `NE.EXP.GNFS.ZS` | Exports of goods and services (% of GDP) |
-| `SP.DYN.LE00.IN` | Life expectancy at birth |
-| `SE.ADT.LITR.ZS` | Literacy rate, adult total (%) |
-| `EG.USE.PCAP.KG.OE` | Energy use per capita (kg of oil equiv.) |
-
-Find more: `https://api.worldbank.org/v2/indicator?format=json&per_page=50&page=N`
-
-### Country codes (ISO2)
-
-Standard ISO 3166-1 alpha-2 codes: `US`, `CN`, `GB`, `DE`, `JP`, `IN`, `BR`, etc.
-Special: `all` for all countries.
-
-## Response structure
-
-Every endpoint returns a 2-element array:
-
-```json
-[
-  {
-    "page": 1,
-    "pages": 2,
-    "per_page": 50,
-    "total": 66,
-    "sourceid": "2",
-    "lastupdated": "2026-04-08"
-  },
-  [
-    {
-      "indicator": {"id": "NY.GDP.MKTP.CD", "value": "GDP (current US$)"},
-      "country": {"id": "US", "value": "United States"},
-      "countryiso3code": "USA",
-      "date": "2024",
-      "value": 28750956130731.2,
-      "unit": "",
-      "obs_status": "",
-      "decimal": 0
-    },
-    ...
-  ]
-]
-```
-
-Country metadata endpoint returns same 2-element shape but with country objects (not indicator rows) at index 1.
-
-## Gotchas
-
-- **Response is always a 2-element array, not a dict.** `d = json.loads(raw)` gives a list. `d[0]` is pagination metadata, `d[1]` is the data list. Accessing `d["page"]` raises `TypeError`. This is the most common mistake.
-
-- **`value` can be null.** Recent years (e.g. 2025) and data-sparse countries frequently have `null` values. Always check `if r["value"] is not None` before using. Use `mrv=N` to skip nulls automatically.
-
-- **Always append `?format=json`.** The default response format is XML. Without `format=json`, you get an XML string that fails `json.loads`.
-
-- **`all/indicator/{id}` includes regional aggregates.** The "all countries" endpoint returns 266 entries including aggregated regions like "Africa Eastern and Southern" (`id: "ZH"`). Filter to real countries with `len(r["country"]["id"]) == 2` (ISO2 codes are always 2 chars; aggregate codes are 2-3 chars but with non-standard values).
-
-- **Semicolons in URL path, not query string.** Multi-country requests use `country/US;CN;GB/indicator/...` not `?countries=US,CN,GB`. Commas do not work.
-
-- **HTTP 302 redirects HTTP to HTTPS.** Always use `https://` directly to avoid an extra round trip.
-
-- **`per_page` in metadata is sometimes a string, sometimes an integer.** The API returns `"per_page": "50"` (string) for some endpoints and `"per_page": 50` (int) for others. Don't compare with `==` without casting: `int(meta["per_page"])`.
-
-- **Invalid country codes return an error object, not a 2-element array.** A bad code gives `[{"message": [{"id": "120", "key": "Invalid value", ...}]}]` — a 1-element list with an error dict. Check `if isinstance(d[0], dict) and "message" in d[0]` before accessing `d[1]`.
-
-- **`mrv` + `gapfill=Y` forward-fills the latest value into future years.** If 2024 is the latest data point and `mrv=3`, `gapfill=Y` returns 2025 (the current year) with the 2024 value copied in. Useful for "current" lookups, but the filled date is misleading.
-
-- **No rate limit documented, but 3 req/s sustained is safe.** The API handles bursts (parallel ThreadPoolExecutor with `max_workers=3`) without issue. For crawling thousands of indicators, add `time.sleep(0.5)` between pages.
-
-- **`date` range returns records newest-first.** Results within a date range are sorted descending by year. If you need ascending order, sort after fetching: `sorted(rows, key=lambda r: r["date"])`.
-
-- **Indicator IDs are case-sensitive.** `ny.gdp.mktp.cd` returns an error; use the uppercase dot-separated form `NY.GDP.MKTP.CD`.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/xiaohongshu/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/xiaohongshu/scraping.md
deleted file mode 100644
index 1c149fce2..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/xiaohongshu/scraping.md
+++ /dev/null
@@ -1,84 +0,0 @@
-# Xiaohongshu — Search and Sort
-
-URL patterns:
-- Home / discovery: `https://www.xiaohongshu.com/explore`
-- Search results: `https://www.xiaohongshu.com/search_result?keyword=...`
-
-## Search flow
-
-- Prefer direct navigation to the desktop search results page over automating the home-page search box.
-- Reliable primary path: `https://www.xiaohongshu.com/search_result?keyword=<url-encoded keyword>&source=web_explore_feed`
-- This route loads the normal desktop results page and avoids home-page input flakiness.
-- The search results page can also appear with variants such as `type=51` or other `source` values after in-app navigation; do not treat those as suspicious if the rendered results are correct.
-- The top search box on `explore` can work, and searching from the home page has transitioned to `search_result` without a login wall in some sessions.
-- The page exposes duplicate search inputs in the DOM with the same placeholder `搜索小红书`.
-- The home-page search input can behave like a tightly controlled app field: direct DOM value assignment may be cleared immediately, and harness `type_text()` may fail to populate it even when the input is focused.
-- Treat the home-page input as best-effort only. Use it when a human-like interactive flow matters, but for automation default to constructing the `search_result` URL directly.
-
-## Sort behavior
-
-- On the current desktop results layout, `最新` is **not** a top-level tab beside `综合`.
-- Open the `筛选` control in the upper-right of the results header to access sort options.
-- Inside `筛选`, `排序依据` contains:
-  - `综合`
-  - `最新`
-  - `最多点赞`
-  - `最多评论`
-  - `最多收藏`
-- The `排序依据` row can render duplicate DOM nodes for the same pill text, including non-interactive clones.
-- Raw global text search for `最新` can hit the wrong node first. Scope to the `排序依据` section and then choose the visible interactive `.tags` node.
-- Prefer semantic filtering such as `aria-hidden != "true"` or section-scoped visible `.tags` selection over style-specific checks.
-- When `最新` is active, the `筛选` trigger changes to `已筛选`.
-- The rendered feed and the `已筛选` / active-pill UI are more reliable than `window.__INITIAL_STATE__.search.searchContext.sort` for confirming latest sort.
-
-## Stable cues
-
-- Search channel tabs near the top: `全部`, `图文`, `视频`, `用户`
-- Sort panel labels: `筛选`, `排序依据`, `最新`
-- Filter sections also visible in the panel: `笔记类型`, `发布时间`, `搜索范围`, `位置距离`
-
-## Interaction notes
-
-- DOM `.click()` opened the `筛选` panel reliably.
-- DOM `.click()` on the visible `最新` pill inside the open `排序依据` section reliably activated latest sort.
-- The reliable DOM pattern was:
-  - find the `排序依据` section / `.filters` block
-  - search within that block for `.tags`
-  - choose the one whose text is `最新` and which is the visible interactive node
-  - call `.click()` on that visible node
-- Example selector strategy:
-  - find `.filters` whose first label is `排序依据`
-  - inside it, pick `.tags` where `textContent.trim() === "最新"` and `el.getAttribute("aria-hidden") !== "true"`
-- `getClientRects().length > 0` alone may be insufficient to distinguish the working node from a duplicate.
-- A broad `document.querySelectorAll("*")` text match for `最新` is not reliable on this page because it may click the hidden duplicate instead of the visible control.
-- Coordinate click on the visible `最新` pill also worked and remains a valid fallback if DOM targeting gets confused by future UI changes.
-- After selecting `最新`, the grid briefly showed skeleton placeholders before the refreshed results appeared.
-- The search page stores the currently rendered note cards in `window.__INITIAL_STATE__.search.feeds._value` as an array of feed entries. For ordinary note cards, the useful fields were:
-  - `id`
-  - `xsecToken`
-  - `noteCard.displayTitle`
-  - `noteCard.user.nickname`
-- The feed array can contain non-note inserts such as hot-query modules. Filter for entries with `noteCard` before treating an item as a note result.
-
-## Post opening
-
-- Do **not** assume a raw results link like `https://www.xiaohongshu.com/explore/<id>` is directly openable.
-- Opening that raw `/explore/<id>` URL in a fresh tab can redirect to the web `404` / app-only gate even when the same post is openable from search results.
-- To open a post from search results, click the visible card image / card in-page first.
-- That click navigation can land on a tokenized URL like `https://www.xiaohongshu.com/explore/<id>?xsec_token=...&xsec_source=pc_search`, which is a more reliable note URL than the raw `/explore/<id>` form.
-- Once the tokenized URL is obtained from the click flow, it can be revisited in-session for extraction.
-- If the search results state is already loaded, you can reconstruct the tokenized note URL directly from a feed item without re-clicking:
-  - `https://www.xiaohongshu.com/explore/<id>?xsec_token=<xsecToken>&xsec_source=pc_search`
-
-## Post extraction
-
-- On tokenized post pages opened via `pc_search`, `document.body.innerText` can be a useful first-pass extraction source because it often includes the rendered note text, hashtags, timestamp, engagement counts, and visible comments.
-- Verify that the note content actually rendered before trusting `document.body.innerText`, because the page can also include substantial navigation, footer, and comment noise.
-- Prefer `document.body.innerText` as a fallback or initial probe before writing fragile per-element selectors for post content.
-
-## Gotchas
-
-- Do not assume `Enter` alone finished the workflow until you verify the URL changed to `search_result` or the result grid appeared.
-- Do not assume the visible `综合` tab controls all sorting; on this layout, time ordering is hidden inside `筛选`.
-- Do not assume the first DOM node whose text is `最新` is the clickable one; this panel duplicates pills and the hidden clone can absorb naive text-based targeting without changing state.
-- Do not assume a successfully opened post can be reproduced by stripping query params; preserve the `xsec_token` when reopening results-derived post URLs.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/youtube/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/youtube/scraping.md
deleted file mode 100644
index 0b89cc19a..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/youtube/scraping.md
+++ /dev/null
@@ -1,418 +0,0 @@
-# YouTube — Data Extraction
-
-Field-tested against youtube.com on 2026-04-21.
-No authentication required for any approach documented here.
-
----
-
-## Approach 1 (Fastest): oEmbed API — No Auth, No Browser
-
-`https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v={VIDEO_ID}&format=json`
-
-Returns JSON in ~0.3s. Works for any public video. Does **not** require login.
-
-```python
-from helpers import http_get
-import json
-
-def youtube_oembed(video_id):
-    """Fetch oEmbed metadata for a YouTube video.
-
-    Returns title, author, thumbnail URL, and embed iframe HTML.
-    """
-    url = f"https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v={video_id}&format=json"
-    return json.loads(http_get(url))
-
-data = youtube_oembed("dQw4w9WgXcQ")
-# {
-#   "title":            "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
-#   "author_name":      "Rick Astley",
-#   "author_url":       "https://www.youtube.com/@RickAstleyYT",
-#   "type":             "video",
-#   "thumbnail_url":    "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg",
-#   "thumbnail_width":  480,
-#   "thumbnail_height": 360,
-#   "width":            200,
-#   "height":           113,
-#   "version":          "1.0",
-#   "provider_name":    "YouTube",
-#   "html":             '<iframe width="200" height="113" src="https://www.youtube.com/embed/dQw4w9WgXcQ?feature=oembed" ...>'
-# }
-```
-
-### Bulk oEmbed (ThreadPoolExecutor)
-
-```python
-from concurrent.futures import ThreadPoolExecutor
-import json
-from helpers import http_get
-
-video_ids = ["dQw4w9WgXcQ", "jNQXAC9IVRw", "9bZkp7q19f0"]
-
-def fetch_oembed(vid):
-    url = f"https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v={vid}&format=json"
-    try:
-        return json.loads(http_get(url))
-    except Exception as e:
-        return {"error": str(e), "id": vid}
-
-with ThreadPoolExecutor(max_workers=5) as ex:
-    results = list(ex.map(fetch_oembed, video_ids))
-# 3 videos: ~4.1s total (YouTube oEmbed is slower than Spotify's; don't use >5 workers)
-```
-
----
-
-## Approach 2: Watch Page — Full Metadata via ytInitialPlayerResponse
-
-Every `youtube.com/watch?v={ID}` page embeds two JSON blobs in the HTML:
-
-- `ytInitialPlayerResponse` — video details, microformat, caption track list
-- `ytInitialData` — comments section structure, related videos
-
-### Extract all video metadata
-
-```python
-from helpers import http_get
-import json, re
-
-def scrape_video(video_id):
-    html = http_get(f"https://www.youtube.com/watch?v={video_id}")
-
-    # ---- ytInitialPlayerResponse ----
-    m = re.search(r'var ytInitialPlayerResponse = (\{.*?\});(?:var|</script>)', html, re.DOTALL)
-    if not m:
-        raise ValueError(f"ytInitialPlayerResponse not found for video {video_id} — video may be private, deleted, or region-blocked")
-    pr = json.loads(m.group(1))
-
-    # Check playability before parsing
-    status = pr.get("playabilityStatus", {}).get("status")
-    if status == "LOGIN_REQUIRED":
-        raise ValueError(f"Video {video_id} is age-restricted or login-gated (playabilityStatus: LOGIN_REQUIRED)")
-    if status == "ERROR":
-        reason = pr.get("playabilityStatus", {}).get("reason", "unknown")
-        raise ValueError(f"Video {video_id} is unavailable: {reason}")
-
-    vd  = pr["videoDetails"]
-    mf  = pr["microformat"]["playerMicroformatRenderer"]
-    caps = pr.get("captions", {}) \
-             .get("playerCaptionsTracklistRenderer", {}) \
-             .get("captionTracks", [])
-
-    return {
-        # Core
-        "video_id":      vd["videoId"],
-        "title":         vd["title"],
-        "author":        vd["author"],
-        "channel_id":    vd["channelId"],
-        "description":   vd["shortDescription"],
-        "duration_s":    int(vd["lengthSeconds"]),
-        "view_count":    int(vd["viewCount"]),
-        "keywords":      vd.get("keywords", []),
-        "is_live":       vd.get("isLiveContent", False),
-        "is_private":    vd.get("isPrivate", False),
-        # Microformat (richer publishing data)
-        "publish_date":  mf.get("publishDate"),   # ISO 8601, e.g. "2009-10-25T00:57:33-07:00"
-        "upload_date":   mf.get("uploadDate"),
-        "category":      mf.get("category"),       # e.g. "Music", "Gaming", "Education"
-        "like_count":    int(mf.get("likeCount", 0)),
-        "is_family_safe": mf.get("isFamilySafe"),
-        "is_unlisted":   mf.get("isUnlisted"),
-        "available_countries": mf.get("availableCountries", []),  # list of ISO 3166-1 alpha-2 codes
-        "channel_name":  mf.get("ownerChannelName"),
-        "channel_url":   mf.get("ownerProfileUrl"),
-        "embed_url":     mf.get("embed", {}).get("iframeUrl"),
-        # Thumbnails (all publicly accessible, no auth)
-        "thumbnail_hq":  f"https://i.ytimg.com/vi/{video_id}/hqdefault.jpg",    # 480×360, always exists
-        "thumbnail_max": f"https://i.ytimg.com/vi/{video_id}/maxresdefault.jpg", # 1280×720, may 404
-        # Caption tracks — baseUrl is included for reference but returns empty in practice;
-        # use the Show Transcript UI flow in the browser instead (see playback.md)
-        "caption_tracks": [
-            {
-                "lang":     t.get("languageCode"),
-                "name":     t.get("name", {}).get("simpleText"),
-                "kind":     t.get("kind", "manual"),  # "manual" or "asr" (auto-generated)
-                "base_url": t.get("baseUrl"),
-            }
-            for t in caps
-        ],
-    }
-
-video = scrape_video("dQw4w9WgXcQ")
-# {
-#   "video_id":     "dQw4w9WgXcQ",
-#   "title":        "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
-#   "author":       "Rick Astley",
-#   "channel_id":   "UCuAXFkgsw1L7xaCfnd5JJOw",
-#   "duration_s":   213,
-#   "view_count":   1764468859,
-#   "publish_date": "2009-10-24T23:57:33-07:00",
-#   "category":     "Music",
-#   "like_count":   16942558,
-#   "caption_tracks": [
-#     {"lang": "en",    "name": "English",                "kind": "manual"},
-#     {"lang": "en",    "name": "English (auto-generated)", "kind": "asr"},
-#     {"lang": "de-DE", "name": "German (Germany)",        "kind": "manual"},
-#     ...
-#   ],
-# }
-```
-
----
-
-## Approach 3: Search Results — No Auth
-
-`youtube.com/results?search_query={QUERY}` is server-side rendered. The `ytInitialData` blob contains up to ~20 video results.
-
-```python
-from helpers import http_get
-import json, re
-from urllib.parse import quote_plus
-
-def youtube_search(query, max_results=20):
-    """Search YouTube videos without a browser or API key."""
-    url = f"https://www.youtube.com/results?search_query={quote_plus(query)}"
-    html = http_get(url)
-
-    m = re.search(r'var ytInitialData = (\{.*?\});</script>', html, re.DOTALL)
-    data = json.loads(m.group(1))
-
-    # Walk the nested structure to find videoRenderer items
-    section_contents = (
-        data.get("contents", {})
-            .get("twoColumnSearchResultsRenderer", {})
-            .get("primaryContents", {})
-            .get("sectionListRenderer", {})
-            .get("contents", [])
-    )
-
-    results = []
-    for section in section_contents:
-        for item in section.get("itemSectionRenderer", {}).get("contents", []):
-            vr = item.get("videoRenderer", {})
-            if not vr:
-                continue
-            snippet = vr.get("detailedMetadataSnippets", [])
-            desc = (
-                "".join(r.get("text", "") for r in snippet[0]["snippetText"]["runs"])
-                if snippet else None
-            )
-            results.append({
-                "video_id":   vr["videoId"],
-                "url":        f"https://www.youtube.com/watch?v={vr['videoId']}",
-                "title":      vr.get("title", {}).get("runs", [{}])[0].get("text"),
-                "channel":    vr.get("ownerText", {}).get("runs", [{}])[0].get("text"),
-                "channel_url": (
-                    "https://www.youtube.com"
-                    + vr.get("ownerText", {}).get("runs", [{}])[0]
-                                              .get("navigationEndpoint", {})
-                                              .get("browseEndpoint", {})
-                                              .get("canonicalBaseUrl", "")
-                ),
-                "duration":   vr.get("lengthText", {}).get("simpleText"),  # e.g. "3:32"
-                "views":      vr.get("viewCountText", {}).get("simpleText"),  # e.g. "1,764,468,859 views"
-                "published":  vr.get("publishedTimeText", {}).get("simpleText"),  # e.g. "7 years ago"
-                "description_snippet": desc,
-                "thumbnail":  f"https://i.ytimg.com/vi/{vr['videoId']}/hqdefault.jpg",
-            })
-            if len(results) >= max_results:
-                return results  # exit both loops immediately
-    return results
-
-results = youtube_search("python tutorial", max_results=5)
-# Returns up to ~14-20 results (YouTube serves fewer than 20 on first page)
-# [
-#   {
-#     "video_id":  "K5KVEU3aaeQ",
-#     "title":     "Python Full Course for Beginners",
-#     "channel":   "Programming with Mosh",
-#     "duration":  "2:02:21",
-#     "views":     "6,056,121 views",
-#     "published": "1 year ago",
-#   }, ...
-# ]
-```
-
----
-
-## Approach 4: Channel Metadata — No Auth
-
-Channel pages (`youtube.com/@handle` or `youtube.com/channel/{CHANNEL_ID}`) embed metadata in `ytInitialData`.
-
-```python
-from helpers import http_get
-import json, re
-
-def scrape_channel(handle_or_id):
-    """
-    handle_or_id: "@RickAstleyYT"           (handle, with @)
-                  "UCuAXFkgsw1L7xaCfnd5JJOw" (channel ID)
-    """
-    if handle_or_id.startswith("UC"):
-        url = f"https://www.youtube.com/channel/{handle_or_id}"
-    else:
-        url = f"https://www.youtube.com/{handle_or_id}"
-
-    html = http_get(url)
-    m = re.search(r'var ytInitialData = (\{.*?\});</script>', html, re.DOTALL)
-    data = json.loads(m.group(1))
-
-    # Canonical metadata (always present)
-    cmd = data.get("metadata", {}).get("channelMetadataRenderer", {})
-
-    # Subscriber count + handle from pageHeaderViewModel
-    ph = (
-        data.get("header", {})
-            .get("pageHeaderRenderer", {})
-            .get("content", {})
-            .get("pageHeaderViewModel", {})
-    )
-    meta_parts = [
-        part.get("text", {}).get("content", "")
-        for row in ph.get("metadata", {})
-                     .get("contentMetadataViewModel", {})
-                     .get("metadataRows", [])
-        for part in row.get("metadataParts", [])
-    ]
-    # meta_parts is typically: ["@handle", "4.48m subscribers", "N videos"]
-
-    # Avatar (take the largest source)
-    avatar_sources = (
-        ph.get("image", {})
-          .get("decoratedAvatarViewModel", {})
-          .get("avatar", {})
-          .get("avatarViewModel", {})
-          .get("image", {})
-          .get("sources", [])
-    )
-    avatar_url = avatar_sources[-1]["url"] if avatar_sources else None
-
-    # Channel banner
-    banner_sources = (
-        ph.get("banner", {})
-          .get("imageBannerViewModel", {})
-          .get("image", {})
-          .get("sources", [])
-    )
-    banner_url = banner_sources[-1]["url"] if banner_sources else None
-
-    return {
-        "channel_id":  cmd.get("externalId"),
-        "title":       cmd.get("title"),
-        "description": cmd.get("description"),
-        "channel_url": cmd.get("channelUrl"),
-        "keywords":    cmd.get("keywords", ""),
-        "handle":      meta_parts[0] if len(meta_parts) > 0 else None,
-        "subscribers": meta_parts[1] if len(meta_parts) > 1 else None,  # e.g. "4.48m subscribers"
-        "avatar_url":  avatar_url,
-        "banner_url":  banner_url,
-    }
-
-channel = scrape_channel("@RickAstleyYT")
-# {
-#   "channel_id":  "UCuAXFkgsw1L7xaCfnd5JJOw",
-#   "title":       "Rick Astley",
-#   "description": "2026 UK & Ireland Reflection Tour...",
-#   "channel_url": "https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw",
-#   "handle":      "@RickAstleyYT",
-#   "subscribers": "4.48m subscribers",
-#   "avatar_url":  "https://yt3.googleusercontent.com/...",
-#   "banner_url":  "https://yt3.googleusercontent.com/...",
-# }
-```
-
----
-
-## Thumbnail URLs — All Sizes
-
-All thumbnail sizes are publicly accessible without auth. Construct directly from `video_id`:
-
-```python
-def thumbnail_urls(video_id):
-    base = f"https://i.ytimg.com/vi/{video_id}"
-    return {
-        "default":  f"{base}/default.jpg",      # 120×90,  always exists
-        "medium":   f"{base}/mqdefault.jpg",     # 320×180, always exists
-        "high":     f"{base}/hqdefault.jpg",     # 480×360, always exists
-        "standard": f"{base}/sddefault.jpg",     # 640×480, always exists
-        "maxres":   f"{base}/maxresdefault.jpg", # 1280×720, may 404 on older/low-res videos
-    }
-```
-
----
-
-## Extract Video ID from Any URL
-
-```python
-import re
-
-def extract_video_id(url):
-    """Extract YouTube video ID (11-char) from any YouTube URL format."""
-    m = re.search(r'(?:v=|/v/|youtu\.be/|/embed/|/shorts/)([A-Za-z0-9_-]{11})', url)
-    return m.group(1) if m else None
-
-extract_video_id("https://www.youtube.com/watch?v=dQw4w9WgXcQ")  # "dQw4w9WgXcQ"
-extract_video_id("https://youtu.be/dQw4w9WgXcQ")                  # "dQw4w9WgXcQ"
-extract_video_id("https://www.youtube.com/shorts/dQw4w9WgXcQ")    # "dQw4w9WgXcQ"
-extract_video_id("https://www.youtube.com/embed/dQw4w9WgXcQ")     # "dQw4w9WgXcQ"
-```
-
----
-
-## What Requires a Browser
-
-The following are **not accessible** via `http_get` and require the CDP browser (see `playback.md`):
-
-- **Trending / Explore** (`/feed/trending`) — `ytInitialData` loads but video items are empty without cookies
-- **Playlist contents** — `ytInitialData` returns only microformat; full video list requires session cookies
-- **Comments** — loaded lazily via XHR, not present in initial HTML
-- **Shorts feed** — requires JS hydration
-- **Channel Videos tab** — video list requires cookies for consistent results
-- **Caption text content** — `captionTracks[].baseUrl` URLs return empty bytes regardless of session state; use the browser's Show Transcript UI flow instead (see `playback.md`)
-- **Age-restricted videos** — oEmbed returns HTTP 401; `scrape_video()` raises `ValueError("LOGIN_REQUIRED")`
-
-### Watch-page DOM hydration — the wait you need
-
-When you do fall through to the browser for watch-page DOM (e.g. because you need a
-rendered UI state, not just metadata), `wait_for_load()` is **not** enough. The `load`
-event fires before YouTube's Polymer components hydrate — `h1.ytd-watch-metadata yt-formatted-string`,
-`ytd-video-owner-renderer #channel-name a`, and `ytd-watch-info-text` all return `null` for
-~2s after load. Add a `wait(3)` after `wait_for_load()` before querying any watch-page
-selector.
-
-Field-tested 2026-04-24 on Brave; same behavior observed on ungoogled-chromium. Prefer
-the `http_get` + `ytInitialPlayerResponse` path above — the browser path exists for flows
-that need live UI state, not for reading metadata.
-
----
-
-## URL Patterns
-
-| Resource       | URL pattern                                                        |
-|----------------|--------------------------------------------------------------------|
-| Video          | `https://www.youtube.com/watch?v={VIDEO_ID}`                       |
-| Short URL      | `https://youtu.be/{VIDEO_ID}`                                      |
-| Shorts         | `https://www.youtube.com/shorts/{VIDEO_ID}`                        |
-| Channel handle | `https://www.youtube.com/@{HANDLE}`                                |
-| Channel ID     | `https://www.youtube.com/channel/{CHANNEL_ID}`                     |
-| Playlist       | `https://www.youtube.com/playlist?list={PLAYLIST_ID}`              |
-| Search         | `https://www.youtube.com/results?search_query={QUERY}`             |
-| oEmbed         | `https://www.youtube.com/oembed?url={VIDEO_URL}&format=json`       |
-| Thumbnail (HQ) | `https://i.ytimg.com/vi/{VIDEO_ID}/hqdefault.jpg`                  |
-
----
-
-## Gotchas
-
-- **`ytInitialPlayerResponse` regex must use non-greedy match with lookahead** — `(\{.*?\});(?:var|</script>)` with `re.DOTALL` is reliable. Do not use `\{.*\}` (greedy) — it consumes the entire rest of the page.
-- **`viewCount` and `lengthSeconds` are strings, not ints** — `vd["viewCount"]` returns `"1764468859"`. Always cast with `int()`.
-- **`likeCount` lives in `microformat`, not `videoDetails`** — `videoDetails` does not expose like count. `microformat.playerMicroformatRenderer.likeCount` is a string integer.
-- **`availableCountries` is a list of ISO 3166-1 alpha-2 codes** — 249 entries for globally available videos. An empty list means region data is unavailable, not that the video is globally blocked.
-- **oEmbed thumbnail is always `hqdefault` (480×360)** — if you need 1280×720, construct the `maxresdefault.jpg` URL directly, but check for 404 on older videos.
-- **Search returns ~14–20 results** — YouTube does not guarantee 20 results. Always iterate `itemSectionRenderer.contents` rather than assuming a fixed count.
-- **Channel subscriber count is a rounded string** — `"4.48m subscribers"`, not an integer. Parse with regex if sorting: `re.search(r'([\d.]+)\s*([km]?)', text, re.I)`.
-- **`meta_parts` order is `[handle, subscribers, video_count]`** — the third element is not always present. Index defensively.
-- **Caption `baseUrl` is not fetchable** — `captionTracks[].baseUrl` contains `expire=` and `signature=` params but returns empty bytes in all tested conditions (plain `http_get`, XHR from within the page, and `fetch()` with cookies). Use the Show Transcript UI in the browser for caption text (see `playback.md`).
-- **Age-restricted videos** — `scrape_video()` raises `ValueError` with a `LOGIN_REQUIRED` message. `oEmbed` returns HTTP 401 (raises `urllib.error.HTTPError`). Neither approach can access age-restricted content without login.
-- **Private / deleted videos** — oEmbed returns HTTP 404 (raises `urllib.error.HTTPError`). Wrap in `try/except`.
-- **`ytInitialData` blob terminator is `;</script>`** — using `re.DOTALL` with `(\{.*?\});</script>` is safe; the blob does not contain `;</script>` internally.
diff --git a/packages/bcode-browser/harness/agent-workspace/domain-skills/zillow/scraping.md b/packages/bcode-browser/harness/agent-workspace/domain-skills/zillow/scraping.md
deleted file mode 100644
index 391f9c174..000000000
--- a/packages/bcode-browser/harness/agent-workspace/domain-skills/zillow/scraping.md
+++ /dev/null
@@ -1,433 +0,0 @@
-# Zillow — Scraping & Data Extraction
-
-Field-tested against `www.zillow.com` on 2026-04-18 using `http_get` (no browser).
-
-## Quick summary
-
-- **Search listing pages (`/homes/`, `/sold/`, `/rentals/`)** — `http_get` works with full Chrome headers. Returns ~973 KB HTML with all listing data embedded in `__NEXT_DATA__` JSON.
-- **Individual property detail pages (`/homedetails/`)** — `http_get` returns **HTTP 403** unconditionally. No header combination bypasses this.
-- **Internal API endpoints** (`/async-create-search-page-state`, `/graphql/`) — **403** for all server-side requests regardless of headers.
-- **Redfin** — `http_get` works; HTML contains both JSON-LD per listing and a stingray JSON API.
-
----
-
-## What works: search listing pages via `__NEXT_DATA__`
-
-Zillow search pages embed all listing data in `<script id="__NEXT_DATA__">`. This is standard Next.js SSR output — it is the same data Zillow's React app hydrates from.
-
-**Required headers** — The single-word User-Agent (`"Mozilla/5.0"`) used by `http_get` internally gets 403. You must pass a full Chrome UA plus Accept/Accept-Language headers:
-
-```python
-import re, json
-from helpers import http_get
-
-HEADERS = {
-    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
-    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-    "Accept-Language": "en-US,en;q=0.9",
-}
-
-def extract_listings(html):
-    """Parse Zillow __NEXT_DATA__ and return list of listing dicts."""
-    m = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-    if not m:
-        return []
-    d = json.loads(m.group(1))
-    sps = d['props']['pageProps']['searchPageState']
-    return sps['cat1']['searchResults']['listResults']
-
-html = http_get("https://www.zillow.com/homes/San-Francisco,-CA_rb/", headers=HEADERS)
-listings = extract_listings(html)
-print(len(listings))  # 41 — always 41 per page
-```
-
-### Fields available in each listing card
-
-The `listResults` array is the canonical source. Each entry includes:
-
-| Field | Source | Example |
-|---|---|---|
-| `zpid` | listing | `15081707` |
-| `address` | listing | `"212 Spruce St, San Francisco, CA 94118"` |
-| `addressStreet`, `addressCity`, `addressState`, `addressZipcode` | listing | split address components |
-| `price` | listing | `"$4,395,000"` (formatted string) |
-| `unformattedPrice` | listing | `4395000` (int, use for math) |
-| `beds` | listing | `4` |
-| `baths` | listing | `4` |
-| `area` | listing | `4133` (sqft) |
-| `latLong` | listing | `{'latitude': 37.78867, 'longitude': -122.45361}` |
-| `statusType` | listing | `"FOR_SALE"` / `"FOR_RENT"` / `"RECENTLY_SOLD"` |
-| `detailUrl` | listing | full `https://www.zillow.com/homedetails/...` URL |
-| `zestimate` | listing | `4857200` (Zillow AI estimate, int) |
-| `imgSrc` | listing | thumbnail URL |
-| `has3DModel` | listing | `True`/`False` |
-| `hasOpenHouse` | listing | `True`/`False` |
-| `openHouseStartDate`, `openHouseEndDate` | listing | ISO strings |
-| `isFeaturedListing` | listing | sponsored/featured flag |
-| `brokerName` | listing | `"Sotheby's International Realty"` |
-| `statusText` | listing | `"FOR SALE"` display string |
-| `hdpData.homeInfo.price` | nested | raw price int (matches `unformattedPrice`) |
-| `hdpData.homeInfo.zestimate` | nested | raw zestimate int |
-| `hdpData.homeInfo.rentZestimate` | nested | monthly rent estimate |
-| `hdpData.homeInfo.homeType` | nested | `"SINGLE_FAMILY"`, `"CONDO"`, `"TOWNHOUSE"` etc. |
-| `hdpData.homeInfo.daysOnZillow` | nested | int |
-| `hdpData.homeInfo.taxAssessedValue` | nested | int |
-| `hdpData.homeInfo.lotAreaValue` + `lotAreaUnit` | nested | e.g. `2957.724`, `"sqft"` |
-| `hdpData.homeInfo.priceForHDP` | nested | reliable sold price for recently-sold listings |
-
-```python
-# Full extraction snippet
-listing = listings[0]
-hi = listing.get('hdpData', {}).get('homeInfo', {})
-
-record = {
-    "zpid":         listing['zpid'],
-    "address":      listing['address'],
-    "price_raw":    listing.get('unformattedPrice') or hi.get('price'),
-    "beds":         listing.get('beds'),
-    "baths":        listing.get('baths'),
-    "sqft":         listing.get('area'),
-    "lat":          listing['latLong']['latitude'],
-    "lon":          listing['latLong']['longitude'],
-    "status":       listing['statusType'],
-    "zestimate":    listing.get('zestimate'),
-    "rent_zest":    hi.get('rentZestimate'),
-    "home_type":    hi.get('homeType'),
-    "days_listed":  hi.get('daysOnZillow'),
-    "tax_assessed": hi.get('taxAssessedValue'),
-    "url":          listing['detailUrl'],
-}
-```
-
-### Total result count and pagination
-
-```python
-d = json.loads(re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL).group(1))
-sps = d['props']['pageProps']['searchPageState']
-
-# Total listings in this search
-total = sps['categoryTotals']['cat1']['totalResultCount']
-print(total)  # 1037
-
-# Each page returns exactly 41 listings. Add /<N>_p/ for subsequent pages:
-# Page 2: https://www.zillow.com/homes/San-Francisco,-CA_rb/2_p/
-# Page 3: https://www.zillow.com/homes/San-Francisco,-CA_rb/3_p/
-
-max_pages = (total + 40) // 41
-```
-
-### Scrape all pages
-
-```python
-import re, json, time
-from helpers import http_get
-
-HEADERS = {
-    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
-    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-    "Accept-Language": "en-US,en;q=0.9",
-}
-
-def get_listings(city_slug, page=1):
-    """city_slug: e.g. 'San-Francisco,-CA', 'Seattle,-WA', 'Austin,-TX'"""
-    if page == 1:
-        url = f"https://www.zillow.com/homes/{city_slug}_rb/"
-    else:
-        url = f"https://www.zillow.com/homes/{city_slug}_rb/{page}_p/"
-    html = http_get(url, headers=HEADERS)
-    m = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL)
-    d = json.loads(m.group(1))
-    sps = d['props']['pageProps']['searchPageState']
-    total = sps['categoryTotals']['cat1']['totalResultCount']
-    listings = sps['cat1']['searchResults']['listResults']
-    return listings, total
-
-all_listings = []
-listings, total = get_listings("San-Francisco,-CA")
-all_listings.extend(listings)
-
-max_pages = (total + 40) // 41
-for page in range(2, min(max_pages + 1, 6)):   # cap at 5 pages for demo
-    time.sleep(1.0)   # polite delay
-    page_listings, _ = get_listings("San-Francisco,-CA", page)
-    all_listings.extend(page_listings)
-
-print(f"Fetched {len(all_listings)} of {total} listings")
-```
-
----
-
-## URL patterns that work (all confirmed)
-
-| URL pattern | Status | Notes |
-|---|---|---|
-| `/homes/{city}_rb/` | **Works** | For-sale listings |
-| `/homes/{city}_rb/{N}_p/` | **Works** | Pagination |
-| `/homes/for_sale/{city}/0-1800000_price/` | **Works** | Price filter (max) |
-| `/homes/3-_beds/{city}/` | **Works** | Bed count filter |
-| `/homes/{zip}_rb/` | **Works** | ZIP code search |
-| `/san-francisco-ca/rentals/` | **Works** | Rental listings |
-| `/san-francisco-ca/sold/` | **Works** | Recently sold |
-| `/homedetails/{address}/{zpid}_zpid/` | **403** | Single property detail |
-| `/async-create-search-page-state` | **403** | Internal search API |
-| `/graphql/` | **400/403** | GraphQL endpoint |
-
----
-
-## Rental listings
-
-Rental search pages use the same `__NEXT_DATA__` structure. However, rental listing cards have a **different schema** — individual units are nested, not a flat price:
-
-```python
-html = http_get("https://www.zillow.com/san-francisco-ca/rentals/", headers=HEADERS)
-listings = extract_listings(html)
-
-r = listings[0]
-# Multi-unit buildings:
-# r['units'] = [{'price': '$3,485+', 'beds': '0', 'roomForRent': False}, ...]
-# r['minBaseRent'] = 3485
-# r['maxBaseRent'] = 7130
-# r['availabilityCount'] = 23
-
-# Single-unit rentals:
-# r['price'] = '$2,500/mo'
-# r['unformattedPrice'] = 2500
-
-# Check which type:
-if r.get('isBuilding'):
-    price_range = f"${r['minBaseRent']}–${r['maxBaseRent']}/mo"
-    units = r.get('units', [])
-else:
-    price = r.get('unformattedPrice') or r.get('hdpData', {}).get('homeInfo', {}).get('price')
-```
-
----
-
-## Sold listings
-
-Sold pages (`/sold/`) work identically. Key difference: `statusType` is `"RECENTLY_SOLD"` and price comes from `hdpData.homeInfo.priceForHDP` (not the `price` field which is `None` in sold cards):
-
-```python
-html = http_get("https://www.zillow.com/san-francisco-ca/sold/", headers=HEADERS)
-listings = extract_listings(html)
-
-for l in listings:
-    hi = l.get('hdpData', {}).get('homeInfo', {})
-    sold_price  = hi.get('priceForHDP')      # actual sold price
-    zestimate   = hi.get('zestimate')
-    tax_value   = hi.get('taxAssessedValue')
-    print(l['address'], f"${sold_price:,}", f"zest=${zestimate}")
-# 999 Green St APT 1702, San Francisco, CA 94133 $3,200,000 zest=$3,403,400
-# 1041 Vallejo St, San Francisco, CA 94133 $6,250,000 zest=None
-```
-
-Total sold inventory in San Francisco: **18,109** (all time in Zillow's database, paginated 41/page).
-
----
-
-## Bot detection behavior
-
-- **Zillow detects bot status server-side** and embeds `window.__USER_SESSION_INITIAL_STATE__` and `props.isBot` in the page.
-- In field testing, the page returned `isBot: False` with the Chrome User-Agent — **Zillow does not block the search pages**.
-- The page does embed `captcha` strings in the HTML (for the CAPTCHA challenge widget code), but the challenge is NOT triggered for search pages.
-- **`/homedetails/` pages do trigger blocking** — every property detail URL tested returned HTTP 403. This is enforced before serving HTML, not via JavaScript CAPTCHA.
-- Rate limiting: 3 rapid sequential requests to `/homes/` all succeeded. Observed no 429s. Add `time.sleep(0.5–1.0)` between pages as a courtesy.
-
----
-
-## What you do NOT get from `http_get`
-
-Because property detail pages are blocked (403), you lose:
-
-- Full property description text
-- All listing photos (you only get `imgSrc` thumbnail from search)
-- Detailed home facts (year built, parking, HVAC, school scores)
-- Price history
-- Nearby comparable sales (comps)
-- Agent contact info
-
-**To get these**, you must navigate to the `/homedetails/` URL in a browser session. The browser is not blocked (Zillow relies on JS challenges and fingerprinting that only trigger in browser context).
-
----
-
-## Alternative: Redfin (field-tested, more accessible)
-
-Redfin allows `http_get` with no blocking for both HTML pages and its internal API.
-
-### Redfin JSON-LD per listing (easiest)
-
-Each Redfin search results page embeds one `<script type="application/ld+json">` per listing with structured property data:
-
-```python
-import re, json
-from helpers import http_get
-
-HEADERS = {
-    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
-    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-    "Accept-Language": "en-US,en;q=0.9",
-}
-
-html = http_get(
-    "https://www.redfin.com/city/17151/CA/San-Francisco/filter/property-type=house",
-    headers=HEADERS
-)
-print(len(html))  # ~1.6 MB
-
-# Extract all SingleFamilyResidence JSON-LD entries
-properties = []
-for s in re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL):
-    try:
-        d = json.loads(s)
-        if isinstance(d, list):
-            for item in d:
-                if item.get('@type') in ('SingleFamilyResidence', 'House', 'Residence', 'Apartment'):
-                    properties.append(item)
-    except Exception:
-        pass
-
-prop = properties[0]
-print("Name:", prop['name'])               # "662 Hampshire St, San Francisco, CA 94110"
-print("Address:", prop['address'])
-# {'@type': 'PostalAddress', 'streetAddress': '662 Hampshire St',
-#  'addressLocality': 'San Francisco', 'addressRegion': 'CA',
-#  'postalCode': '94110', 'addressCountry': 'US'}
-print("Rooms:", prop['numberOfRooms'])     # 3
-print("Floor size:", prop['floorSize'])    # {'@type': 'QuantitativeValue', 'value': 3350, 'unitCode': 'FTK'}
-print("URL:", prop['url'])
-# https://www.redfin.com/CA/San-Francisco/662-Hampshire-St-94110/home/1533754
-```
-
-Note: The JSON-LD schema does NOT include price (Redfin omits `offers` from the LD+JSON). Use the stingray API below for price.
-
-### Redfin stingray API (structured JSON with price)
-
-Redfin's internal GIS/search API returns rich structured data including price, MLS ID, beds, baths, sqft, agent info, and remarks. Responses are prefixed with `{}&&` — strip it before parsing:
-
-```python
-import json
-from helpers import http_get
-
-def redfin_search(region_id, region_type=6, num_homes=20, page=1, uipt="1,2,3,4,5,6"):
-    """
-    region_type: 6=city, 2=zipcode, 5=county
-    uipt: property types (1=house, 2=condo, 3=townhouse, 4=multi-family, 5=land, 6=other)
-    """
-    url = (
-        f"https://www.redfin.com/stingray/api/gis"
-        f"?al=1&num_homes={num_homes}&ord=redfin-recommended-asc"
-        f"&page_number={page}&region_id={region_id}&region_type={region_type}"
-        f"&sf=1,2,3,5,6,7&status=9&uipt={uipt}&v=8"
-    )
-    raw = http_get(url, headers={
-        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
-        "Referer": "https://www.redfin.com/",
-        "Accept": "*/*",
-    })
-    # Strip the {}&& CSRF prefix Redfin prepends to all API responses
-    assert raw.startswith('{}&&'), f"Unexpected prefix: {raw[:10]}"
-    return json.loads(raw[4:])
-
-data = redfin_search(region_id=17151)  # 17151 = San Francisco, CA
-homes = data['payload']['homes']
-
-home = homes[0]
-print("Address:", home['streetLine']['value'])  # "875 California St #703"
-print("City/State/Zip:", home['city'], home['state'], home['zip'])
-print("Price:", home['price']['value'])          # 3300000
-print("Beds:", home['beds'])                     # 3
-print("Baths:", home['baths'])                   # 2.5
-print("Sqft:", home['sqFt']['value'])            # 1828
-print("$/sqft:", home['pricePerSqFt']['value'])  # 1805
-print("Lot size:", home['lotSize']['value'])      # 9448
-print("Year built:", home['yearBuilt']['value'])  # 2021
-print("Days on market:", home['dom']['value'])    # 1
-print("MLS ID:", home['mlsId']['value'])          # "426115342"
-print("MLS Status:", home['mlsStatus'])           # "Active"
-print("Lat/Long:", home['latLong']['value'])
-print("URL:", home['url'])                        # "/CA/San-Francisco/..."
-print("Remarks:", home['listingRemarks'][:100])
-```
-
-### Redfin region IDs
-
-| City | region_id | region_type |
-|---|---|---|
-| San Francisco, CA | `17151` | `6` (city) |
-| Los Angeles, CA | `17152` | `6` |
-| New York, NY | `17834` | `6` |
-| Seattle, WA | `16163` | `6` |
-
-To find other region IDs: search on Redfin, look at the URL (e.g. `/city/17151/CA/San-Francisco`) — the number is the region_id.
-
-### Redfin stingray response structure
-
-```
-data['payload']['homes'][i]
-  .streetLine.value      → street address string
-  .city / .state / .zip  → strings
-  .price.value           → int (asking price in dollars)
-  .sqFt.value            → int (square feet)
-  .pricePerSqFt.value    → int
-  .beds                  → int
-  .baths                 → float (2.5 = 2 full + 1 half)
-  .fullBaths / .partialBaths → ints
-  .lotSize.value         → int (sq ft)
-  .yearBuilt.value       → int
-  .dom.value             → days on market (int)
-  .mlsId.value           → MLS listing number (string)
-  .mlsStatus             → "Active", "Pending", etc.
-  .listingId             → Redfin internal int
-  .propertyId            → Redfin internal int
-  .latLong.value         → {'latitude': float, 'longitude': float}
-  .url                   → relative URL "/CA/San-Francisco/..."
-  .listingRemarks        → description text (may be truncated)
-  .keyFacts              → [{'description': str, 'rank': int}]
-  .listingTags           → ['SWEEPING CITY VIEWS', ...]
-  .hoa.value             → HOA monthly (int)
-  .location.value        → neighborhood name string
-  .sashes                → [{'sashTypeName': 'New'/'Price Drop'/...}]
-  .photos.value          → photo token string
-  .numPictures           → int
-```
-
----
-
-## Alternative APIs (no scraping required)
-
-If you need property data without scraping Zillow or Redfin at scale:
-
-| API | Free tier | Key data |
-|---|---|---|
-| **ATTOM Data** (attomdata.com) | Trial available | Ownership, AVM, tax, sale history, building characteristics |
-| **Rentcast** (rentcastapi.com) | 50 req/mo free | Rental estimates, comps, market data |
-| **RapidAPI: Zillow56** | ~100 req/mo free | Wraps Zillow data (unofficial, use at own risk) |
-| **HouseCanary** | Paid | AVM, market risk, rental value |
-| **Redfin API** (unofficial, above) | Unlimited | MLS listing data |
-| **US Census / HUD** | Free, no key | Median home values by geography, affordability |
-
----
-
-## Gotchas
-
-- **Single User-Agent word triggers 403.** `http_get` passes `"Mozilla/5.0"` as default User-Agent — this gets blocked. Always pass the full Chrome UA via the `headers=` argument.
-
-- **`price` field is `None` for sold and rental multi-unit listings.** Use `unformattedPrice` for for-sale, `hdpData.homeInfo.priceForHDP` for sold, and `minBaseRent`/`maxBaseRent` for rentals.
-
-- **`/homedetails/` is unconditionally blocked.** Tested with full browser headers, Referer, Sec-Fetch-* headers — all return HTTP 403. Only the browser bypasses this.
-
-- **41 listings per page, hardcoded.** Zillow always returns exactly 41 results per page from `listResults`. `mapResults` was empty in all tests (server-side response only).
-
-- **`isBot: False` doesn't mean you're safe.** Zillow correctly identifies server-side requests and blocks `/homedetails/`. The `isBot` flag in `__NEXT_DATA__` is `False` for search pages but the restriction is enforced at route level for detail pages.
-
-- **Captcha strings in HTML do not mean CAPTCHA is active.** The search page includes the captcha widget JavaScript (for lazy loading if needed) but does not serve a challenge — confirmed by successfully parsing listing data from the same HTML.
-
-- **Redfin `{}&&` prefix on all API responses.** Strip with `raw[4:]` before `json.loads()`. If the prefix changes, the assertion fails explicitly.
-
-- **Redfin JSON-LD omits price.** The `SingleFamilyResidence` schema objects do not include an `offers` field — use the stingray API for pricing.
-
-- **Redfin stingray API returns all listing fields wrapped in `{'value': X, 'level': N}` dicts.** Always read `.value` for numeric fields (e.g. `home['price']['value']`, not `home['price']`). Level `1` means data is public; `2` means potentially restricted.
-
-- **Zillow total count can exceed 800 but pagination caps at page ~20.** Zillow caps search results at around 800 listings even if `totalResultCount` shows 1037. Narrow by ZIP code, neighborhood, or price range to stay within bounds.
-
-- **URL filter syntax for Zillow:** Beds: `3-_beds` prefix; price: `0-1800000_price` suffix; ZIP: use `{zip}_rb` instead of city slug. Test by building the URL in a browser and copying the pattern.
diff --git a/packages/bcode-browser/harness/domain-skills/shopify-admin/README.md b/packages/bcode-browser/harness/domain-skills/shopify-admin/README.md
deleted file mode 100644
index 7904e2cc7..000000000
--- a/packages/bcode-browser/harness/domain-skills/shopify-admin/README.md
+++ /dev/null
@@ -1,36 +0,0 @@
-# shopify-admin
-
-Browser-harness patterns for `admin.shopify.com` and embedded Shopify apps.
-
-## Files in this folder
-
-- `embedded-apps.md` — every Shopify app runs in an iframe; how to target it
-- `polaris-inputs.md` — Polaris React inputs reject synthetic value setters; use CDP type_text
-- `knowledge-base.md` — automating the Shopify Knowledge Base App for FAQ entries
-
-## When to use these
-
-You're driving Shopify admin and need to add / edit / configure something. The Shopify admin UI is large and many surfaces are embedded apps — first check whether what you need is in an embedded app (most apps under `admin.shopify.com/store/<store>/apps/<app-slug>/...` are).
-
-## When to skip
-
-- If the operation is read-only product / inventory data → use the **Storefront API** (HTTP) instead, much faster
-- If the store has a custom admin app with API token provisioned → use the **Admin API** (GraphQL or REST) instead, no UI scraping
-- If you're editing theme code → use the **Shopify CLI** (`shopify theme push`) — don't touch the theme editor UI
-
-The browser is the right tool only when:
-- The setting / app exposes no API
-- The change is one-time or rare enough not to justify scripting
-- You're discovering / exploring the admin (e.g., finding selectors for a future automation)
-
-## Authentication
-
-Mike (or the human owner) must be logged into `admin.shopify.com` in the Chrome session that browser-harness attaches to. The harness does NOT log in — it inherits the human's session.
-
-If you hit `accounts.shopify.com` redirect, stop and ask the human to log in. Don't type credentials.
-
-## Polaris is in transition (Jan 2026 onward)
-
-Shopify is migrating its design system from React-based Polaris to Web-Components-based Polaris. Most legacy admin surfaces are still React. Newer surfaces (Catalog Mapping, parts of Settings) may be web components.
-
-Screenshot first. If you see `<s-text-field>` or `<s-button>` web component tags → use the web component pattern. If you see `[class*="Polaris-"]` React class names → use the CDP keystrokes pattern in `polaris-inputs.md`.
diff --git a/packages/bcode-browser/harness/domain-skills/shopify-admin/embedded-apps.md b/packages/bcode-browser/harness/domain-skills/shopify-admin/embedded-apps.md
deleted file mode 100644
index 82a777b9d..000000000
--- a/packages/bcode-browser/harness/domain-skills/shopify-admin/embedded-apps.md
+++ /dev/null
@@ -1,72 +0,0 @@
-# Shopify embedded apps run in iframes
-
-Every Shopify app surfaced in the admin (first-party like Knowledge Base, third-party like Okendo) renders inside a sandboxed iframe. Your top-level `document` queries find the Shopify chrome (sidebar, header, search bar) but **none of the app's UI**.
-
-## How to target the iframe
-
-```python
-from helpers import iframe_target, js, type_text
-
-# 1. Find the iframe by URL substring
-tid = iframe_target("qa-pairs-app")  # Knowledge Base App
-
-# 2. Run JS inside the iframe by passing target_id
-result = js("""
-(() => {
-  const button = Array.from(document.querySelectorAll('button')).find(b => b.textContent.trim() === 'Add FAQ');
-  if (button) { button.click(); return {clicked: true}; }
-  return {clicked: false};
-})()
-""", target_id=tid)
-```
-
-## Finding the URL substring
-
-The iframe's URL contains the app slug. Run:
-
-```python
-import json
-for t in cdp("Target.getTargets")["targetInfos"]:
-    if t["type"] == "iframe" and "shopify" in t.get("url", "").lower():
-        print(t["url"])
-```
-
-Then pick a substring unique to your target app.
-
-## Known Shopify app iframe slugs
-
-| App | iframe URL substring |
-|---|---|
-| Shopify Knowledge Base (qa-pairs-app) | `qa-pairs-app` |
-| Shopify Online Store editor | `online-store-web.shopifyapps.com` |
-| Shopify Hydrogen Storefront | `hydrogen-storefronts` (or similar — verify) |
-
-Add to this table when you discover new ones.
-
-## Why iframes
-
-Shopify uses App Bridge to embed third-party apps with isolation. Your top-level page CAN'T directly access app DOM for security reasons — you need iframe targeting (which the harness does via CDP `Target.attachToTarget`).
-
-## Coordinate clicks vs JS clicks
-
-Coordinate clicks (`click(x, y)`) pass through iframes at the compositor level — they work. But JS clicks scoped to the iframe target are more reliable for routine button taps because:
-
-- Element text content is stable across UI redesigns
-- DPR scaling on retina is automatic
-- React event handlers are guaranteed to fire (vs. CDP mouse events which sometimes hit a transparent layer above the button)
-
-## Gotcha — multiple iframes from same app
-
-The Online Store editor renders the storefront preview AND the editor toolbar in two separate iframes. Pick the right one by URL substring; don't assume the first match is correct.
-
-```python
-# WRONG — picks first match
-tid = iframe_target("online-store-web")
-
-# RIGHT — disambiguate
-for t in cdp("Target.getTargets")["targetInfos"]:
-    url = t.get("url", "")
-    if "online-store-web" in url and "editor" in url:
-        tid = t["targetId"]
-        break
-```
diff --git a/packages/bcode-browser/harness/domain-skills/shopify-admin/knowledge-base.md b/packages/bcode-browser/harness/domain-skills/shopify-admin/knowledge-base.md
deleted file mode 100644
index b4fbba921..000000000
--- a/packages/bcode-browser/harness/domain-skills/shopify-admin/knowledge-base.md
+++ /dev/null
@@ -1,109 +0,0 @@
-# Shopify Knowledge Base App — automating FAQ entries
-
-The Knowledge Base App (Shopify Winter '26 Edition) lets merchants control how AI agents (ChatGPT, Perplexity, Claude, Copilot, Gemini) answer questions about their brand. Each entry is a Question / Answer pair. The app currently has no public API and is English-only as of Winter '26 — browser automation is the canonical path.
-
-## URL pattern
-
-```
-https://admin.shopify.com/store/<store-handle>/apps/shopify-knowledge-base/app
-```
-
-Sub-routes:
-- `/app` — overview (FAQ list, top unanswered questions, query log)
-- `/app/new` — Add FAQ form
-- `/app/pairs/<id>` — entry detail / edit
-
-## Iframe slug
-
-The app runs at iframe URL containing `qa-pairs-app`:
-
-```python
-tid = iframe_target("qa-pairs-app")
-```
-
-## Adding a single FAQ
-
-See `polaris-inputs.md` for the full canonical pattern. Quick version:
-
-```python
-def add_faq(question, answer):
-    tid = iframe_target("qa-pairs-app")
-    # focus question input via JS, type via CDP, focus answer, type, click Save
-    # poll URL for /pairs/<id> success signal
-```
-
-## Batching multiple FAQs
-
-After saving an entry, the success page shows "FAQ created. Add another FAQ" link. Click it via JS to skip navigating back to overview:
-
-```python
-def click_add_another():
-    tid = iframe_target("qa-pairs-app")
-    js("""
-    (() => {
-      const link = Array.from(document.querySelectorAll('a, button'))
-        .find(x => x.textContent.trim() === 'Add another FAQ');
-      if (link) link.click();
-    })()
-    """, target_id=tid)
-```
-
-Loop:
-
-```python
-ENTRIES = [(q1, a1), (q2, a2), ...]
-for q, a in ENTRIES:
-    click_add_another()
-    time.sleep(1.5)  # wait for form to render
-    ok, info = add_faq(q, a)
-    print(f"{q[:40]} -> {ok} ({info})")
-    if not ok: break
-```
-
-## Brand voice — what to put in answers
-
-This is application-specific (depends on the merchant). For JING the rule was Aesop founder-letter tone — sentence case, no exclamation points, "JING" not "we", specific over generic.
-
-The Shopify guidance "Provide a brief answer in 1 or 2 sentences" is a soft hint. The textarea accepts longer text and AI agents prefer specific multi-sentence answers. Aim for 2-4 short sentences with concrete details.
-
-## What to put in the Knowledge Base
-
-Categories that materially shape AI agent answers about your brand:
-
-1. **Brand voice / DNA** — "What is your brand?" / "What's your tone?"
-2. **Specs** — exact materials, dimensions, weights, sizes (NOT marketing prose)
-3. **Comparisons** — "How does X compare to <competitor>?" with concrete differences
-4. **Policies** — returns, shipping, care, warranty, contact (in brand voice)
-5. **Origin** — founder, where made, why brand exists
-6. **Limitations** — what you DON'T do (V1 scope, US-only, etc.) — agents that hallucinate availability hurt conversion
-
-Skip: anything marketing-speak. The Knowledge Base is for **truth, in voice**, not pitch copy.
-
-## Top unanswered questions
-
-The overview shows up to 7 "Top unanswered questions" Shopify auto-detected from query logs. **Answer these first** — they're real shopper queries hitting your store right now. Once answered, the section empties.
-
-## Query log
-
-`/admin/apps/shopify-knowledge-base/app/queries` (or "Query log" in app sidebar) shows what shoppers actually asked AI agents about your brand. Read weekly. New patterns become new FAQ entries.
-
-## Verifying entries surface in AI
-
-After adding an entry, allow 24 hours for AI provider indexing, then test:
-
-- ChatGPT: "Tell me about <your brand>'s return policy" → check if your exact wording surfaces
-- Perplexity: same
-- Claude: "Compare <your brand> vs <competitor>" → see if your comparison framing appears
-
-If the answer doesn't surface, the entry might be too long, too vague, or contradicted by another source (your homepage, an outdated blog post). Tighten the answer.
-
-## Limits
-
-As of Winter '26 Edition:
-- English-only
-- No bulk import / CSV upload
-- No API for read or write
-- Each entry maximum ~500 words (soft cap; UI shows guidance "1 or 2 sentences")
-- No version history visible to the merchant
-
-Watch Shopify changelogs for API exposure — likely in Spring '26 or Summer '26 Edition. When it ships, switch to API-driven population.
diff --git a/packages/bcode-browser/harness/domain-skills/shopify-admin/polaris-inputs.md b/packages/bcode-browser/harness/domain-skills/shopify-admin/polaris-inputs.md
deleted file mode 100644
index 5d4fdf0d6..000000000
--- a/packages/bcode-browser/harness/domain-skills/shopify-admin/polaris-inputs.md
+++ /dev/null
@@ -1,137 +0,0 @@
-# Polaris React inputs require CDP-native keystrokes
-
-Shopify admin uses Polaris (their design system). Until January 2026 it was React-based. Polaris React text inputs and textareas are controlled components that **reject the standard "React-friendly" synthetic value setter pattern.**
-
-## The trap
-
-This pattern looks like it works — the field's `value` shows the right text:
-
-```js
-const setter = Object.getOwnPropertyDescriptor(HTMLInputElement.prototype, 'value').set;
-setter.call(inputEl, "my text");
-inputEl.dispatchEvent(new Event('input', { bubbles: true }));
-```
-
-But the **Save / Submit button stays disabled**. Polaris's onChange handler reads from React's internal state, which the synthetic event chain doesn't fully update.
-
-## What works
-
-CDP-native keystrokes via `Input.insertText`:
-
-```python
-from helpers import js, type_text
-
-# 1. Focus the input via JS — this works fine
-js("""
-(() => {
-  const input = Array.from(document.querySelectorAll('input[type="text"], input:not([type])'))
-    .find(x => { const r = x.getBoundingClientRect(); return r.width > 100 && r.height > 0; });
-  if (input) input.focus();
-})()
-""", target_id=tid)
-
-# 2. Type via CDP — fires Input.insertText which is the lowest-level
-#    text-entry signal. React's controlled-input subscriber catches this.
-type_text("My question text")
-```
-
-For textareas, same pattern with `document.querySelectorAll('textarea')`.
-
-## Full add-FAQ pattern (Knowledge Base App)
-
-```python
-import time
-from helpers import iframe_target, js, type_text, page_info, screenshot
-
-def add_faq(question: str, answer: str) -> tuple[bool, str]:
-    tid = iframe_target("qa-pairs-app")
-
-    # 1. Make sure the form is rendered
-    for _ in range(15):
-        ready = js("""
-        (() => {
-          const i = Array.from(document.querySelectorAll('input[type="text"], input:not([type])'))
-            .find(x => { const r = x.getBoundingClientRect(); return r.width > 100; });
-          const t = Array.from(document.querySelectorAll('textarea'))
-            .find(x => { const r = x.getBoundingClientRect(); return r.width > 100; });
-          if (i && t) { i.focus(); return true; }
-          return false;
-        })()
-        """, target_id=tid)
-        if ready: break
-        time.sleep(0.3)
-
-    # 2. Type question (input has focus from step 1)
-    type_text(question)
-    time.sleep(0.2)
-
-    # 3. Focus textarea, type answer
-    js("""
-    (() => {
-      const t = Array.from(document.querySelectorAll('textarea'))
-        .find(x => { const r = x.getBoundingClientRect(); return r.width > 100; });
-      if (t) t.focus();
-    })()
-    """, target_id=tid)
-    time.sleep(0.2)
-    type_text(answer)
-    time.sleep(0.4)
-
-    # 4. Click Save (now enabled because Polaris saw real keystrokes)
-    saved = js("""
-    (() => {
-      const btn = Array.from(document.querySelectorAll('button')).find(b => b.textContent.trim() === 'Save');
-      if (!btn || btn.disabled) return {clicked: false, disabled: btn?.disabled};
-      btn.click();
-      return {clicked: true};
-    })()
-    """, target_id=tid)
-    if not saved.get("clicked"):
-        return False, "save_button_disabled"
-
-    # 5. Poll URL for save success — Shopify redirects to /pairs/<id>
-    for _ in range(20):
-        time.sleep(0.3)
-        url = page_info().get("url", "")
-        if "/pairs/" in url and "/new" not in url:
-            return True, url.split("/pairs/")[-1]
-    return False, "save_timeout"
-```
-
-## Why this works
-
-Polaris React components subscribe to native `inputType` events (e.g., `insertText` from IME / accessibility tools / paste). The synthetic React-friendly setter fires `input` events but skips the lower-level `inputType` signal that Polaris validates against to enable Save buttons.
-
-CDP `Input.insertText` (which the harness's `type_text()` calls) emits the full native event chain, including `inputType: 'insertText'`, which React catches via its synthetic event system.
-
-## Polaris Web Components (post January 2026)
-
-The `polaris-react` repo was archived January 6, 2026. New Polaris is web-component-based. For new admin surfaces (Catalog Mapping, parts of Settings), the pattern shifts:
-
-```js
-// Web components expose value setter on the element itself
-const wc = document.querySelector('s-text-field');
-wc.value = 'my text';
-wc.dispatchEvent(new CustomEvent('input', { bubbles: true, detail: { value: 'my text' } }));
-```
-
-But until Shopify completes the migration (probably late 2026), **always test the React pattern first** — most legacy surfaces still use it.
-
-## How to know which pattern to use
-
-Screenshot the form first. Then JS-introspect:
-
-```js
-// Check if React-based (Polaris-* class names) or web-component-based (s-* tags)
-const hasReact = document.querySelector('[class*="Polaris-"]');
-const hasWC = document.querySelector('s-text-field, s-button, s-textarea');
-return { hasReact: !!hasReact, hasWC: !!hasWC };
-```
-
-If both, lean web component (the surface is mid-migration and the WC will be authoritative).
-
-## Avoid
-
-- Coordinate-based typing via `Input.dispatchKeyEvent` keypress-by-keypress — slower, more brittle, no real benefit over `Input.insertText`
-- `el.value = 'x'` without the setter prototype trick — won't even fill the visible field on Polaris React
-- `dispatchEvent(new Event('change', ...))` only — Polaris listens for `input`, not `change`, on text fields
diff --git a/script/check-harness-diff.sh b/script/check-harness-diff.sh
index 090aff456..b24aab3bc 100755
--- a/script/check-harness-diff.sh
+++ b/script/check-harness-diff.sh
@@ -37,20 +37,37 @@ UPSTREAM_SHORT="$(git rev-parse --short harness/main)"
 # mutates the active index relative to the current branch, which is wrong.
 git archive --format=tar "$UPSTREAM_HEAD" | tar -xf - -C "$TMP"
 
-# Known-divergence filter:
-#   - .gitignore (we add .venv/ — see UPSTREAM.md §3 divergences table)
-# Build artifacts the vendored side might generate during smoke tests:
-#   - uv.lock, .venv/, __pycache__/, *.egg-info/, *.pyc
-# These are gitignored on our side, but `diff -rq` doesn't read .gitignore.
-# Files in the divergences table go in EXPECTED; everything else is drift.
+# Three filter classes (applied in order):
+#
+#   IGNORED_PATHS_REGEX — paths excluded from our vendored tree by policy.
+#     Sync agents skip these; the diff checker pretends they don't exist.
+#     See UPSTREAM.md §3 "Excluded paths" for the source of truth.
+#       - domain-skills/  and  agent-workspace/domain-skills/
+#         (user-contributed site recipes; quality + prompt-injection concerns)
+#
+#   NOISE_REGEX        — build artifacts the vendored side may generate
+#     during smoke tests but are gitignored on our side. `diff -rq` doesn't
+#     read .gitignore, so we filter here:
+#       - uv.lock, .venv/, __pycache__/, *.egg-info/, *.pyc, .cache
+#
+#   EXPECTED_REGEX     — files we deliberately modify, logged in
+#     UPSTREAM.md §3 divergences table:
+#       - .gitignore (adds .venv/)
+#
+# Ordering matters: IGNORED first (treat as if absent), then NOISE
+# (build dirt), then EXPECTED vs UNEXPECTED split for the remainder.
+# Match in either `diff -rq` line shape:
+#   "Files .../domain-skills/foo and .../domain-skills/foo differ"
+#   "Only in .../agent-workspace: domain-skills"
+#   "Only in /tmp/...: domain-skills"
+IGNORED_PATHS_REGEX='(/domain-skills(/|$| )|: domain-skills($| ))'
+NOISE_REGEX='(uv\.lock|\.venv|__pycache__|\.egg-info|\.pyc|\.cache|\.pytest_cache)'
 EXPECTED_REGEX='/(\.gitignore)( |$)'
-# `diff -rq` emits two line shapes:
-#   "Files A and B differ"
-#   "Only in <dir>: <name>"
-# Match noise in either shape.
-NOISE_REGEX='(uv\.lock|\.venv|__pycache__|\.egg-info|\.pyc|\.cache)'
 
-DIFF_OUT="$(diff -rq "$VENDORED/" "$TMP/" 2>&1 | grep -Ev "$NOISE_REGEX" || true)"
+DIFF_OUT="$(diff -rq "$VENDORED/" "$TMP/" 2>&1 \
+  | grep -Ev "$IGNORED_PATHS_REGEX" \
+  | grep -Ev "$NOISE_REGEX" \
+  || true)"
 
 echo "=== vendored vs harness/main ($UPSTREAM_SHORT) ==="
 echo
@@ -86,6 +103,7 @@ fi
 # matched the noise regex.
 echo "Line stats vs upstream (added on our side, removed on our side):"
 diff -ruN \
+  --exclude='domain-skills' \
   --exclude='.venv' --exclude='__pycache__' --exclude='*.egg-info' \
   --exclude='*.pyc' --exclude='uv.lock' --exclude='.cache' \
   "$TMP" "$VENDORED" 2>/dev/null \