Small follow-ups from v4 audit by MaxGhenis · Pull Request #312 · PolicyEngine/policyengine.py

MaxGhenis · 2026-04-20T18:23:03Z

Three small quality-of-life items the v4 audit flagged:

1. `Simulation.save()` unhelpful error

Before: AttributeError: 'NoneType' object has no attribute 'save'
After: ValueError: Simulation.save() called with no output_dataset. Run simulation.run() or simulation.ensure() first so there is something to persist.

One-line fix in common/model_version.py.

2. `scripts/check_data_staleness.py`

~60 lines of Python. Prints one verdict line per country comparing the bundled release manifest's model_package.version against the installed country package. Exits non-zero when any country is stale so CI can gate on it. No network access; reads both sides locally via importlib.metadata + the bundled JSON.

Example output:

us: ok (policyengine-us 1.653.3 matches bundle pin)
uk: STALE (policyengine-uk installed=2.88.9, bundle pin=2.88.0; consider a release-bundle refresh)

Targeted fix for the "is our data stale?" pain flagged in #311. Doesn't require the larger storage-substrate design to ship.

3. Clarified `load()` docstring

The old docstring claimed save() + load() round-trip created_at / updated_at. They don't — save() doesn't persist them; load() reads filesystem ctime / mtime instead. Docstring now says so.

Tests

tests/test_small_follow_ups.py — 3 tests:

save() raises ValueError with "no output_dataset" in the message
check_data_staleness script emits one line per country
Direct check_country() test against a tmp manifest with a drifted pin

430/430 existing tests still pass.

🤖 Generated with Claude Code

Three things the audit flagged as small quality-of-life: 1. Simulation.save() raises a helpful ValueError when output_dataset is None rather than the raw AttributeError from None.save(). Names the missing step (run()/ensure()). 2. scripts/check_data_staleness.py — a ~60 line script that prints a one-line verdict per country comparing the bundled release manifest's model_package.version against the installed country package. Exits non-zero when any country is stale so CI can gate on it. No network access; reads both sides locally. 3. Clarified the load() docstring — the "created_at/updated_at round-trip" claim was misleading; these are filesystem ctime/mtime approximations because save() doesn't persist them. Regression tests in tests/test_small_follow_ups.py: - save() raises ValueError with "no output_dataset" in the message when output_dataset is None - check_data_staleness script runs and emits one verdict line per configured country - direct test of check_country() against a tmp manifest with a drifted pin, so we don't have to reinstall a country package to exercise the drift path All 430 existing tests still pass.

vahid-ahmadi

Three independently useful follow-ups, all correctly scoped. Approve in spirit; one substantive comment and a couple of nits.

Substantive: the script duplicates manifest discovery instead of reusing get_release_manifest.

provenance/manifest.py already exposes get_release_manifest(country_id), which loads the bundled JSON via importlib.resources.files("policyengine").joinpath("data", "release_manifests", ...). The script instead computes MANIFEST_DIR = Path(__file__).resolve().parent.parent / "src" / "policyengine" / ... and re-validates the JSON itself.

That works in a source checkout but won't work post-install — the src/ walk won't find anything in a wheel-installed package. Switching to get_release_manifest(country) makes the script work from anywhere and removes the duplicated discovery logic.

The reason it's structured this way is the drift test (test__check_data_staleness_script_detects_drift) reassigns module.MANIFEST_DIR = tmp_manifest_dir. If the script used get_release_manifest, the test would need to monkey-patch the helper instead — slightly more code, but the runtime behaviour gets a lot more robust. Worth the swap.

Smaller comments

COUNTRIES = ("us", "uk") is hardcoded. If a third country is ever bundled, this also needs editing. Could just list *.json under the manifest dir; not blocking.
The STALE verdict ends with "consider a release-bundle refresh" regardless of direction. If the installed package is older than the pinned one (engineer working in an older env), refresh is the wrong advice — the verdict ideally distinguishes installed > pin (refresh bundle) vs installed < pin (upgrade local package). Cheap to add, optional.
find_spec("policyengine_us") and metadata.version("policyengine-us") answer different questions (import vs distribution). Edge case where a vendored copy is on PYTHONPATH but no dist is installed will throw PackageNotFoundError from metadata.version. Wrap it or accept the rare crash; either is fine.
tests/test_small_follow_ups.py is a fine bin for these three but reads as a sediment file. If a fourth small follow-up shows up, prefer splitting by topic (test_simulation_save.py, test_check_data_staleness.py) rather than growing this file.

Verified correct

The save() fix is the right shape — explicit ValueError with the actionable next step.
The load() docstring honestly retracts the round-trip claim. Good.
Script exit code mirrors "any country stale", suitable for CI gating as advertised.
The drift test exercises the failure path cleanly without needing to break the installed environment.

LGTM once the manifest-discovery swap is in (or you've decided to keep the test-friendly version with a comment explaining why).

vahid-ahmadi reviewed May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small follow-ups from v4 audit#312

Small follow-ups from v4 audit#312
MaxGhenis wants to merge 1 commit intomainfrom
small-follow-ups

MaxGhenis commented Apr 20, 2026

Uh oh!

vahid-ahmadi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaxGhenis commented Apr 20, 2026

1. Simulation.save() unhelpful error

2. scripts/check_data_staleness.py

3. Clarified load() docstring

Tests

Uh oh!

vahid-ahmadi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `Simulation.save()` unhelpful error

2. `scripts/check_data_staleness.py`

3. Clarified `load()` docstring