Fix US program statistics variable mappings by anth-volk · Pull Request #327 · PolicyEngine/policyengine.py

anth-volk · 2026-04-30T21:56:39Z

Fixes #325

This draft PR addresses only the immediate causes of the US economic_impact_analysis program-statistics breakage observed while following up on Vahid's JOSS PR (#264). It is not intended to solve the broader design problem of country package program variables changing over time; that follow-up is tracked in #326.

Changes:

Keep ProgramStatistics.program_name as the direct model/output variable to aggregate; no separate alias field is introduced.
Replace the broken US program-statistics mappings with currently valid policyengine-us variables:
- payroll_tax -> employee_payroll_tax
- medicare -> medicare_cost
- state_income_tax remains state_income_tax
Note: payroll_tax was configured as a person-level program, while employee_payroll_tax is a tax-unit-level variable. This means the payroll-tax program-statistics row now reports tax-unit recipient/winner/loser counts, using tax-unit weights, rather than person counts.
Add medicare_cost and state_income_tax to the US output variables materialized by .py.
Update US household snapshots for the newly exposed medicare_cost and state_income_tax output keys.
Validate US program-statistics configuration before expensive simulations run.
Add shared error helpers for constructing typed errors and conditional error detail lines, then use them in the US program-statistics validation error path.
Replace bare aggregate next(...) lookup failures with descriptive ValueErrors.
Add mocked US program-statistics, aggregate error, and error-helper tests that would have caught this issue.

Verification:

make format
uv run --python 3.13 --extra dev python -m pytest tests/test_household_calculator_snapshot.py::test_us_household_snapshot tests/test_errors.py tests/test_aggregate.py tests/test_change_aggregate.py tests/test_us_program_statistics.py
uv run --python 3.13 --extra dev pytest tests/test_change_aggregate.py tests/test_us_program_statistics.py
uv run --python 3.13 --extra dev ruff check tests/test_us_program_statistics.py tests/test_change_aggregate.py
git diff --check

vahid-ahmadi · 2026-05-01T10:25:07Z

Review

Core logic is correct — variable name swaps, validation, and StopIteration → ValueError upgrade all check out. Two things to fix before merge.

Blocking

CI failures are caused by this PR. Adding medicare_cost to entity_variables["person"] makes the household calculator materialize it, and 4 snapshots in test_household_calculator_snapshot.py fail with new key: person[0].medicare_cost=14500.0. Need to regenerate the snapshot files.
Negative test can silently pass on the wrong exception. test_us_program_statistics_config_fails_before_simulation_run uses try/except + manual raise AssertionError. Use pytest.raises(ValueError, match="US program statistics config is invalid") instead.

Worth addressing

ChangeAggregate has no invalid-variable regression test. test_aggregate.py got one for Aggregate, but ChangeAggregate had the same bare next(...) pattern and isn't covered.
Entity change not flagged in the PR description. payroll_tax was entity="person"; employee_payroll_tax is entity="tax_unit". Correct, but it changes recipient counts in the program-statistics table — worth noting.

Verified correct

All three variable replacements match what policyengine-us actually exposes (employee_payroll_tax, medicare_cost, household_state_income_tax).
Validator runs cheap, before ensure() on either simulation.
Test arithmetic: employee_payroll_tax 1600, medicare_cost 300, state_income_tax 900 — all correct under the test weights.
Stays in scope for US economic_impact_analysis uses fragile hard-coded program variable names #325; no drift into Design a durable program-statistics mapping for changing country packages #326.

Minor suggestions

context= strings like "Aggregate.variable" are internal field names; user-facing wording reads better.
Missing return annotations on the three new helpers (-> None, -> Variable).
Validator dedupes errors across baseline/reform — asymmetric configs harder to triage.

vahid-ahmadi

Follow-up review on the new commits (fe78636 … 0c39e5a).

Still worth addressing

create_error is a thin wrapper that adds nothing. Body is return error_type(message), and the test (test_create_error_returns_requested_error_type) just re-asserts Python's exception constructor behavior. Both raise create_error(ValueError, msg) and raise ValueError(msg) produce identical results. The indirection makes the call site longer without helping callers. Suggest removing create_error (and its test) and inlining raise ValueError(...). format_conditional_error_detail should stay — it has real logic (sort + dedupe + None-on-empty).
Asymmetric baseline/reform configs still merge errors. _validate_program_statistics_config puts missing variables and missing outputs from both simulations into the same sets, so a config where baseline lacks medicare_cost and reform lacks social_security produces one message that doesn't say which simulation each is missing from. Was minor before, still minor — just noting it's unchanged.
Inconsistent return annotations on the new aggregate helpers. require_output_column has -> None; get_aggregate_variable and get_output_entity_data don't. Easy fix — return Variable and the entity data type respectively, or at least Any.

New observation

The match=\"Aggregate.variable\" / match=\"ChangeAggregate.filter_variable\" test assertions now lock the internal context= strings into the test contract. If those labels ever get rewritten to be more user-facing (which was a prior minor suggestion), the tests need to update too. Not a blocker — just flagging the new coupling.

Looks mergeable once create_error is removed (or kept with a note explaining what it gives over ValueError(msg)).

vahid-ahmadi

Follow-up after ed1ab89.

Addressed

create_error removed everywhere (utils/errors.py, utils/init.py re-export, analysis.py import, the tautological test). Inline raise ValueError(...) used in the validator.
Return annotations added on the new aggregate helpers — get_aggregate_variable(...) -> Variable, get_output_entity_data(...) -> Any, plus data: Any on require_output_column.
Test contract loosened nicely: instead of match=\"Aggregate.variable\" (which would have locked the internal context= label into the test), the tests now assert on the user-facing fragments \"references missing variable\" plus the variable name. That defuses the coupling I flagged last round and leaves room to reword context= later without churn.

The asymmetric baseline/reform-config message is still merged, but that was minor and explicitly noted as such — fine to leave.

LGTM. Nothing else from me; ready to merge.

anth-volk added 2 commits April 30, 2026 23:55

Fix US program statistics variable mappings

d738290

Format US program statistics validation

e9a73dc

vahid-ahmadi mentioned this pull request May 1, 2026

Derive program-statistics entity from model metadata (refs #326) #334

Closed

3 tasks

anth-volk added 5 commits May 1, 2026 15:06

Restore direct US program statistic mappings

fe78636

Factor program statistics validation errors

2726503

Update US household snapshots for program outputs

0f665f2

Tighten aggregate error regression tests

21c7197

Document program statistics count units

0c39e5a

anth-volk marked this pull request as ready for review May 1, 2026 14:13

anth-volk requested a review from vahid-ahmadi May 1, 2026 14:13

vahid-ahmadi reviewed May 5, 2026

View reviewed changes

Address program statistics review cleanup

ed1ab89

anth-volk requested a review from vahid-ahmadi May 5, 2026 19:28

vahid-ahmadi reviewed May 6, 2026

View reviewed changes

vahid-ahmadi approved these changes May 6, 2026

View reviewed changes

vahid-ahmadi mentioned this pull request May 6, 2026

Add federal vs. state budgetary impact to economic_impact_analysis #296

Open

2 tasks

anth-volk merged commit af56b5c into main May 6, 2026
11 checks passed

vahid-ahmadi mentioned this pull request May 7, 2026

Derive US program-statistics entity from variable metadata #342

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix US program statistics variable mappings#327

Fix US program statistics variable mappings#327
anth-volk merged 8 commits intomainfrom
fix-us-program-statistics

anth-volk commented Apr 30, 2026 •

edited

Loading

Uh oh!

vahid-ahmadi commented May 1, 2026

Uh oh!

vahid-ahmadi left a comment •

edited

Loading

Uh oh!

vahid-ahmadi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anth-volk commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vahid-ahmadi commented May 1, 2026

Review

Blocking

Worth addressing

Verified correct

Minor suggestions

Uh oh!

vahid-ahmadi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vahid-ahmadi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anth-volk commented Apr 30, 2026 •

edited

Loading

vahid-ahmadi left a comment •

edited

Loading