Skip to content

Fix US program statistics variable mappings#327

Merged
anth-volk merged 8 commits intomainfrom
fix-us-program-statistics
May 6, 2026
Merged

Fix US program statistics variable mappings#327
anth-volk merged 8 commits intomainfrom
fix-us-program-statistics

Conversation

@anth-volk
Copy link
Copy Markdown
Contributor

@anth-volk anth-volk commented Apr 30, 2026

Fixes #325

This draft PR addresses only the immediate causes of the US economic_impact_analysis program-statistics breakage observed while following up on Vahid's JOSS PR (#264). It is not intended to solve the broader design problem of country package program variables changing over time; that follow-up is tracked in #326.

Changes:

  • Keep ProgramStatistics.program_name as the direct model/output variable to aggregate; no separate alias field is introduced.
  • Replace the broken US program-statistics mappings with currently valid policyengine-us variables:
    • payroll_tax -> employee_payroll_tax
    • medicare -> medicare_cost
    • state_income_tax remains state_income_tax
  • Note: payroll_tax was configured as a person-level program, while employee_payroll_tax is a tax-unit-level variable. This means the payroll-tax program-statistics row now reports tax-unit recipient/winner/loser counts, using tax-unit weights, rather than person counts.
  • Add medicare_cost and state_income_tax to the US output variables materialized by .py.
  • Update US household snapshots for the newly exposed medicare_cost and state_income_tax output keys.
  • Validate US program-statistics configuration before expensive simulations run.
  • Add shared error helpers for constructing typed errors and conditional error detail lines, then use them in the US program-statistics validation error path.
  • Replace bare aggregate next(...) lookup failures with descriptive ValueErrors.
  • Add mocked US program-statistics, aggregate error, and error-helper tests that would have caught this issue.

Verification:

  • make format
  • uv run --python 3.13 --extra dev python -m pytest tests/test_household_calculator_snapshot.py::test_us_household_snapshot tests/test_errors.py tests/test_aggregate.py tests/test_change_aggregate.py tests/test_us_program_statistics.py
  • uv run --python 3.13 --extra dev pytest tests/test_change_aggregate.py tests/test_us_program_statistics.py
  • uv run --python 3.13 --extra dev ruff check tests/test_us_program_statistics.py tests/test_change_aggregate.py
  • git diff --check

@vahid-ahmadi
Copy link
Copy Markdown

Review

Core logic is correct — variable name swaps, validation, and StopIterationValueError upgrade all check out. Two things to fix before merge.

Blocking

  1. CI failures are caused by this PR. Adding medicare_cost to entity_variables["person"] makes the household calculator materialize it, and 4 snapshots in test_household_calculator_snapshot.py fail with new key: person[0].medicare_cost=14500.0. Need to regenerate the snapshot files.

  2. Negative test can silently pass on the wrong exception. test_us_program_statistics_config_fails_before_simulation_run uses try/except + manual raise AssertionError. Use pytest.raises(ValueError, match="US program statistics config is invalid") instead.

Worth addressing

  1. ChangeAggregate has no invalid-variable regression test. test_aggregate.py got one for Aggregate, but ChangeAggregate had the same bare next(...) pattern and isn't covered.

  2. Entity change not flagged in the PR description. payroll_tax was entity="person"; employee_payroll_tax is entity="tax_unit". Correct, but it changes recipient counts in the program-statistics table — worth noting.

Verified correct

Minor suggestions

  • context= strings like "Aggregate.variable" are internal field names; user-facing wording reads better.
  • Missing return annotations on the three new helpers (-> None, -> Variable).
  • Validator dedupes errors across baseline/reform — asymmetric configs harder to triage.

@anth-volk anth-volk marked this pull request as ready for review May 1, 2026 14:13
@anth-volk anth-volk requested a review from vahid-ahmadi May 1, 2026 14:13
Copy link
Copy Markdown

@vahid-ahmadi vahid-ahmadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review on the new commits (fe786360c39e5a).

Still worth addressing

  1. create_error is a thin wrapper that adds nothing. Body is return error_type(message), and the test (test_create_error_returns_requested_error_type) just re-asserts Python's exception constructor behavior. Both raise create_error(ValueError, msg) and raise ValueError(msg) produce identical results. The indirection makes the call site longer without helping callers. Suggest removing create_error (and its test) and inlining raise ValueError(...). format_conditional_error_detail should stay — it has real logic (sort + dedupe + None-on-empty).

  2. Asymmetric baseline/reform configs still merge errors. _validate_program_statistics_config puts missing variables and missing outputs from both simulations into the same sets, so a config where baseline lacks medicare_cost and reform lacks social_security produces one message that doesn't say which simulation each is missing from. Was minor before, still minor — just noting it's unchanged.

  3. Inconsistent return annotations on the new aggregate helpers. require_output_column has -> None; get_aggregate_variable and get_output_entity_data don't. Easy fix — return Variable and the entity data type respectively, or at least Any.

New observation

The match=\"Aggregate.variable\" / match=\"ChangeAggregate.filter_variable\" test assertions now lock the internal context= strings into the test contract. If those labels ever get rewritten to be more user-facing (which was a prior minor suggestion), the tests need to update too. Not a blocker — just flagging the new coupling.

Looks mergeable once create_error is removed (or kept with a note explaining what it gives over ValueError(msg)).

@anth-volk anth-volk requested a review from vahid-ahmadi May 5, 2026 19:28
Copy link
Copy Markdown

@vahid-ahmadi vahid-ahmadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up after ed1ab89.

Addressed

  • create_error removed everywhere (utils/errors.py, utils/init.py re-export, analysis.py import, the tautological test). Inline raise ValueError(...) used in the validator.
  • Return annotations added on the new aggregate helpers — get_aggregate_variable(...) -> Variable, get_output_entity_data(...) -> Any, plus data: Any on require_output_column.
  • Test contract loosened nicely: instead of match=\"Aggregate.variable\" (which would have locked the internal context= label into the test), the tests now assert on the user-facing fragments \"references missing variable\" plus the variable name. That defuses the coupling I flagged last round and leaves room to reword context= later without churn.

The asymmetric baseline/reform-config message is still merged, but that was minor and explicitly noted as such — fine to leave.

LGTM. Nothing else from me; ready to merge.

@anth-volk anth-volk merged commit af56b5c into main May 6, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

US economic_impact_analysis uses fragile hard-coded program variable names

2 participants