fix: thread-safety hardening of Configuration (cache and in-place mutation races)#620
Open
fgmacedo wants to merge 4 commits into
Open
fix: thread-safety hardening of Configuration (cache and in-place mutation races)#620fgmacedo wants to merge 4 commits into
fgmacedo wants to merge 4 commits into
Conversation
… reads Snapshot the cache fields locally before the freshness check. Without the snapshot, another thread invalidating the cache between the ``self._cached is not None`` check and the ``return self._cached`` could cause the getter to return ``None``, leading to ``TypeError`` in callers that iterate the result (e.g., ``list(machine.configuration)``). Surfaced intermittently by tests/test_threading.py ::TestThreadSafety::test_concurrent_send_and_read_configuration on slower CI runners. Signed-off-by: Fernando Macedo <fgmacedo@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #620 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 42 42
Lines 5027 5029 +2
Branches 812 812
=========================================
+ Hits 5027 5029 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
``Configuration.add()`` and ``discard()`` previously mutated the ``OrderedSet`` stored on the model in place, then rewrote the same reference. Two thread-safety problems followed: 1. A concurrent reader iterating ``.configuration`` (cache miss path inside the ``states`` getter) could crash with ``RuntimeError: Set changed size during iteration``. 2. Because ``setattr`` re-stored the *same* OrderedSet ref, the cache identity check ``self._cached_value is raw`` stayed True after the mutation, so a reader could briefly receive the stale cached ``OrderedSet[State]`` missing the new state. Both ``add`` and ``discard`` now copy the underlying set before mutating, producing a fresh ref each call. Readers either continue iterating the prior ref (no concurrent mutation) or recompute against the new ref (cache identity check naturally fails on the new object). Affects only ``StateChart``-style configurations where ``atomic_configuration_update=False`` (the default for parallel regions). The atomic update path used by ``StateMachine`` was never affected because it always builds a fresh OrderedSet. Regression tests in ``tests/test_threading.py::TestThreadSafety``: ``test_concurrent_parallel_region_send_and_read`` and the deterministic contract pin ``test_add_discard_produce_fresh_orderedset``. Signed-off-by: Fernando Macedo <fgmacedo@gmail.com>
Two cache-related races in ``Configuration`` introduced indirectly by the 3.1.0 performance optimisation are fixed in 3.1.1. Includes a performance comparison table; the 4.7x-7.7x speedup vs 3.0.0 declared in 3.1.0 release notes is unchanged. Signed-off-by: Fernando Macedo <fgmacedo@gmail.com>
- Bump version in pyproject.toml, ``statemachine.__version__``, and uv.lock - Update Project-Id-Version in locale catalogs (en, pt_BR, hi_IN, zh_CN) Signed-off-by: Fernando Macedo <fgmacedo@gmail.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Fixes two thread-safety races in
Configurationintroduced indirectly by thecaching / no-copy fast paths added in 3.1.0 (#592). The sync engine is
documented as supporting concurrent reads of
.configurationwhile anotherthread sends events; both races violate that contract.
Race A — cached
statescould returnNoneConfiguration.statescheckedself._cached is not Noneand then returnedself._cached. Another thread invalidating between the check and returncaused
Noneto be returned and aTypeErrorinlist(machine.configuration).Fixed by snapshotting
_cachedand_cached_valuelocally before thefreshness check (commit
d2ddf1e).Race B — in-place mutation of the model's
OrderedSetConfiguration.add()/discard()mutated the sameOrderedSetreferencethe model holds, in place, then rewrote the same ref. A concurrent reader
iterating that reference inside
Configuration.statescould crash withRuntimeError: Set changed size during iteration, and the cache identitycheck could briefly hand back a stale resolved set. Fixed by copy-on-write
in
addanddiscard(commit38f8eef).Race B is only reachable when
atomic_configuration_update=False(the
StateChartdefault, where parallel regions need it). The atomic pathused by
StateMachinewas never affected.Tests
tests/test_threading.py::TestThreadSafety::test_concurrent_send_and_read_configurationcovers Race A.
test_concurrent_parallel_region_send_and_readexercises a parallelStateChartunder concurrent writes and reads.test_add_discard_produce_fresh_orderedsetis a deterministiccontract pin for copy-on-write (fails on pre-fix code without timing).
100% branch coverage on
statemachine/configuration.pypreserved.Performance
Copy-on-write in
add/discardreintroduces an O(n) shallow copy of theactive configuration on every state entry and exit. Median results
(macOS / Python 3.14, pytest-benchmark), 3.1.0 → 3.1.1:
test_parallel_region_eventstest_many_transitions_resettest_guarded_transitionstest_history_pause_resumetest_many_transitions_full_cycletest_flat_self_transitionOverall 4.7x–7.7x event throughput vs 3.0.0 (declared in 3.1.0) is
unchanged.
Release
Lands as 3.1.1. See
docs/releases/3.1.1.md.Notes for reviewers
StateMachinedefault) was never affected,_write_to_modelis already called with a fresh
OrderedSetfrom the engine.add/discardnow allocate a new typically 0–7-element
OrderedSetper call.