Conversation
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17609 +/- ##
============================================
+ Coverage 39.94% 40.09% +0.14%
Complexity 2554 2554
============================================
Files 5176 5176
Lines 348546 348658 +112
Branches 44564 44586 +22
============================================
+ Hits 139242 139802 +560
+ Misses 209304 208856 -448 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CRZbulabula
left a comment
There was a problem hiding this comment.
Overall
This PR fixes 6 real correctness bugs in ConfigNode (WAL ordering, log sorting, unchecked consensus writes, off-by-one, concurrent snapshot, pipe-listener failure masking). Changes are well-structured, rollback logic is carefully designed, and test coverage for core scenarios is solid.
Inline comments below cover a few items worth attention before merge.
| if (persistStatus.getCode() != TSStatusCode.SUCCESS_STATUS.getStatusCode()) { | ||
| return persistStatus; | ||
| } | ||
| } |
There was a problem hiding this comment.
WAL replay idempotency requirement
With the new write-ahead ordering (persist → execute), if persistPlanForSimpleConsensus succeeds at line 125 but executor.executeNonQueryPlan fails at line 133, the plan is already persisted in WAL and will be replayed on restart.
This is correct WAL semantics, but it introduces a hard requirement: all ConfigPhysicalPlan implementations must be idempotent under replay. For example, a RegisterDataNodePlan replayed against an already-registered node must not fail or double-register.
Is this idempotency property currently guaranteed across all ConfigPhysicalPlan types? If not, this could cause issues on crash recovery in edge cases.
| ByteBuffer buffer = plan.serializeToByteBuffer(); | ||
| buffer.position(buffer.limit()); | ||
| simpleLogWriter.write(buffer); | ||
| simpleLogWriter.force(); |
There was a problem hiding this comment.
force() on every write makes scheduled flush redundant
Now that every write is synchronously forced here, the scheduled flushWALForSimpleConsensus thread (lines 447-455) becomes redundant — it calls simpleLogWriter.force() periodically, but there's nothing left to flush.
This is harmless (double-force is idempotent) but worth either:
- Removing the scheduled thread to avoid confusion, or
- Adding a brief comment explaining the intentional belt-and-suspenders approach.
For ConfigNode metadata operations (low write frequency), the per-write force() is the right durability choice.
| "Persist current ConfigPhysicalPlan for ConfigNode SimpleConsensus mode failed", e); | ||
| return new TSStatus(TSStatusCode.EXECUTE_STATEMENT_ERROR.getStatusCode()) | ||
| .setMessage( | ||
| "Persist ConfigNode SimpleConsensus log failed: " + String.valueOf(e.getMessage())); |
There was a problem hiding this comment.
Nit: String.valueOf(e.getMessage()) is redundant — string concatenation with + already handles null by converting to "null". Can simplify to:
"Persist ConfigNode SimpleConsensus log failed: " + e.getMessage()| final List<RegionMaintainTask> copiedRegionMaintainTaskList; | ||
| synchronized (regionMaintainTaskList) { | ||
| copiedRegionMaintainTaskList = new ArrayList<>(regionMaintainTaskList); | ||
| } |
There was a problem hiding this comment.
Good fix — synchronized copy before iterating. One minor observation: this copies the entire list under the lock. If regionMaintainTaskList can grow very large, this could cause a brief pause. In practice, for ConfigNode this list is typically small, so this is fine. Just noting for future awareness.
| } | ||
| }); | ||
| return result.getAndIncrement(); | ||
| return result.get(); |



Description
This PR has:
for an unfamiliar reader.
for code coverage.
Key changed/added classes (or packages if there are too many classes) in this PR