fix(server): sync hstore schema cache clears#3011
Conversation
Register a JVM-wide schema cache clear listener for CachedSchemaTransactionV2 so HStore server nodes can invalidate local schema caches when another node changes schema. Clear V2 schema-id/name caches and the attached array cache by graph name, and publish schema cache clear events after schema add, update, and remove paths. Add unit coverage for graph-scoped V2 schema cache clearing. Fixes apache#2617
c1611c2 to
f6adb67
Compare
There was a problem hiding this comment.
Pull request overview
Fixes stale local schema caches in HStore multi-server deployments by ensuring CachedSchemaTransactionV2 listens for schema-cache-clear meta events and clears the appropriate per-graph V2 caches.
Changes:
- Register a JVM-wide
MetaManagerschema-cache-clear listener and clear V2schema-id/schema-namecaches (plus the attached array cache) by graph name. - Publish schema cache clear notifications on schema add/update/remove and on cache clears triggered by store events.
- Add a unit test covering graph-scoped V2 schema cache clearing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java | Adds meta-event listener + per-graph cache clearing, and expands notify paths to keep multi-node caches consistent. |
| hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/cache/CachedSchemaTransactionTest.java | Adds unit coverage for graph-scoped V2 schema cache clearing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Prevent remote schema cache clear notifications during local STORE_INIT events. Rename `notifySchemaCacheClear` to `maybeNotifySchemaCacheClear` to clarify conditional notification behavior. Enhance unit tests to ensure all CachedSchemaTransactionV2 caches, including array caches, are comprehensively cleared on a graph-scoped basis. Fixes apache#2617
|
Hi @imbajin - while working on this PR I spotted what might be a separate bug in the legacy Producers emit past-tense actions:
Listeners only match the present-tense ones:
So those events look like they get dropped silently. Wider blast radius than the HStore Worth a separate issue or maybe sub-issue - or is this intentional? If you're +1, I'll take care of it |
Good catch @dpol1 — I dug into the code and confirmed the mismatch is real. Opened #3012 to track this separately — feel free to take it or I can look at it after #3011 lands. |
I'll do in a separate PR, when this lands! |
imbajin
left a comment
There was a problem hiding this comment.
The fix direction is correct: a JVM-global MetaManager listener with graph-scoped clearSchemaCache is the right architectural shape for HStore cache synchronization. Two critical behavioral issues need to be resolved before merge — updateSchema triggering unnecessary broadcast storms, and the static listener having no lifecycle reset path — plus a few important design concerns below.
…anager - Add MetaManager.extractSchemaCacheClearGraphNamesFromResponse so callers no longer need to reach into MetaDriver directly via metaDriver(). The schema cache clear path in CachedSchemaTransactionV2 already references this helper;
Address review feedback that the cross-node cache sync introduced in
this branch had no end-to-end test coverage.
Refactor the meta-event consumer in CachedSchemaTransactionV2 into a
package-private static handleSchemaCacheClearEvent so tests can drive
the publish -> callback -> clearSchemaCache path without a live
etcd/PD watch.
Add five tests covering:
- target-only invalidation across multiple graphs in the same JVM
- noop on null graph-name response
- multi-graph payload clears each graph
- listenSchemaCacheClear is idempotent across repeated invocations
- full register -> capture consumer -> publish -> clear flow via
a Mockito MetaDriver stub
dpol1
left a comment
There was a problem hiding this comment.
@imbajin I noticed one more issue exposed by my cache-clear change
The failing casts are here:
After the V2 cache clear, the schema is loaded again from backend. In that path, userdata is restored here:
~create_time comes back as String, but those tests expect Date.
I’ll fix this in Userdata, not in TextSerializer. The constructor currently just does putAll(map) here:
I’ll normalize Userdata.CREATE_TIME in put() / putAll(), so serialized date strings/timestamps are converted back to Date in one place. Then I’ll add a small regression test and rerun the two HStore create-time tests. That's my way to handle it if you have another idea on your POV, give me a feedback :)
- Cross-node schema cache invalidation broke when the publisher cleared its own fresh cache. Tag events with a per-JVM UUID and skip self.
|
@imbajin Dropped the Userdata idea — too wide for this PR. The ~create_time String/Date round-trip: pretty sure that's a backend serialization thing. Leaving it out to keep this PR scoped. |
|
Thanks for the update. The current direction looks much better scoped than the earlier From my reading, the current PR mechanism is:
Could you please update the PR description/comment with a compact before -> after diagram? I think that will make the design much easier to review, especially since most of the added code is test coverage. Suggested structure: A few points I would like to confirm before merge:
Overall, the design now looks reasonable to me. The main remaining thing is to make the lifecycle and compatibility boundaries explicit so future readers do not assume the reconnect and |
…t compat
- CachedSchemaTransactionV2: state explicitly that
resetMetaListenerForReconnect() is a manual hook, not wired to a
MetaManager/MetaDriver reconnect callback in this PR.
- MetaManager.SchemaCacheClearEvent.fromValue: document the legacy
plain-string branch as a rolling-upgrade compatibility path.
|
Thanks @imbajin. Addressed all three points. 1. Reconnect lifecycle Confirmed: no 2. Out of scope, filed as follow-up: #3013 . 3. Compatibility note Added to PR description and as code comment at
|
Purpose of the PR
In HStore / multi-server mode, schema changes can leave other server nodes with stale local schema caches.
CachedSchemaTransactionV2already publishes schema-cache clear events throughMetaManager, but it did not listen for those meta events and clear the affected local V2 caches.Main Changes
CachedSchemaTransactionV2schema-id/schema-namecaches and the attached array cache by graph nameEventHubcache listener registered per transaction instanceSchema Cache Clear Flow
Compatibility / Scope Note
This PR does not change persisted schema data. It only changes the runtime schema-cache-clear meta-event value so new events can carry a per-JVM source id. New consumers still accept legacy plain-string graph names, so mixed-version rollouts during upgrade are safe.
The reconnect recovery hook remains manual until MetaManager/MetaDriver exposes a reconnect callback. The
~create_timeString/Date schema userdata round-trip is tracked separately in #3013 .Verifying these changes
mvn test -pl hugegraph-server/hugegraph-test -am -P unit-test -Dtest=CachedSchemaTransactionTest -DfailIfNoTests=false "-Djacoco.skip=true"Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need