Skip to content

fix(server): sync hstore schema cache clears#3011

Merged
imbajin merged 6 commits intoapache:masterfrom
dpol1:fix/2617-schema-cache-v2-clear
May 6, 2026
Merged

fix(server): sync hstore schema cache clears#3011
imbajin merged 6 commits intoapache:masterfrom
dpol1:fix/2617-schema-cache-v2-clear

Conversation

@dpol1
Copy link
Copy Markdown
Contributor

@dpol1 dpol1 commented Apr 28, 2026

Purpose of the PR

In HStore / multi-server mode, schema changes can leave other server nodes with stale local schema caches. CachedSchemaTransactionV2 already publishes schema-cache clear events through MetaManager, but it did not listen for those meta events and clear the affected local V2 caches.

Main Changes

  • Register a JVM-wide schema cache clear listener for CachedSchemaTransactionV2
  • Clear V2 schema-id / schema-name caches and the attached array cache by graph name
  • Publish schema cache clear events after schema add, update, and remove paths
  • Keep the local EventHub cache listener registered per transaction instance
  • Add unit coverage for graph-scoped V2 schema cache clearing

Schema Cache Clear Flow

Before

+----------+       schema add/remove       +---------------+
| Server A | ----------------------------> | HStore / Meta |
+----------+                               +---------------+
     |
     | local V2 cache updated
     v
+-------------------+
| Server A cache    |
| fresh             |
+-------------------+

+-------------------+
| Server B cache    |
| stale             |
+-------------------+

Problem:
Server B had no effective cross-JVM V2 schema-cache-clear listener, so it could
keep serving stale schema from local cache.
After

+----------+       schema add/remove       +---------------+
| Server A | ----------------------------> | HStore / Meta |
| source A |                               +---------------+
+----------+                                      |
     |                                            |
     | publish schema-cache-clear                 |
     | {"graph":"DEFAULT-g","source":"A"}         |
     v                                            v
+-------------------+                    +-------------------+
| Server A listener |                    | Server B listener |
| source == A       |                    | source != B       |
| skip self echo    |                    | clear V2 caches   |
+-------------------+                    +-------------------+
                                                |
                                                v
                                      +---------------------+
                                      | schema-id cache     |
                                      | schema-name cache   |
                                      | array attachment    |
                                      +---------------------+

Compatibility / Scope Note

This PR does not change persisted schema data. It only changes the runtime schema-cache-clear meta-event value so new events can carry a per-JVM source id. New consumers still accept legacy plain-string graph names, so mixed-version rollouts during upgrade are safe.

The reconnect recovery hook remains manual until MetaManager/MetaDriver exposes a reconnect callback. The ~create_time String/Date schema userdata round-trip is tracked separately in #3013 .

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • mvn test -pl hugegraph-server/hugegraph-test -am -P unit-test -Dtest=CachedSchemaTransactionTest -DfailIfNoTests=false "-Djacoco.skip=true"

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working tests Add or improve test cases labels Apr 28, 2026
Register a JVM-wide schema cache clear listener for CachedSchemaTransactionV2 so HStore server nodes can invalidate local schema caches when another node changes schema.

Clear V2 schema-id/name caches and the attached array cache by graph name, and publish schema cache clear events after schema add, update, and remove paths.

Add unit coverage for graph-scoped V2 schema cache clearing.

Fixes apache#2617
@dpol1 dpol1 force-pushed the fix/2617-schema-cache-v2-clear branch from c1611c2 to f6adb67 Compare April 28, 2026 08:16
@imbajin imbajin requested a review from Copilot April 28, 2026 09:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes stale local schema caches in HStore multi-server deployments by ensuring CachedSchemaTransactionV2 listens for schema-cache-clear meta events and clears the appropriate per-graph V2 caches.

Changes:

  • Register a JVM-wide MetaManager schema-cache-clear listener and clear V2 schema-id / schema-name caches (plus the attached array cache) by graph name.
  • Publish schema cache clear notifications on schema add/update/remove and on cache clears triggered by store events.
  • Add a unit test covering graph-scoped V2 schema cache clearing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransactionV2.java Adds meta-event listener + per-graph cache clearing, and expands notify paths to keep multi-node caches consistent.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/cache/CachedSchemaTransactionTest.java Adds unit coverage for graph-scoped V2 schema cache clearing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Prevent remote schema cache clear notifications during local STORE_INIT events.
Rename `notifySchemaCacheClear` to `maybeNotifySchemaCacheClear` to clarify conditional notification behavior.

Enhance unit tests to ensure all CachedSchemaTransactionV2 caches, including array caches, are comprehensively cleared on a graph-scoped basis.

Fixes apache#2617
@dpol1
Copy link
Copy Markdown
Contributor Author

dpol1 commented Apr 28, 2026

Hi @imbajin - while working on this PR I spotted what might be a separate bug in the legacy EventHub path. Flagging before I file anything.

Producers emit past-tense actions:

  • Cache.ACTION_INVALIDED ("invalided")
  • Cache.ACTION_CLEARED ("cleared")

Listeners only match the present-tense ones:

  • Cache.ACTION_INVALID ("invalid")
  • Cache.ACTION_CLEAR ("clear")

So those events look like they get dropped silently. Wider blast radius than the HStore CachedSchemaTransactionV2 fix here, so I kept it out of scope.

Worth a separate issue or maybe sub-issue - or is this intentional? If you're +1, I'll take care of it

@imbajin
Copy link
Copy Markdown
Member

imbajin commented Apr 28, 2026

Producers emit past-tense actions: Cache.ACTION_INVALIDED / Cache.ACTION_CLEARED
Listeners only match the present-tense ones: Cache.ACTION_INVALID / Cache.ACTION_CLEAR
So those events look like they get dropped silently.

Good catch @dpol1 — I dug into the code and confirmed the mismatch is real.

Opened #3012 to track this separately — feel free to take it or I can look at it after #3011 lands.

@dpol1
Copy link
Copy Markdown
Contributor Author

dpol1 commented Apr 28, 2026

Producers emit past-tense actions: Cache.ACTION_INVALIDED / Cache.ACTION_CLEARED
Listeners only match the present-tense ones: Cache.ACTION_INVALID / Cache.ACTION_CLEAR
So those events look like they get dropped silently.

Good catch @dpol1 — I dug into the code and confirmed the mismatch is real.

Opened #3012 to track this separately — feel free to take it or I can look at it after #3011 lands.

I'll do in a separate PR, when this lands!

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix direction is correct: a JVM-global MetaManager listener with graph-scoped clearSchemaCache is the right architectural shape for HStore cache synchronization. Two critical behavioral issues need to be resolved before merge — updateSchema triggering unnecessary broadcast storms, and the static listener having no lifecycle reset path — plus a few important design concerns below.

dpol1 added 2 commits April 29, 2026 10:40
…anager

 - Add MetaManager.extractSchemaCacheClearGraphNamesFromResponse so callers
  no longer need to reach into MetaDriver directly via metaDriver(). The
  schema cache clear path in CachedSchemaTransactionV2 already references
  this helper;
Address review feedback that the cross-node cache sync introduced in
  this branch had no end-to-end test coverage.

  Refactor the meta-event consumer in CachedSchemaTransactionV2 into a
  package-private static handleSchemaCacheClearEvent so tests can drive
  the publish -> callback -> clearSchemaCache path without a live
  etcd/PD watch.

  Add five tests covering:
  - target-only invalidation across multiple graphs in the same JVM
  - noop on null graph-name response
  - multi-graph payload clears each graph
  - listenSchemaCacheClear is idempotent across repeated invocations
  - full register -> capture consumer -> publish -> clear flow via
    a Mockito MetaDriver stub
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 29, 2026
@dpol1 dpol1 requested a review from imbajin April 29, 2026 08:43
Copy link
Copy Markdown
Contributor Author

@dpol1 dpol1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imbajin I noticed one more issue exposed by my cache-clear change

The failing casts are here:

After the V2 cache clear, the schema is loaded again from backend. In that path, userdata is restored here:

TextSerializer.java#L917

~create_time comes back as String, but those tests expect Date.

I’ll fix this in Userdata, not in TextSerializer. The constructor currently just does putAll(map) here:

Userdata.java#L38-L39

I’ll normalize Userdata.CREATE_TIME in put() / putAll(), so serialized date strings/timestamps are converted back to Date in one place. Then I’ll add a small regression test and rerun the two HStore create-time tests. That's my way to handle it if you have another idea on your POV, give me a feedback :)

- Cross-node schema cache invalidation broke when the publisher cleared
  its own fresh cache. Tag events with a per-JVM UUID and skip self.
@dpol1
Copy link
Copy Markdown
Contributor Author

dpol1 commented Apr 30, 2026

@imbajin Dropped the Userdata idea — too wide for this PR.
Schema-add still propagates a cache clear, but now V2 events carry a per-JVM source id so the publisher skips its own echo. Remote HStore servers still see the event and invalidate normally.
Plain-string (legacy) events clear the same way as before, so older publishers won't break.

The ~create_time String/Date round-trip: pretty sure that's a backend serialization thing. Leaving it out to keep this PR scoped.

@imbajin
Copy link
Copy Markdown
Member

imbajin commented May 5, 2026

Thanks for the update. The current direction looks much better scoped than the earlier Userdata normalization idea.

From my reading, the current PR mechanism is:

  • keep the local EventHub schema-cache listener for same-JVM cache events
  • add one JVM-wide MetaManager listener for cross-server schema-cache-clear events
  • publish graph-scoped schema-cache-clear events on schema add/remove
  • include a per-JVM source id so the publisher can skip its own watch echo
  • clear all V2 schema cache layers on remote JVMs: schema-id, schema-name, and the array attachment cache
  • continue accepting legacy plain-string event values from older publishers

Could you please update the PR description/comment with a compact before -> after diagram? I think that will make the design much easier to review, especially since most of the added code is test coverage.

Suggested structure:

Before

+----------+       schema add/remove       +---------------+
| Server A | ----------------------------> | HStore / Meta |
+----------+                               +---------------+
     |
     | local V2 cache updated
     v
+-------------------+
| Server A cache    |
| fresh             |
+-------------------+

+-------------------+
| Server B cache    |
| stale             |
+-------------------+

Problem:
Server B had no effective cross-JVM V2 schema-cache-clear listener, so it could
keep serving stale schema from local cache.
After

+----------+       schema add/remove       +---------------+
| Server A | ----------------------------> | HStore / Meta |
| source A |                               +---------------+
+----------+                                      |
     |                                            |
     | publish schema-cache-clear                 |
     | {"graph":"DEFAULT-g","source":"A"}         |
     v                                            v
+-------------------+                    +-------------------+
| Server A listener |                    | Server B listener |
| source == A       |                    | source != B       |
| skip self echo    |                    | clear V2 caches   |
+-------------------+                    +-------------------+
                                                |
                                                v
                                      +---------------------+
                                      | schema-id cache     |
                                      | schema-name cache   |
                                      | array attachment    |
                                      +---------------------+

A few points I would like to confirm before merge:

  1. resetMetaListenerForReconnect() does not seem to be wired to any actual MetaManager / MetaDriver reconnect callback.

    If there is an existing call path, could you point it out? If not, please adjust the comment so it is clear this is only a future/manual recovery hook, not a complete reconnect recovery mechanism in this PR.

  2. The ~create_time String/Date round-trip should stay out of this PR, but please mention it as a follow-up.

    The self-echo skip avoids forcing the publisher to immediately reload its freshly created schema from backend, but remote nodes can still reload schema after cache invalidation. So the backend userdata serialization issue still exists; it is just outside the current PR scope.

  3. Please add a short compatibility note.

    Something like:

    This PR does not change persisted schema data. It only changes the runtime schema-cache-clear meta-event value so new events can carry a per-JVM source id. New consumers still accept legacy plain-string graph names.

Overall, the design now looks reasonable to me. The main remaining thing is to make the lifecycle and compatibility boundaries explicit so future readers do not assume the reconnect and ~create_time issues are fully solved here.

…t compat

  - CachedSchemaTransactionV2: state explicitly that
    resetMetaListenerForReconnect() is a manual hook, not wired to a
    MetaManager/MetaDriver reconnect callback in this PR.
  - MetaManager.SchemaCacheClearEvent.fromValue: document the legacy
    plain-string branch as a rolling-upgrade compatibility path.
@dpol1
Copy link
Copy Markdown
Contributor Author

dpol1 commented May 5, 2026

Thanks @imbajin. Addressed all three points.

1. Reconnect lifecycle

Confirmed: no MetaManager/MetaDriver reconnect callback exists today. resetMetaListenerForReconnect() is a manual hook only — clarified in
CachedSchemaTransactionV2 comment + Javadoc, TODO kept for wiring once a reconnect callback is exposed. This PR is not an automatic reconnect-recovery mechanism.

2. ~create_time round-trip

Out of scope, filed as follow-up: #3013 .

3. Compatibility note

Added to PR description and as code comment at MetaManager.SchemaCacheClearEvent.fromValue() (legacy plain-string branch):

This PR does not change persisted schema data. It only changes the runtime schema-cache-clear meta-event value so new events can carry a per-JVM source id. New
consumers still accept legacy plain-string graph names, so mixed-version rollouts are safe.

MetaManagerSchemaCacheClearEventTest already covers both formats (legacy plain-string + JSON with graph/source). Before/after diagram also in PR description.

imbajin

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM , feel free to follow up on the two in-code TODOs at your pace

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 6, 2026
@imbajin imbajin merged commit ee177a0 into apache:master May 6, 2026
16 of 18 checks passed
@dpol1 dpol1 deleted the fix/2617-schema-cache-v2-clear branch May 6, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files. tests Add or improve test cases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] cacheEventListener in CachedSchemaTransactionV2 Work does not meet expectations

3 participants