Skip to content

Add native rule implementation for the standard protovalidate rules#469

Open
jonbodner-buf wants to merge 25 commits intomainfrom
jbodner/native-rules-port
Open

Add native rule implementation for the standard protovalidate rules#469
jonbodner-buf wants to merge 25 commits intomainfrom
jbodner/native-rules-port

Conversation

@jonbodner-buf
Copy link
Copy Markdown
Contributor

This PR ports the native rule support from protovalidate-go to protovalidate-java. Validation rules defined by protovalidate are handled in Java code instead of CEL. CEL expressions are still processed in CEL, as are any new rules that don't have native implementations.

The standard rules can be evaluated either through CEL or through native Java code. Native evaluation is functionally identical (the conformance suite passes in both modes) but skips CEL compilation and runtime overhead for the rules it covers — a single validate() call on a complex message can run an order of magnitude faster and allocate ~10× less.

Native rules are opt-in while the implementation matures. Enable them by configuring the validator:

Config config = Config.newBuilder().setEnableNativeRules(true).build();
Validator validator = ValidatorFactory.newBuilder().withConfig(config).build();

Native evaluators currently cover:

  • bool
  • all 12 numeric kinds (signed and unsigned int32/int64, float, double, etc.)
  • enum (const/in/not_in; the existing defined_only path is unchanged)
  • bytes
  • string (including all well-known formats)
  • repeated/map list-level rules (min_items/max_items/unique, min_pairs/max_pairs)
  • the google.protobuf.{Bool,Int32,Int64,UInt32,UInt64,Float,Double,String,Bytes}Value wrapper types (rules on wrapper-typed fields run through the same native evaluators after unwrapping).

Forward compatibility is preserved by a clone-and-clear contract: when protovalidate adds a new rule that this codebase hasn't yet implemented natively, the rule remains on the residual FieldRules and CEL enforces it. Native evaluation is an optimization, never a replacement.

Performance improvement for native rule implementations:

benchmark                          metric  cel                native          delta
buildBenchComplexSchema            time    15002410.58 ns/op  36947.91 ns/op  -99.8%
buildBenchComplexSchema            alloc   22874194.63 B/op   74972.01 B/op   -99.7%
buildBenchInt32GT                  time    4978570.44 ns/op   15568.07 ns/op  -99.7%
buildBenchInt32GT                  alloc   8468429.34 B/op    43944 B/op      -99.5%
validateBenchBoolConst             time    506.87 ns/op       60.67 ns/op     -88%
validateBenchBoolConst             alloc   1552 B/op          8 B/op          -99.5%
validateBenchBytesConst            time    614.72 ns/op       96.61 ns/op     -84.3%
validateBenchBytesConst            alloc   1828 B/op          140 B/op        -92.3%
validateBenchBytesIn               time    1498.31 ns/op      126.79 ns/op    -91.5%
validateBenchBytesIn               alloc   4288 B/op          168 B/op        -96.1%
validateBenchComplexSchema         time    54851.83 ns/op     4835.59 ns/op   -91.2%
validateBenchComplexSchema         alloc   130384.01 B/op     9184 B/op       -93%
validateBenchDoubleIn              time    686.87 ns/op       84 ns/op        -87.8%
validateBenchDoubleIn              alloc   2324 B/op          140 B/op        -94%
validateBenchEnumConst             time    506.34 ns/op       64.54 ns/op     -87.3%
validateBenchEnumConst             alloc   1556 B/op          72 B/op         -95.4%
validateBenchEnumNotIn             time    505.09 ns/op       65.28 ns/op     -87.1%
validateBenchEnumNotIn             alloc   1620 B/op          80 B/op         -95.1%
validateBenchEnumRules             time    726.41 ns/op       67.74 ns/op     -90.7%
validateBenchEnumRules             alloc   1960 B/op          80 B/op         -95.9%
validateBenchInt32GT               time    36965.57 ns/op     1114.67 ns/op   -97%
validateBenchInt32GT               alloc   85736.01 B/op      1244 B/op       -98.5%
validateBenchInt64Const            time    556.7 ns/op        77.81 ns/op     -86%
validateBenchInt64Const            alloc   1740 B/op          72 B/op         -95.9%
validateBenchInt64In               time    708.46 ns/op       94.13 ns/op     -86.7%
validateBenchInt64In               alloc   2048 B/op          132 B/op        -93.6%
validateBenchMap                   time    1007.53 ns/op      226.63 ns/op    -77.5%
validateBenchMap                   alloc   2912 B/op          432 B/op        -85.2%
validateBenchRepeatedBytesUnique   time    1034.72 ns/op      326.8 ns/op     -68.4%
validateBenchRepeatedBytesUnique   alloc   3304 B/op          1088 B/op       -67.1%
validateBenchRepeatedInt32Unique   time    668.98 ns/op       205.97 ns/op    -69.2%
validateBenchRepeatedInt32Unique   alloc   2256 B/op          576 B/op        -74.5%
validateBenchRepeatedMessage       time    9555.54 ns/op      645.4 ns/op     -93.2%
validateBenchRepeatedMessage       alloc   31112 B/op         1012 B/op       -96.7%
validateBenchRepeatedScalar        time    688.25 ns/op       95.64 ns/op     -86.1%
validateBenchRepeatedScalar        alloc   2240 B/op          112 B/op        -95%
validateBenchRepeatedScalarUnique  time    685.01 ns/op       203.59 ns/op    -70.3%
validateBenchRepeatedScalarUnique  alloc   2592 B/op          856 B/op        -67%
validateBenchRepeatedStringUnique  time    637.11 ns/op       232.19 ns/op    -63.6%
validateBenchRepeatedStringUnique  alloc   2176 B/op          576 B/op        -73.5%
validateBenchScalar                time    1523.08 ns/op      85.91 ns/op     -94.4%
validateBenchScalar                alloc   4492 B/op          120 B/op        -97.3%
validateBenchStringConst           time    548.64 ns/op       78.3 ns/op      -85.7%
validateBenchStringConst           alloc   1728 B/op          72 B/op         -95.8%
validateBenchStringContains        time    498.97 ns/op       74.8 ns/op      -85%
validateBenchStringContains        alloc   1628 B/op          72 B/op         -95.6%
validateBenchStringIn              time    778.2 ns/op        80.58 ns/op     -89.6%
validateBenchStringIn              alloc   2176 B/op          100 B/op        -95.4%
validateBenchStringLen             time    619.07 ns/op       70.94 ns/op     -88.5%
validateBenchStringLen             alloc   2132 B/op          72 B/op         -96.6%
validateBenchStringMinLen          time    640.65 ns/op       69.71 ns/op     -89.1%
validateBenchStringMinLen          alloc   2236 B/op          72 B/op         -96.8%
validateBenchStringPrefix          time    517.29 ns/op       79.1 ns/op      -84.7%
validateBenchStringPrefix          alloc   1732 B/op          72 B/op         -95.8%
validateBenchUint32In              time    747.06 ns/op       86.95 ns/op     -88.4%
validateBenchUint32In              alloc   2120 B/op          124 B/op        -94.2%
validateManyUnruled                time    650.71 ns/op       68.37 ns/op     -89.5%
validateManyUnruled                alloc   2192 B/op          72 B/op         -96.7%
validateMultiRuleError             time    3292.56 ns/op      433.89 ns/op    -86.8%
validateMultiRuleError             alloc   10736 B/op         2168 B/op       -79.8%
validateMultiRuleNoError           time    1958.92 ns/op      82.84 ns/op     -95.8%
validateMultiRuleNoError           alloc   5872 B/op          72 B/op         -98.8%
validateRegexPattern               time    1987 ns/op         779.5 ns/op     -60.8%
validateRegexPattern               alloc   4296 B/op          180 B/op        -95.8%
validateRepeatedRule               time    14083.24 ns/op     1186.26 ns/op   -91.6%
validateRepeatedRule               alloc   42896 B/op         1468 B/op       -96.6%
validateSimple                     time    2130.52 ns/op      882.01 ns/op    -58.6%
validateSimple                     alloc   4056 B/op          152 B/op        -96.3%
validateStringMatching             time    8325.98 ns/op      2035.29 ns/op   -75.6%
validateStringMatching             alloc   16280 B/op         960 B/op        -94.1%
validateTestByteMatching           time    6546.92 ns/op      359.91 ns/op    -94.5%
validateTestByteMatching           alloc   22120 B/op         600 B/op        -97.3%
validateWrapperTesting             time    18270 ns/op        962.24 ns/op    -94.7%
validateWrapperTesting             alloc   40404 B/op         1376 B/op       -96.6%

Run the benchmarks yourself with:

./gradlew :benchmarks:jmh
./gradlew :benchmarks:jmhCompareParams

Establishes the JMH measurement substrate for the native-rules port. Each
subsequent phase will compare against these numbers to verify the port
delivers real CPU and allocation wins.

Ports the bench and test messages from protovalidate-go's bench.proto and
native_test.proto into a new native_bench.proto, with gofakeit annotations
stripped (no maintained Java equivalent of protogofakeit). Hand-built
deterministic fixtures replace the gofakeit-generated values, so runs are
reproducible.

Adds compile-time benchmarks alongside the steady-state ones to track
evaluator-construction cost — under the planned clone-and-clear dispatch,
CEL will skip compilation for rules covered natively.
Lays the groundwork for replacing CEL with native Java evaluation of
standard rules. Per-rule implementations land in subsequent phases; this
commit only adds the surface area and the dispatch path, so behavior is
unchanged when the flag is off (the default) or when no rule type has
been migrated yet.

Native evaluators live in build.buf.protovalidate.rules. To consume the
existing evaluator pipeline they need access to a few package-private
types in build.buf.protovalidate; those are widened to public and
marked with a new @internal annotation so external consumers know they
remain unsupported.

Forward compatibility is preserved by a clone-and-clear contract: when
a future rule is added to the validate proto and the native dispatcher
hasn't yet learned it, the rule stays on the residual FieldRules and
CEL enforces it. Native evaluation is an optimization, never a
replacement.

The conformance runner reads DISABLE_NATIVE_RULES so the same suite can
exercise both modes without code changes — used now to verify the
dispatcher is invisible (2872/2872 in both modes), and again later for
parity testing.
First rule type to land natively. Confirms the dispatcher contract end
to end — the violation shape, rule path, field path, and rule_value
match what the CEL path produces for the same input. Wrapper-typed
scalar fields bypass the dispatcher; CEL's transparent unwrapping
handles them correctly today and adding native wrapper support
uniformly across all scalar rule types is more naturally a follow-up
once more rule types have shipped.

The test asserts on Violation.getRuleValue() explicitly, since that
field isn't part of the Violation proto and is therefore invisible to
the conformance suite — a parity-loss vector that needs Java-side
coverage to catch.
Covers all 12 proto numeric kinds (int32, sint32, sfixed32, uint32,
fixed32, int64, sint64, sfixed64, uint64, fixed64, float, double) and
all scalar numeric rules: gt, gte, lt, lte, const, in, not_in, plus
finite for float/double. Range-combo rule ids
(<type>.gt_lt[_exclusive], <type>.gte_lte[_exclusive], ...) and
violation messages match protovalidate-go's reference.

Unsigned correctness for uint32/fixed32/uint64/fixed64 uses
Integer.compareUnsigned/Long.compareUnsigned for ordering and
Integer.toUnsignedString/Long.toUnsignedString for messages, since
Java stores unsigned protobuf scalars in signed primitives.

Adds Value.rawValue() so native evaluators can read protobuf-java's
underlying scalar (Long for uint64, Integer for int32, ...) without
the CEL adaptation that ObjectValue.value(Class) applies — uint64
arrives as Guava UnsignedLong via that path, which doesn't compose
with the rule values read off the typed *Rules message.

The biggest steady-state benchmarks improve dramatically when native
is enabled — validateBenchInt32GT drops 97% in time and 98% in
allocation, and buildBenchInt32GT drops 99.7% (5ms → 16μs) because
every rule is covered natively and CEL has nothing left to compile.
Covers the three enum rules that aren't already handled by the
existing EnumEvaluator (defined_only). Both evaluators compose on the
same field when both rules are set — protovalidate's clone-and-clear
contract leaves defined_only untouched on the residual FieldRules so
the existing path keeps working.

Also adds bench fixtures BenchBoolConst and BenchEnumRules so Phase 2
and this phase have direct measurement targets. Previously neither
phase had a bool-only or enum-only bench message and the JMH numbers
were dominated by dispatcher overhead. With these fixtures, native vs
CEL shows ~85% and ~90% time reduction on the steady-state path.

While here, simplifies getUnknownFields().asMap().isEmpty() to just
isEmpty() in the bool and numeric evaluators — UnknownFieldSet has the
method directly.
Covers const, len, min_len, max_len, pattern, prefix, suffix,
contains, in, not_in, plus the well-known size-only formats ip
(4 or 16 bytes), ipv4 (4 bytes), ipv6 (16 bytes), uuid (16 bytes) —
none of those require actual parsing per protovalidate's spec.

Pattern matching uses re2j to stay byte-compatible with the CEL path,
and throws ExecutionException on non-UTF-8 input to match Go's
RuntimeError contract — the conformance suite expects pattern checks
to fail loudly rather than silently substituting characters.

ByteString doesn't expose a contains() method, so there's a small
in-class implementation. Hex formatter mirrors Go's fmt.Sprintf("%x")
output for byte slices, used in const/prefix/suffix/contains
violation messages.

validateTestByteMatching drops 92% in time and 97% in allocation when
native is enabled. validateBenchComplexSchema is now down 73% in time
vs the Phase 0 baseline (numerics + bytes both native).
The biggest single rule type by surface area: 14 scalar rules
(const/len/min_len/max_len/len_bytes/min_bytes/max_bytes/pattern/
prefix/suffix/contains/not_contains/in/not_in), 18 well-known formats
(email, hostname, the ip/ipv4/ipv6 family with prefix and
prefixlen variants, uri, uri_ref, address, uuid, tuuid, ulid,
host_and_port), and the well_known_regex oneof case for HTTP header
name/value with a strict toggle.

The well-known formats reuse the existing format helpers in
CustomOverload (isEmail, isHostname, isIp, isIpPrefix, isUri,
isUriRef, isHostAndPort) — those are widened to public+@internal so
both the CEL custom-overload registration and the native rules path
share the same parsers. No reimplementation, no risk of drift.

Length rules count Unicode code points (String.codePointCount), not
Java chars — surrogate-pair-encoded characters like emoji count as
one. Byte-length rules count UTF-8 bytes.

validateStringMatching drops 71% in time and 93% in allocation when
native is enabled. validateBenchComplexSchema is now down 80% in time
and 82% in allocation vs the Phase 0 baseline (numerics + bytes +
string all native).
Covers repeated min_items/max_items/unique and map min_pairs/max_pairs.
Element-level / key-and-value rules continue to flow through the
existing ListEvaluator/MapEvaluator path; this phase only handles the
top-level list/map rules that don't recurse into elements.

The unique check uses a linear O(n²) scan for lists of 16 or fewer
items (no allocation) and falls back to a HashSet for larger lists,
mirroring Go's threshold. Element kinds with reliable Object.equals
(scalars, strings, bools, bytes, enums) are supported; message/group
element kinds bail to CEL since their equality is reference-only.

This is the last rule type in scope for the native-rules port. With
all six rule phases native, BenchComplexSchema drops 90.5% in time
(55,536 → 5,260 ns/op) and 92.4% in allocation (133,120 → 10,100
B/op) vs the pre-port baseline. The remaining ~10% is enum
defined_only, oneof required, field-level required, and wrapper-
typed scalars — all intentionally out of scope (the first three are
already non-CEL evaluators; wrapper support is deferred).
The parity test runs a representative slice of conformance fixtures —
one per rule type, plus the kitchen-sink composite — through both
evaluation modes and asserts byte-equal Violation.toProto() output.
This is stricter than the conformance suite, which by default omits
the message text from comparison; the parity test catches any
between-mode drift on rule_id, field path, rule path, AND message.

Documents the Config.setDisableNativeRules opt-in in README, listing
what's covered and noting wrapper-typed scalars as a follow-up.
WrappedValueEvaluator pulls the inner 'value' field off a wrapper
Message at evaluation time and delegates to the same scalar native
evaluator (BoolRulesEvaluator, NumericRulesEvaluator, etc.) that
would have been used for a direct field of the wrapped type. The
RuleBase still uses the OUTER wrapper field's descriptor so violation
field paths point at the user's wrapper-typed field, matching the
field path produced by the CEL path.

Removes the Phase 2 wrapper-bypass that returned null from the
dispatcher when the outer descriptor was a Message. The CEL path was
correct because CEL's runtime auto-unwraps protobuf wrappers; native
evaluators expect the underlying scalar (e.g. Long for
Int64Value.value), which is what the unwrap produces.

ObjectValue is widened to public+@internal so the rules package can
construct one when delegating into the inner scalar evaluator.

validateWrapperTesting drops 94.8% in time (16,866 → 880 ns/op) and
96.6% in allocation (39,956 → 1,376 B/op). Conformance passes
2872/2872 in both modes; the parity test is extended to include
WrapperDouble for explicit coverage.
Reads more naturally: setting enableNativeRules to true turns native
rule evaluation on, false turns it off, default is false. Same effective
behavior (CEL-only by default; opt in to native), but the API name is
direct rather than double-negative.

Renames Config.setDisableNativeRules -> Config.setEnableNativeRules,
Config.isNativeRulesDisabled -> Config.isNativeRulesEnabled, the
EvaluatorBuilder field, the conformance env var DISABLE_NATIVE_RULES ->
ENABLE_NATIVE_RULES, the JMH @param across both benchmark classes, and
all test/doc references.

Bench @param order flips to {"false","true"} so the CEL-only run still
comes first; the regression-gate run is unchanged.
Adds make conformance-native that runs the conformance CLI with
ENABLE_NATIVE_RULES=true, and updates the Conformance workflow to run
both CEL-only (default) and native-enabled passes. Catches any
between-mode drift on every PR rather than only when someone runs the
parity test locally.

JMH-based regression detection in CI was considered but rejected: at
sub-microsecond benchmark scales JMH variance on shared CI runners
exceeds the 5% gate that's meaningful, so a per-PR perf gate would be
either too loose to catch real regressions or too flaky to trust. The
conformance dual-mode pass is the strict correctness gate; JMH stays a
developer-local A/B via the existing benchmarks/RESULTS.md workflow.
NumericTypeConfig.floatCompare/doubleCompare now treat +0.0 and -0.0 as
equal so a rule like float.const = -0.0 against val = +0.0 no longer fires
a spurious violation, and the formatters preserve the sign in violation
messages by checking the bit pattern before the whole-number short-circuit.

RepeatedRulesEvaluator.isUnique drops the linear-scan fast path for short
lists; the HashSet path is uniform across sizes.

Adds FloatBugConfirmationTest plus the validationtest.proto fixtures it
exercises (ExampleFloat/DoubleConstNegZero, ExampleFloat/DoubleRepeatedUnique,
FloatDoubleNaNNegZero) to lock in native-vs-CEL parity on -0 const messages
and document that NaN/±0 unique semantics agree across both paths (both
deviate from CEL's IEEE-754 spec via CustomOverload.uniqueList).
StringRulesEvaluator:
- Pre-build well_known_regex sites (header_name / header_name_empty /
  header_value) as static finals; checkKnownRegex no longer allocates
  RuleSites per violation.
- Drop defensive ArrayList copies of in/not_in lists; the proto runtime
  already returns immutable views.
- Replace strVal.getBytes(UTF_8).length with a counting helper that doesn't
  allocate a byte array.
- Store WellKnownFormat empty messages explicitly per constant rather than
  deriving them by string substitution. Substring derivation produced
  "value is empty, which is not a valid host (hostname or IP address) and
  port pair" for host_and_port, where the spec says
  "value is empty, which is not a valid host and port pair".
- Drop unused ruleSuffix field on WellKnownFormat.

BytesRulesEvaluator / EnumRulesEvaluator: drop defensive in/not_in copies.

NumericRulesEvaluator:
- Switch evaluate() from eager `new ArrayList<>(0)` to the lazy-null
  pattern used by the other evaluators; rename finalize → done.
- Drop the rulesField parameter from tryBuild; the descriptor now lives on
  NumericTypeConfig.

NumericTypeConfig:
- Carry the FieldRules-level field descriptor on each kind. Lets the
  dispatcher read and clear the typed sub-message without the 12-arm
  rulesFieldNumberFor switch.
- Extract a private create(...) helper to make the static initializers
  uniform.
- Document the FieldRules / per-kind *Rules class-init invariant.

Rules:
- Drop rulesFieldNumberFor entirely; numericTryBuild reads the descriptor
  off the config.
- Inline tryBuildRepeatedRules / tryBuildMapRules at their single call
  sites.

MapRulesEvaluator: tautology() returns false unconditionally. The previous
condition was unreachable because tryBuild already returns null when both
fields are unset.

WrappedValueEvaluator: assert at construction that the inner field is
field number 1 named "value" (the wrapper-message contract).
The lazy-null `add(...)`, `done(...)`, and `formatList(...)` helpers were
duplicated across six evaluators. Move them onto RuleBase: a static
`add(...)`, an instance `done(...)` that finalizes against this base's
field-path/rule-prefix elements, and a generic `formatList(...)` plus
toString-default overload.

Per-evaluator wrappers that were pure delegations are deleted entirely.
NumericRulesEvaluator and BytesRulesEvaluator keep their `formatList`
methods because each captures a non-default formatter (config.formatter
and ByteString::toStringUtf8 respectively).

Call sites use `RuleBase.add(...)` / `base.done(...)` / `RuleBase.formatList(...)`
directly.
Adds five test classes covering contracts the conformance suite doesn't
exercise directly:

- FailFastTest: native evaluators must short-circuit after the first
  violation when failFast=true. One fixture per evaluator kind with two
  rules that both fail.
- NotInRulesTest: pins rule_id and message text for not_in on int32,
  uint32, string, bytes, and enum.
- ResidualClearingTest: asserts exactly one violation (not two) per rule
  type, proving the dispatcher's clone-and-clear contract for each kind.
- WellKnownRegexTest: HTTP header_name/header_value, strict + loose modes,
  and the empty-header-name special case.
- WrappedValueEvaluatorTest: absent wrapper field, present wrapper with
  default inner value, present wrapper with violating inner value.

Adds the validationtest.proto fixtures these tests need (multi-rule
fixtures, NotIn fixtures, HttpHeader fixtures, wrapper fixtures), pulling
in google/protobuf/wrappers.proto.

Parameterizes NativeRulesParityTest with @MethodSource so each fixture
produces a discrete pass/fail in the JUnit report.
Adds 16 single-rule fixtures and matching @benchmark methods so each rule
that the native path now handles has an isolated CEL-vs-native A/B (the
existing BenchComplexSchema mixed too many rules into one timing). Covers
the gaps identified in the second-pass review:

- string: const, len, min_len (isolated), prefix, contains, in
- bytes: const, in
- numeric (non-int32): int64 const+in, uint32 in, double in
- enum: const, not_in
- repeated.unique: string, int32

Each fixture's value satisfies its rule so the benchmark measures the
happy path. The `enableNativeRules` @param continues to A/B native vs CEL.
@jonbodner-buf jonbodner-buf requested a review from pkwarren May 4, 2026 15:09
…ctor to prevent instantiation. Update README for benchmarks. Add new gradle task for comparing native rule performance.
@jonbodner-buf jonbodner-buf force-pushed the jbodner/native-rules-port branch from 4419221 to ea107a4 Compare May 4, 2026 15:42
…etEnableNativeRules() and setDisableNativeRules(). If neither is used, protovalidate-java defaults to native rules enabled.
@jonbodner-buf
Copy link
Copy Markdown
Contributor Author

New commit reverses the behavior. Now native rules are enabled by default.

Copy link
Copy Markdown
Member

@pkwarren pkwarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass - still need to review the tests. The results look really promising - nice work!

There were a few places that may differ a bit between Java and Go that need a closer look. I also think we may be able to optimize further but that's better in follow ups.

Comment thread .github/workflows/conformance.yaml Outdated
- name: Test conformance
- name: Test conformance (CEL-only)
run: make conformance-cel
- name: Test conformance (native rules enabled — default mode)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd remove the (native rules enabled ... here - there will potentially be other variants down the road.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread Makefile
conformance: ## Execute conformance tests.
$(GRADLE) conformance:conformance
conformance: ## Execute conformance tests with native rule evaluators enabled.
ENABLE_NATIVE_RULES=true $(GRADLE) conformance:conformance
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could define ENABLE_NATIVE_RULES ?= true above and then just update the workflow to run make conformance ENABLE_NATIVE_RULES=false for the CEL flow.

Comment thread README.md
* A comprehensive RPC quickstart for [Java and gRPC][grpc-java]
* A [migration guide for protoc-gen-validate][migration-guide] users

## Native rule evaluators (opt-in)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now opt-out

Comment thread README.md
Validator validator = ValidatorFactory.newBuilder().withConfig(config).build();
```

Native evaluators currently cover bool, all 12 numeric kinds (signed and unsigned int32/int64, float, double, etc.), enum (`const`/`in`/`not_in`; the existing `defined_only` path is unchanged), bytes, string (including all well-known formats), repeated/map list-level rules (`min_items`/`max_items`/`unique`, `min_pairs`/`max_pairs`), and the `google.protobuf.{Bool,Int32,Int64,UInt32,UInt64,Float,Double,String,Bytes}Value` wrapper types — rules on wrapper-typed fields run through the same native evaluators after unwrapping.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this helpful information for users or a detail left to the implementation's javadoc or somewhere else?

My main concern is keeping this list in sync with any future optimizations.

Comment thread benchmarks/build.gradle.kts Outdated
//
// Override the input file:
// ./gradlew :benchmarks:jmhCompareParams -Presults=path/to/results.json
tasks.register<Exec>("jmhCompareParams") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this jmhCompareNative or jmhCompareNativeRules?

If so we should update the naming of the jq script to match.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to jmhCompareNativeRules, .jq file renamed to jmh-compare-native-rules.jq

// Java's compareTo semantics (NaN.compareTo(NaN) == 0) so behavior matches the existing
// protovalidate-java CEL path, which uses the same Object.equals semantics.

private static int floatCompare(Float f1, Float f2) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the comparator is used for more than range operations, we may need special handling for NaN in places where the comparator is used.

Double.NaN.compareTo(Double.NaN) == 0 in Java but if the Go code is using == then we'll get different behavior.

*
* @return this builder
*/
public Builder setEnableNativeRules() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two methods here is a different pattern than setAllowUnknownFields and setFailFast which take the boolean value. I'd have this match and remove setDisableNativeRules.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. Now there is a single method, setEnableNativeRules that takes a boolean. The default value is true.

@@ -71,6 +73,7 @@ final class EvaluatorBuilder {
EvaluatorBuilder(Cel cel, Config config) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not from your change but we should change this to call this(cel, config, Collections.emptyList(), false) so we just do this initialization in one place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a new private constructor that is called by the two existing constructors. One constructor throws a checked exception, so merging them causes a great deal of disruption as the exception is bubbled up.

*/
@Retention(RetentionPolicy.SOURCE)
@Target({ElementType.TYPE, ElementType.METHOD, ElementType.CONSTRUCTOR, ElementType.FIELD})
public @interface Internal {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it makes the package bigger I wonder if we could look to move the rules.* package into build.buf.protovalidate. This way we can continue to leave some classes/methods package private.

There are also some other things we could look at as well - https://errorprone.info/api/latest/com/google/errorprone/annotations/RestrictedApi.html.

private static final Pattern ULID_REGEX =
Pattern.compile("^[0-7][0-9A-HJKMNP-TV-Za-hjkmnp-tv-z]{25}$");
private static final Pattern HEADER_NAME_REGEX =
Pattern.compile("^:?[0-9a-zA-Z!#$%&\\\\'*+\\-.\\^_|~`]+$");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/bufbuild/protovalidate-go/blob/c0c9f624fd0fc37177c680ec2e40717ecbc68ac4/native_string.go#L246 uses raw strings, so it appears the escaping of the hyphen char differs betweeen Go and Java.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants