chore: consolidated feature branch for ENVITED-X pipeline#14
Draft
jdsika wants to merge 15 commits into
Draft
Conversation
b95deac to
f9b74a0
Compare
Emit warnings for abstract class covering axiom edge cases: - Zero children: warn that no covering axiom will be generated - One child: warn that the covering axiom degenerates to an equivalence (Parent = Child), recommending --skip-abstract-class-as-unionof-subclasses Both axioms are still emitted when applicable (semantically correct per OWL 2), but warnings alert users who extend the ontology downstream. Tests verify warnings are logged, flag suppression works, the single-child covering axiom triple is correctly asserted, plus negative tests for multi-child and concrete class cases, and the mixin-only children edge case. Refs: linkml#3309, linkml#3219 Signed-off-by: jdsika <carlo.van-driesten@bmw.de> Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
… names Add an opt-in --normalize-prefixes flag to OWL, SHACL, and JSON-LD Context generators that normalises non-standard prefix aliases to well-known names from a static prefix map (derived from rdflib 7.x defaults, cross-checked against prefix.cc consensus). Key design decisions: - Static frozen map (MappingProxyType) instead of runtime Graph().namespaces() lookup eliminates rdflib version dependency - Both http://schema.org/ and https://schema.org/ map to 'schema' - Shared normalize_graph_prefixes() helper used by OWL and SHACL - Two-phase graph normalisation: Phase 1 normalises schema-declared prefixes, Phase 2 cleans up runtime-injected bindings - Collision detection: skip with warning when standard prefix name is already user-declared for a different namespace - Phase 2 guard prevents overwriting HTTPS bindings with HTTP variants The flag defaults to off, preserving existing behaviour. Tests cover OWL, SHACL, and context generators with sdo->schema, dce->dc, http/https edge case, custom prefix preservation, flag-off backward compatibility, cross-generator consistency, prefix collision detection, schema1 regression prevention, Phase 2 HTTPS guard, empty schema edge case, and static map integrity. Signed-off-by: jdsika <carlo.van-driesten@bmw.de> Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
…erals Add a `--default-language` CLI option to both gen-owl and gen-shacl that emits BCP 47 language-tagged string literals for human-readable annotations. gen-owl changes: - New `default_language` field on OwlSchemaGenerator - `_LANGUAGE_TAGGABLE_RANGES` frozenset (string, ncname) guards tagging - `_resolve_language()` checks element-level in_language first, then default - `_literal()` helper creates properly tagged Literal objects - `add_metadata()` tags string-range and fallback-range literals - `add_enum()` PV labels respect language tags - New `--default-language` Click option gen-shacl changes: - New `default_language` field on ShaclGenerator - NodeShape rdfs:label / rdfs:comment get language tags - PropertyShape sh:name / sh:description get language tags via prop_pv_text() - Numeric literals (sh:order, sh:minCount, etc.) are never tagged - New `--default-language` Click option Tests: - 3 new OWL tests: tagged labels, backward-compat plain literals, URI ranges - 4 new SHACL tests: NodeShape, PropertyShape, plain literals, numeric guard Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
…apes
Add a new --message-template option that attaches sh:message literals to
each property shape using a user-defined template string.
Supported placeholders:
{name} — slot name (underscore-separated)
{title} — slot title (human-readable), falls back to name
{description} — slot description, falls back to empty string
{comments} — slot comments joined with "; ", falls back to empty string
{class} — enclosing class name
{path} — property IRI (compact or full)
The resulting message is stripped of leading/trailing whitespace and
omitted entirely when empty (avoids blank sh:message literals).
When --default-language is also set, the literal is language-tagged.
Example:
gen-shacl --message-template "{name} ({class}): {description} [{comments}]"
Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Implement SHACL-SPARQL constraint generation for the boolean-guard pattern commonly used in conditional validation rules. When a LinkML class has rules: blocks with preconditions (value_presence: PRESENT) and postconditions (equals_string: true), the generator now emits sh:SPARQLConstraint nodes on the corresponding sh:NodeShape. Features: - New _add_rules() method translates recognised rule patterns to SPARQL - Boolean-guard pattern: if value present then flag must be true - Rule description mapped to sh:message on the constraint - Deactivated rules are skipped - Warnings emitted for bidirectional/open_world rule flags - New --emit-rules/--no-emit-rules CLI flag (default: enabled) - Full URI references in SPARQL (no PREFIX declarations needed) The generated SPARQL follows W3C SHACL Section 5 and uses the pre-bound \ variable per Section 5.3.1. Constraints are validated by pyshacl with advanced=True. Refs: linkml#2464 Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Python truthiness check `if s.maximum_cardinality:` evaluates to False when the value is 0 (an integer), silently skipping sh:maxCount 0 emission. The same bug affected minimum_cardinality and exact_cardinality. Replace all three truthiness checks with explicit `is not None` guards: - `if s.minimum_cardinality is not None:` - `if s.maximum_cardinality is not None:` - `elif s.exact_cardinality is not None:` (two occurrences) Add regression tests: - test_zero_maximum_cardinality_emits_maxcount - test_zero_exact_cardinality_emits_both_counts This is the primary mechanism for suppressing inherited slots on subclasses via slot_usage (OWL maxCardinality 0 pattern). Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
The SHACL generator translated any_of branches by dispatching
solely on `any.range` (class, type, enum, or simple datatype).
If a branch specified `pattern:` — either alone or combined
with a range — the constraint was silently dropped, producing
an empty blank node `[ ]` (trivially satisfied) instead of the
intended `[ sh:pattern "..." ]`.
This is a problem for schemas that use pattern alternatives in
`any_of`, such as the SPDX license field where valid values are
either members of a fixed enum (SPDX identifiers), IRIs, or
custom identifiers matching the LicenseRef- pattern defined in
SPDX Specification v2.3 Annex D (ABNF: license-ref =
["DocumentRef-"(idstring)":"]"LicenseRef-"(idstring)).
The fix adds a single check after the range dispatch:
if any.pattern:
g.add((range_list[-1], SH.pattern, Literal(any.pattern)))
This correctly handles:
- Pattern-only branches (no range): node gets only sh:pattern
- Range + pattern branches: node gets both sh:datatype and sh:pattern
- Range-only branches (no pattern): unchanged behaviour
The test suite now includes a dedicated schema exercising all
three cases, with assertions on both the generated RDF triples
and pyshacl validation of conforming/non-conforming data.
Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Add a --deterministic / --no-deterministic CLI flag (default off) to OWL, SHACL, JSON-LD Context, and JSON-LD generators that produces diff-stable output using Weisfeiler-Lehman structural hashing on top of the RDFC-1.0 canonicalization from upstream (linkml#3407). Three-phase hybrid pipeline (when --deterministic is set): 1. RDFC-1.0 canonicalization (upstream) produces sequential _:c14nN IDs 2. Weisfeiler-Lehman structural hashing replaces sequential IDs with content-based _:b<sha256> hashes that remain stable when unrelated triples are added/removed 3. rdflib re-serialization recovers idiomatic Turtle (inline blank nodes, collection syntax, filtered prefixes, preserved xsd:string) Without --deterministic, upstream's always-on RDFC-1.0 canonicalization is used directly (via canonicalize_rdf_graph). Additional features gated behind --deterministic: - Expression sorting (any_of/all_of/none_of/exactly_one_of) in owlgen - Collection sorting (sh:in, sh:ignoredProperties) in shaclgen - Permissible value sorting in owlgen and shaclgen - JSON-LD deterministic key ordering (deterministic_json) - JSON-LD context structured ordering (jsonldcontextgen) Rebased on top of upstream linkml#3407 (pyoxigraph RDFC-1.0). Refs: linkml#1847, linkml#3407 Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
a64dd0c to
7303dc7
Compare
When --default-language is set, the sh:message literal on SPARQL constraints (sh:SPARQLConstraint) was emitted without a language tag. Add lang=self._resolve_language() to the Literal() constructor call for SPARQL rule descriptions. Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
674d72f to
a705c35
Compare
rdflib's Turtle serializer always emits a trailing double newline. Normalize to single newline in deterministic_turtle() and the rdflib fallback path in canonicalize_rdf_graph() for consistent file endings. Note: CLI print() still adds a newline after serialize()'s trailing newline. Callers capturing stdout should strip trailing blank lines (e.g. via sed). Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
a705c35 to
3d3a52a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR exists solely to observe CI status for the combined feature branch used by the ENVITED-X asset pipeline. It is NOT intended to be merged.
Contents
14 commits stacked on upstream
main(1c5f68e4, includes #3447):Changes from previous version
from linkml.utils.rdf_canonicalizesorted withlinkml.*group (ruff isort)\nfrom generated tutorial.ttlfiles (end-of-file-fixer)self._present(...)to single line (ruff format)rdf_canonicalize.pyintolinkml.utils— avoids PyPI linkml-runtime missing the module when installed via git--default-languageto SPARQL constraintsh:messageliteralsKnown CI issues
prefixmapsgit pin (nogitin container) — expected until prefixmaps#82 releases v0.2.8TODO
deterministic_turtle()/canonicalize_rdf_graph()—rdflib.Graph.serialize(format="turtle")produces\n\n; currently patched by stripping in committed filesDO NOT MERGE
This branch will be force-pushed when upstream PRs are updated. It will be removed once all upstream PRs are merged.