Draft
Conversation
0cb1793 to
33485ca
Compare
Contributor
Author
|
CI failure seems to be unrelated to my changes. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces YEAST, a Rust library for declarative AST cleanup/desugaring on top of tree-sitter parse trees, and integrates it into the shared tree-sitter extractor so extraction can optionally run on a rewritten AST and/or validate against an alternate output node-types schema.
Changes:
- Add new
shared/yeast+shared/yeast-macroscrates implementing the rule/query/template system, plus tests and documentation. - Extend the shared tree-sitter extractor to optionally run YEAST rules and to support separate
output_node_typesfor schema generation and TRAP validation. - Update Bazel vendored Rust deps to include YEAST dependencies (and bump
tree-sitter,cc, etc.).
Show a summary per file
| File | Description |
|---|---|
| shared/yeast/tests/test.rs | End-to-end tests for parsing, query matching, tree building, and desugaring rules. |
| shared/yeast/tests/node-types.yml | Test output schema in the new YAML node-types format. |
| shared/yeast/src/visitor.rs | Converts a tree-sitter Tree into a YEAST Ast. |
| shared/yeast/src/tree_builder.rs | Fresh identifier generation support for templates/rules. |
| shared/yeast/src/schema.rs | Schema representation for kinds/fields (language-derived or YAML-derived). |
| shared/yeast/src/range.rs | Serde helpers for (de)serializing tree_sitter::Range. |
| shared/yeast/src/query.rs | Query AST and matching engine (captures, repetition, named/unnamed semantics). |
| shared/yeast/src/print.rs | Debug printer for walking a YEAST AST. |
| shared/yeast/src/node_types_yaml.rs | YAML ↔ JSON node-types conversion + schema construction from YAML. |
| shared/yeast/src/lib.rs | Core YEAST types (Ast, Node, Rule, Runner) and rewrite application logic. |
| shared/yeast/src/dump.rs | Human-readable AST dump utility used by tests. |
| shared/yeast/src/cursor.rs | Cursor trait abstraction used by traversal/extractor integration. |
| shared/yeast/src/captures.rs | Capture storage and utilities (single/repeated/optional). |
| shared/yeast/src/build.rs | BuildCtx used by tree!/trees! macros to build synthetic nodes. |
| shared/yeast/src/bin/node_types_yaml.rs | CLI tool to convert YAML node-types ↔ JSON node-types. |
| shared/yeast/src/bin/main.rs | Minimal YEAST CLI for parsing and printing. |
| shared/yeast/doc/yeast.md | Main YEAST documentation (architecture, query/template language, integration). |
| shared/yeast/doc/node-types-yaml.md | Specification for the YAML node-types format and CLI usage. |
| shared/yeast/Cargo.toml | New yeast crate manifest and dependencies. |
| shared/yeast/Cargo.lock | Lockfile for the standalone shared/yeast crate. |
| shared/yeast/BUILD.bazel | Bazel target for the yeast Rust library. |
| shared/yeast/.gitkeep | Placeholder file for directory tracking. |
| shared/yeast/.gitignore | Ignores shared/yeast/target. |
| shared/yeast/.envrc | Direnv config for local development. |
| shared/yeast-macros/src/parse.rs | Proc-macro parsing and codegen for query!, tree!, trees!, rule!. |
| shared/yeast-macros/src/lib.rs | Proc-macro entry points and user-facing macro docs. |
| shared/yeast-macros/Cargo.toml | New yeast-macros proc-macro crate manifest. |
| shared/yeast-macros/BUILD.bazel | Bazel target for the yeast-macros proc-macro crate. |
| shared/tree-sitter-extractor/tests/multiple_languages.rs | Updates tests to include output_node_types in LanguageSpec. |
| shared/tree-sitter-extractor/tests/integration_test.rs | Updates tests to include output_node_types in LanguageSpec. |
| shared/tree-sitter-extractor/src/generator/mod.rs | Generator uses output_node_types when provided. |
| shared/tree-sitter-extractor/src/generator/language.rs | Adds output_node_types to generator Language. |
| shared/tree-sitter-extractor/src/extractor/simple.rs | Uses output_node_types for schema validation in the simple extractor. |
| shared/tree-sitter-extractor/src/extractor/mod.rs | Adds optional YEAST desugaring path and AstNode abstraction. |
| shared/tree-sitter-extractor/Cargo.toml | Adds a path dependency on shared/yeast. |
| shared/tree-sitter-extractor/BUILD.bazel | Adds Bazel dep on //shared/yeast. |
| ruby/extractor/src/generator.rs | Populates output_node_types: None for Ruby/Erb generator languages. |
| ruby/extractor/src/extractor.rs | Updates shared extractor invocation with new extract(...) params. |
| ql/extractor/src/generator.rs | Populates output_node_types: None for QL generator languages. |
| ql/extractor/src/extractor.rs | Populates output_node_types: None for QL simple extractor languages. |
| MODULE.bazel | Adds/upgrades vendored crates (notably tree-sitter and new deps). |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/defs.bzl | Adds YEAST + YEAST-macros crates and bumps vendored dependencies. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.zstd-sys-2.0.16+zstd.1.5.7.bazel | Updates cc dependency reference. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-ruby-0.23.1.bazel | Updates cc dependency reference. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-ql-0.23.1.bazel | Updates cc dependency reference. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-python-0.23.6.bazel | Adds vendoring/build definitions for tree-sitter-python. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-json-0.24.8.bazel | Updates cc dependency reference. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-embedded-template-0.25.0.bazel | Updates cc dependency reference. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-0.26.8.bazel | Bumps vendored tree-sitter to 0.26.8 and updates cc reference. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.iana-time-zone-haiku-0.1.2.bazel | Updates cc dependency reference. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.find-msvc-tools-0.1.9.bazel | Updates vendored find-msvc-tools version metadata. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.cc-1.2.61.bazel | Updates vendored cc version metadata and dependencies. |
| misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.bazel | Adds aliases for serde_yaml and tree-sitter-python; bumps tree-sitter alias. |
| Cargo.toml | Adds shared/yeast and shared/yeast-macros to the workspace. |
| Cargo.lock | Workspace lock updates (adds yeast crates; bumps tree-sitter, cc, etc.). |
Copilot's findings
Comments suppressed due to low confidence (1)
shared/yeast/src/node_types_yaml.rs:303
schema_from_yaml_with_languagealso registers YAMLunnamed:tokens usingschema.register_kind(name), which only affects the named kind map. If the YAML adds any unnamed tokens not present in the tree-sitter language,QueryNode::UnnamedNodelookups will still fail becauseunnamed_kind_idsis never updated.
This should use an unnamed-kind registration path (updating unnamed_kind_ids) rather than register_kind.
- Files reviewed: 52/55 changed files
- Comments generated: 6
33485ca to
fb1d844
Compare
fb1d844 to
cba9c08
Compare
YEAST (YEAST Elaborates Abstract Syntax Trees) is a framework for transforming tree-sitter parse trees before CodeQL extraction. Core components: - shared/yeast/ — Ast, Node, Schema, query matching engine, captures, FreshScope, BuildCtx - shared/yeast-macros/ — proc macros: query!, tree!, trees!, rule! The query language is inspired by tree-sitter queries: (assignment left: (_) @lhs right: (_) @rhs) Templates support embedded Rust ({expr}), splicing ({..expr}), computed literals (#{expr}), and fresh identifiers ($name). The rule! macro combines query and transform: rule!((for pattern: (_) @pat ...) => (call receiver: {val} ...)) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Human-friendly YAML alternative to tree-sitter node-types.json with three sections: supertypes, named, unnamed. Supports bidirectional conversion and building Schema objects from YAML. Includes CLI binary (node_types_yaml) and documentation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Produces indented text showing node kinds, named fields, and leaf content. Unnamed tokens are hidden unless inside a named field. Used by tests for readable assertions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
12 tests covering parsing, queries, tree building, desugaring rules, cursor navigation, and the shorthand rule! syntax. Tests use a custom output node-types.yml with named fields for all children (parameter, stmt, index), loaded via schema_from_yaml_with_language. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Covers architecture, query language, template language (tree!/trees!/rule!), capture semantics, fresh identifiers, and extractor integration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cba9c08 to
ed1ba0a
Compare
ed1ba0a to
1b8f451
Compare
extract() gains a rules parameter. When empty, uses tree-sitter native traversal (no behavior change). When non-empty, runs yeast desugaring and extracts via traverse_yeast. Adds AstNode trait abstracting over tree_sitter::Node and yeast::Node, with minimal changes to existing Visitor methods (Node -> &N in 6 signatures). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Language and LanguageSpec gain optional output_node_types field. When set, the generator produces dbscheme/QL from the output types and the extractor validates TRAP against them. All existing extractors pass None (no behavior change). Ruby extract() calls gain vec![] for the new rules parameter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add BUILD.bazel files for the yeast and yeast-macros crates, register them as dependencies of the shared tree-sitter extractor, and refresh the vendored crate dependencies via update_tree_sitter_extractors_deps.sh.
1b8f451 to
e612319
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a cleaned-up prototype of the YEAST library that was developed in a hackathon a few years ago.
YEAST is intended to be a lightweight layer for performing various kinds of AST cleanup and desugaring directly on the parse tree produced by a
tree-sitterparser. Rewrite rules are specified declaratively, with a query language that approximates that oftree-sitter, though notably with no alternation or anchors (and also with greedy semantics -- no backtracking). I expect that this will be sufficient for most uses.Output templates also look like
tree-sittertrees, with embedded rust blocks for specifying code that calculates an AST based on the given input.Because the output AST may be an entirely different language from the input AST, this PR also adds a new
node-types.ymlformat -- a lightweight reformulation ofnode-types.jsonintended for human consumption (unlike the latter).Of note: the output format disallows having field-less child nodes. The
node-types.ymlformat supports them, but YEAST itself will silently throw them away.There's a lot of code in this PR, but it's just a prototype, so don't feel compelled to review it in detail.
DO, however, look at the documentation, and also the changes to the existing
tree-sitterextractor infrastructure (the final two commits).