Skip to content

Shared: Add YEAST desugaring library#21797

Draft
tausbn wants to merge 8 commits intomainfrom
tausbn/yeast-desugaring-tool
Draft

Shared: Add YEAST desugaring library#21797
tausbn wants to merge 8 commits intomainfrom
tausbn/yeast-desugaring-tool

Conversation

@tausbn
Copy link
Copy Markdown
Contributor

@tausbn tausbn commented May 5, 2026

This PR adds a cleaned-up prototype of the YEAST library that was developed in a hackathon a few years ago.

YEAST is intended to be a lightweight layer for performing various kinds of AST cleanup and desugaring directly on the parse tree produced by a tree-sitter parser. Rewrite rules are specified declaratively, with a query language that approximates that of tree-sitter, though notably with no alternation or anchors (and also with greedy semantics -- no backtracking). I expect that this will be sufficient for most uses.

Output templates also look like tree-sitter trees, with embedded rust blocks for specifying code that calculates an AST based on the given input.

Because the output AST may be an entirely different language from the input AST, this PR also adds a new node-types.yml format -- a lightweight reformulation of node-types.json intended for human consumption (unlike the latter).

Of note: the output format disallows having field-less child nodes. The node-types.yml format supports them, but YEAST itself will silently throw them away.


There's a lot of code in this PR, but it's just a prototype, so don't feel compelled to review it in detail.

DO, however, look at the documentation, and also the changes to the existing tree-sitter extractor infrastructure (the final two commits).

@tausbn tausbn force-pushed the tausbn/yeast-desugaring-tool branch 3 times, most recently from 0cb1793 to 33485ca Compare May 5, 2026 12:50
@tausbn tausbn added no-change-note-required This PR does not need a change note labels May 5, 2026
@tausbn
Copy link
Copy Markdown
Contributor Author

tausbn commented May 5, 2026

CI failure seems to be unrelated to my changes.

@tausbn tausbn marked this pull request as ready for review May 5, 2026 14:38
Copilot AI review requested due to automatic review settings May 5, 2026 14:38
@tausbn tausbn requested review from a team as code owners May 5, 2026 14:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces YEAST, a Rust library for declarative AST cleanup/desugaring on top of tree-sitter parse trees, and integrates it into the shared tree-sitter extractor so extraction can optionally run on a rewritten AST and/or validate against an alternate output node-types schema.

Changes:

  • Add new shared/yeast + shared/yeast-macros crates implementing the rule/query/template system, plus tests and documentation.
  • Extend the shared tree-sitter extractor to optionally run YEAST rules and to support separate output_node_types for schema generation and TRAP validation.
  • Update Bazel vendored Rust deps to include YEAST dependencies (and bump tree-sitter, cc, etc.).
Show a summary per file
File Description
shared/yeast/tests/test.rs End-to-end tests for parsing, query matching, tree building, and desugaring rules.
shared/yeast/tests/node-types.yml Test output schema in the new YAML node-types format.
shared/yeast/src/visitor.rs Converts a tree-sitter Tree into a YEAST Ast.
shared/yeast/src/tree_builder.rs Fresh identifier generation support for templates/rules.
shared/yeast/src/schema.rs Schema representation for kinds/fields (language-derived or YAML-derived).
shared/yeast/src/range.rs Serde helpers for (de)serializing tree_sitter::Range.
shared/yeast/src/query.rs Query AST and matching engine (captures, repetition, named/unnamed semantics).
shared/yeast/src/print.rs Debug printer for walking a YEAST AST.
shared/yeast/src/node_types_yaml.rs YAML ↔ JSON node-types conversion + schema construction from YAML.
shared/yeast/src/lib.rs Core YEAST types (Ast, Node, Rule, Runner) and rewrite application logic.
shared/yeast/src/dump.rs Human-readable AST dump utility used by tests.
shared/yeast/src/cursor.rs Cursor trait abstraction used by traversal/extractor integration.
shared/yeast/src/captures.rs Capture storage and utilities (single/repeated/optional).
shared/yeast/src/build.rs BuildCtx used by tree!/trees! macros to build synthetic nodes.
shared/yeast/src/bin/node_types_yaml.rs CLI tool to convert YAML node-types ↔ JSON node-types.
shared/yeast/src/bin/main.rs Minimal YEAST CLI for parsing and printing.
shared/yeast/doc/yeast.md Main YEAST documentation (architecture, query/template language, integration).
shared/yeast/doc/node-types-yaml.md Specification for the YAML node-types format and CLI usage.
shared/yeast/Cargo.toml New yeast crate manifest and dependencies.
shared/yeast/Cargo.lock Lockfile for the standalone shared/yeast crate.
shared/yeast/BUILD.bazel Bazel target for the yeast Rust library.
shared/yeast/.gitkeep Placeholder file for directory tracking.
shared/yeast/.gitignore Ignores shared/yeast/target.
shared/yeast/.envrc Direnv config for local development.
shared/yeast-macros/src/parse.rs Proc-macro parsing and codegen for query!, tree!, trees!, rule!.
shared/yeast-macros/src/lib.rs Proc-macro entry points and user-facing macro docs.
shared/yeast-macros/Cargo.toml New yeast-macros proc-macro crate manifest.
shared/yeast-macros/BUILD.bazel Bazel target for the yeast-macros proc-macro crate.
shared/tree-sitter-extractor/tests/multiple_languages.rs Updates tests to include output_node_types in LanguageSpec.
shared/tree-sitter-extractor/tests/integration_test.rs Updates tests to include output_node_types in LanguageSpec.
shared/tree-sitter-extractor/src/generator/mod.rs Generator uses output_node_types when provided.
shared/tree-sitter-extractor/src/generator/language.rs Adds output_node_types to generator Language.
shared/tree-sitter-extractor/src/extractor/simple.rs Uses output_node_types for schema validation in the simple extractor.
shared/tree-sitter-extractor/src/extractor/mod.rs Adds optional YEAST desugaring path and AstNode abstraction.
shared/tree-sitter-extractor/Cargo.toml Adds a path dependency on shared/yeast.
shared/tree-sitter-extractor/BUILD.bazel Adds Bazel dep on //shared/yeast.
ruby/extractor/src/generator.rs Populates output_node_types: None for Ruby/Erb generator languages.
ruby/extractor/src/extractor.rs Updates shared extractor invocation with new extract(...) params.
ql/extractor/src/generator.rs Populates output_node_types: None for QL generator languages.
ql/extractor/src/extractor.rs Populates output_node_types: None for QL simple extractor languages.
MODULE.bazel Adds/upgrades vendored crates (notably tree-sitter and new deps).
misc/bazel/3rdparty/tree_sitter_extractors_deps/defs.bzl Adds YEAST + YEAST-macros crates and bumps vendored dependencies.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.zstd-sys-2.0.16+zstd.1.5.7.bazel Updates cc dependency reference.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-ruby-0.23.1.bazel Updates cc dependency reference.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-ql-0.23.1.bazel Updates cc dependency reference.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-python-0.23.6.bazel Adds vendoring/build definitions for tree-sitter-python.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-json-0.24.8.bazel Updates cc dependency reference.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-embedded-template-0.25.0.bazel Updates cc dependency reference.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.tree-sitter-0.26.8.bazel Bumps vendored tree-sitter to 0.26.8 and updates cc reference.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.iana-time-zone-haiku-0.1.2.bazel Updates cc dependency reference.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.find-msvc-tools-0.1.9.bazel Updates vendored find-msvc-tools version metadata.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.cc-1.2.61.bazel Updates vendored cc version metadata and dependencies.
misc/bazel/3rdparty/tree_sitter_extractors_deps/BUILD.bazel Adds aliases for serde_yaml and tree-sitter-python; bumps tree-sitter alias.
Cargo.toml Adds shared/yeast and shared/yeast-macros to the workspace.
Cargo.lock Workspace lock updates (adds yeast crates; bumps tree-sitter, cc, etc.).

Copilot's findings

Comments suppressed due to low confidence (1)

shared/yeast/src/node_types_yaml.rs:303

  • schema_from_yaml_with_language also registers YAML unnamed: tokens using schema.register_kind(name), which only affects the named kind map. If the YAML adds any unnamed tokens not present in the tree-sitter language, QueryNode::UnnamedNode lookups will still fail because unnamed_kind_ids is never updated.

This should use an unnamed-kind registration path (updating unnamed_kind_ids) rather than register_kind.

  • Files reviewed: 52/55 changed files
  • Comments generated: 6

Comment thread shared/yeast/src/node_types_yaml.rs
Comment thread shared/yeast/src/lib.rs Outdated
Comment thread shared/yeast/doc/yeast.md Outdated
Comment thread shared/yeast/src/print.rs Outdated
Comment thread shared/tree-sitter-extractor/src/extractor/mod.rs
Comment thread shared/yeast/src/schema.rs
@tausbn tausbn force-pushed the tausbn/yeast-desugaring-tool branch from 33485ca to fb1d844 Compare May 5, 2026 15:03
@tausbn tausbn requested a review from Copilot May 5, 2026 15:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 51/54 changed files
  • Comments generated: 3

Comment thread shared/tree-sitter-extractor/src/extractor/mod.rs Outdated
Comment thread shared/tree-sitter-extractor/src/extractor/mod.rs Outdated
Comment thread shared/yeast/.envrc Outdated
@tausbn tausbn force-pushed the tausbn/yeast-desugaring-tool branch from fb1d844 to cba9c08 Compare May 5, 2026 15:19
@tausbn tausbn marked this pull request as draft May 5, 2026 18:40
tausbn and others added 5 commits May 5, 2026 18:56
YEAST (YEAST Elaborates Abstract Syntax Trees) is a framework for
transforming tree-sitter parse trees before CodeQL extraction.

Core components:
- shared/yeast/ — Ast, Node, Schema, query matching engine, captures,
  FreshScope, BuildCtx
- shared/yeast-macros/ — proc macros: query!, tree!, trees!, rule!

The query language is inspired by tree-sitter queries:
  (assignment left: (_) @lhs right: (_) @rhs)

Templates support embedded Rust ({expr}), splicing ({..expr}),
computed literals (#{expr}), and fresh identifiers ($name).

The rule! macro combines query and transform:
  rule!((for pattern: (_) @pat ...) => (call receiver: {val} ...))

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Human-friendly YAML alternative to tree-sitter node-types.json with
three sections: supertypes, named, unnamed. Supports bidirectional
conversion and building Schema objects from YAML.

Includes CLI binary (node_types_yaml) and documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Produces indented text showing node kinds, named fields, and leaf
content. Unnamed tokens are hidden unless inside a named field.
Used by tests for readable assertions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
12 tests covering parsing, queries, tree building, desugaring rules,
cursor navigation, and the shorthand rule! syntax.

Tests use a custom output node-types.yml with named fields for all
children (parameter, stmt, index), loaded via
schema_from_yaml_with_language.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Covers architecture, query language, template language
(tree!/trees!/rule!),
capture semantics, fresh identifiers, and extractor integration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tausbn tausbn force-pushed the tausbn/yeast-desugaring-tool branch from cba9c08 to ed1ba0a Compare May 5, 2026 18:57
@tausbn tausbn force-pushed the tausbn/yeast-desugaring-tool branch from ed1ba0a to 1b8f451 Compare May 5, 2026 21:24
@tausbn tausbn requested a review from Copilot May 5, 2026 21:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 49/50 changed files
  • Comments generated: 2

Comment thread shared/tree-sitter-extractor/src/extractor/simple.rs Outdated
Comment thread shared/tree-sitter-extractor/src/extractor/mod.rs
tausbn and others added 3 commits May 5, 2026 21:48
extract() gains a rules parameter. When empty, uses tree-sitter native
traversal (no behavior change). When non-empty, runs yeast desugaring
and extracts via traverse_yeast.

Adds AstNode trait abstracting over tree_sitter::Node and yeast::Node,
with minimal changes to existing Visitor methods (Node -> &N in 6
signatures).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Language and LanguageSpec gain optional output_node_types field.
When set, the generator produces dbscheme/QL from the output types
and the extractor validates TRAP against them.

All existing extractors pass None (no behavior change).
Ruby extract() calls gain vec![] for the new rules parameter.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add BUILD.bazel files for the yeast and yeast-macros crates, register
them as dependencies of the shared tree-sitter extractor, and refresh
the vendored crate dependencies via update_tree_sitter_extractors_deps.sh.
@tausbn tausbn force-pushed the tausbn/yeast-desugaring-tool branch from 1b8f451 to e612319 Compare May 5, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants