Skip to content

feat: Run leading USE WAREHOUSE / SET / ALTER SESSION on the same connection as the main query#96

Draft
robertlacok wants to merge 5 commits intomainfrom
robert/sal-51-reuse-engine-per-cell
Draft

feat: Run leading USE WAREHOUSE / SET / ALTER SESSION on the same connection as the main query#96
robertlacok wants to merge 5 commits intomainfrom
robert/sal-51-reuse-engine-per-cell

Conversation

@robertlacok
Copy link
Copy Markdown
Contributor

@robertlacok robertlacok commented Apr 23, 2026

Summary

A SQL block like

USE WAREHOUSE abc;
SELECT 1

now runs the SELECT on warehouse abc. The toolkit splits the leading session-setup statements off the cell SQL and runs them on the same physical DBAPI connection as the main query, inside the same engine.begin() block.

No callers need to change — _dntk.execute_sql("USE WAREHOUSE abc; SELECT 1", env) just works.

What's in here

  1. A parser (deepnote_toolkit/sql/setup_statement_parser.py) that strips a strict leading run of USE WAREHOUSE, USE DATABASE, USE SCHEMA, USE ROLE, USE SECONDARY ROLES, SET, or ALTER SESSION statements off the cell SQL. Respects line/block comments, single-quoted strings, double-quoted identifiers, and Snowflake $$-quoted strings. Stops at the first non-setup statement. Passes through unchanged if the cell is setup-only or has no closing ; on the prefix.
  2. A setup_statements: list[str] kwarg on execute_sql and execute_sql_with_connection_json. When provided, those statements run via connection.exec_driver_sql on the same DBAPI connection as the main query. Acts as the explicit escape hatch for the templated-value case below.
  3. Wiring: execute_sql_with_connection_json calls the parser after compile_sql_query. Parsed setup statements + the explicit setup_statements kwarg are concatenated and run before the main query inside engine.begin().

Why parse, not just expose the kwarg

Without parsing, every caller (the SQL block UI, the Python SDK, agents that emit SQL) would have to know to split USE statements off and pass them as setup_statements. Putting the parser in the toolkit makes the fix work end-to-end at the API surface, in one place, with one test suite.

Why parse after Jinja, not before

The scanner sees rendered SQL, no {{ }} / {% %} to reason about. Catches both the dynamic case ({{ render_setup() }} rendering to USE WAREHOUSE x) and Jinja conditionals that emit semicolons. The one trade-off is templated values inside a setup statement (USE WAREHOUSE {{ env }}) render to a bind placeholder that exec_driver_sql doesn't bind and that Snowflake's USE WAREHOUSE doesn't accept anyway. The parser detects this and raises SetupStatementError pointing the caller at the explicit setup_statements= kwarg as the escape hatch. Detection respects quoted regions so a literal %(x)s inside a string doesn't false-positive.

Test plan

  • tests/unit/test_setup_statement_parser.py — 26 tests covering all parser cases plus placeholder detection across pyformat / format / named / numeric / qmark styles.
  • TestSetupStatementsAgainstRealSQLite — a real sqlite:///:memory: engine. The setup statement creates a TEMP TABLE, the main query selects from it. SQLite TEMP tables are connection-scoped — the SELECT only succeeds if setup and main share a physical DBAPI connection. This is the substantive end-to-end check; the previous over-mocked tests that stubbed pandas.read_sql_query are gone.
  • TestSetupStatementParserIntegration — checks that the parser's output (parsed setup + remaining query) flows through to _execute_sql_on_engine, that explicit setup_statements= is appended after parsed ones, and that USE WAREHOUSE {{ env }} raises with a message that names setup_statements=.
  • poetry run pytest tests/unit/ — 784 passed (excludes pre-existing pyspark collection errors in test_chart.py / test_ocelots.py).
  • poetry run black and poetry run flake8 clean.

Usage

# Just works.
df = _dntk.execute_sql("USE WAREHOUSE abc; SELECT 123", "SQL_INTEG")

# Explicit escape hatch for dynamic warehouses.
df = _dntk.execute_sql(
    "SELECT 123",
    "SQL_INTEG",
    setup_statements=[f"USE WAREHOUSE {env}"],
)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Support for setup SQL statements that run on the same connection before the main query.
    • Added a parser to extract and validate leading session-setup statements from queries.
  • Bug Fixes
    • If a setup statement fails, the main query is not executed.
  • Tests
    • Added unit and end-to-end tests for extraction, execution order, failure handling, and parameter behavior.

…-51)

Cache the SQLAlchemy engine per IPython cell execution so that two
back-to-back `_dntk.execute_sql*` calls in the same generated cell share
the same physical DBAPI connection. This lets Snowflake session state
set by `USE WAREHOUSE`, `USE ROLE`, `SET ...` etc. persist onto the next
query, which unblocks supporting `USE WAREHOUSE abc; SELECT 123` in
Snowflake SQL blocks.

Cached engines are created with `pool_size=1, max_overflow=0` so
`engine.begin()` always checks out the same connection. Disposed via a
`post_run_cell` IPython hook. Caching is skipped for SSH-tunneled
engines, user-supplied custom pool config, and when no IPython shell is
available (script/CLI use) — all of those fall back to the previous
create-then-dispose-per-call behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robertlacok robertlacok requested a review from a team as a code owner April 23, 2026 11:45
@robertlacok robertlacok requested a review from m1so April 23, 2026 11:45
@linear
Copy link
Copy Markdown

linear Bot commented Apr 23, 2026

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

This PR adds extraction and execution of leading session-setup statements. A new parser extracts allowlisted setup statements from the start of a rendered query. Extracted statements are combined with any user-supplied setup_statements and threaded through public APIs (execute_sql_with_connection_json, execute_sql) into internal execution paths. Setup statements are executed on the same connection before the main query (DuckDB via execute_duckdb_sql per statement; SQLAlchemy engines via exec_driver_sql on the DBAPI connection). If any setup statement fails, the main query is not executed. Tests added cover parsing, propagation, ordering, and failure behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Parser as SetupStatementParser
    participant Service as SQLExecutor
    participant Engine as DBEngine
    Client->>Service: submit SQL (+ optional setup_statements)
    Service->>Parser: extract_setup_statements(rendered_query)
    Parser-->>Service: (extracted_setup_statements, remaining_query)
    Service->>Service: combine extracted + provided -> final_setup_statements
    Service->>Engine: open connection
    alt SQLAlchemy-backed engine
        Service->>Engine: exec_driver_sql(setup stmt 1)
        Engine-->>Service: OK / Error
        Service->>Engine: exec_driver_sql(setup stmt N)
        Engine-->>Service: OK / Error
        Service->>Engine: pandas.read_sql_query(main query)
        Engine-->>Service: result rows
    else DuckDB path
        Service->>Engine: execute_duckdb_sql(setup stmt 1)
        Engine-->>Service: OK / Error
        Service->>Engine: execute_duckdb_sql(setup stmt N)
        Engine-->>Service: OK / Error
        Service->>Engine: execute_duckdb_sql(main query)
        Engine-->>Service: result rows
    end
    Service-->>Client: results or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs ⚠️ Warning Feature has inline docstrings but no public documentation updates in docs/ or README.md. External repo docs cannot be verified from this sandbox. Update docs/ with setup_statements guide, add usage examples to README.md, and verify updates in deepnote/deepnote OSS repo and deepnote/deepnote-internal roadmap.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly identifies the main change: parsing and executing leading session-setup statements (USE WAREHOUSE/SET/ALTER SESSION) on the same connection as the main query.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

📦 Python package built successfully!

  • Version: 2.2.1.dev6+78d8bd3
  • Wheel: deepnote_toolkit-2.2.1.dev6+78d8bd3-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.2.1.dev6%2B78d8bd3/deepnote_toolkit-2.2.1.dev6%2B78d8bd3-py3-none-any.whl"

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 85.81081% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.75%. Comparing base (f2ac75a) to head (c28c7d0).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
deepnote_toolkit/sql/setup_statement_parser.py 86.13% 11 Missing and 8 partials ⚠️
deepnote_toolkit/sql/sql_execution.py 81.81% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #96      +/-   ##
==========================================
+ Coverage   74.32%   74.75%   +0.43%     
==========================================
  Files          94       95       +1     
  Lines        5535     5681     +146     
  Branches      824      866      +42     
==========================================
+ Hits         4114     4247     +133     
- Misses       1155     1158       +3     
- Partials      266      276      +10     
Flag Coverage Δ
combined 74.75% <85.81%> (+0.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robertlacok robertlacok marked this pull request as draft April 23, 2026 11:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepnote_toolkit/ipython_utils.py (1)

6-34: 🛠️ Refactor suggestion | 🟠 Major

Type the hook registry and callback explicitly.

Lines 6-18 add a new hook API, but both _registered_post_run_cell_callbacks and callback are untyped. That weakens static checking around event registration.

Suggested fix
+from collections.abc import Callable
+from typing import Any
+
-_registered_post_run_cell_callbacks: set = set()
+_registered_post_run_cell_callbacks: set[Callable[..., Any]] = set()
 
-
-def register_post_run_cell_hook(callback) -> bool:
+def register_post_run_cell_hook(callback: Callable[..., Any]) -> bool:

As per coding guidelines, "Use explicit type hints for function parameters and return values" and "Use type hints consistently".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/ipython_utils.py` around lines 6 - 34, Add explicit type
hints for the hook registry and callback: import typing symbols (Set, Callable,
Any) and change the module-level variable to
_registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update the
register_post_run_cell_hook signature to def
register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static
type checkers know the stored callable shape when registering with ip.events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 465-485: The cache key in _compute_engine_cache_key currently uses
only ssh_options, pool-related params, integration_id, or url which ignores
other engine-defining options; update it to incorporate a deterministic
serialization of the relevant connection params (e.g., the params dict and any
connect_args/auth headers present in sql_alchemy_dict) into the returned key
while preserving the existing early returns for ssh_options-enabled and explicit
pool config; specifically, when integration_id exists include it plus a stable
encoding (sorted/JSON) of params/connect_args in the key (and if no
integration_id, include the URL plus the same stable encoding) so that differing
params produce distinct cache keys.

---

Outside diff comments:
In `@deepnote_toolkit/ipython_utils.py`:
- Around line 6-34: Add explicit type hints for the hook registry and callback:
import typing symbols (Set, Callable, Any) and change the module-level variable
to _registered_post_run_cell_callbacks: Set[Callable[..., Any]] = set(); update
the register_post_run_cell_hook signature to def
register_post_run_cell_hook(callback: Callable[..., Any]) -> bool: so static
type checkers know the stored callable shape when registering with ip.events.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6e1c41b-ba25-4588-aa46-4a16f4340388

📥 Commits

Reviewing files that changed from the base of the PR and between f2ac75a and c716237.

📒 Files selected for processing (4)
  • deepnote_toolkit/ipython_utils.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_ipython_utils.py
  • tests/unit/test_sql_execution_internal.py

Comment thread deepnote_toolkit/sql/sql_execution.py Outdated
Comment on lines +465 to +485
def _compute_engine_cache_key(sql_alchemy_dict) -> Optional[str]:
"""Stable key identifying the engine for *sql_alchemy_dict*, or None when
this connection should not be cached.

SSH-tunneled engines can't be cached because the tunnel only lives inside
`_create_sql_ssh_uri`'s context manager. Connections that already specify
a custom pool config in user `params` are also skipped to avoid clashing
with the `pool_size=1` we set on cached engines.
"""
if sql_alchemy_dict.get("ssh_options", {}).get("enabled"):
return None

params = sql_alchemy_dict.get("params") or {}
if "pool_size" in params or "max_overflow" in params or "poolclass" in params:
return None

integration_id = sql_alchemy_dict.get("integration_id")
if integration_id:
return f"integration:{integration_id}"

return f"url:{sql_alchemy_dict['url']}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include engine-defining params in the cache key.

Line 481 collapses the key to integration_id, and Line 485 falls back to just the raw URL. create_engine() also depends on params, so two calls in one cell can incorrectly share an engine even when connect_args, auth headers, or other connection options differ. That can silently run the second query with the wrong connection/session configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 465 - 485, The cache key
in _compute_engine_cache_key currently uses only ssh_options, pool-related
params, integration_id, or url which ignores other engine-defining options;
update it to incorporate a deterministic serialization of the relevant
connection params (e.g., the params dict and any connect_args/auth headers
present in sql_alchemy_dict) into the returned key while preserving the existing
early returns for ssh_options-enabled and explicit pool config; specifically,
when integration_id exists include it plus a stable encoding (sorted/JSON) of
params/connect_args in the key (and if no integration_id, include the URL plus
the same stable encoding) so that differing params produce distinct cache keys.

@deepnote-bot
Copy link
Copy Markdown

deepnote-bot commented Apr 23, 2026

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-96
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-05-06 14:11:56 (UTC)
📜 Deployed commit 2d78b182ec1c2ab38f49e26425b9fd2c7a5d27be
🛠️ Toolkit version 78d8bd3

Drops the implicit cache + IPython post_run_cell hook in favor of an
explicit `_dntk.sql_session()` context manager. Inside the with-block
each unique connection (keyed by integration_id, or URL when not
present) opens its SSH tunnel and SQLAlchemy engine once and reuses
both for every `execute_sql*` call inside the block. On exit the
session disposes the engine and closes the tunnel together.

This makes SSH tunnels work with SAL-51 reuse — the previous cache
had to skip SSH because the tunnel's lifetime was bound to a single
call by `_create_sql_ssh_uri`'s context manager. With the session
owning both, the tunnel's lifetime now matches the engine's.

Other gains:
- No IPython coupling. Lifetime is whatever the caller (deepnote-
  internal codegen) wraps in the `with` block.
- No "magic" cache key vs. live state divergence — outside a session
  behavior is exactly the previous per-call create+dispose.
- Nested `sql_session()` blocks no-op on the inner block so the outer
  retains ownership.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robertlacok robertlacok changed the title feat: Reuse SQLAlchemy engine across execute_sql calls in a cell (SAL-51) feat: Add _dntk.sql_session() to share engine + SSH tunnel across calls (SAL-51) Apr 23, 2026
@robertlacok robertlacok changed the title feat: Add _dntk.sql_session() to share engine + SSH tunnel across calls (SAL-51) feat: Add sql_session() to share engine + SSH tunnel across calls Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
deepnote_toolkit/sql/sql_execution.py (1)

475-488: ⚠️ Potential issue | 🟠 Major

Expand the session cache key beyond integration_id / URL.

_session_resource_key() still aliases every call for the same integration or raw URL. Inside one sql_session(), that lets a later call reuse an engine and SSH tunnel created with different params, auth headers, or SSH settings.

Possible fix
 def _session_resource_key(sql_alchemy_dict) -> Optional[str]:
     params = sql_alchemy_dict.get("params") or {}
     if "pool_size" in params or "max_overflow" in params or "poolclass" in params:
         return None
-
-    integration_id = sql_alchemy_dict.get("integration_id")
-    if integration_id:
-        return f"integration:{integration_id}"
-
-    return f"url:{sql_alchemy_dict['url']}"
+    key_payload = {
+        "integration_id": sql_alchemy_dict.get("integration_id"),
+        "url": sql_alchemy_dict["url"],
+        "params": params,
+        "ssh_options": sql_alchemy_dict.get("ssh_options"),
+    }
+    return json.dumps(key_payload, sort_keys=True, default=str)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 475 - 488,
_session_resource_key currently returns the same key for all connections sharing
an integration_id or URL, allowing reuse of engines/tunnels with different
connection-specific settings; change _session_resource_key(sql_alchemy_dict) to
include a stable fingerprint of the connection-specific attributes (e.g.,
serialize and include params, auth headers, and SSH settings such as
sql_alchemy_dict.get("params"), sql_alchemy_dict.get("headers") or auth token
field, and sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing
the key (but keep the existing early return None for explicit
pool_size/max_overflow/poolclass). Use a deterministic serialization or short
hash of those fields to build the returned string (e.g.,
f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so
engines/tunnels are only reused when all relevant settings match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 365-399: Change _open_ssh_tunnel and _close_ssh_tunnel to have
explicit type hints and raise a specific exception type instead of a bare
Exception: add a small custom exception class (e.g., SSHTunnelError or
SSHTunnelSetupError) and use it when PRIVATE_SSH_KEY_BLOB is missing or
create_ssh_tunnel fails; annotate _open_ssh_tunnel signature to return
Tuple[Any, URL] (or the concrete tunnel type if available) and accept
sql_alchemy_dict: Mapping[str, Any], and annotate _close_ssh_tunnel(server:
Optional[Any]) -> None; update docstrings to state ownership/closing contract
and failure modes; keep references to the existing symbols (_open_ssh_tunnel,
_close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env, make_url, URL) so reviewers
can find and validate the changes.
- Around line 561-572: The call to create_engine currently passes two kwarg
dicts separately which can cause a TypeError if sql_alchemy_dict["params"]
contains overlapping keys like pool_pre_ping; before calling create_engine (in
the block using suppress_third_party_deprecation_warnings and the in_session
logic), merge sql_alchemy_dict["params"] and extra_engine_params into a single
dict (e.g., copy sql_alchemy_dict["params"] then use setdefault or update
carefully) so extra_engine_params values only apply when keys are absent, then
call create_engine(url, **merged_params) instead of passing two dicts.

In `@tests/unit/test_sql_execution_internal.py`:
- Around line 380-386: Update the helper _make_sql_alchemy_dict to add explicit
type hints: annotate parameters url and params as Optional[str] and
Optional[Dict[str, Any]] (or appropriate param dict type) respectively, and
declare the function return type as Dict[str, Any] (or a more specific mapping
type used in tests). Adjust the function signature and any necessary imports
(from typing import Optional, Dict, Any) so the function reads e.g. def
_make_sql_alchemy_dict(integration_id: str = "integration_a", url: Optional[str]
= None, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any].

---

Duplicate comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 475-488: _session_resource_key currently returns the same key for
all connections sharing an integration_id or URL, allowing reuse of
engines/tunnels with different connection-specific settings; change
_session_resource_key(sql_alchemy_dict) to include a stable fingerprint of the
connection-specific attributes (e.g., serialize and include params, auth
headers, and SSH settings such as sql_alchemy_dict.get("params"),
sql_alchemy_dict.get("headers") or auth token field, and
sql_alchemy_dict.get("ssh_config")/ssh-related keys) when computing the key (but
keep the existing early return None for explicit
pool_size/max_overflow/poolclass). Use a deterministic serialization or short
hash of those fields to build the returned string (e.g.,
f"integration:{integration_id}:{hash}" or f"url:{url}:{hash}") so
engines/tunnels are only reused when all relevant settings match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22352dcb-ac84-4524-b98f-c797b6e3f455

📥 Commits

Reviewing files that changed from the base of the PR and between c716237 and 6762926.

📒 Files selected for processing (3)
  • deepnote_toolkit/__init__.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_sql_execution_internal.py

Comment thread deepnote_toolkit/sql/sql_execution.py Outdated
Comment on lines +365 to +399
def _open_ssh_tunnel(sql_alchemy_dict) -> tuple[Any, Any]:
"""Open an SSH tunnel for *sql_alchemy_dict* and return ``(server, url)``.

Caller is responsible for eventually closing the returned server with
``_close_ssh_tunnel``.
"""
base64_encoded_key = dnenv.get_env("PRIVATE_SSH_KEY_BLOB")
if not base64_encoded_key:
raise Exception(
"The private key needed to establish the SSH connection is missing. Please try again or contact support."
)
original_url = make_url(sql_alchemy_dict["url"])
server = create_ssh_tunnel(
ssh_host=sql_alchemy_dict["ssh_options"]["host"],
ssh_port=int(sql_alchemy_dict["ssh_options"]["port"]),
ssh_user=sql_alchemy_dict["ssh_options"]["user"],
remote_host=original_url.host,
remote_port=int(original_url.port),
private_key=base64.b64decode(base64_encoded_key).decode("utf-8"),
)
url = URL.create(
drivername=original_url.drivername,
username=original_url.username,
password=original_url.password,
host=server.local_bind_host,
port=server.local_bind_port,
database=original_url.database,
query=original_url.query,
)
return server, url


def _close_ssh_tunnel(server) -> None:
if server is not None and server.is_active:
server.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Type the new SSH/session helpers and use a specific exception.

These helpers are new control-path code, but their signatures are still untyped and _open_ssh_tunnel() raises a bare Exception. Please make the contract explicit so callers can reason about ownership and failures.

As per coding guidelines, "Use explicit type hints for function parameters and return values", "Use appropriate exception types for error handling", and "Document functions and classes with docstrings".

Also applies to: 475-581

🧰 Tools
🪛 Ruff (0.15.10)

[warning] 373-375: Create your own exception

(TRY002)


[warning] 373-375: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 365 - 399, Change
_open_ssh_tunnel and _close_ssh_tunnel to have explicit type hints and raise a
specific exception type instead of a bare Exception: add a small custom
exception class (e.g., SSHTunnelError or SSHTunnelSetupError) and use it when
PRIVATE_SSH_KEY_BLOB is missing or create_ssh_tunnel fails; annotate
_open_ssh_tunnel signature to return Tuple[Any, URL] (or the concrete tunnel
type if available) and accept sql_alchemy_dict: Mapping[str, Any], and annotate
_close_ssh_tunnel(server: Optional[Any]) -> None; update docstrings to state
ownership/closing contract and failure modes; keep references to the existing
symbols (_open_ssh_tunnel, _close_ssh_tunnel, create_ssh_tunnel, dnenv.get_env,
make_url, URL) so reviewers can find and validate the changes.

Comment thread deepnote_toolkit/sql/sql_execution.py Outdated
Comment on lines +561 to +572
extra_engine_params: dict[str, Any] = {"pool_pre_ping": True}
if in_session:
# See sql_session() docstring for why pool_size=1 is required to make
# session state (USE WAREHOUSE, ...) persist across calls.
extra_engine_params["pool_size"] = 1
extra_engine_params["max_overflow"] = 0

try:
with suppress_third_party_deprecation_warnings():
engine = create_engine(
url, **sql_alchemy_dict["params"], **extra_engine_params
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Python, what happens when the same keyword argument is supplied by two **kwargs expansions in a single function call?

💡 Result:

In Python (3.5+), supplying the same keyword argument from two kwargs expansions in a single function call, such as f(d1, d2) where d1 and d2 both contain the same key, raises TypeError: f got multiple values for keyword argument 'keyname'. This behavior is specified in PEP 448, which introduced multiple unpacking: "Currently, if an argument is given multiple times — such as a positional argument given both positionally and by keyword — a TypeError is raised. This remains true for duplicate arguments provided through multiple unpackings, e.g. f({'x': 2}, {'x': 3}), except that the error will be detected at runtime." Python merges the kwargs from left to right, but detects duplicates during argument processing in getargs.c and raises the error. This distinguishes from dictionary merging {d1, d2}, which keeps the last value without error. Example: def f(kwargs): pass d1 = {'a': 1} d2 = {'a': 2} f(d1, d2) # TypeError: f got multiple values for keyword argument 'a'

Citations:


🏁 Script executed:

cd deepnote_toolkit && find . -type f -name "sql_execution.py" | head -1

Repository: deepnote/deepnote-toolkit

Length of output: 91


🏁 Script executed:

# Check the exact file and read lines around 561-572
wc -l deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 110


🏁 Script executed:

# Read the relevant section
sed -n '555,580p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 973


🏁 Script executed:

# Search for sql_alchemy_dict definition and usage
rg "sql_alchemy_dict" deepnote_toolkit/sql/sql_execution.py -B 3 -A 3

Repository: deepnote/deepnote-toolkit

Length of output: 8322


🏁 Script executed:

# Check where pool_pre_ping is used in the codebase
rg "pool_pre_ping" deepnote_toolkit/

Repository: deepnote/deepnote-toolkit

Length of output: 172


🏁 Script executed:

# Check for tests that might show if pool_pre_ping is passed in params
find . -name "*test*sql*" -type f | head -10

Repository: deepnote/deepnote-toolkit

Length of output: 349


🏁 Script executed:

# Search for any configuration or default params that might include pool_pre_ping
rg "pool_pre_ping\|params.*=" deepnote_toolkit/sql/ -B 2 -A 2

Repository: deepnote/deepnote-toolkit

Length of output: 51


🏁 Script executed:

# Check the _session_resource_key function more carefully to understand the pattern
sed -n '730,750p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 910


🏁 Script executed:

# Search for the _session_resource_key function definition
rg "_session_resource_key" deepnote_toolkit/sql/sql_execution.py -A 15

Repository: deepnote/deepnote-toolkit

Length of output: 1266


🏁 Script executed:

# Look for test files that might test the engine creation with params
grep -l "pool_pre_ping\|create_engine\|_acquire_engine" tests/unit/test_sql_execution*.py

Repository: deepnote/deepnote-toolkit

Length of output: 143


🏁 Script executed:

# Check test_sql_execution.py for relevant tests
head -100 tests/unit/test_sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 3193


🏁 Script executed:

# Check if there are any tests showing pool_pre_ping in params
rg "pool_pre_ping" tests/

Repository: deepnote/deepnote-toolkit

Length of output: 238


🏁 Script executed:

# Search for any examples of params being passed in tests
rg "params.*:" tests/unit/test_sql_execution.py | head -20

Repository: deepnote/deepnote-toolkit

Length of output: 1661


🏁 Script executed:

# Look at the full context of _acquire_engine to understand the flow
sed -n '760,800p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 1689


🏁 Script executed:

# Look at the test_sql_execution_internal.py file to see the pool_pre_ping tests
grep -B 10 -A 10 "pool_pre_ping" tests/unit/test_sql_execution_internal.py

Repository: deepnote/deepnote-toolkit

Length of output: 1726


🏁 Script executed:

# Check if sql_alchemy_dict["params"] could ever contain pool_pre_ping by searching all uses
rg "\"params\".*pool_pre_ping|pool_pre_ping.*\"params\"" deepnote_toolkit/

Repository: deepnote/deepnote-toolkit

Length of output: 51


🏁 Script executed:

# Look at how sql_alchemy_dict["params"] is built/populated
sed -n '420,480p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 2670


🏁 Script executed:

# Check the _make_sql_alchemy_dict function used in tests to see what params typically look like
rg "_make_sql_alchemy_dict" tests/unit/test_sql_execution_internal.py -A 20

Repository: deepnote/deepnote-toolkit

Length of output: 6504


🏁 Script executed:

# Look at the execute_sql_with_connection_json to see how sql_alchemy_dict is created in real usage
sed -n '200,250p' deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 1598


🏁 Script executed:

# Check if there's any documentation or comments about where pool_pre_ping could come from
rg -B 3 -A 3 "pool_pre_ping|extra_engine_params" deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 691


Merge engine kwargs before calling create_engine().

If sql_alchemy_dict["params"] contains pool_pre_ping, passing create_engine(url, **params, **extra_engine_params) raises TypeError due to duplicate kwargs. Merge to a single dict first and use setdefault() to avoid conflicts.

Suggested fix
-    extra_engine_params: dict[str, Any] = {"pool_pre_ping": True}
+    engine_params: dict[str, Any] = dict(sql_alchemy_dict["params"])
+    engine_params.setdefault("pool_pre_ping", True)
     if in_session:
         # See sql_session() docstring for why pool_size=1 is required to make
         # session state (USE WAREHOUSE, ...) persist across calls.
-        extra_engine_params["pool_size"] = 1
-        extra_engine_params["max_overflow"] = 0
+        engine_params["pool_size"] = 1
+        engine_params["max_overflow"] = 0

     try:
         with suppress_third_party_deprecation_warnings():
-            engine = create_engine(
-                url, **sql_alchemy_dict["params"], **extra_engine_params
-            )
+            engine = create_engine(url, **engine_params)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 561 - 572, The call to
create_engine currently passes two kwarg dicts separately which can cause a
TypeError if sql_alchemy_dict["params"] contains overlapping keys like
pool_pre_ping; before calling create_engine (in the block using
suppress_third_party_deprecation_warnings and the in_session logic), merge
sql_alchemy_dict["params"] and extra_engine_params into a single dict (e.g.,
copy sql_alchemy_dict["params"] then use setdefault or update carefully) so
extra_engine_params values only apply when keys are absent, then call
create_engine(url, **merged_params) instead of passing two dicts.

Comment thread tests/unit/test_sql_execution_internal.py Outdated
Drops the sql_session() context manager and the engine lifecycle
changes it required (per-cell registry, pool_size=1, SSH tunnel
lifetime extension). Restores the original per-call create_engine +
dispose flow.

In its place, execute_sql / execute_sql_with_connection_json grow an
optional setup_statements: list[str] parameter. When provided, those
statements are run via connection.exec_driver_sql on the same DBAPI
connection inside the same engine.begin() block as the main query, so
session state set by `USE WAREHOUSE`, `USE ROLE`, `SET ...`, etc. is
in effect for the main query without changing the engine's lifetime.

Single call, single engine, single tunnel, single connection — no
shared mutable state across calls and no special-casing for SSH or
custom pool config.

DuckDB's process-wide singleton means setup statements naturally
persist there too; the duckdb branch executes them on the singleton
before the main query for API parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robertlacok robertlacok changed the title feat: Add sql_session() to share engine + SSH tunnel across calls feat: Add setup_statements parameter to execute_sql for session setup Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepnote_toolkit/sql/sql_execution.py (1)

419-447: ⚠️ Potential issue | 🟠 Major

setup_statements are not part of cache semantics.

Line 435 enables cache even when setup_statements changes session context (e.g., warehouse/role/schema). That can return stale/wrong data from a different setup context.

Suggested fix (safe default: disable cache when setup statements are present)
 def _execute_sql_with_caching(
@@
-    sql_caching_enabled = (
-        sql_cache_mode != "cache_disabled" and return_variable_type == "dataframe"
-    )
+    has_setup_statements = bool(setup_statements)
+    sql_caching_enabled = (
+        sql_cache_mode != "cache_disabled"
+        and return_variable_type == "dataframe"
+        and not has_setup_statements
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 419 - 447, The current
cache path (controlled by sql_caching_enabled / can_get_sql_cache and
get_sql_cache) ignores setup_statements, which can change session context and
produce stale/wrong results; update the logic that computes sql_caching_enabled
/ can_get_sql_cache (or add an explicit guard before calling get_sql_cache) to
disable caching when setup_statements is non-empty (i.e., treat any truthy
setup_statements as a reason to skip get_sql_cache), ensuring the existing
duckdb short-circuit (requires_duckdb) and return_variable_type checks remain
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Line 631: In _execute_sql_on_engine, validate setup_statements before
iterating: ensure it's either None or an iterable of statements (not a plain
string). Add a guard that detects isinstance(setup_statements, str) and either
wrap it as a single-item list or raise a TypeError, and also validate items are
strings (or convert them) before the loop that executes each statement to avoid
iterating over characters and producing misleading SQL errors.
- Around line 105-106: Several functions in this module are missing parameter
and return type annotations; replace any parameter defaults like
setup_statements=None with an explicit Optional type (e.g., setup_statements:
Optional[Sequence[str]] = None) and add explicit return type annotations for
each function that currently lacks them. Import typing primitives you need (from
typing import Optional, Sequence, Iterable, Any) and update the function
signatures (including the function that declares setup_statements) to use
Optional[T] rather than T = None, and add concrete return types (e.g., -> None,
-> int, -> List[Any], or -> Iterable[str] as appropriate) to the five
un-annotated functions in this module so all parameters and returns are fully
typed. Ensure the parameter name setup_statements is changed to
setup_statements: Optional[<appropriate type>] = None and that each function
signature and its return type reflect the actual values they produce.

---

Outside diff comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 419-447: The current cache path (controlled by sql_caching_enabled
/ can_get_sql_cache and get_sql_cache) ignores setup_statements, which can
change session context and produce stale/wrong results; update the logic that
computes sql_caching_enabled / can_get_sql_cache (or add an explicit guard
before calling get_sql_cache) to disable caching when setup_statements is
non-empty (i.e., treat any truthy setup_statements as a reason to skip
get_sql_cache), ensuring the existing duckdb short-circuit (requires_duckdb) and
return_variable_type checks remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1e59874a-01db-41f1-9708-1368007f2686

📥 Commits

Reviewing files that changed from the base of the PR and between 6762926 and 029e245.

📒 Files selected for processing (3)
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_sql_execution.py
  • tests/unit/test_sql_execution_internal.py

Comment on lines +105 to 106
setup_statements=None,
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l deepnote_toolkit/sql/sql_execution.py

Repository: deepnote/deepnote-toolkit

Length of output: 110


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '100,115p'

Repository: deepnote/deepnote-toolkit

Length of output: 976


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '235,250p'

Repository: deepnote/deepnote-toolkit

Length of output: 868


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '414,430p'

Repository: deepnote/deepnote-toolkit

Length of output: 973


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '492,505p'

Repository: deepnote/deepnote-toolkit

Length of output: 581


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,640p'

Repository: deepnote/deepnote-toolkit

Length of output: 957


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '95,110p'

Repository: deepnote/deepnote-toolkit

Length of output: 682


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '230,250p'

Repository: deepnote/deepnote-toolkit

Length of output: 979


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '408,435p'

Repository: deepnote/deepnote-toolkit

Length of output: 1272


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '485,510p'

Repository: deepnote/deepnote-toolkit

Length of output: 928


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '625,660p'

Repository: deepnote/deepnote-toolkit

Length of output: 1882


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '99,107p'

Repository: deepnote/deepnote-toolkit

Length of output: 344


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '234,242p'

Repository: deepnote/deepnote-toolkit

Length of output: 331


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '410,421p'

Repository: deepnote/deepnote-toolkit

Length of output: 453


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '490,499p'

Repository: deepnote/deepnote-toolkit

Length of output: 395


🏁 Script executed:

cat -n deepnote_toolkit/sql/sql_execution.py | sed -n '631,647p'

Repository: deepnote/deepnote-toolkit

Length of output: 1076


Add type hints to all function parameters and return annotations.

The five functions at lines 99, 234, 410, 490, and 631 lack any parameter or return type annotations. Replace setup_statements=None with setup_statements: Optional[...] and add explicit return type annotations to each function. Per coding guidelines, use Optional[T] instead of T = None, and include return types for all functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 105 - 106, Several
functions in this module are missing parameter and return type annotations;
replace any parameter defaults like setup_statements=None with an explicit
Optional type (e.g., setup_statements: Optional[Sequence[str]] = None) and add
explicit return type annotations for each function that currently lacks them.
Import typing primitives you need (from typing import Optional, Sequence,
Iterable, Any) and update the function signatures (including the function that
declares setup_statements) to use Optional[T] rather than T = None, and add
concrete return types (e.g., -> None, -> int, -> List[Any], or -> Iterable[str]
as appropriate) to the five un-annotated functions in this module so all
parameters and returns are fully typed. Ensure the parameter name
setup_statements is changed to setup_statements: Optional[<appropriate type>] =
None and that each function signature and its return type reflect the actual
values they produce.



def _execute_sql_on_engine(engine, query, bind_params):
def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate setup_statements type before iterating.

Line 669 accepts any iterable. If a caller passes a string, it runs one-character “statements” and fails with misleading SQL errors.

Suggested fix
 def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):
@@
+    if setup_statements is None:
+        normalized_setup_statements: list[str] = []
+    elif isinstance(setup_statements, list) and all(
+        isinstance(stmt, str) for stmt in setup_statements
+    ):
+        normalized_setup_statements = setup_statements
+    else:
+        raise TypeError("setup_statements must be a list[str] or None")
+
     with engine.begin() as connection:
@@
-        for stmt in setup_statements or []:
+        for stmt in normalized_setup_statements:
             connection.exec_driver_sql(stmt)

Also applies to: 667-670

🧰 Tools
🪛 Ruff (0.15.10)

[warning] 631-631: Missing return type annotation for private function _execute_sql_on_engine

(ANN202)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepnote_toolkit/sql/sql_execution.py` at line 631, In
_execute_sql_on_engine, validate setup_statements before iterating: ensure it's
either None or an iterable of statements (not a plain string). Add a guard that
detects isinstance(setup_statements, str) and either wrap it as a single-item
list or raise a TypeError, and also validate items are strings (or convert them)
before the loop that executes each statement to avoid iterating over characters
and producing misleading SQL errors.

Adds extract_setup_statements parser at deepnote_toolkit/sql/
setup_statement_parser.py and wires it into execute_sql_with_
connection_json. After the cell SQL is Jinja-rendered, the parser
strips a strict leading run of USE WAREHOUSE / USE DATABASE /
USE SCHEMA / USE ROLE / USE SECONDARY ROLES / SET / ALTER SESSION
statements off the front and feeds them down as setup_statements,
which run on the same connection as the main query inside one
engine.begin() block.

The parser:
- Skips line and block comments before/between statements.
- Respects single-quoted strings, double-quoted identifiers, and
  Snowflake $$-quoted strings when looking for unquoted ;.
- Stops on the first non-setup statement.
- Passes through unchanged if the cell is setup-only or has no
  closing ; on the prefix.
- Raises SetupStatementError with a clear message pointing to the
  setup_statements= kwarg if any extracted statement contains a
  bind placeholder (would otherwise blow up at exec_driver_sql,
  since it doesn't bind).

Test surface upgrade:
- New tests/unit/test_setup_statement_parser.py covers all parser
  cases, including placeholder detection across pyformat / format /
  named / numeric / qmark styles.
- New TestSetupStatementsAgainstRealSQLite uses a real
  sqlite:///:memory: engine and proves setup statements share the
  connection with the main query (TEMP TABLEs created in setup are
  visible to the main SELECT — only possible if they share a
  physical DBAPI connection).
- Drops the two over-mocked _execute_sql_on_engine ordering tests
  that mocked pandas.read_sql_query away; their substance is now
  covered by the SQLite test. Keeps the abort-on-setup-failure test.
- New TestSetupStatementParserIntegration verifies the parser's
  output flows through to _execute_sql_on_engine and that the
  templated-value error reaches the caller.

After this PR, _dntk.execute_sql("USE WAREHOUSE abc; SELECT 1", env)
just works end-to-end against Snowflake — no codegen change in any
caller is required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@robertlacok robertlacok changed the title feat: Add setup_statements parameter to execute_sql for session setup feat: Run leading USE WAREHOUSE / SET / ALTER SESSION on the same connection as the main query May 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepnote_toolkit/sql/setup_statement_parser.py`:
- Around line 143-145: _match_setup_prefix only advances over whitespace between
allowlisted keywords (prefix) so inline comments like /* ... */ prevent
matching; update the loop that currently calls _skip_inline_whitespace(s, cur)
to instead advance past both whitespace and inline comments (either by calling
an existing helper that skips comments, e.g.,
_skip_inline_whitespace_and_comments(s, cur), or by extending
_skip_inline_whitespace to handle /* ... */ and -- style comments) so cur moves
past comments before matching the next keyword; ensure you update the same loop
that iterates "for i, word in enumerate(prefix)" and maintain the correct cur
value for subsequent _match_keyword checks.

In `@deepnote_toolkit/sql/sql_execution.py`:
- Around line 681-685: The setup statements are executed via
connection.exec_driver_sql before the cursor-tracking wrapper is installed, so
long-running setup queries won't be registered and can't be canceled by
cancel_all_cursors(); move the installation of the tracking wrapper to run
immediately after opening the physical connection (i.e., before the for stmt in
setup_statements loop) so that connection.exec_driver_sql calls register their
cursors with the same tracking mechanism (ensure the code that wraps or replaces
the engine.begin() connection with the tracked connection is invoked prior to
executing setup_statements).
- Around line 221-225: The current code overwrites query_preview_source with
compiled_query when parsed_setup is present, causing preview-only rewrites
(e.g., LIMIT 100) to be stored in DeepnoteQueryPreview.deepnote_query; instead
preserve the original main query string for previews. Change the logic in the
block that checks parsed_setup so query_preview_source is set to the original
main query (the un-rewritten main SQL variable used earlier, e.g.,
original_main_query or the parsed "main" portion) rather than compiled_query;
keep compiled_query for execution but ensure DeepnoteQueryPreview.deepnote_query
is populated from the preserved original main query when parsed_setup is
non-empty.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e04b08ff-cf77-404c-9550-93852e283fdd

📥 Commits

Reviewing files that changed from the base of the PR and between 029e245 and 13ac7e8.

📒 Files selected for processing (5)
  • deepnote_toolkit/sql/setup_statement_parser.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_setup_statement_parser.py
  • tests/unit/test_sql_execution.py
  • tests/unit/test_sql_execution_internal.py

Comment on lines +143 to +145
for i, word in enumerate(prefix):
if i > 0:
cur = _skip_inline_whitespace(s, cur)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Skip comments between allowlisted keywords.

_match_setup_prefix() only advances over whitespace here. Valid SQL like USE /* pick one */ ROLE analyst; SELECT 1 or ALTER /*x*/ SESSION ... will not be extracted, so the cell falls back to the old multi-statement path instead of the new same-connection behavior.

Suggested fix
         for i, word in enumerate(prefix):
             if i > 0:
-                cur = _skip_inline_whitespace(s, cur)
+                cur = _skip_whitespace_and_comments(s, cur)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/sql/setup_statement_parser.py` around lines 143 - 145,
_match_setup_prefix only advances over whitespace between allowlisted keywords
(prefix) so inline comments like /* ... */ prevent matching; update the loop
that currently calls _skip_inline_whitespace(s, cur) to instead advance past
both whitespace and inline comments (either by calling an existing helper that
skips comments, e.g., _skip_inline_whitespace_and_comments(s, cur), or by
extending _skip_inline_whitespace to handle /* ... */ and -- style comments) so
cur moves past comments before matching the next keyword; ensure you update the
same loop that iterates "for i, word in enumerate(prefix)" and maintain the
correct cur value for subsequent _match_keyword checks.

Comment on lines +221 to +225
# query_preview_source mirrors the user-visible main query for cache
# key/preview purposes — keep it aligned with the post-extraction
# remainder.
if parsed_setup:
query_preview_source = compiled_query
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the original main query in query_preview_source.

When parsed_setup is non-empty, this replaces query_preview_source with the rendered remainder, which already carries preview-only rewrites like LIMIT 100. DeepnoteQueryPreview.deepnote_query will then store the limited/compiled SQL instead of the original main query, so chaining a preview from USE ...; SELECT ... changes semantics.

Suggested fix
-        if parsed_setup:
-            query_preview_source = compiled_query
+        if parsed_setup:
+            _, query_preview_source = extract_setup_statements(query_preview_source)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 221 - 225, The current
code overwrites query_preview_source with compiled_query when parsed_setup is
present, causing preview-only rewrites (e.g., LIMIT 100) to be stored in
DeepnoteQueryPreview.deepnote_query; instead preserve the original main query
string for previews. Change the logic in the block that checks parsed_setup so
query_preview_source is set to the original main query (the un-rewritten main
SQL variable used earlier, e.g., original_main_query or the parsed "main"
portion) rather than compiled_query; keep compiled_query for execution but
ensure DeepnoteQueryPreview.deepnote_query is populated from the preserved
original main query when parsed_setup is non-empty.

Comment on lines 681 to +685
with engine.begin() as connection:
# Run setup statements first on the same physical connection so any
# session state they set is visible to the main query below.
for stmt in setup_statements or []:
connection.exec_driver_sql(stmt)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Install cursor tracking before running setup statements.

The new exec_driver_sql loop runs before the tracking wrapper is attached. If a setup statement blocks and the cell is interrupted, its cursor is never registered, so cancel_all_cursors() cannot stop it. That regresses this function's cancellation guarantee for the new path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/sql/sql_execution.py` around lines 681 - 685, The setup
statements are executed via connection.exec_driver_sql before the
cursor-tracking wrapper is installed, so long-running setup queries won't be
registered and can't be canceled by cancel_all_cursors(); move the installation
of the tracking wrapper to run immediately after opening the physical connection
(i.e., before the for stmt in setup_statements loop) so that
connection.exec_driver_sql calls register their cursors with the same tracking
mechanism (ensure the code that wraps or replaces the engine.begin() connection
with the tracked connection is invoked prior to executing setup_statements).

connection.exec_driver_sql is the documented contract here — setup
statements are session-control SQL (USE WAREHOUSE, USE ROLE, SET,
ALTER SESSION) that cannot be parameter-bound, so the semgrep
sqlalchemy-execute-raw-query rule's recommendation doesn't apply.
Add a focused nosemgrep annotation with the rationale inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
deepnote_toolkit/sql/sql_execution.py (1)

646-646: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add return type annotation.

Static analysis correctly flags this. Function returns DataFrame | None.

-def _execute_sql_on_engine(engine, query, bind_params, setup_statements=None):
+def _execute_sql_on_engine(
+    engine, query, bind_params, setup_statements: Optional[list[str]] = None
+) -> Optional["pd.DataFrame"]:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/sql/sql_execution.py` at line 646, The function
_execute_sql_on_engine lacks a return type annotation; update its signature to
include the correct type (DataFrame | None) — e.g. change def
_execute_sql_on_engine(...) to def _execute_sql_on_engine(...) -> DataFrame |
None — and ensure DataFrame is imported (from pandas import DataFrame) or use
Optional[DataFrame] with typing imports so static analysis stops flagging the
missing return type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@deepnote_toolkit/sql/sql_execution.py`:
- Line 646: The function _execute_sql_on_engine lacks a return type annotation;
update its signature to include the correct type (DataFrame | None) — e.g.
change def _execute_sql_on_engine(...) to def _execute_sql_on_engine(...) ->
DataFrame | None — and ensure DataFrame is imported (from pandas import
DataFrame) or use Optional[DataFrame] with typing imports so static analysis
stops flagging the missing return type.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e064366b-70e4-46b9-9940-d941ab90462c

📥 Commits

Reviewing files that changed from the base of the PR and between 13ac7e8 and c28c7d0.

📒 Files selected for processing (1)
  • deepnote_toolkit/sql/sql_execution.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants