-
Notifications
You must be signed in to change notification settings - Fork 143
feat(backend/kernel): add use_kernel=True flag — route through the Rust kernel via PyO3 #787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vikrantpuppala
wants to merge
9
commits into
main
Choose a base branch
from
feat/kernel-backend
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
9259644
feat(backend/kernel): route use_sea=True through the Rust kernel
vikrantpuppala 2572362
refactor(backend/kernel): PAT-only auth, drop External trampoline
vikrantpuppala 6b30815
test(e2e): live kernel-backend (use_sea=True) suite
vikrantpuppala 31ca581
fix(backend/kernel): defer databricks-sql-kernel poetry dep declaration
vikrantpuppala 823c416
fix(backend/kernel): unit tests skip without pyarrow, mypy + black
vikrantpuppala c0219ee
fix(backend/kernel): make package importable without the kernel wheel
vikrantpuppala 8958e76
test(e2e): skip use_sea=True parametrized cases when kernel wheel mis…
vikrantpuppala 37fa544
refactor(backend/kernel): address review feedback — mechanical fixes
vikrantpuppala 24e9a5c
feat(backend/kernel): introduce dedicated use_kernel flag + substanti…
vikrantpuppala File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| """Backend that delegates to the Databricks SQL Kernel (Rust) via PyO3. | ||
|
|
||
| Routed when ``use_kernel=True`` is passed to ``databricks.sql.connect``. | ||
| The module's identity is "delegates to the kernel" — not the wire | ||
| protocol the kernel happens to use today (SEA REST). The kernel may | ||
| switch its default transport (SEA REST → SEA gRPC → …) without | ||
| renaming this module. | ||
|
|
||
| This ``__init__`` deliberately does **not** re-export | ||
| ``KernelDatabricksClient`` from ``.client``. Importing ``.client`` | ||
| loads the ``databricks_sql_kernel`` PyO3 extension at module-import | ||
| time; doing that eagerly here would make ``import | ||
| databricks.sql.backend.kernel.type_mapping`` (used by tests / by | ||
| ``KernelResultSet`` consumers) require the kernel wheel even when | ||
| the caller never plans to open a kernel-backed session. Callers | ||
| that need the client import it directly: | ||
|
|
||
| from databricks.sql.backend.kernel.client import KernelDatabricksClient | ||
|
|
||
| ``session.py::_create_backend`` already does this lazy import under | ||
| the ``use_kernel=True`` branch. | ||
|
|
||
| See ``docs/designs/pysql-kernel-integration.md`` in | ||
| ``databricks-sql-kernel`` for the full integration design. | ||
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| """Translate the connector's ``AuthProvider`` into ``databricks_sql_kernel`` | ||
| ``Session`` auth kwargs. | ||
|
|
||
| This phase ships PAT only. The kernel-side PyO3 binding accepts | ||
| ``auth_type='pat'``; OAuth / federation / custom credentials | ||
| providers are reserved but not yet wired in either layer. Non-PAT | ||
| auth raises ``NotSupportedError`` from this bridge so the failure | ||
| surfaces at session-open time with a clear message rather than | ||
| deep inside the kernel. | ||
|
|
||
| Token extraction goes through ``AuthProvider.add_headers({})`` | ||
| rather than touching auth-provider-specific attributes, so the | ||
| bridge works uniformly for every PAT shape — including | ||
| ``AccessTokenAuthProvider`` wrapped in ``TokenFederationProvider`` | ||
| (which ``get_python_sql_connector_auth_provider`` does for every | ||
| provider it builds). | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| import re | ||
| from typing import Any, Dict, Optional | ||
|
|
||
| from databricks.sql.auth.authenticators import AccessTokenAuthProvider, AuthProvider | ||
| from databricks.sql.auth.token_federation import TokenFederationProvider | ||
| from databricks.sql.exc import NotSupportedError | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| _BEARER_PREFIX = "Bearer " | ||
|
|
||
| # Defense-in-depth: reject tokens containing ASCII control characters. | ||
| # A token with embedded CR/LF/NUL would let a misbehaving HTTP stack | ||
| # split or terminate the Authorization header line, opening a header- | ||
| # injection sink. Real PATs and federation-exchanged tokens never | ||
| # contain these. | ||
| _CONTROL_CHAR_RE = re.compile(r"[\x00-\x1f\x7f]") | ||
|
|
||
|
|
||
| def _is_pat(auth_provider: AuthProvider) -> bool: | ||
| """Return True iff this provider ultimately wraps an | ||
| ``AccessTokenAuthProvider``. | ||
|
|
||
| ``get_python_sql_connector_auth_provider`` always wraps the | ||
| base provider in a ``TokenFederationProvider``, so an | ||
| ``isinstance`` check against ``AccessTokenAuthProvider`` alone | ||
| never matches in practice. We peek through the federation | ||
| wrapper to find the real type. | ||
| """ | ||
| if isinstance(auth_provider, AccessTokenAuthProvider): | ||
| return True | ||
| if isinstance(auth_provider, TokenFederationProvider) and isinstance( | ||
| auth_provider.external_provider, AccessTokenAuthProvider | ||
| ): | ||
| return True | ||
| return False | ||
|
|
||
|
|
||
| def _extract_bearer_token(auth_provider: AuthProvider) -> Optional[str]: | ||
| """Pull the current bearer token out of an ``AuthProvider``. | ||
|
|
||
| The connector's ``AuthProvider.add_headers`` mutates a header | ||
| dict and writes the ``Authorization: Bearer <token>`` value. | ||
| Going through that public surface keeps us insulated from | ||
| provider-specific internals. | ||
|
|
||
| Returns ``None`` if the provider did not write an Authorization | ||
| header or wrote a non-Bearer scheme — neither is representable | ||
| in the kernel's PAT auth surface. | ||
| """ | ||
| headers: Dict[str, str] = {} | ||
| auth_provider.add_headers(headers) | ||
| auth = headers.get("Authorization") | ||
| if not auth: | ||
| return None | ||
| if not auth.startswith(_BEARER_PREFIX): | ||
| return None | ||
| token = auth[len(_BEARER_PREFIX) :] | ||
| if _CONTROL_CHAR_RE.search(token): | ||
| raise ValueError( | ||
| "Bearer token contains ASCII control characters; refusing to " | ||
| "forward it to the kernel auth bridge." | ||
| ) | ||
| return token | ||
|
|
||
|
|
||
| def kernel_auth_kwargs(auth_provider: AuthProvider) -> Dict[str, Any]: | ||
| """Build the kwargs passed to ``databricks_sql_kernel.Session(...)``. | ||
|
|
||
| PAT (including ``TokenFederationProvider``-wrapped PAT) routes | ||
| through the kernel's PAT path. Anything else raises | ||
| ``NotSupportedError`` — the kernel binding doesn't accept OAuth | ||
| today, and routing OAuth through PAT would silently break | ||
| token refresh during long-running sessions. | ||
| """ | ||
| if _is_pat(auth_provider): | ||
| token = _extract_bearer_token(auth_provider) | ||
| if not token: | ||
| raise ValueError( | ||
| "PAT auth provider did not produce a Bearer Authorization " | ||
| "header; cannot route through the kernel's PAT path" | ||
| ) | ||
| return {"auth_type": "pat", "access_token": token} | ||
|
|
||
| raise NotSupportedError( | ||
| f"The kernel backend (use_kernel=True) currently only supports PAT auth, " | ||
| f"but got {type(auth_provider).__name__}. Use the Thrift backend " | ||
| "(default) for OAuth / federation / custom credential providers." | ||
| ) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] kernel_auth_kwargs raises bare ValueError when a PAT provider produces no Bearer header (and _extract_bearer_token raises ValueError on control-char-laden tokens), while the rest of the kernel-backend error surface routes misuse through PEP 249 exception types (ProgrammingError / NotSupportedError). A user catching DB-API exceptions will miss this case. Fix: raise ProgrammingError instead, or wrap in NotSupportedError so the auth-bridge error type is consistent with the surrounding API contract.