Skip to content

feat: add variable substitution support for check definitions#1078

Merged
mwojtyczka merged 31 commits intodatabrickslabs:mainfrom
fedeflowers:feature/parametrization_rules
Apr 22, 2026
Merged

feat: add variable substitution support for check definitions#1078
mwojtyczka merged 31 commits intodatabrickslabs:mainfrom
fedeflowers:feature/parametrization_rules

Conversation

@fedeflowers
Copy link
Copy Markdown
Contributor

@fedeflowers fedeflowers commented Mar 15, 2026

Changes

Add variable substitution support for check definitions. This allows users to define reusable check templates with {{ placeholder }} syntax and resolve them prior to execution by passing a [variables] dictionary at load time.
New functionality:

  • resolve_variables() in utils.py — recursively replaces {{ key }} placeholders in all string values of check definitions using a highly efficient single-pass substitution.
  • New optional variables parameter on load_checks() and load_checks_from_local_file(). This delegates the templating to the data-loading layer so the engine always executes clean and self-contained rules.
  • Default Variables in ExtraParams: The DQEngine now accepts default variables via ExtraParams. These are applied to all checks loaded by that engine instance unless overridden in a specific load_checks
  • Supports scalar variable types (str, int, float, bool, Decimal, datetime.date, datetime.datetime, datetime.time).
  • Logs warnings for any unresolved placeholders after substitution is complete.
  • Rejects collection types (list, dict, set tuple and None with an explicit InvalidParameterError.
    Example usage:
    Using load_checks from a file or table:
# checks.yml
# - check:
#     function: is_not_null
#     arguments:
#       column: "{{ col }}"
# Resolving templates at load time:
checks = engine.load_checks_from_local_file('checks.yml', variables={"col": "email"})
# Engine executes the resolved rules
engine.apply_checks_by_metadata(df, checks)

If building checks programmatically using raw dictionaries:

from databricks.labs.dqx.utils import resolve_variables
raw_checks = [{"check": {"function": "is_not_null", "arguments": {"column": "{{ col }}"}}}]
resolved_checks = resolve_variables(raw_checks, variables={"col": "email"})
engine.apply_checks_by_metadata(df, resolved_checks)

Linked issues

Resolves #967

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

@fedeflowers fedeflowers requested a review from a team as a code owner March 15, 2026 19:13
@fedeflowers fedeflowers requested review from pratikk-databricks and removed request for a team March 15, 2026 19:13
@ghanse ghanse self-requested a review March 16, 2026 17:05
@ghanse ghanse added the under-review This PR is currently being reviewed by one of DQX maintainers. label Mar 16, 2026
@ghanse ghanse requested a review from Copilot March 18, 2026 19:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds runtime variable substitution ({{ key }}) for metadata-defined checks so users can template rule definitions and resolve them by passing a variables dict at execution/validation time.

Changes:

  • Introduces apply_variables() in utils.py to recursively substitute placeholders in all string fields, validate variable value types, and warn on unresolved placeholders.
  • Extends DQEngine/DQEngineCore metadata entrypoints (apply_checks_by_metadata*, validate_checks) with an optional variables parameter and applies substitution before validation/deserialization.
  • Adds unit + integration coverage for substitution behavior and engine plumbing.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/databricks/labs/dqx/utils.py Implements recursive placeholder substitution + scalar-only variable validation + unresolved-placeholder warnings.
src/databricks/labs/dqx/engine.py Wires variables through metadata APIs and applies substitution before validation/deserialization.
tests/unit/test_utils.py Adds focused unit tests for substitution correctness, immutability, warnings, and type validation.
tests/integration/test_apply_checks.py Adds integration tests proving substitution works through metadata apply/split + validation paths.
tests/integration/test_apply_checks_and_save_in_table.py Adds integration test proving substitution works when saving results to a table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/databricks/labs/dqx/engine.py
Comment thread src/databricks/labs/dqx/engine.py
Comment thread src/databricks/labs/dqx/engine.py Outdated
Comment thread src/databricks/labs/dqx/engine.py
Comment thread src/databricks/labs/dqx/engine.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…ks_by_metadata_and_split, validate_checks to be consistent with the downstream implementation
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 18, 2026

✅ 641/641 passed, 36 skipped, 6h43m11s total

Running from acceptance #4239

Copy link
Copy Markdown
Collaborator

@ghanse ghanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think substitution should happen when the user loads checks. We should allow the user to pass variables to load_checks and delegate substitution to that method instead of adding to the apply_checks methods.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.94%. Comparing base (54b9626) to head (877e748).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1078      +/-   ##
==========================================
- Coverage   92.02%   91.94%   -0.08%     
==========================================
  Files          98       98              
  Lines        9093     9140      +47     
==========================================
+ Hits         8368     8404      +36     
- Misses        725      736      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Code review feedback implementation

Co-authored-by: Marcin Wojtyczka <marcin.wojtyczka@databricks.com>
Copy link
Copy Markdown
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must work for save_checks method to. Currently saving errors out. We need to resolve vars during saving. I'm updating accordingly

Copy link
Copy Markdown
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mwojtyczka mwojtyczka added Approved to Merge When PR is reviewed and approved. To be merged once all tests pass and removed under-review This PR is currently being reviewed by one of DQX maintainers. labels Apr 14, 2026
Copy link
Copy Markdown
Collaborator

@ghanse ghanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mwojtyczka mwojtyczka merged commit 80b029a into databrickslabs:main Apr 22, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved to Merge When PR is reviewed and approved. To be merged once all tests pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Parametrization of rules

4 participants