feat: built-in databricks unity catalog sync#802

Open

parisni wants to merge 11 commits intoapache:mainfrom

leboncoin:pr-databricks-uc-sync

Contributor

parisni commented Feb 11, 2026

Important Read

Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

This pull request adds Databricks Unity Catalog (DATABRICKS_UC) catalog sync support to XTable, with a new xtable-databricks module, UC config wiring, UC sync client
implementation, and documentation updates.

Brief change log

Added new xtable-databricks module + ServiceLoader wiring.
Added DatabricksUnityCatalogConfig and DatabricksUnityCatalogSyncClient (DDL-based UC registration).
Implemented UC sync methods: hasDatabase, createDatabase, getTable, createTable, createOrReplaceTable, dropTable, refreshTable.
Added UC-specific unit tests for all catalog sync methods.
Documented Databricks UC sync flow, auth (OAuth M2M), limitations (Delta-only, unique location), and schema evolution strategy (drop/recreate).

Verify this pull request

This change added tests and can be verified as follows:

Added TestDatabricksUnityCatalogSyncClient in xtable-databricks.
Run:
- mvn -Drat.skip=true -Dspotless.skip=true -Dtest=TestDatabricksUnityCatalogSyncClient -pl xtable-databricks test

parisni added 11 commits

February 11, 2026 14:04


          databricks uc skeleton

757a741


          implem/test create table

2c323cb


          implem/test hasDatabase/getTable

e590d89


          implem/test createDatabase

df63007


          implem/test create or replace table

9767c5c


          implem/test drop table

c5ef4b5


          implem: dummy refresh table

0d117ea

not sure if unity catalog refresh by itself the table metadata. will
have to investigate
for now skip refresh table


          implem/test refresh table

af8cc83


          fix: refresh table to recreate the table when diff

dc22a1e

- alter column stmt is meant for delta, not uc so it tries to upgrade
delta logs
- refresh table stmt does not refresh the columns
- create or replace stmt is also meant for delta, not uc so tries to
upgrade delta logs
- drop table + create new table: does the work, also sounds like uc does
keep track of the table history so we don't loose associated stuff (top
users, related assets, creation date...)

doc

0cef043


          fix: use msck repair in place of replacetable for refresh

8e971fb

parisni changed the title ~~feat: databricks unity catalog sync~~ feat: built-in databricks unity catalog sync

vinishjail97 reviewed

View reviewed changes

...-databricks/src/main/java/org/apache/xtable/databricks/DatabricksUnityCatalogSyncClient.java

+                DatabricksUnityCatalogSyncClient(
+                    ExternalCatalogConfig catalogConfig,
+                    String tableFormat,

Contributor

vinishjail97 Feb 23, 2026

SQL injection risk: fullName is derived from user-provided tableIdentifier and interpolated directly into the SQL string. While escapeSqlString is used for location, the table name itself is not sanitized.

If hierarchicalId contains ; DROP TABLE ..., this would be injected. Consider validating that catalog/schema/table names match [a-zA-Z0-9_]+ before interpolating them into DDL statements.

...-databricks/src/main/java/org/apache/xtable/databricks/DatabricksUnityCatalogSyncClient.java

+                  if (table == null) {
+                    return null;
+                  }
+                  return table.getStorageLocation();

Contributor

vinishjail97 Feb 23, 2026

MSCK REPAIR TABLE ... SYNC METADATA is a Hive-specific command. Does Databricks UC's SQL Warehouse support this syntax? The standard Databricks approach for refreshing external table metadata is ALTER TABLE <table> SET TBLPROPERTIES or reading from the Delta log directly.

If this doesn't actually work on Databricks, schema drift would silently be ignored. Has this been tested against a real SQL Warehouse?

...-databricks/src/main/java/org/apache/xtable/databricks/DatabricksUnityCatalogSyncClient.java

+                  String catalog = hierarchical.getCatalogName();
+                  if (StringUtils.isBlank(catalog)) {
+                    throw new CatalogSyncException(
+                        "Databricks UC requires a catalog name (expected catalog.schema.table)");

Contributor

vinishjail97 Feb 23, 2026

createOrReplaceTable drops then recreates the table. This is not atomic — if createTable fails after dropTable succeeds, the table is gone. Consider using CREATE OR REPLACE TABLE DDL instead (if supported for external tables), or at minimum catch the create failure and log a clear error that the table was dropped but not recreated.

...-databricks/src/main/java/org/apache/xtable/databricks/DatabricksUnityCatalogSyncClient.java

+                  } catch (Exception e) {
+                    throw new CatalogSyncException("Failed to get schema: " + fullName, e);
+                  }
+                }

Contributor

vinishjail97 Feb 23, 2026

The init method can create up to 3 separate WorkspaceClient instances (lines 153, 158, 164) if statementExecution, tablesApi, and schemasApi are all null. Create the WorkspaceClient once at the top:

if (this.workspaceClient == null) {
    this.workspaceClient = new WorkspaceClient(buildConfig(databricksConfig));
}
if (this.statementExecution == null) {
    this.statementExecution = workspaceClient.statementExecution();
}
...

...-databricks/src/main/java/org/apache/xtable/databricks/DatabricksUnityCatalogSyncClient.java

+                private ExternalCatalogConfig catalogConfig;
+                private DatabricksUnityCatalogConfig databricksConfig;
+                private Configuration hadoopConf;

Contributor

vinishjail97 Feb 23, 2026

All fields (catalogConfig, databricksConfig, hadoopConf, tableFormat, workspaceClient, etc.) are mutable and set via init(). This pattern is fragile — calling any method before init() would NPE.

Consider making these final and removing the no-arg constructor, or at least adding a null-check guard in methods that use these fields. The ServiceLoader no-arg constructor + init() pattern should be documented with a warning.

xtable-databricks/src/main/java/org/apache/xtable/databricks/DatabricksUnityCatalogConfig.java

+                      props.get(CLIENT_SECRET),
+                      props.get(TOKEN));
+                }
+              }

Contributor

vinishjail97 Feb 23, 2026

The token field is defined and parsed but the docs say PAT auth is "intentionally not wired" and "not supported yet". If it's not supported, don't add the field — it creates confusion. Remove TOKEN constant and token field until PAT auth is actually implemented.

xtable-databricks/pom.xml

+                  <dependencies>
+                      <dependency>
+                          <groupId>org.apache.xtable</groupId>

Contributor

vinishjail97 Feb 23, 2026

Pin the Databricks SDK version to a property in the parent pom.xml instead of hardcoding 0.85.0 here. Follow the pattern used for other dependency versions in the project.

...abricks/src/test/java/org/apache/xtable/databricks/TestDatabricksUnityCatalogSyncClient.java

+               * distributed with this work for additional information
+               * regarding copyright ownership.  The ASF licenses this file
+               * to you under the Apache License, Version 2.0 (the
+               * "License"); you may not use this file except in compliance

Contributor

vinishjail97 Feb 23, 2026

Tests have massive boilerplate — every test method repeats the same 15-line client construction block. Extract a createClient(String tableFormat) helper method to reduce duplication and improve readability.

...abricks/src/test/java/org/apache/xtable/databricks/TestDatabricksUnityCatalogSyncClient.java

+                      .thenReturn(
+                          new StatementResponse()
+                              .setStatus(new StatementStatus().setState(StatementState.SUCCEEDED)));

Contributor

vinishjail97 Feb 23, 2026

Missing test: refreshTable with matching schema (no-op path). You test the schema-mismatch path but not the case where schemas match and no MSCK REPAIR is issued. Also missing: refreshTable with null catalogTable and refreshTable with null/empty schema.

website/sidebars.js

                                       'hms',
                                       'glue-catalog',
-                                      'unity-catalog',
+                                      'databricks-unity-catalog',

Contributor

vinishjail97 Feb 23, 2026

The sidebar reference was changed from unity-catalog to databricks-unity-catalog, but the docs file is still named unity-catalog.md (not renamed). This will break the sidebar link. Either rename the md file or keep the sidebar reference as unity-catalog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet