Spark: Add session-level read split configs#16185
Open
liucao-dd wants to merge 1 commit intoapache:mainfrom
Open
Spark: Add session-level read split configs#16185liucao-dd wants to merge 1 commit intoapache:mainfrom
liucao-dd wants to merge 1 commit intoapache:mainfrom
Conversation
Allow SQL paths such as MERGE INTO to tune split planning without per-read options, especially split size and adaptive sizing. The config names mirror the read.split table properties; older Spark version backports will follow separately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Spark Split Planning Session Configs
Summary
This change adds Spark session-level configuration for Iceberg split planning:
spark.sql.iceberg.split.target-sizespark.sql.iceberg.split.planning-lookbackspark.sql.iceberg.split.open-file-costspark.sql.iceberg.split.adaptive-size.enabledThe goal is to let Spark SQL reads and row-level operations tune split planning without requiring
table property changes or DataFrame read options.
Naming
The config names intentionally mirror the existing Iceberg table properties:
read.split.target-sizeread.split.planning-lookbackread.split.open-file-costread.split.adaptive-size.enabledif reviewers prefer names closer to existing read options, such as
spark.sql.iceberg.split-size, I am open to adjusting.Motivation
Iceberg already supports per-read split planning through DataFrame read options and table
properties. Those are not always practical for SQL-first workloads:
MERGE INTO,UPDATE, andDELETEdo not expose DataFrame read options.different split-planning behavior.
or table metadata.
spark.sql.iceberg.split.target-sizeandspark.sql.iceberg.split.adaptive-size.enabledare the main knobs that applications would commonly tune at session level. The lookback and open-file-cost configs are there for consistency.Backports
This PR updates Spark 4.1 first. Backports for Spark 4.0, 3.5, will follow in separate PRs to keep this change reviewable.
Test Layout
The new
TestSparkReadConffile is intended as the read-side counterpart toTestSparkWriteConf.It covers precedence and accessor behavior that is easiest to validate directly against
SparkReadConf, especially the*Option()accessors used by scan planning.