Spark: Add session-level read split configs by liucao-dd · Pull Request #16185 · apache/iceberg

liucao-dd · 2026-05-01T05:13:06Z

Spark Split Planning Session Configs

Summary

This change adds Spark session-level configuration for Iceberg split planning:

spark.sql.iceberg.split.target-size
spark.sql.iceberg.split.planning-lookback
spark.sql.iceberg.split.open-file-cost
spark.sql.iceberg.split.adaptive-size.enabled

The goal is to let Spark SQL reads and row-level operations tune split planning without requiring
table property changes or DataFrame read options.

Naming

The config names intentionally mirror the existing Iceberg table properties:

read.split.target-size
read.split.planning-lookback
read.split.open-file-cost
read.split.adaptive-size.enabled

if reviewers prefer names closer to existing read options, such as spark.sql.iceberg.split-size, I am open to adjusting.

Motivation

Iceberg already supports per-read split planning through DataFrame read options and table
properties. Those are not always practical for SQL-first workloads:

SQL statements such as MERGE INTO, UPDATE, and DELETE do not expose DataFrame read options.
Changing table properties affects all readers, which is too broad when different jobs need
different split-planning behavior.
Session configs let one Spark application or notebook tune read parallelism without changing code
or table metadata.

spark.sql.iceberg.split.target-size and spark.sql.iceberg.split.adaptive-size.enabled are the main knobs that applications would commonly tune at session level. The lookback and open-file-cost configs are there for consistency.

Backports

This PR updates Spark 4.1 first. Backports for Spark 4.0, 3.5, will follow in separate PRs to keep this change reviewable.

Test Layout

The new TestSparkReadConf file is intended as the read-side counterpart to TestSparkWriteConf.
It covers precedence and accessor behavior that is easiest to validate directly against
SparkReadConf, especially the *Option() accessors used by scan planning.

Allow SQL paths such as MERGE INTO to tune split planning without per-read options, especially split size and adaptive sizing. The config names mirror the read.split table properties; older Spark version backports will follow separately.

Spark: Add session configs for split planning

3b27dfc

Allow SQL paths such as MERGE INTO to tune split planning without per-read options, especially split size and adaptive sizing. The config names mirror the read.split table properties; older Spark version backports will follow separately.

github-actions Bot added spark docs labels May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Add session-level read split configs#16185

Spark: Add session-level read split configs#16185
liucao-dd wants to merge 1 commit intoapache:mainfrom
liucao-dd:spark/split-read-conf-session-config

liucao-dd commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liucao-dd commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Spark Split Planning Session Configs

Summary

Naming

Motivation

Backports

Test Layout

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liucao-dd commented May 1, 2026 •

edited

Loading