Spark: Support aggregate pushdown for identity partition column GROUP BY by hemanthboyina · Pull Request #16176 · apache/iceberg

hemanthboyina · 2026-04-30T18:17:55Z

This PR enables aggregate pushdown for queries with GROUP BY on identity partition columns. Currently, Iceberg supports pushing down aggregates (COUNT, MIN, MAX) for queries without GROUP BY, computing results from file metadata instead of reading data files. However, when a query includes GROUP BY, the pushdown is disabled even when the GROUP BY columns are identity partition fields.

singhpk234 · 2026-05-02T02:26:57Z

+    Map<List<Object>, AggregateEvaluator> evaluatorsByPartition =
+        groupFilesByPartition(spec, groupByPositions, boundAggregates);


i am not confident this is correct, plus we are just checking the recent partitioning, a table could comprise of lot of different partition spec files which evolved across snapshots

Thanks for the review @singhpk234 You raised a valid point. the current implementation only considers the current partition spec and bails out for files from different specs. Will look into handling spec evolution properly and update the PR.

handled partition spec evolution changes, can you please review

Spark: Support aggregate pushdown for identity partition column GROUP BY

fde0869

github-actions Bot added the spark label Apr 30, 2026

fix complexity and checkstyle

751c3b1

singhpk234 requested a review from huaxingao May 2, 2026 02:24

singhpk234 reviewed May 2, 2026

View reviewed changes

fix spec evolution changes

8e38932

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Support aggregate pushdown for identity partition column GROUP BY#16176

Spark: Support aggregate pushdown for identity partition column GROUP BY#16176
hemanthboyina wants to merge 3 commits intoapache:mainfrom
hemanthboyina:groupby_aggregate

hemanthboyina commented Apr 30, 2026

Uh oh!

singhpk234 May 2, 2026

Uh oh!

hemanthboyina May 2, 2026

Uh oh!

hemanthboyina May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Map<List<Object>, AggregateEvaluator> evaluatorsByPartition =
		groupFilesByPartition(spec, groupByPositions, boundAggregates);

Conversation

hemanthboyina commented Apr 30, 2026

Uh oh!

singhpk234 May 2, 2026

Choose a reason for hiding this comment

Uh oh!

hemanthboyina May 2, 2026

Choose a reason for hiding this comment

Uh oh!

hemanthboyina May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants