Skip to content

Core: Add partition tuple to ManifestInfo#16213

Open
nastra wants to merge 1 commit intoapache:mainfrom
nastra:v4-add-partition-tuple
Open

Core: Add partition tuple to ManifestInfo#16213
nastra wants to merge 1 commit intoapache:mainfrom
nastra:v4-add-partition-tuple

Conversation

@nastra
Copy link
Copy Markdown
Contributor

@nastra nastra commented May 5, 2026

Adds a partitions field to ManifestInfo and its implementation ManifestInfoStruct. This mirrors the existing partitions field on ManifestFile

@nastra nastra requested a review from amogh-jahagirdar May 5, 2026 11:02
@github-actions github-actions Bot added the core label May 5, 2026
@nastra nastra force-pushed the v4-add-partition-tuple branch from d3241e3 to 5468398 Compare May 5, 2026 11:32
@nastra nastra closed this May 5, 2026
@nastra nastra reopened this May 5, 2026
Copy link
Copy Markdown
Contributor

@anoopj anoopj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will there be a followup PR for adding this to TrackedFile?

}

@Override
@SuppressWarnings("unchecked")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: just add it to case 9?

@nastra nastra force-pushed the v4-add-partition-tuple branch from 5468398 to b73b211 Compare May 6, 2026 10:35
@nastra
Copy link
Copy Markdown
Contributor Author

nastra commented May 6, 2026

LGTM. Will there be a followup PR for adding this to TrackedFile?

@anoopj yes

Comment on lines +58 to +74
Types.StructType PARTITION_SUMMARY_TYPE =
Types.StructType.of(
Types.NestedField.required(
509,
"contains_null",
Types.BooleanType.get(),
"True if any file has a null partition value"),
Types.NestedField.optional(
518,
"contains_nan",
Types.BooleanType.get(),
"True if any file has a nan partition value"),
Types.NestedField.optional(
510, "lower_bound", Types.BinaryType.get(), "Partition lower bound for all files"),
Types.NestedField.optional(
511, "upper_bound", Types.BinaryType.get(), "Partition upper bound for all files"));
Types.NestedField PARTITION_SUMMARIES =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we keep partition tuples for data file entries, I don't think we neccessarily need to keep partition field summaries for manifests as well, we can still use stats at this level in the tree. In fact, I largely think we shouldn't since it adds more complexity to the structure.

Ultimately, at the manifest level we want to preserve pruning but I think we can do that with stats, so long as we ensure that stats on source columns are collected and we have the rules to aggregate them correct. That was independent of the decision to keep the tuple or not. Let me know if I'm missing something @nastra @anoopj @rdblue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants