Skip to content

Conversation

@tub
Copy link
Contributor

@tub tub commented Feb 2, 2026

Purpose

Hadoop 3.4.x switched hadoop-aws from AWS SDK v1 to v2 (HADOOP-18073).
This updates the S3 multipart upload implementation to use the new SDK types:

  • CompletedPart instead of PartETag
  • CompleteMultipartUploadResponse instead of CompleteMultipartUploadResult
  • UploadPartRequest/UploadPartResponse for part uploads

This is a step towards implementing conditional writes for AWS S3, Azure and others. I'm preparing a PR for this as a follow-up once this is merged.

Also updates NOTICE files for the new Hadoop 3.4.2 dependencies.

Linked issue: #6563

Note

Although many of the changes were generated using Claude code, they've been manually verified locally and against an actual S3 bucket.

Tests

Existing S3 and Azure tests cover the changes.

API and Format

No

Documentation

No

@tub tub changed the title Update S3 multipart upload for Hadoop 3.4+ (AWS SDK v2) [filesystems] Update S3 multipart upload for Hadoop 3.4+ (AWS SDK v2) Feb 2, 2026
@tub tub force-pushed the hadoop-3.4-upgrade branch 2 times, most recently from ac02b55 to a6de06a Compare February 3, 2026 11:52
Hadoop 3.4.x switched hadoop-aws from AWS SDK v1 to v2 (HADOOP-18073).
This updates the S3 multipart upload implementation to use the new SDK
types:
- CompletedPart instead of PartETag
- CompleteMultipartUploadResponse instead of CompleteMultipartUploadResult
- UploadPartRequest/UploadPartResponse for part uploads

Also updates NOTICE files for the new Hadoop 3.4.2 dependencies.

Co-Authored-By: Claude <noreply@anthropic.com>
@tub tub force-pushed the hadoop-3.4-upgrade branch from a6de06a to 641b724 Compare February 3, 2026 14:19
Hadoop 3.4+ uses the pjfanning fork of jersey-json
(com.github.pjfanning:jersey-json) instead of the original
(com.sun.jersey:jersey-json). This fork contains properties files
with GPL license references that fail Apache license checks.

Add exclusions for the pjfanning jersey-json fork to:
- paimon-hadoop-shaded (fixes OSS, OBS, COSN modules)
- paimon-gs-impl (uses direct hadoop-common dependency)

Co-Authored-By: Claude <noreply@anthropic.com>
@tub tub marked this pull request as ready for review February 3, 2026 16:28
@tub
Copy link
Contributor Author

tub commented Feb 3, 2026

@JingsongLi I'm prepping a PR based on this one that will allow users to have multiple processes writing to the same table without having to use an external catalog/metastore for S3 and Azure. I'd really appreciate it if you could help me find a reviewer 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant