Improve S3 API compatibility#16
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch attempts to bring x2s3 closer into alignment with the AWS S3 API standards and conventions, for the purposes of better supporting a diverse set of clients including Java clients like N5 Viewer.
The main change is that we now check the Accept header and return HTML or XML based on the preference. I then asked Claude to write tests of an AWS S3 bucket API, and compare to the same bucket being proxied by x2s3. It identified many small differences which have been either fixed or documented below.
Finally, I added Java-based integration tests that run inside a Pixi environment.
1. Use Accept header to determine HTML vs XML responses
When the UI is enabled, the proxy previously always returned HTML for the bucket index (
GET /) and directory listings (GET /bucket/prefix/). This broke S3 API clients that expected XML.Now the proxy checks the
Acceptheader: HTML is only returned whentext/htmlappears beforeapplication/xml. Otherwise, proper S3-compatible XML is returned (ListAllMyBucketsResultfor the index,ListObjectsV2for directory listings).2. Accept
max-keys=0instead of returning 400AWS S3 accepts
max-keys=0and returns an empty listing withIsTruncated=false. The proxy was rejecting it with 400 due to Pydantic'sge=1constraint.Fixed by changing the constraint to
ge=0. Also added an early return in the file client'swalk_pathso it returns an empty listing rather than incorrectly reportingIsTruncated=true.3. Accept
max-keysvalues above 1000 instead of returning 400AWS S3 accepts
max-keys=1001(or higher) and simply returns up to 1000 results, echoing the requested value back inMaxKeys. The proxy was rejecting any value over 1000 with 400 due to Pydantic'sle=1000constraint.Fixed by removing the upper-bound constraint. The aioboto client passes the value through to upstream S3 which handles it naturally. The file client returns however many items are requested (no artificial cap).
4. Always emit
<Prefix>and<KeyCount>in ListObjectsV2 XMLAWS S3 always includes
<Prefix></Prefix>(even when empty) and<KeyCount>0</KeyCount>(even when zero) in ListObjectsV2 responses. The proxy was omitting these elements becauseadd_telem()inutils.pyskipped any falsy value ("",0,None).Fixed by changing the guard from
if not valuetoif value is None, so empty strings and zero are now emitted. Also fixed a related issue in the aioboto client whereNextContinuationTokendefaulted to""instead ofNone, which would have caused an empty<NextContinuationToken>element to appear on non-truncated responses. Both clients now defaultPrefixto''(instead ofNone) so the element is always emitted.5. Include
Last-ModifiedandETagheaders in GetObject responsesThe proxy's GetObject streaming response was missing the
Last-ModifiedandETagheaders that AWS S3 always returns. The upstream S3 response included them inres_headers, butopen_object()in the aioboto client was not copying them through. (Thehead_object()method already returned these headers correctly.)6. Add
xmlnsnamespace to XML responsesAWS S3 includes
xmlns="http://s3.amazonaws.com/doc/2006-03-01/"on root elements of all XML responses. The proxy was omitting it. Some S3 client libraries may depend on this namespace.Added the namespace to
ListBucketResultandListAllMyBucketsResultroot elements. Updatedparse_xml()to strip namespace prefixes so internal XML parsing (e.g. the browse UI) continues to work with plain tag names.7. Fix
LastModifiedtimestamp formatProxy was returning
2024-07-26T13:39:10+00:00(Python'sisoformat()). AWS returns2024-07-26T13:39:10.000Z(milliseconds with Z suffix). Some S3 clients may parse timestamps strictly.Fixed both the aioboto client (which called
.isoformat()on boto datetimes) andformat_timestamp_s3()in utils (used by the file client) to usestrftime("%Y-%m-%dT%H:%M:%S.000Z").8. Fix XML element ordering in ListObjectsV2
Reordered the keys list in
get_list_xml()to match AWS S3's element order:Name, Prefix, StartAfter, ContinuationToken, NextContinuationToken, KeyCount, MaxKeys, Delimiter, EncodingType, IsTruncated, followed byCommonPrefixes, thenContents.9. Fix
HEAD /bucketreturning 500 instead of 200HEADon a bucket root (e.g.HEAD /janelia-data-examples) was passing an empty key toclient.head_object(""), which failed with a 500 error. AWS returns 200 for this operation (HeadBucket).Fixed by returning 200 with
Content-Type: application/xmlwhen the target path is empty, before attempting to HEAD an object.Remaining differences from AWS S3
These were identified during testing but intentionally not addressed.
By design
GET /): AWS returns 307 (requires auth); proxy returns 200 with its configured target list.Content-Dispositionon octet-stream files: Proxy addsContent-Disposition: attachmentforapplication/octet-streamfiles. AWS does not.Not supported
list-type=1): Proxy returns 400. AWS supports v1 and also uses it as the default when nolist-typeis given. Proxy always returns v2 format.fetch-owner=true: AWS returns<Owner>elements in each<Contents>entry. Proxy ignores this parameter and never includes Owner data.Cosmetic
ContinuationTokenvalues differ: Expected — tokens are opaque and server-specific.Serverheader: AWS returnsAmazonS3, proxy returnsuvicorn.<?xml version="1.0" encoding="UTF-8"?>(double quotes, uppercase). Proxy uses single quotes and lowercase. Both are valid XML.<Prefix></Prefix>, proxy uses<Prefix />(Python ElementTree self-closing tag). Both are semantically identical XML.<RequestId>and<HostId>: AWS includes these in error responses. Proxy has no real values to provide.binary/octet-streamvsapplication/octet-stream: AWS uses the non-standardbinary/octet-streamfor extensionless files. Proxy uses the IANA-standardapplication/octet-stream.x-amz-server-side-encryption: AWS returns this header on HeadObject. Proxy does not (it's an AWS-specific detail).x-amz-bucket-region,x-amz-request-id: AWS returns these on HeadBucket. Proxy does not.@StephanPreibisch @cmhulbert @bogovicj @neomorphic