Skip to content

Improve S3 API compatibility#16

Merged
krokicki merged 14 commits into
mainfrom
accept-header
Apr 12, 2026
Merged

Improve S3 API compatibility#16
krokicki merged 14 commits into
mainfrom
accept-header

Conversation

@krokicki
Copy link
Copy Markdown
Member

@krokicki krokicki commented Apr 12, 2026

This branch attempts to bring x2s3 closer into alignment with the AWS S3 API standards and conventions, for the purposes of better supporting a diverse set of clients including Java clients like N5 Viewer.

The main change is that we now check the Accept header and return HTML or XML based on the preference. I then asked Claude to write tests of an AWS S3 bucket API, and compare to the same bucket being proxied by x2s3. It identified many small differences which have been either fixed or documented below.

Finally, I added Java-based integration tests that run inside a Pixi environment.

1. Use Accept header to determine HTML vs XML responses

When the UI is enabled, the proxy previously always returned HTML for the bucket index (GET /) and directory listings (GET /bucket/prefix/). This broke S3 API clients that expected XML.

Now the proxy checks the Accept header: HTML is only returned when text/html appears before application/xml. Otherwise, proper S3-compatible XML is returned (ListAllMyBucketsResult for the index, ListObjectsV2 for directory listings).

2. Accept max-keys=0 instead of returning 400

AWS S3 accepts max-keys=0 and returns an empty listing with IsTruncated=false. The proxy was rejecting it with 400 due to Pydantic's ge=1 constraint.

Fixed by changing the constraint to ge=0. Also added an early return in the file client's walk_path so it returns an empty listing rather than incorrectly reporting IsTruncated=true.

3. Accept max-keys values above 1000 instead of returning 400

AWS S3 accepts max-keys=1001 (or higher) and simply returns up to 1000 results, echoing the requested value back in MaxKeys. The proxy was rejecting any value over 1000 with 400 due to Pydantic's le=1000 constraint.

Fixed by removing the upper-bound constraint. The aioboto client passes the value through to upstream S3 which handles it naturally. The file client returns however many items are requested (no artificial cap).

4. Always emit <Prefix> and <KeyCount> in ListObjectsV2 XML

AWS S3 always includes <Prefix></Prefix> (even when empty) and <KeyCount>0</KeyCount> (even when zero) in ListObjectsV2 responses. The proxy was omitting these elements because add_telem() in utils.py skipped any falsy value ("", 0, None).

Fixed by changing the guard from if not value to if value is None, so empty strings and zero are now emitted. Also fixed a related issue in the aioboto client where NextContinuationToken defaulted to "" instead of None, which would have caused an empty <NextContinuationToken> element to appear on non-truncated responses. Both clients now default Prefix to '' (instead of None) so the element is always emitted.

5. Include Last-Modified and ETag headers in GetObject responses

The proxy's GetObject streaming response was missing the Last-Modified and ETag headers that AWS S3 always returns. The upstream S3 response included them in res_headers, but open_object() in the aioboto client was not copying them through. (The head_object() method already returned these headers correctly.)

6. Add xmlns namespace to XML responses

AWS S3 includes xmlns="http://s3.amazonaws.com/doc/2006-03-01/" on root elements of all XML responses. The proxy was omitting it. Some S3 client libraries may depend on this namespace.

Added the namespace to ListBucketResult and ListAllMyBucketsResult root elements. Updated parse_xml() to strip namespace prefixes so internal XML parsing (e.g. the browse UI) continues to work with plain tag names.

7. Fix LastModified timestamp format

Proxy was returning 2024-07-26T13:39:10+00:00 (Python's isoformat()). AWS returns 2024-07-26T13:39:10.000Z (milliseconds with Z suffix). Some S3 clients may parse timestamps strictly.

Fixed both the aioboto client (which called .isoformat() on boto datetimes) and format_timestamp_s3() in utils (used by the file client) to use strftime("%Y-%m-%dT%H:%M:%S.000Z").

8. Fix XML element ordering in ListObjectsV2

Reordered the keys list in get_list_xml() to match AWS S3's element order: Name, Prefix, StartAfter, ContinuationToken, NextContinuationToken, KeyCount, MaxKeys, Delimiter, EncodingType, IsTruncated, followed by CommonPrefixes, then Contents.

9. Fix HEAD /bucket returning 500 instead of 200

HEAD on a bucket root (e.g. HEAD /janelia-data-examples) was passing an empty key to client.head_object(""), which failed with a 500 error. AWS returns 200 for this operation (HeadBucket).

Fixed by returning 200 with Content-Type: application/xml when the target path is empty, before attempting to HEAD an object.

Remaining differences from AWS S3

These were identified during testing but intentionally not addressed.

By design

  • ListBuckets (GET /): AWS returns 307 (requires auth); proxy returns 200 with its configured target list.
  • GetBucketAcl: AWS returns 403 AccessDenied for anonymous requests; proxy returns a synthetic read-only ACL (200).
  • Content-Disposition on octet-stream files: Proxy adds Content-Disposition: attachment for application/octet-stream files. AWS does not.

Not supported

  • ListObjects v1 (list-type=1): Proxy returns 400. AWS supports v1 and also uses it as the default when no list-type is given. Proxy always returns v2 format.
  • fetch-owner=true: AWS returns <Owner> elements in each <Contents> entry. Proxy ignores this parameter and never includes Owner data.

Cosmetic

  • ContinuationToken values differ: Expected — tokens are opaque and server-specific.
  • Server header: AWS returns AmazonS3, proxy returns uvicorn.
  • XML declaration quoting: AWS uses <?xml version="1.0" encoding="UTF-8"?> (double quotes, uppercase). Proxy uses single quotes and lowercase. Both are valid XML.
  • Empty element serialization: AWS uses <Prefix></Prefix>, proxy uses <Prefix /> (Python ElementTree self-closing tag). Both are semantically identical XML.
  • Error XML missing <RequestId> and <HostId>: AWS includes these in error responses. Proxy has no real values to provide.
  • Error XML formatting: AWS uses compact single-line XML. Proxy uses indented multi-line.
  • binary/octet-stream vs application/octet-stream: AWS uses the non-standard binary/octet-stream for extensionless files. Proxy uses the IANA-standard application/octet-stream.
  • x-amz-server-side-encryption: AWS returns this header on HeadObject. Proxy does not (it's an AWS-specific detail).
  • x-amz-bucket-region, x-amz-request-id: AWS returns these on HeadBucket. Proxy does not.

@StephanPreibisch @cmhulbert @bogovicj @neomorphic

@krokicki krokicki merged commit 866a7d4 into main Apr 12, 2026
4 checks passed
@krokicki krokicki deleted the accept-header branch April 12, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant