Improve S3 API compatibility by krokicki · Pull Request #16 · JaneliaSciComp/x2s3

krokicki · 2026-04-12T12:33:31Z

This branch attempts to bring x2s3 closer into alignment with the AWS S3 API standards and conventions, for the purposes of better supporting a diverse set of clients including Java clients like N5 Viewer.

The main change is that we now check the Accept header and return HTML or XML based on the preference. I then asked Claude to write tests of an AWS S3 bucket API, and compare to the same bucket being proxied by x2s3. It identified many small differences which have been either fixed or documented below.

Finally, I added Java-based integration tests that run inside a Pixi environment.

1. Use Accept header to determine HTML vs XML responses

When the UI is enabled, the proxy previously always returned HTML for the bucket index (GET /) and directory listings (GET /bucket/prefix/). This broke S3 API clients that expected XML.

Now the proxy checks the Accept header: HTML is only returned when text/html appears before application/xml. Otherwise, proper S3-compatible XML is returned (ListAllMyBucketsResult for the index, ListObjectsV2 for directory listings).

2. Accept `max-keys=0` instead of returning 400

AWS S3 accepts max-keys=0 and returns an empty listing with IsTruncated=false. The proxy was rejecting it with 400 due to Pydantic's ge=1 constraint.

Fixed by changing the constraint to ge=0. Also added an early return in the file client's walk_path so it returns an empty listing rather than incorrectly reporting IsTruncated=true.

3. Accept `max-keys` values above 1000 instead of returning 400

AWS S3 accepts max-keys=1001 (or higher) and simply returns up to 1000 results, echoing the requested value back in MaxKeys. The proxy was rejecting any value over 1000 with 400 due to Pydantic's le=1000 constraint.

Fixed by removing the upper-bound constraint. The aioboto client passes the value through to upstream S3 which handles it naturally. The file client returns however many items are requested (no artificial cap).

4. Always emit `<Prefix>` and `<KeyCount>` in ListObjectsV2 XML

AWS S3 always includes <Prefix></Prefix> (even when empty) and <KeyCount>0</KeyCount> (even when zero) in ListObjectsV2 responses. The proxy was omitting these elements because add_telem() in utils.py skipped any falsy value ("", 0, None).

Fixed by changing the guard from if not value to if value is None, so empty strings and zero are now emitted. Also fixed a related issue in the aioboto client where NextContinuationToken defaulted to "" instead of None, which would have caused an empty <NextContinuationToken> element to appear on non-truncated responses. Both clients now default Prefix to '' (instead of None) so the element is always emitted.

5. Include `Last-Modified` and `ETag` headers in GetObject responses

The proxy's GetObject streaming response was missing the Last-Modified and ETag headers that AWS S3 always returns. The upstream S3 response included them in res_headers, but open_object() in the aioboto client was not copying them through. (The head_object() method already returned these headers correctly.)

6. Add `xmlns` namespace to XML responses

AWS S3 includes xmlns="http://s3.amazonaws.com/doc/2006-03-01/" on root elements of all XML responses. The proxy was omitting it. Some S3 client libraries may depend on this namespace.

Added the namespace to ListBucketResult and ListAllMyBucketsResult root elements. Updated parse_xml() to strip namespace prefixes so internal XML parsing (e.g. the browse UI) continues to work with plain tag names.

7. Fix `LastModified` timestamp format

Proxy was returning 2024-07-26T13:39:10+00:00 (Python's isoformat()). AWS returns 2024-07-26T13:39:10.000Z (milliseconds with Z suffix). Some S3 clients may parse timestamps strictly.

Fixed both the aioboto client (which called .isoformat() on boto datetimes) and format_timestamp_s3() in utils (used by the file client) to use strftime("%Y-%m-%dT%H:%M:%S.000Z").

8. Fix XML element ordering in ListObjectsV2

Reordered the keys list in get_list_xml() to match AWS S3's element order: Name, Prefix, StartAfter, ContinuationToken, NextContinuationToken, KeyCount, MaxKeys, Delimiter, EncodingType, IsTruncated, followed by CommonPrefixes, then Contents.

9. Fix `HEAD /bucket` returning 500 instead of 200

HEAD on a bucket root (e.g. HEAD /janelia-data-examples) was passing an empty key to client.head_object(""), which failed with a 500 error. AWS returns 200 for this operation (HeadBucket).

Fixed by returning 200 with Content-Type: application/xml when the target path is empty, before attempting to HEAD an object.

Remaining differences from AWS S3

These were identified during testing but intentionally not addressed.

By design

ListBuckets (GET /): AWS returns 307 (requires auth); proxy returns 200 with its configured target list.
GetBucketAcl: AWS returns 403 AccessDenied for anonymous requests; proxy returns a synthetic read-only ACL (200).
Content-Disposition on octet-stream files: Proxy adds Content-Disposition: attachment for application/octet-stream files. AWS does not.

Not supported

ListObjects v1 (list-type=1): Proxy returns 400. AWS supports v1 and also uses it as the default when no list-type is given. Proxy always returns v2 format.
fetch-owner=true: AWS returns <Owner> elements in each <Contents> entry. Proxy ignores this parameter and never includes Owner data.

Cosmetic

ContinuationToken values differ: Expected — tokens are opaque and server-specific.
Server header: AWS returns AmazonS3, proxy returns uvicorn.
XML declaration quoting: AWS uses <?xml version="1.0" encoding="UTF-8"?> (double quotes, uppercase). Proxy uses single quotes and lowercase. Both are valid XML.
Empty element serialization: AWS uses <Prefix></Prefix>, proxy uses <Prefix /> (Python ElementTree self-closing tag). Both are semantically identical XML.
Error XML missing <RequestId> and <HostId>: AWS includes these in error responses. Proxy has no real values to provide.
Error XML formatting: AWS uses compact single-line XML. Proxy uses indented multi-line.
binary/octet-stream vs application/octet-stream: AWS uses the non-standard binary/octet-stream for extensionless files. Proxy uses the IANA-standard application/octet-stream.
x-amz-server-side-encryption: AWS returns this header on HeadObject. Proxy does not (it's an AWS-specific detail).
x-amz-bucket-region, x-amz-request-id: AWS returns these on HeadBucket. Proxy does not.

@StephanPreibisch @cmhulbert @bogovicj @neomorphic

krokicki added 14 commits April 11, 2026 16:11

use Accept header to determine whether to return html or xml

9bc479a

Merge branch 'main' into accept-header

2f5f5c2

match AWS behavior for max-keys=0 and max-keys>1000

3e928e9

implement HEAD /bucket

6576776

always emit Prefix and KeyCount

5cc0835

proxy etag and last-modified when available

e45a732

output namespaced xml

0097ec3

changed LastModified timestamp format to match AWS

dc89253

change Element ordering to match AWS

1d2eb99

ensure Prefix appears when empty

71fc3fd

added tests and scripts

de68b5e

added test-java env

165284a

default to localhost for java test

32303c3

run CI on PRs

3fee270

krokicki merged commit 866a7d4 into main Apr 12, 2026
4 checks passed

krokicki deleted the accept-header branch April 12, 2026 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve S3 API compatibility#16

Improve S3 API compatibility#16
krokicki merged 14 commits into
mainfrom
accept-header

krokicki commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krokicki commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Use Accept header to determine HTML vs XML responses

2. Accept max-keys=0 instead of returning 400

3. Accept max-keys values above 1000 instead of returning 400

4. Always emit <Prefix> and <KeyCount> in ListObjectsV2 XML

5. Include Last-Modified and ETag headers in GetObject responses

6. Add xmlns namespace to XML responses

7. Fix LastModified timestamp format

8. Fix XML element ordering in ListObjectsV2

9. Fix HEAD /bucket returning 500 instead of 200

Remaining differences from AWS S3

By design

Not supported

Cosmetic

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krokicki commented Apr 12, 2026 •

edited

Loading

2. Accept `max-keys=0` instead of returning 400

3. Accept `max-keys` values above 1000 instead of returning 400

4. Always emit `<Prefix>` and `<KeyCount>` in ListObjectsV2 XML

5. Include `Last-Modified` and `ETag` headers in GetObject responses

6. Add `xmlns` namespace to XML responses

7. Fix `LastModified` timestamp format

9. Fix `HEAD /bucket` returning 500 instead of 200