UN-3343 [FEAT] Support AWS S3 connector authentication via IRSA#1866
UN-3343 [FEAT] Support AWS S3 connector authentication via IRSA#1866kirtimanmishrazipstack wants to merge 8 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds explicit typing and expands the S3/MinIO exception substring→message mapping; no control-flow change in exception handling. MinioFS now sanitizes credential inputs, builds a creds dict only from non-empty values and passes it via **creds to S3FileSystem; logging was converted to parameterized form. JSON schema no longer requires key/secret and clarifies descriptions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR enables the MinIO/S3 connector to authenticate via IAM roles (IRSA, EC2 instance profiles) when static credentials are left blank, by conditionally passing
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[MinioFS.__init__ called] --> B{key AND secret\nboth non-empty?}
B -- Yes --> C[creds = key + secret]
B -- No --> D[creds = empty\nboto3 credential chain]
C --> E{endpoint_url\nnon-empty?}
D --> E
E -- Yes --> F[creds += endpoint_url]
E -- No --> G[S3FileSystem\nno endpoint_url]
F --> H[S3FileSystem\nwith explicit creds + endpoint]
G --> I[boto3 resolves\nendpoint automatically]
H --> J[test_credentials calls ls]
I --> J
J -- Success --> K[return True]
J -- Exception --> L[handle_s3fs_exception]
L --> M{substring match\nin S3FS_EXC_TO_UNSTRACT_EXC?}
M -- Match --> N[User-friendly ConnectorError]
M -- No match --> O[Raw exception string\nin ConnectorError]
Prompt To Fix All With AIThis is a comment left during a code review.
Path: unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Line: 19-22
Comment:
**Kubernetes-specific guidance in a general OIDC error**
The `InvalidIdentityToken` message references "ServiceAccount configuration", but `InvalidIdentityToken` is raised by any OIDC-based web identity federation — including GitHub Actions OIDC, GitLab CI OIDC, and on-premise OIDC providers. "ServiceAccount" is a Kubernetes-only concept, so this guidance is actively misleading for non-EKS OIDC callers.
Consider making the message provider-agnostic:
```suggestion
"InvalidIdentityToken": (
"The identity token provided is invalid. Verify the OIDC provider "
"configuration and ensure the token audience/issuer matches the IAM "
"role's trust policy."
),
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Line: 28-33
Comment:
**`s3:DeleteObject` omitted from `AccessDenied` permission guidance**
`can_write()` returns `True` and the connector supports file deletion. The `AccessDenied` message lists `s3:ListAllMyBuckets, s3:ListBucket, s3:GetObject, s3:PutObject` but silently drops `s3:DeleteObject`. A user whose role is missing only `s3:DeleteObject` will follow this guidance exactly, grant every listed permission, and still receive `AccessDenied` when deleting files — with no clue as to why.
```suggestion
"AccessDenied": (
"Access denied. The IAM user or role does not have sufficient S3 "
"permissions. Ensure the policy grants the required S3 actions "
"(s3:ListAllMyBuckets, s3:ListBucket, s3:GetObject, s3:PutObject, "
"s3:DeleteObject) on the target bucket."
),
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "PR comments" |
unstract/connectors/src/unstract/connectors/filesystems/minio/minio.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
unstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json (1)
5-9:⚠️ Potential issue | 🟡 MinorInconsistency:
endpoint_urlis required but description says "leave blank".The
endpoint_urlfield is listed in therequiredarray (line 7) but its description says "leave blank for default AWS S3" (line 32). This is contradictory. Additionally, it has a non-empty default value (https://s3.amazonaws.com), so users technically cannot leave it blank.Consider either:
- Removing
endpoint_urlfrom therequiredarray if blank values should be allowed, or- Updating the description to clarify that the default value is used for AWS S3.
📝 Suggested description fix (if keeping required)
"endpoint_url": { "type": "string", "title": "Endpoint URL", "default": "https://s3.amazonaws.com", - "description": "Endpoint URL (leave blank for default AWS S3)" + "description": "Endpoint URL (use default for AWS S3, or provide custom URL for MinIO/S3-compatible storage)" },Also applies to: 28-32
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json` around lines 5 - 9, The JSON schema currently lists "endpoint_url" in the "required" array while its description tells users to "leave blank for default AWS S3" and it has a non-empty default; fix by either (A) removing "endpoint_url" from the "required" array so blank/omitted values are allowed, or (B) keep it required but update the "endpoint_url" description and default to clearly state that the default "https://s3.amazonaws.com" will be used for AWS S3 (and remove the "leave blank" wording); locate the "required" array and the "endpoint_url" property in the schema to apply the chosen change (modify the "required" array or the "endpoint_url" property's "description" and/or "default").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/connectors/src/unstract/connectors/filesystems/minio/minio.py`:
- Around line 19-33: The code in minio.py currently treats partial credentials
(only key or only secret) as absent and silently falls back to IAM; update the
validation around key/secret (the variables key, secret, creds and the
assignment to self._using_static_creds) to detect the case where exactly one of
key or secret is provided and either raise a ConnectorError with a clear message
or emit a warning that static credentials are incomplete and IAM will be used;
if you choose to raise, import ConnectorError from
unstract.connectors.exceptions and raise it before building creds, otherwise
call the module logger to warn and ensure self._using_static_creds remains
False.
In
`@unstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json`:
- Around line 34-39: The JSON schema's "region_name" property has an invalid
default "ap-south"; update the default to a valid AWS region identifier (e.g.,
"ap-south-1") or set the default to null/"" to force explicit configuration;
modify the "region_name" object's "default" value in json_schema.json
accordingly so it uses a correct region code or an empty value.
---
Outside diff comments:
In
`@unstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json`:
- Around line 5-9: The JSON schema currently lists "endpoint_url" in the
"required" array while its description tells users to "leave blank for default
AWS S3" and it has a non-empty default; fix by either (A) removing
"endpoint_url" from the "required" array so blank/omitted values are allowed, or
(B) keep it required but update the "endpoint_url" description and default to
clearly state that the default "https://s3.amazonaws.com" will be used for AWS
S3 (and remove the "leave blank" wording); locate the "required" array and the
"endpoint_url" property in the schema to apply the chosen change (modify the
"required" array or the "endpoint_url" property's "description" and/or
"default").
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ef8d51b7-40d9-48d6-b173-e8cd2c792231
📒 Files selected for processing (3)
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.pyunstract/connectors/src/unstract/connectors/filesystems/minio/minio.pyunstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json
unstract/connectors/src/unstract/connectors/filesystems/minio/minio.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
…stack/unstract into UN-3343-Support-AWS-S3-connector-IAM
…ort-AWS-S3-connector-IAM
unstract/connectors/src/unstract/connectors/filesystems/minio/static/json_schema.json
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py`:
- Around line 30-35: The shared AccessDenied message (_ACCESS_DENIED_MSG) is
AWS-specific (mentions IAM and s3:* actions) but is used in the common mapping
S3FS_EXC_TO_UNSTRACT_EXC_COMMON and thus can mislead non-AWS backends like
MinIO; change the string to be backend-agnostic and actionable (e.g., advise
checking bucket permissions, credentials, and access policies for the storage
backend or service) and remove the hardcoded AWS IAM action names so callers
using MinIO or other S3-compatible systems receive correct, generic permission
guidance.
- Around line 12-15: The ambient-auth fallback message in _AMBIENT_AUTH_MSG is
too IRSA-specific; update its text to reference the full boto3/default
credential chain (environment variables, shared credentials/profile, IAM role,
IRSA/service account, etc.) so guidance covers bad env/profile credentials as
well; make the same wording change for the other ambient-auth message variant
around lines 38-40 (the other AWS auth fallback constant) so both constants
consistently mention environment variables, shared credentials/profiles, IAM
roles and IRSA.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cf6be913-3dea-4ea9-a6a0-a258ebfa37e3
📒 Files selected for processing (1)
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
♻️ Duplicate comments (2)
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py (2)
20-23:⚠️ Potential issue | 🟡 MinorBroaden ambient-auth guidance beyond IRSA-only hints.
Line 20 currently points users mainly to IAM role/IRSA, but ambient mode can also fail due to env/shared-profile credentials in the default AWS chain.
Suggested wording update
_AMBIENT_AUTH_MSG = ( - "AWS authentication failed — verify IAM role permissions " - "or IRSA service account annotation." + "AWS authentication failed — verify the active credential source in the " + "default AWS credential chain (environment variables, shared credentials/profile, " + "instance/task role, or IRSA service account annotation)." )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py` around lines 20 - 23, Update the _AMBIENT_AUTH_MSG constant to broaden ambient-auth troubleshooting beyond IRSA by mentioning other credential sources in the default AWS credential chain (environment variables, shared credentials/profile, and EC2/ECS instance roles) so users know to check env vars and shared AWS profiles as well as IAM/IRSA; modify the string value of _AMBIENT_AUTH_MSG in exceptions.py to include these additional hints while preserving the existing IAM/IRSA guidance.
38-43:⚠️ Potential issue | 🟠 MajorKeep shared
AccessDeniedguidance backend-agnostic.Line 38 is used for both S3 and MinIO (Line 74), but the current message is AWS IAM-action specific and can misdirect MinIO users.
Suggested backend-agnostic message
_ACCESS_DENIED_MSG = ( - "Access denied — the IAM user or role does not have sufficient S3 " - "permissions. Ensure the policy grants s3:ListAllMyBuckets (for connection " - "test), s3:GetObject, s3:PutObject, s3:ListBucket, and s3:DeleteObject on " - "the target bucket." + "Access denied — the configured credentials do not have sufficient " + "permissions for the target bucket. Verify the attached S3/MinIO policy " + "allows the operations required by this connector." )Also applies to: 74-77
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py` around lines 38 - 43, The current _ACCESS_DENIED_MSG is AWS IAM-specific and can confuse MinIO users; update the _ACCESS_DENIED_MSG (and any duplicate at the MinIO-specific message) to a backend-agnostic string that removes AWS IAM action names and instead states that the client lacks required S3-compatible permissions (e.g., needs list, get, put, and delete access on the target bucket and list permission for connection tests), and ensure both uses reference this single generic message variable (replace the MinIO-specific copy with the shared _ACCESS_DENIED_MSG).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py`:
- Around line 20-23: Update the _AMBIENT_AUTH_MSG constant to broaden
ambient-auth troubleshooting beyond IRSA by mentioning other credential sources
in the default AWS credential chain (environment variables, shared
credentials/profile, and EC2/ECS instance roles) so users know to check env vars
and shared AWS profiles as well as IAM/IRSA; modify the string value of
_AMBIENT_AUTH_MSG in exceptions.py to include these additional hints while
preserving the existing IAM/IRSA guidance.
- Around line 38-43: The current _ACCESS_DENIED_MSG is AWS IAM-specific and can
confuse MinIO users; update the _ACCESS_DENIED_MSG (and any duplicate at the
MinIO-specific message) to a backend-agnostic string that removes AWS IAM action
names and instead states that the client lacks required S3-compatible
permissions (e.g., needs list, get, put, and delete access on the target bucket
and list permission for connection tests), and ensure both uses reference this
single generic message variable (replace the MinIO-specific copy with the shared
_ACCESS_DENIED_MSG).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f7055e6c-8213-44ff-bbec-afe42832c261
📒 Files selected for processing (1)
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
unstract/connectors/src/unstract/connectors/filesystems/minio/minio.py (1)
27-31:⚠️ Potential issue | 🟡 MinorHandle one-sided static credentials explicitly.
At Line 27-Line 31, providing only one of
key/secretstill silently falls back to ambient auth, which can be surprising for users who intended static auth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/connectors/src/unstract/connectors/filesystems/minio/minio.py` around lines 27 - 31, The code silently ignores one-sided static credentials (only key or only secret) and falls back to ambient auth; update the logic in minio.py where creds is built so that if exactly one of key or secret is provided (key XOR secret) you raise a clear exception (e.g., ValueError) or return an explicit error explaining both must be supplied, rather than silently omitting the provided value; keep the existing handling for both provided (add both to creds) and for neither provided (keep ambient auth), and mention the variables key, secret and creds in your change so it’s easy to locate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@unstract/connectors/src/unstract/connectors/filesystems/minio/minio.py`:
- Around line 27-31: The code silently ignores one-sided static credentials
(only key or only secret) and falls back to ambient auth; update the logic in
minio.py where creds is built so that if exactly one of key or secret is
provided (key XOR secret) you raise a clear exception (e.g., ValueError) or
return an explicit error explaining both must be supplied, rather than silently
omitting the provided value; keep the existing handling for both provided (add
both to creds) and for neither provided (keep ambient auth), and mention the
variables key, secret and creds in your change so it’s easy to locate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4dec7188-f37f-4473-805f-129c53fa64ad
📒 Files selected for processing (2)
unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.pyunstract/connectors/src/unstract/connectors/filesystems/minio/minio.py
🚧 Files skipped from review as they are similar to previous changes (1)
- unstract/connectors/src/unstract/connectors/filesystems/minio/exceptions.py
|



What
Why
How
minio.py:MinioFS.__init__now strips and normaliseskey,secret, andendpoint_url. Only passes them toS3FileSystemwhen non-empty, letting boto3's default credential chain take over otherwise. Tracks a_using_static_credsflag for error messaging. Logging switched to lazy%formatting.exceptions.py: Splits the singleS3FS_EXC_TO_UNSTRACT_EXCdict into three maps —_STATIC(key/secret errors),_AMBIENT(IRSA/IAM errors),_COMMON(shared errors like port/bucket).handle_s3fs_exceptiongains ausing_static_credsparam to select the right error map.json_schema.json:keyandsecretremoved fromrequiredarray (now optional). Field descriptions updated to mention "leave blank to use IAM role / instance profile".endpoint_urldescription simplified.Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
S3FileSystemexactly as before. The new IAM path only activates when credentials are explicitly left blank.Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
Screenshots
Checklist
I have read and understood the Contribution Guidelines.