[SECURITY] Add input validation and CSP headers#1834
[SECURITY] Add input validation and CSP headers#1834hari-kuriakose wants to merge 6 commits intomainfrom
Conversation
- Add server-side HTML/script injection validation for name and description fields across all user-facing serializers (CWE-20) - Add Content-Security-Policy header via Django middleware (API) and nginx config (frontend) to mitigate XSS and data injection attacks - Change SESSION_COOKIE_SECURE and CSRF_COOKIE_SECURE defaults to True so cookies are only sent over HTTPS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WalkthroughAdds regex-based input sanitizers and integrates them as field-level validators across multiple backend serializers; introduces a Content-Security-Policy middleware and adds a comprehensive CSP header to the frontend nginx configuration to emit CSP on all responses. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
backend/utils/tests/test_input_sanitizer.py (1)
7-69: Add a regression test for benignconnection=okinput.This will protect against false positives in event-handler detection.
Test addition
class TestValidateNoHtmlTags: @@ def test_rejects_event_handler_case_insensitive(self): with pytest.raises( ValidationError, match="must not contain event handler attributes" ): validate_no_html_tags("ONLOAD=alert(1)") + + def test_allows_benign_assignment_text(self): + assert validate_no_html_tags("connection=ok") == "connection=ok"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/utils/tests/test_input_sanitizer.py` around lines 7 - 69, The tests are missing a regression case for benign strings like "connection=ok" that could be mis-detected as an event-handler; add a new test method in TestValidateNoHtmlTags (e.g., test_allows_connection_equals_ok) that asserts validate_no_html_tags("connection=ok") returns "connection=ok" (no ValidationError), ensuring the event-handler detection in validate_no_html_tags doesn't yield false positives for this pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/adapter_processor_v2/serializers.py`:
- Around line 32-43: The validate method in the serializer overrides parent
validation and omits calling super(), so parent logic in AuditSerializer is
bypassed; update the validate(self, data) implementation to first call validated
= super().validate(data) (or merge with returned value) and then apply the
adapter-specific checks (validate_name_field and validate_no_html_tags) on that
validated dict, finally returning the validated dict so the AuditSerializer
validations are preserved.
In `@backend/api_v2/serializers.py`:
- Around line 69-70: The validate_description method can raise TypeError if
value is None because validate_no_html_tags expects a str; update
validate_description (in the same serializer) to guard against None the same way
BaseAdapterSerializer.validate does — e.g., if value is None return value or
coerce to "" before calling validate_no_html_tags — then call
validate_no_html_tags(value, field_name="Description") so validate_no_html_tags
never receives None.
In `@backend/middleware/content_security_policy.py`:
- Around line 17-27: Change the unconditional header assignment in the
middleware so it doesn't clobber any existing route-specific CSP: instead of
directly setting response["Content-Security-Policy"], use
response.setdefault("Content-Security-Policy", <policy string>) so the default
policy (the same multi-line string currently assigned) is applied only when the
header is not already present; update the code in content_security_policy.py
where the header is set to use response.setdefault to preserve any prior CSP set
by views or earlier middleware.
In `@backend/utils/input_sanitizer.py`:
- Around line 10-20: EVENT_HANDLER_PATTERN is too broad and matches substrings
inside benign words (e.g., "connection=..."); update the regex used by
validate_no_html_tags to only match true HTML attribute contexts by changing
EVENT_HANDLER_PATTERN to require a word boundary or a preceding '<' or
whitespace before the attribute name (for example use a pattern like a
lookbehind for whitespace or '<' followed by "on" + letters and "=" with
re.IGNORECASE) so it no longer flags tokens like "connection=" while still
catching real event-handler attributes referenced in validate_no_html_tags.
---
Nitpick comments:
In `@backend/utils/tests/test_input_sanitizer.py`:
- Around line 7-69: The tests are missing a regression case for benign strings
like "connection=ok" that could be mis-detected as an event-handler; add a new
test method in TestValidateNoHtmlTags (e.g., test_allows_connection_equals_ok)
that asserts validate_no_html_tags("connection=ok") returns "connection=ok" (no
ValidationError), ensuring the event-handler detection in validate_no_html_tags
doesn't yield false positives for this pattern.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2d27dfb1-b259-4452-adef-342aba0d383a
📒 Files selected for processing (13)
backend/account_v2/serializer.pybackend/adapter_processor_v2/serializers.pybackend/api_v2/serializers.pybackend/backend/settings/base.pybackend/connector_v2/serializers.pybackend/middleware/content_security_policy.pybackend/notification_v2/serializers.pybackend/prompt_studio/prompt_studio_core_v2/serializers.pybackend/utils/input_sanitizer.pybackend/utils/tests/__init__.pybackend/utils/tests/test_input_sanitizer.pybackend/workflow_manager/workflow_v2/serializers.pyfrontend/nginx.conf
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>
for more information, see https://pre-commit.ci
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hari John Kuriakose <hari@zipstack.com>
Frontend Lint Report (Biome)✅ All checks passed! No linting or formatting issues found. |
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/utils/input_sanitizer.py (1)
10-20:⚠️ Potential issue | 🟠 MajorNarrow
EVENT_HANDLER_PATTERNfurther.This still rejects benign
on... =text such asoncall = primaryoronboarding = enabled, so valid names/descriptions can fail across every serializer using this helper. Please constrain the match to actual HTML attribute contexts or a vetted set of real DOM event names instead of anyon\w+=token.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/utils/input_sanitizer.py` around lines 10 - 20, EVENT_HANDLER_PATTERN is too broad and rejects benign tokens like "oncall" or "onboarding"; narrow it by matching only real DOM event names or actual HTML attribute contexts. Update EVENT_HANDLER_PATTERN (used by validate_no_html_tags) to either (a) match a vetted list of event names (e.g., click, change, load, submit, mouseover, keydown, input, focus, blur, etc.) as anchors like on(click|change|...) followed by optional whitespace and '=', or (b) require HTML attribute context by ensuring the token appears as an attribute (e.g., preceded by '<tag ' or whitespace within a tag) before the '='; replace the existing pattern with one of these stricter patterns and add a unit test demonstrating that "oncall = primary" and "onboarding = enabled" pass while real event attributes like "onclick=" are still rejected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/middleware/content_security_policy.py`:
- Around line 6-22: The CSP middleware's process_response currently sets
"script-src 'self'" and "style-src 'self'", which blocks the inline <script>
used by the login page; update the header value set in process_response to
permit inline scripts/styles (e.g., add 'unsafe-inline' to both script-src and
style-src) or implement a nonce-based CSP and inject matching nonces into the
login template rendered by authentication_service.render; also update the
middleware docstring to remove the incorrect "JSON API backend" claim. Use the
process_response function and the "Content-Security-Policy" header string as the
change points, or alternatively refactor backend/account_v2/templates/login.html
to use external JS and then keep the stricter CSP.
---
Duplicate comments:
In `@backend/utils/input_sanitizer.py`:
- Around line 10-20: EVENT_HANDLER_PATTERN is too broad and rejects benign
tokens like "oncall" or "onboarding"; narrow it by matching only real DOM event
names or actual HTML attribute contexts. Update EVENT_HANDLER_PATTERN (used by
validate_no_html_tags) to either (a) match a vetted list of event names (e.g.,
click, change, load, submit, mouseover, keydown, input, focus, blur, etc.) as
anchors like on(click|change|...) followed by optional whitespace and '=', or
(b) require HTML attribute context by ensuring the token appears as an attribute
(e.g., preceded by '<tag ' or whitespace within a tag) before the '='; replace
the existing pattern with one of these stricter patterns and add a unit test
demonstrating that "oncall = primary" and "onboarding = enabled" pass while real
event attributes like "onclick=" are still rejected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 20bde572-2bc1-45f4-a2ac-5ffc1c2e7e5f
📒 Files selected for processing (4)
backend/adapter_processor_v2/serializers.pybackend/api_v2/serializers.pybackend/middleware/content_security_policy.pybackend/utils/input_sanitizer.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/api_v2/serializers.py
- backend/adapter_processor_v2/serializers.py
| """Middleware to add Content-Security-Policy header to all responses. | ||
|
|
||
| Since this is a JSON API backend, the policy is restrictive by default: | ||
| only 'self' is allowed for all directives, and no inline scripts or styles | ||
| are permitted. This prevents any injected content from being executed if a | ||
| response is ever rendered in a browser context. | ||
| """ | ||
|
|
||
| def process_response( | ||
| self, request: HttpRequest, response: HttpResponse | ||
| ) -> HttpResponse: | ||
| response.setdefault( | ||
| "Content-Security-Policy", | ||
| ( | ||
| "default-src 'self'; " | ||
| "script-src 'self'; " | ||
| "style-src 'self'; " |
There was a problem hiding this comment.
CSP blocks inline scripts required by the login page.
The docstring states this is a "JSON API backend" with no inline scripts permitted, but the backend serves HTML pages with inline JavaScript. Context snippet 3 shows backend/account_v2/templates/login.html contains an inline <script> block for the form submit spinner, and context snippet 4 confirms authentication_service.py renders these templates via Django's render().
The current script-src 'self' (line 21) without 'unsafe-inline' will block that inline script, breaking the login page functionality.
🐛 Proposed fix: Add 'unsafe-inline' for script-src and style-src
def process_response(
self, request: HttpRequest, response: HttpResponse
) -> HttpResponse:
response.setdefault(
"Content-Security-Policy",
(
"default-src 'self'; "
- "script-src 'self'; "
- "style-src 'self'; "
+ "script-src 'self' 'unsafe-inline'; "
+ "style-src 'self' 'unsafe-inline'; "
"img-src 'self'; "
"font-src 'self'; "
"connect-src 'self'; "
"frame-ancestors 'none'; "
"base-uri 'self'; "
"form-action 'self'"
),
)
return responseAlternatively, refactor the login template to use an external JS file or add a nonce-based CSP for stronger security. Also update the docstring to remove the misleading "JSON API backend" claim.
The AI summary states script-src 'self' 'unsafe-inline' but the actual code at line 21 shows script-src 'self' without 'unsafe-inline'.
🧰 Tools
🪛 Ruff (0.15.4)
[warning] 15-15: Unused method argument: request
(ARG002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/middleware/content_security_policy.py` around lines 6 - 22, The CSP
middleware's process_response currently sets "script-src 'self'" and "style-src
'self'", which blocks the inline <script> used by the login page; update the
header value set in process_response to permit inline scripts/styles (e.g., add
'unsafe-inline' to both script-src and style-src) or implement a nonce-based CSP
and inject matching nonces into the login template rendered by
authentication_service.render; also update the middleware docstring to remove
the incorrect "JSON API backend" claim. Use the process_response function and
the "Content-Security-Policy" header string as the change points, or
alternatively refactor backend/account_v2/templates/login.html to use external
JS and then keep the stricter CSP.
| add_header Content-Security-Policy | ||
| "default-src 'self'; " | ||
| "script-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net https://unpkg.com https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.googletagmanager.com https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://js.stripe.com https://app.productfruits.com; " | ||
| "style-src 'self' 'unsafe-inline'; " | ||
| "img-src 'self' data: blob: https:; " | ||
| "font-src 'self' data:; " | ||
| "connect-src 'self' ws: wss: https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.google-analytics.com https://api.stripe.com https://app.productfruits.com; " | ||
| "frame-src 'self' https://www.google.com/recaptcha/ https://recaptcha.google.com https://js.stripe.com https://hooks.stripe.com; " | ||
| "worker-src 'self' blob: https://unpkg.com https://cdn.jsdelivr.net; " | ||
| "object-src 'none'; " | ||
| "base-uri 'self'; " | ||
| "form-action 'self' https://checkout.stripe.com; " | ||
| "frame-ancestors 'self'" | ||
| always; |
There was a problem hiding this comment.
@hari-kuriakose I hope this covers everything we use. @vishnuszipstack please check. I believe you had some report last time
Greptile SummaryThis PR adds two complementary security improvements: server-side input sanitization (rejecting HTML/script injection in user-facing name and description fields) and Content-Security-Policy headers on both the Django API backend and the frontend nginx server. Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[API Request] --> B[Django Middleware Stack]
B --> C[ContentSecurityPolicyMiddleware]
C --> D[Set CSP Header on Response]
A --> E[Serializer Field Validation]
E --> F{Field Type?}
F -->|name / display_name| G[validate_name_field]
F -->|description| H[validate_no_html_tags]
G --> I[strip whitespace]
I --> J{Empty after strip?}
J -->|yes| K[ValidationError: must not be empty]
J -->|no| H
H --> L{HTML_TAG_PATTERN match?}
L -->|yes| M[ValidationError: HTML/script tags]
L -->|no| N{JS_PROTOCOL_PATTERN match?}
N -->|yes| O[ValidationError: JS protocol]
N -->|no| P{EVENT_HANDLER_PATTERN match?}
P -->|yes| Q[ValidationError: event handler]
P -->|no| R[Return validated value]
subgraph Serializers
S1[WorkflowSerializer]
S2[APIDeploymentSerializer]
S3[ConnectorInstanceSerializer]
S4[BaseAdapterSerializer]
S5[CustomToolSerializer]
S6[NotificationSerializer]
S7[OrganizationSignupSerializer]
end
E --> Serializers
Prompt To Fix All With AIThis is a comment left during a code review.
Path: backend/workflow_manager/workflow_v2/serializers.py
Line: 53-54
Comment:
**Missing `None` guard — potential `TypeError` on nullable `description`**
In DRF, when a field has `allow_null=True`, `validate_<field>` is still called with `None` after the field returns its value. Calling `validate_no_html_tags(None, ...)` will propagate `None` into `HTML_TAG_PATTERN.search(None)`, which raises `TypeError: expected string or bytes-like object`.
The exact same `description` field is handled safely in `api_v2/serializers.py`:
```python
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
This protection is missing in both `WorkflowSerializer.validate_description` (here) and `CustomToolSerializer.validate_description` in `prompt_studio/prompt_studio_core_v2/serializers.py:58`.
```suggestion
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: backend/prompt_studio/prompt_studio_core_v2/serializers.py
Line: 58-59
Comment:
**Missing `None` guard — potential `TypeError` on nullable `description`**
Same issue as `workflow_v2/serializers.py:53` — `validate_description` does not guard against `None` before calling `validate_no_html_tags`. If the `description` field accepts `null` values, DRF will invoke this method with `None`, and `HTML_TAG_PATTERN.search(None)` will raise a `TypeError`.
```suggestion
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: frontend/nginx.conf
Line: 62
Comment:
**`ws:` in `connect-src` allows unencrypted WebSocket to any origin**
`ws:` is a scheme-only wildcard that permits WebSocket connections to **any host** over plaintext. In production, you almost certainly only need secure WebSockets (`wss:`). Leaving `ws:` here means a script running in the browser could open `ws://any-attacker-host/exfil` and it would not be blocked by CSP.
Consider tightening to just `wss:` (or even scoping it to specific origins like `wss://your-api-host`) to remove the plaintext connection escape hatch:
```suggestion
"connect-src 'self' wss: https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.google-analytics.com https://api.stripe.com https://app.productfruits.com; "
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: backend/middleware/content_security_policy.py
Line: 19-29
Comment:
**`object-src` not explicitly restricted in backend CSP**
`object-src` controls whether `<object>`, `<embed>`, and `<applet>` elements (Flash/Java plugins) can load content. It falls back to `default-src 'self'` here rather than being explicitly denied. The frontend nginx.conf correctly sets `object-src 'none'`, but the backend middleware omits it.
Best practice (and the OWASP CSP cheat sheet recommendation) is to explicitly set `object-src 'none'` on all policies since plugin-based content is almost never intentional in a JSON API context:
```suggestion
(
"default-src 'self'; "
"script-src 'self'; "
"style-src 'self'; "
"img-src 'self'; "
"font-src 'self'; "
"connect-src 'self'; "
"object-src 'none'; "
"frame-ancestors 'none'; "
"base-uri 'self'; "
"form-action 'self'"
),
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: backend/utils/input_sanitizer.py
Line: 10
Comment:
**`EVENT_HANDLER_PATTERN` misses event handlers preceded by non-whitespace delimiters**
The pattern `(?:^|\s)on\w+\s*=` only matches `on<handler>=` when it appears at the start of the string or is preceded by whitespace (`\s`). Non-whitespace delimiters like `;`, `&`, or `"` won't match. For example:
```
"foo;onclick=bar" # NOT caught
"foo&onerror=alert(1)" # NOT caught
```
In practice, event handlers need to be inside an HTML tag to be executable (e.g., `<img onerror=...>`), and HTML tags are already blocked by `HTML_TAG_PATTERN`. However, since this pattern is also tested in isolation as a defense-in-depth check, the gap could create a false sense of complete coverage.
Consider anchoring the `on` prefix more broadly:
```python
EVENT_HANDLER_PATTERN = re.compile(r"\bon\w+\s*=", re.IGNORECASE)
```
The word boundary `\b` would still require the `on` to start at a word boundary (after non-word characters like `;`, `&`, space, or start of string), but would not require whitespace specifically.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "Apply suggestions fr..." |
| def validate_description(self, value: str) -> str: | ||
| return validate_no_html_tags(value, field_name="Description") |
There was a problem hiding this comment.
Missing
None guard — potential TypeError on nullable description
In DRF, when a field has allow_null=True, validate_<field> is still called with None after the field returns its value. Calling validate_no_html_tags(None, ...) will propagate None into HTML_TAG_PATTERN.search(None), which raises TypeError: expected string or bytes-like object.
The exact same description field is handled safely in api_v2/serializers.py:
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")This protection is missing in both WorkflowSerializer.validate_description (here) and CustomToolSerializer.validate_description in prompt_studio/prompt_studio_core_v2/serializers.py:58.
| def validate_description(self, value: str) -> str: | |
| return validate_no_html_tags(value, field_name="Description") | |
| def validate_description(self, value: str) -> str: | |
| if value is None: | |
| return value | |
| return validate_no_html_tags(value, field_name="Description") |
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/workflow_manager/workflow_v2/serializers.py
Line: 53-54
Comment:
**Missing `None` guard — potential `TypeError` on nullable `description`**
In DRF, when a field has `allow_null=True`, `validate_<field>` is still called with `None` after the field returns its value. Calling `validate_no_html_tags(None, ...)` will propagate `None` into `HTML_TAG_PATTERN.search(None)`, which raises `TypeError: expected string or bytes-like object`.
The exact same `description` field is handled safely in `api_v2/serializers.py`:
```python
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
This protection is missing in both `WorkflowSerializer.validate_description` (here) and `CustomToolSerializer.validate_description` in `prompt_studio/prompt_studio_core_v2/serializers.py:58`.
```suggestion
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
How can I resolve this? If you propose a fix, please make it concise.| def validate_description(self, value: str) -> str: | ||
| return validate_no_html_tags(value, field_name="Description") |
There was a problem hiding this comment.
Missing
None guard — potential TypeError on nullable description
Same issue as workflow_v2/serializers.py:53 — validate_description does not guard against None before calling validate_no_html_tags. If the description field accepts null values, DRF will invoke this method with None, and HTML_TAG_PATTERN.search(None) will raise a TypeError.
| def validate_description(self, value: str) -> str: | |
| return validate_no_html_tags(value, field_name="Description") | |
| def validate_description(self, value: str) -> str: | |
| if value is None: | |
| return value | |
| return validate_no_html_tags(value, field_name="Description") |
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/prompt_studio/prompt_studio_core_v2/serializers.py
Line: 58-59
Comment:
**Missing `None` guard — potential `TypeError` on nullable `description`**
Same issue as `workflow_v2/serializers.py:53` — `validate_description` does not guard against `None` before calling `validate_no_html_tags`. If the `description` field accepts `null` values, DRF will invoke this method with `None`, and `HTML_TAG_PATTERN.search(None)` will raise a `TypeError`.
```suggestion
def validate_description(self, value: str) -> str:
if value is None:
return value
return validate_no_html_tags(value, field_name="Description")
```
How can I resolve this? If you propose a fix, please make it concise.| "style-src 'self' 'unsafe-inline'; " | ||
| "img-src 'self' data: blob: https:; " | ||
| "font-src 'self' data:; " | ||
| "connect-src 'self' ws: wss: https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.google-analytics.com https://api.stripe.com https://app.productfruits.com; " |
There was a problem hiding this comment.
ws: in connect-src allows unencrypted WebSocket to any origin
ws: is a scheme-only wildcard that permits WebSocket connections to any host over plaintext. In production, you almost certainly only need secure WebSockets (wss:). Leaving ws: here means a script running in the browser could open ws://any-attacker-host/exfil and it would not be blocked by CSP.
Consider tightening to just wss: (or even scoping it to specific origins like wss://your-api-host) to remove the plaintext connection escape hatch:
| "connect-src 'self' ws: wss: https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.google-analytics.com https://api.stripe.com https://app.productfruits.com; " | |
| "connect-src 'self' wss: https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.google-analytics.com https://api.stripe.com https://app.productfruits.com; " |
Prompt To Fix With AI
This is a comment left during a code review.
Path: frontend/nginx.conf
Line: 62
Comment:
**`ws:` in `connect-src` allows unencrypted WebSocket to any origin**
`ws:` is a scheme-only wildcard that permits WebSocket connections to **any host** over plaintext. In production, you almost certainly only need secure WebSockets (`wss:`). Leaving `ws:` here means a script running in the browser could open `ws://any-attacker-host/exfil` and it would not be blocked by CSP.
Consider tightening to just `wss:` (or even scoping it to specific origins like `wss://your-api-host`) to remove the plaintext connection escape hatch:
```suggestion
"connect-src 'self' wss: https://eu.i.posthog.com https://eu-assets.i.posthog.com https://www.google-analytics.com https://api.stripe.com https://app.productfruits.com; "
```
How can I resolve this? If you propose a fix, please make it concise.| ( | ||
| "default-src 'self'; " | ||
| "script-src 'self'; " | ||
| "style-src 'self'; " | ||
| "img-src 'self'; " | ||
| "font-src 'self'; " | ||
| "connect-src 'self'; " | ||
| "frame-ancestors 'none'; " | ||
| "base-uri 'self'; " | ||
| "form-action 'self'" | ||
| ), |
There was a problem hiding this comment.
object-src not explicitly restricted in backend CSP
object-src controls whether <object>, <embed>, and <applet> elements (Flash/Java plugins) can load content. It falls back to default-src 'self' here rather than being explicitly denied. The frontend nginx.conf correctly sets object-src 'none', but the backend middleware omits it.
Best practice (and the OWASP CSP cheat sheet recommendation) is to explicitly set object-src 'none' on all policies since plugin-based content is almost never intentional in a JSON API context:
| ( | |
| "default-src 'self'; " | |
| "script-src 'self'; " | |
| "style-src 'self'; " | |
| "img-src 'self'; " | |
| "font-src 'self'; " | |
| "connect-src 'self'; " | |
| "frame-ancestors 'none'; " | |
| "base-uri 'self'; " | |
| "form-action 'self'" | |
| ), | |
| ( | |
| "default-src 'self'; " | |
| "script-src 'self'; " | |
| "style-src 'self'; " | |
| "img-src 'self'; " | |
| "font-src 'self'; " | |
| "connect-src 'self'; " | |
| "object-src 'none'; " | |
| "frame-ancestors 'none'; " | |
| "base-uri 'self'; " | |
| "form-action 'self'" | |
| ), |
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/middleware/content_security_policy.py
Line: 19-29
Comment:
**`object-src` not explicitly restricted in backend CSP**
`object-src` controls whether `<object>`, `<embed>`, and `<applet>` elements (Flash/Java plugins) can load content. It falls back to `default-src 'self'` here rather than being explicitly denied. The frontend nginx.conf correctly sets `object-src 'none'`, but the backend middleware omits it.
Best practice (and the OWASP CSP cheat sheet recommendation) is to explicitly set `object-src 'none'` on all policies since plugin-based content is almost never intentional in a JSON API context:
```suggestion
(
"default-src 'self'; "
"script-src 'self'; "
"style-src 'self'; "
"img-src 'self'; "
"font-src 'self'; "
"connect-src 'self'; "
"object-src 'none'; "
"frame-ancestors 'none'; "
"base-uri 'self'; "
"form-action 'self'"
),
```
How can I resolve this? If you propose a fix, please make it concise.| # Pattern to detect javascript: protocol | ||
| JS_PROTOCOL_PATTERN = re.compile(r"javascript\s*:", re.IGNORECASE) | ||
| # Pattern to detect event handlers (onclick, onerror, etc.) | ||
| EVENT_HANDLER_PATTERN = re.compile(r"(?:^|\s)on\w+\s*=", re.IGNORECASE) |
There was a problem hiding this comment.
EVENT_HANDLER_PATTERN misses event handlers preceded by non-whitespace delimiters
The pattern (?:^|\s)on\w+\s*= only matches on<handler>= when it appears at the start of the string or is preceded by whitespace (\s). Non-whitespace delimiters like ;, &, or " won't match. For example:
"foo;onclick=bar" # NOT caught
"foo&onerror=alert(1)" # NOT caught
In practice, event handlers need to be inside an HTML tag to be executable (e.g., <img onerror=...>), and HTML tags are already blocked by HTML_TAG_PATTERN. However, since this pattern is also tested in isolation as a defense-in-depth check, the gap could create a false sense of complete coverage.
Consider anchoring the on prefix more broadly:
EVENT_HANDLER_PATTERN = re.compile(r"\bon\w+\s*=", re.IGNORECASE)The word boundary \b would still require the on to start at a word boundary (after non-word characters like ;, &, space, or start of string), but would not require whitespace specifically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/utils/input_sanitizer.py
Line: 10
Comment:
**`EVENT_HANDLER_PATTERN` misses event handlers preceded by non-whitespace delimiters**
The pattern `(?:^|\s)on\w+\s*=` only matches `on<handler>=` when it appears at the start of the string or is preceded by whitespace (`\s`). Non-whitespace delimiters like `;`, `&`, or `"` won't match. For example:
```
"foo;onclick=bar" # NOT caught
"foo&onerror=alert(1)" # NOT caught
```
In practice, event handlers need to be inside an HTML tag to be executable (e.g., `<img onerror=...>`), and HTML tags are already blocked by `HTML_TAG_PATTERN`. However, since this pattern is also tested in isolation as a defense-in-depth check, the gap could create a false sense of complete coverage.
Consider anchoring the `on` prefix more broadly:
```python
EVENT_HANDLER_PATTERN = re.compile(r"\bon\w+\s*=", re.IGNORECASE)
```
The word boundary `\b` would still require the `on` to start at a word boundary (after non-word characters like `;`, `&`, space, or start of string), but would not require whitespace specifically.
How can I resolve this? If you propose a fix, please make it concise.


What
ContentSecurityPolicyMiddlewareto the middleware stackWhy
How
Input validation (
backend/utils/input_sanitizer.py):validate_name_field()andvalidate_no_html_tags()validators that reject HTML tags,javascript:protocols, and event handler attributes (onclick=, etc.)validate_<field>methods to serializers for all user-facing name/identifier and description fields:APIDeploymentSerializer—display_name,descriptionWorkflowSerializer—workflow_name,descriptionBaseAdapterSerializer—adapter_name,description(viavalidate()override)ConnectorInstanceSerializer—connector_nameNotificationSerializer—name(enhanced existing validator)CustomToolSerializer—tool_name,descriptionOrganizationSignupSerializer—name,display_nameprompt,preamble,postamble,summarize_prompt) are intentionally not validated — they must accept arbitrary text including code snippets for LLM extraction workflowsCSP headers:
backend/middleware/content_security_policy.py): Strict policy for the JSON API backend — all directives set to'self',frame-ancestors 'none'. Addedmiddleware.content_security_policy.ContentSecurityPolicyMiddlewareto thePRODUCTION_MIDDLEWARElist.frontend/nginx.conf): Policy tailored to the React SPA's third-party dependencies — Monaco Editor (cdn.jsdelivr.net), PDF.js (unpkg.com), PostHog, Google Tag Manager, reCAPTCHA, Stripe, Product Fruits, and WebSocket connections for Socket.IOCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
<and>characters (e.g.My Workflow <v2>). Normal names with spaces, hyphens, underscores, parentheses, periods, and other punctuation are unaffected. Prompt fields are not validated.Database Migrations
Relevant Docs
Related Issues or PRs
Dependencies Versions
Dependencies Versions
Notes on Testing
backend/utils/tests/test_input_sanitizer.pycovering clean input, HTML tags, script tags, JS protocols, event handlers, whitespace stripping, and empty string rejectioncd backend && python -m pytest utils/tests/test_input_sanitizer.py -v<script>alert(1)</script>via API — should return 400Content-Security-Policyheader present in both API and frontend responsesScreenshots
N/A — backend/infrastructure changes only
Checklist
I have read and understood the Contribution Guidelines.