feat(infra): add ECS container health checks #4502
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe PR implements ECS container health checks for frontend and backend services by configuring health check endpoints in Terraform modules, adding corresponding health check probes to Docker Compose services, creating a frontend health endpoint, updating backend baseline-reset logic, and adding an infrastructure labeler rule. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
de28c71
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docker-compose/e2e/compose.yaml (1)
25-28: Inconsistent backend healthcheck path with Terraform configuration.The backend healthcheck in this compose file probes
/a/(admin endpoint), but the Terraform configuration ininfrastructure/live/main.tfsetshealth_check_path = "/status/"for the backend service. Consider updating this to use/status/for consistency:test: > sh -c ' - wget --spider http://backend:9000/a/ + wget --spider http://backend:9000/status/ 'This ensures the e2e environment validates the same health endpoint that will be used in production.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose/e2e/compose.yaml` around lines 25 - 28, The healthcheck in the compose file currently probes the admin endpoint using the test command (sh -c 'wget --spider http://backend:9000/a/'); change this to probe the same path used in Terraform by updating the test to hit http://backend:9000/status/ so the docker-compose service healthcheck matches the infrastructure/live/main.tf health_check_path; adjust the wget URL in the test command accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docker-compose/e2e/compose.yaml`:
- Around line 25-28: The healthcheck in the compose file currently probes the
admin endpoint using the test command (sh -c 'wget --spider
http://backend:9000/a/'); change this to probe the same path used in Terraform
by updating the test to hit http://backend:9000/status/ so the docker-compose
service healthcheck matches the infrastructure/live/main.tf health_check_path;
adjust the wget URL in the test command accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 54fb6add-bcae-43ae-92fa-493d35eec975
📒 Files selected for processing (3)
docker-compose/e2e/compose.yamlfrontend/src/app/status/route.tsinfrastructure/live/main.tf
4b88124
| test: > | ||
| sh -c ' | ||
| wget --spider http://backend:9000/a/ | ||
| wget --spider http://backend:9000/status/ |
There was a problem hiding this comment.
you may also update the fuzz docker compose to use the same endpoint.
| @@ -0,0 +1,5 @@ | |||
| import { NextResponse } from 'next/server' | |||
There was a problem hiding this comment.
I am not completely sure, but I think its better to move this to frontend/src/app/api/
Also, If we ever expose this API publicly, we will need to rename it to prevent conflict with backend's status endpoint. Can you rename it to something like health?
There was a problem hiding this comment.
that would be good.
| domain_name = var.domain_name | ||
| environment = var.environment | ||
| frontend_health_check_path = "/" | ||
| frontend_health_check_path = "/status" |
There was a problem hiding this comment.
Let's just remove this from here and update the alb variables.tf default value similar to backend:
variable "frontend_health_check_path" {
description = "The health check path for the frontend target group."
type = string
default = "/" --> default = "/status"
}
7b45eac
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/tests/unit/apps/slack/commands/slack_check_invite_link_test.py (1)
122-129: Strengthen this regression test with a “no alert posted” assertion.After re-baselining to 250 (threshold 600), this run should not send a Slack alert. Adding an explicit assertion will lock in that behavior.
✅ Suggested test addition
call_command("slack_check_invite_link") assert fake_workspace.invite_link_created_at == new_time assert fake_workspace.invite_link_commit_sha == new_sha assert fake_workspace.invite_link_member_count == 250 assert fake_workspace.invite_link_alert_threshold == 600 assert fake_workspace.invite_link_last_alert_sent_at is None + mock_web_client_class.return_value.chat_postMessage.assert_not_called()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/unit/apps/slack/commands/slack_check_invite_link_test.py` around lines 122 - 129, Add an assertion after call_command("slack_check_invite_link") that verifies no Slack alert was posted for fake_workspace (e.g., assert the mocked alert poster was not called). Locate the mock used in this test for posting alerts (the Slack posting stub or patch) and call its assert_not_called() (or equivalent) to ensure that when invite_link_member_count is 250 and invite_link_alert_threshold is 600, no alert is sent.docker-compose/fuzz/compose.yaml (1)
27-27: Enforce "no redirects" in the health probe to prevent redirects from being silently followed.Line 27 uses
wget --spider, which by default follows up to 20 redirects. If/status/returns a 3xx redirect that chains to a 200 response, the health check will pass. To strictly reject redirects and align with expecting a direct 2xx response, use--max-redirect=0.Proposed diff
- wget --spider http://backend:9500/status/ + wget --spider --max-redirect=0 http://backend:9500/status/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose/fuzz/compose.yaml` at line 27, The health check uses wget --spider http://backend:9500/status/ which follows redirects by default; update the probe invocation to add --max-redirect=0 so redirects are not followed and only a direct 2xx response succeeds. Modify the command that runs wget (the line invoking wget in the compose.yaml health probe) to include the --max-redirect=0 flag and keep existing options intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/tests/unit/apps/slack/commands/slack_check_invite_link_test.py`:
- Around line 122-129: Add an assertion after
call_command("slack_check_invite_link") that verifies no Slack alert was posted
for fake_workspace (e.g., assert the mocked alert poster was not called). Locate
the mock used in this test for posting alerts (the Slack posting stub or patch)
and call its assert_not_called() (or equivalent) to ensure that when
invite_link_member_count is 250 and invite_link_alert_threshold is 600, no alert
is sent.
In `@docker-compose/fuzz/compose.yaml`:
- Line 27: The health check uses wget --spider http://backend:9500/status/ which
follows redirects by default; update the probe invocation to add
--max-redirect=0 so redirects are not followed and only a direct 2xx response
succeeds. Modify the command that runs wget (the line invoking wget in the
compose.yaml health probe) to include the --max-redirect=0 flag and keep
existing options intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b96ffa7a-2c0b-471e-8e55-7a6d6516eef5
⛔ Files ignored due to path filters (2)
backend/poetry.lockis excluded by!**/*.lockfrontend/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (5)
.github/labeler.ymlbackend/apps/slack/management/commands/slack_check_invite_link.pybackend/tests/unit/apps/owasp/models/common_test.pybackend/tests/unit/apps/slack/commands/slack_check_invite_link_test.pydocker-compose/fuzz/compose.yaml
✅ Files skipped from review due to trivial changes (1)
- .github/labeler.yml
28a1809
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
c0ab71d to
5c65ea2
Compare
|
Thanks for your detail review. I've updated my PR with all changes addressed. |
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Ah, no issues, happens sometimes. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4502 +/- ##
=======================================
Coverage 99.05% 99.05%
=======================================
Files 527 527
Lines 16882 16882
Branches 2372 2320 -52
=======================================
Hits 16723 16723
Misses 91 91
Partials 68 68
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
rudransh-shrivastava
left a comment
There was a problem hiding this comment.
LGTM!
Thank you for working on this!
|
arkid15r
left a comment
There was a problem hiding this comment.
Thanks for improving this @hassaansaleem28 👍





Proposed change
Resolves #4483
Summary
This PR adds ECS container health checks for both backend and frontend services to resolve task health being UNKNOWN
Changes
Validation
terraform -chdir=infrastructure/modules/servicetest passed.make test-infrastructurepassed.terraform -chdir=infrastructure/live validatepassed.Local production-like container checks:
Checklist
make check-testlocally: all warnings addressed, tests passed