Skip to content

feat(infra): add ECS container health checks #4502

Merged
arkid15r merged 9 commits intoOWASP:mainfrom
hassaansaleem28:fix/add-ecs-container-health
Apr 11, 2026
Merged

feat(infra): add ECS container health checks #4502
arkid15r merged 9 commits intoOWASP:mainfrom
hassaansaleem28:fix/add-ecs-container-health

Conversation

@hassaansaleem28
Copy link
Copy Markdown
Contributor

@hassaansaleem28 hassaansaleem28 commented Apr 5, 2026

Proposed change

Resolves #4483

Summary

This PR adds ECS container health checks for both backend and frontend services to resolve task health being UNKNOWN

Changes

  • Add health_check_path variable to service module.
  • Add container healthCheck in ECS task definition.
  • Set backend path to /status/ and frontend path to /api/health.
  • Add Terraform tests for health check config.

Validation

  • terraform -chdir=infrastructure/modules/service test passed.
  • make test-infrastructure passed.
  • terraform -chdir=infrastructure/live validate passed.

Local production-like container checks:

  • Backend /status/ -> 200
  • Frontend /api/health -> 200

Checklist

  • Required: I followed the contributing workflow
  • Required: I verified that my code works as intended and resolves the issue as described
  • Required: I ran make check-test locally: all warnings addressed, tests passed
  • I used AI for code, documentation, tests, or communication related to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

The PR implements ECS container health checks for frontend and backend services by configuring health check endpoints in Terraform modules, adding corresponding health check probes to Docker Compose services, creating a frontend health endpoint, updating backend baseline-reset logic, and adding an infrastructure labeler rule.

Changes

Cohort / File(s) Summary
Infrastructure - Terraform Service Module
infrastructure/modules/service/variables.tf, infrastructure/modules/service/main.tf, infrastructure/live/main.tf
Added health_check_path variable with "/" default and path validation. Configured ECS container health check using wget against http://$(hostname -i):${var.container_port}${var.health_check_path} with interval=30, retries=3, startPeriod=60, timeout=5. Updated live environment to pass health_check_path="/api/health" for frontend and "/status/" for backend.
Infrastructure - Module Tests
infrastructure/modules/service/tests/service.tftest.hcl
Added new tftest run validating health check block presence and field values. Extended backend test scenario with health_check_path="/status/" override and assertion verifying path propagation to container health check command.
Frontend Health Endpoint
frontend/src/app/api/health/route.ts
Added new Next.js API route exporting GET handler returning { status: 'ok' } JSON response.
Docker Compose Health Checks
docker-compose/e2e/compose.yaml, docker-compose/fuzz/compose.yaml
Updated backend healthcheck from /a/ to /status/ and frontend from / to /api/health.
Backend Logic & Tests
backend/apps/slack/management/commands/slack_check_invite_link.py, backend/tests/unit/apps/slack/commands/slack_check_invite_link_test.py, backend/tests/unit/apps/owasp/models/common_test.py
Modified baseline-setting logic to reset when new invite commit detected regardless of existing baseline. Added test verifying baseline recalculation on new invite commit. Updated URL parsing test to use netloc comparison instead of substring matching with stricter assertion.
GitHub Actions
.github/labeler.yml
Added infrastructure labeler rule matching changes under infrastructure/**.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

backend, backend-tests, infrastructure, frontend

Suggested reviewers

  • kasya
  • arkid15r
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Some changes appear tangential: the .github/labeler.yml infrastructure label is unrelated to health checks; backend/apps/slack and backend/tests/unit changes address invite baseline logic unrelated to ECS health checks; the OWASP common_test.py modification is unrelated to health checks. Consider moving infrastructure labeler, Slack invite baseline, and OWASP common test changes to separate PRs focused on their respective objectives to keep this PR narrowly scoped on ECS health checks.
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(infra): add ECS container health checks' directly describes the main change: adding ECS container health checks to both backend and frontend services.
Linked Issues check ✅ Passed The PR substantially fulfills issue #4483 requirements: defines HealthCheck for ECS tasks [infrastructure/modules/service/main.tf, infrastructure/live/main.tf], verifies endpoints return HTTP 200 [docker-compose configurations], and validates with Terraform tests [infrastructure/modules/service/tests/service.tftest.hcl].
Description check ✅ Passed The PR description is clearly related to the changeset, describing the addition of ECS container health checks with specific backend and frontend paths.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 5, 2026
cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 5, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

@hassaansaleem28 hassaansaleem28 marked this pull request as ready for review April 5, 2026 07:47
@rudransh-shrivastava rudransh-shrivastava self-assigned this Apr 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docker-compose/e2e/compose.yaml (1)

25-28: Inconsistent backend healthcheck path with Terraform configuration.

The backend healthcheck in this compose file probes /a/ (admin endpoint), but the Terraform configuration in infrastructure/live/main.tf sets health_check_path = "/status/" for the backend service. Consider updating this to use /status/ for consistency:

     test: >
       sh -c '
-        wget --spider http://backend:9000/a/
+        wget --spider http://backend:9000/status/
       '

This ensures the e2e environment validates the same health endpoint that will be used in production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose/e2e/compose.yaml` around lines 25 - 28, The healthcheck in the
compose file currently probes the admin endpoint using the test command (sh -c
'wget --spider http://backend:9000/a/'); change this to probe the same path used
in Terraform by updating the test to hit http://backend:9000/status/ so the
docker-compose service healthcheck matches the infrastructure/live/main.tf
health_check_path; adjust the wget URL in the test command accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docker-compose/e2e/compose.yaml`:
- Around line 25-28: The healthcheck in the compose file currently probes the
admin endpoint using the test command (sh -c 'wget --spider
http://backend:9000/a/'); change this to probe the same path used in Terraform
by updating the test to hit http://backend:9000/status/ so the docker-compose
service healthcheck matches the infrastructure/live/main.tf health_check_path;
adjust the wget URL in the test command accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 54fb6add-bcae-43ae-92fa-493d35eec975

📥 Commits

Reviewing files that changed from the base of the PR and between 9691a0e and de28c71.

📒 Files selected for processing (3)
  • docker-compose/e2e/compose.yaml
  • frontend/src/app/status/route.ts
  • infrastructure/live/main.tf

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 6, 2026
cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 6, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 3 files (changes from recent commits).

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 6, 2026
cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 6, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Copy link
Copy Markdown
Collaborator

@rudransh-shrivastava rudransh-shrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I tested the solution and it works!

Image

Just have a few changes:

test: >
sh -c '
wget --spider http://backend:9000/a/
wget --spider http://backend:9000/status/
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may also update the fuzz docker compose to use the same endpoint.

@@ -0,0 +1,5 @@
import { NextResponse } from 'next/server'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure, but I think its better to move this to frontend/src/app/api/

Also, If we ever expose this API publicly, we will need to rename it to prevent conflict with backend's status endpoint. Can you rename it to something like health?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be good.

Comment thread infrastructure/live/main.tf Outdated
domain_name = var.domain_name
environment = var.environment
frontend_health_check_path = "/"
frontend_health_check_path = "/status"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just remove this from here and update the alb variables.tf default value similar to backend:

variable "frontend_health_check_path" {
  description = "The health check path for the frontend target group."
  type        = string
  default     = "/"   --> default  = "/status" 
}

cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 9, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 7 files (changes from recent commits).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/tests/unit/apps/slack/commands/slack_check_invite_link_test.py (1)

122-129: Strengthen this regression test with a “no alert posted” assertion.

After re-baselining to 250 (threshold 600), this run should not send a Slack alert. Adding an explicit assertion will lock in that behavior.

✅ Suggested test addition
         call_command("slack_check_invite_link")
 
         assert fake_workspace.invite_link_created_at == new_time
         assert fake_workspace.invite_link_commit_sha == new_sha
         assert fake_workspace.invite_link_member_count == 250
         assert fake_workspace.invite_link_alert_threshold == 600
         assert fake_workspace.invite_link_last_alert_sent_at is None
+        mock_web_client_class.return_value.chat_postMessage.assert_not_called()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/unit/apps/slack/commands/slack_check_invite_link_test.py`
around lines 122 - 129, Add an assertion after
call_command("slack_check_invite_link") that verifies no Slack alert was posted
for fake_workspace (e.g., assert the mocked alert poster was not called). Locate
the mock used in this test for posting alerts (the Slack posting stub or patch)
and call its assert_not_called() (or equivalent) to ensure that when
invite_link_member_count is 250 and invite_link_alert_threshold is 600, no alert
is sent.
docker-compose/fuzz/compose.yaml (1)

27-27: Enforce "no redirects" in the health probe to prevent redirects from being silently followed.

Line 27 uses wget --spider, which by default follows up to 20 redirects. If /status/ returns a 3xx redirect that chains to a 200 response, the health check will pass. To strictly reject redirects and align with expecting a direct 2xx response, use --max-redirect=0.

Proposed diff
-          wget --spider http://backend:9500/status/
+          wget --spider --max-redirect=0 http://backend:9500/status/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose/fuzz/compose.yaml` at line 27, The health check uses wget
--spider http://backend:9500/status/ which follows redirects by default; update
the probe invocation to add --max-redirect=0 so redirects are not followed and
only a direct 2xx response succeeds. Modify the command that runs wget (the line
invoking wget in the compose.yaml health probe) to include the --max-redirect=0
flag and keep existing options intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/tests/unit/apps/slack/commands/slack_check_invite_link_test.py`:
- Around line 122-129: Add an assertion after
call_command("slack_check_invite_link") that verifies no Slack alert was posted
for fake_workspace (e.g., assert the mocked alert poster was not called). Locate
the mock used in this test for posting alerts (the Slack posting stub or patch)
and call its assert_not_called() (or equivalent) to ensure that when
invite_link_member_count is 250 and invite_link_alert_threshold is 600, no alert
is sent.

In `@docker-compose/fuzz/compose.yaml`:
- Line 27: The health check uses wget --spider http://backend:9500/status/ which
follows redirects by default; update the probe invocation to add
--max-redirect=0 so redirects are not followed and only a direct 2xx response
succeeds. Modify the command that runs wget (the line invoking wget in the
compose.yaml health probe) to include the --max-redirect=0 flag and keep
existing options intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b96ffa7a-2c0b-471e-8e55-7a6d6516eef5

📥 Commits

Reviewing files that changed from the base of the PR and between 4b88124 and 7b45eac.

⛔ Files ignored due to path filters (2)
  • backend/poetry.lock is excluded by !**/*.lock
  • frontend/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • .github/labeler.yml
  • backend/apps/slack/management/commands/slack_check_invite_link.py
  • backend/tests/unit/apps/owasp/models/common_test.py
  • backend/tests/unit/apps/slack/commands/slack_check_invite_link_test.py
  • docker-compose/fuzz/compose.yaml
✅ Files skipped from review due to trivial changes (1)
  • .github/labeler.yml

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 9, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 9, 2026
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
@hassaansaleem28
Copy link
Copy Markdown
Contributor Author

Hi @rudransh-shrivastava,

Thanks for your detail review. I've updated my PR with all changes addressed.
Lmk if any further changes needed.

Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
@hassaansaleem28
Copy link
Copy Markdown
Contributor Author

I did run make check previously also but don't know why CI was failing previously.

image

cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 9, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

@rudransh-shrivastava
Copy link
Copy Markdown
Collaborator

I did run make check previously also but don't know why CI was failing previously.

Ah, no issues, happens sometimes.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.05%. Comparing base (158dbae) to head (57cbf6a).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #4502   +/-   ##
=======================================
  Coverage   99.05%   99.05%           
=======================================
  Files         527      527           
  Lines       16882    16882           
  Branches     2372     2320   -52     
=======================================
  Hits        16723    16723           
  Misses         91       91           
  Partials       68       68           
Flag Coverage Δ
backend 99.50% <ø> (ø)
frontend 97.78% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 158dbae...57cbf6a. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 4 files (changes from recent commits).

Copy link
Copy Markdown
Collaborator

@rudransh-shrivastava rudransh-shrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thank you for working on this!

@arkid15r arkid15r enabled auto-merge April 11, 2026 01:48
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving this @hassaansaleem28 👍

@arkid15r arkid15r added this pull request to the merge queue Apr 11, 2026
Merged via the queue into OWASP:main with commit c0efae7 Apr 11, 2026
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Container HealthCheck to ECS Tasks

3 participants