Skip to content

added HealthCheck container to ECS Tasks#4512

Closed
anirudhprmar wants to merge 8 commits intoOWASP:mainfrom
anirudhprmar:feature/add-container-to-ecs-tasks
Closed

added HealthCheck container to ECS Tasks#4512
anirudhprmar wants to merge 8 commits intoOWASP:mainfrom
anirudhprmar:feature/add-container-to-ecs-tasks

Conversation

@anirudhprmar
Copy link
Copy Markdown
Contributor

@anirudhprmar anirudhprmar commented Apr 6, 2026

Proposed change

Resolves #4483

At Container-level added a ECS container health check
Defined container HealthCheck for both frontend and backend tasks.

Checklist

  • Required: I followed the contributing workflow
  • Required: I verified that my code works as intended and resolves the issue as described
  • Required: I ran make check-test locally: all warnings addressed, tests passed
  • I used AI for code, documentation, tests, or communication related to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds optional container health-check variables to the service module, conditionally injects an ECS healthCheck block into container definitions, wires those variables into live module calls for frontend/backend, updates tests, and installs curl/wget in the respective runtime images.

Changes

Cohort / File(s) Summary
Service module variables
infrastructure/modules/service/variables.tf
Added inputs: health_check_command (list(string), nullable), health_check_interval (number, default 30), health_check_retries (number, default 3), health_check_start_period (number, default 60), health_check_timeout (number, default 5).
Service module container definition
infrastructure/modules/service/main.tf
Conditionally merges an ECS healthCheck object into local.container_definition when var.health_check_command is non-null; maps command, interval, retries, startPeriod, and timeout from the new variables.
Live module invocations
infrastructure/live/main.tf
module "backend" and module "frontend" calls updated to pass health-check inputs. Backend uses curl -f http://localhost:8000/health/; frontend uses wget --spider http://localhost:3000/; timing fields set to 30/3/60/5.
Runtime images
docker/backend/Dockerfile, docker/frontend/Dockerfile
Installed CLI tools used by health checks: added curl to backend runtime image and wget to frontend runtime image.
Module tests
infrastructure/modules/service/tests/service.tftest.hcl
Added override_resource blocks for deterministic attributes and two tests: one asserting healthCheck is null by default, another asserting healthCheck fields exist and match provided variables when set.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • arkid15r
  • kasya
  • cubic-dev-ai
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding ECS container HealthCheck configuration for both frontend and backend tasks.
Description check ✅ Passed The description is relevant to the changeset, mentioning container-level HealthCheck addition for frontend and backend tasks and linking to issue #4483.
Linked Issues check ✅ Passed The PR implements all core objectives from #4483: defines HealthCheck for frontend and backend tasks, implements health-check command configuration (curl for backend, wget for frontend), adds health-check parameters (interval, retries, startPeriod, timeout), and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing ECS container HealthCheck configuration: no unrelated modifications detected beyond health-check configuration and supporting test infrastructure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@anirudhprmar anirudhprmar marked this pull request as draft April 6, 2026 19:46
@anirudhprmar
Copy link
Copy Markdown
Contributor Author

will be adding tests soon

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
infrastructure/modules/service/variables.tf (1)

30-52: Consider adding validation blocks for health check parameters.

The health check variables lack validation to ensure values fall within AWS ECS allowed ranges. Invalid values would cause Terraform apply failures with potentially unclear AWS API errors.

AWS ECS health check constraints:

  • interval: 5–300 seconds
  • timeout: 2–120 seconds
  • retries: 1–10
  • startPeriod: 0–300 seconds
🛡️ Proposed validation blocks
 variable "health_check_interval" {
   description = "Time period between each health check execution (seconds)"
   type        = number
   default     = 30
+
+  validation {
+    condition     = var.health_check_interval >= 5 && var.health_check_interval <= 300
+    error_message = "health_check_interval must be between 5 and 300 seconds."
+  }
 }

 variable "health_check_retries" {
   description = "The number of consecutive failed health checks before marking container unhealthy"
   type        = number
   default     = 3
+
+  validation {
+    condition     = var.health_check_retries >= 1 && var.health_check_retries <= 10
+    error_message = "health_check_retries must be between 1 and 10."
+  }
 }

 variable "health_check_start_period" {
   description = "Grace period (seconds) before health check starts"
   type        = number
   default     = 60
+
+  validation {
+    condition     = var.health_check_start_period >= 0 && var.health_check_start_period <= 300
+    error_message = "health_check_start_period must be between 0 and 300 seconds."
+  }
 }

 variable "health_check_timeout" {
   description = "Time period to wait for successful health check before marking container unhealthy"
   type        = number
   default     = 5
+
+  validation {
+    condition     = var.health_check_timeout >= 2 && var.health_check_timeout <= 120
+    error_message = "health_check_timeout must be between 2 and 120 seconds."
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infrastructure/modules/service/variables.tf` around lines 30 - 52, Add
validation blocks to each health check variable (health_check_interval,
health_check_timeout, health_check_retries, health_check_start_period) to
enforce AWS ECS allowed ranges: interval 5–300, timeout 2–120, retries 1–10,
start_period 0–300; for each variable add a validation with a condition
expression that checks the numeric range and a clear error message explaining
the valid range so Terraform will fail fast with a helpful message instead of
AWS API errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@infrastructure/live/main.tf`:
- Around line 129-133: The ECS health_check_command uses "wget --spider" but the
production frontend image (Alpine) doesn't include wget, causing "command not
found" failures; update health_check_command in main.tf to call a binary present
in the image and target a stable health endpoint (e.g., replace ["CMD-SHELL",
"wget --spider http://localhost:3000/ || exit 1"] with ["CMD-SHELL", "curl -f
http://localhost:3000/healthz || exit 1"] or similar), or alternatively install
wget in the frontend image Dockerfile by adding apk add --no-cache wget (or
curl) so health_check_command works; ensure the path uses a dedicated endpoint
like /healthz or /health to avoid redirects.
- Around line 62-66: The ECS health checks use health_check_command and require
curl/wget but the backend and frontend Alpine images do not install them; update
the Dockerfile RUN apk add line(s) that install packages (the RUN command that
currently installs make and postgresql16-client in the backend image) to include
curl (e.g., add curl to the apk add list) and likewise add wget to the frontend
image's apk add invocation so the health_check_command (and the frontend
equivalent) succeed; ensure you preserve the existing package versions/flags
(e.g., --no-cache) when adding curl/wget.

---

Nitpick comments:
In `@infrastructure/modules/service/variables.tf`:
- Around line 30-52: Add validation blocks to each health check variable
(health_check_interval, health_check_timeout, health_check_retries,
health_check_start_period) to enforce AWS ECS allowed ranges: interval 5–300,
timeout 2–120, retries 1–10, start_period 0–300; for each variable add a
validation with a condition expression that checks the numeric range and a clear
error message explaining the valid range so Terraform will fail fast with a
helpful message instead of AWS API errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 593e438c-82ca-4d15-87b1-a0b41c3ec1e5

📥 Commits

Reviewing files that changed from the base of the PR and between 2e41260 and b6dec73.

📒 Files selected for processing (3)
  • infrastructure/live/main.tf
  • infrastructure/modules/service/main.tf
  • infrastructure/modules/service/variables.tf

Comment thread infrastructure/live/main.tf Outdated
Comment thread infrastructure/live/main.tf Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Confidence score: 2/5

  • There is a concrete regression risk in infrastructure/live/main.tf: the ECS health check invokes curl, but the backend runtime image reportedly does not include it, so health checks can fail even when the app is otherwise healthy.
  • This can cause ECS tasks to be marked unhealthy and repeatedly restarted, creating real availability risk rather than a minor housekeeping issue.
  • Given the severity (7/10) and high confidence (8/10), this is a high-risk merge unless the health check command or image dependencies are aligned before rollout.
  • Pay close attention to infrastructure/live/main.tf - health check command/runtime dependency mismatch may trigger unhealthy task restart loops.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="infrastructure/live/main.tf">

<violation number="1" location="infrastructure/live/main.tf:62">
P1: Backend ECS health check depends on `curl`, but backend runtime image does not install `curl`, risking unhealthy task restart loops.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread infrastructure/live/main.tf Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/frontend/Dockerfile`:
- Line 25: The Dockerfile currently installs wget in the builder stage (the line
with "apk add --no-cache libc6-compat wget"), but the final image is produced
from the runner stage so wget is missing at runtime; remove wget from the
builder stage and add it to the runner stage's apk add invocation (the runner
stage block around the later "apk add --no-cache ...") so that wget is present
in the final image used for ECS health checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 15b722ec-1161-4744-a219-48b04c71e136

📥 Commits

Reviewing files that changed from the base of the PR and between b6dec73 and 209c97b.

📒 Files selected for processing (2)
  • docker/backend/Dockerfile
  • docker/frontend/Dockerfile
✅ Files skipped from review due to trivial changes (1)
  • docker/backend/Dockerfile

Comment thread docker/frontend/Dockerfile Outdated
coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 7, 2026
cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
infrastructure/modules/service/tests/service.tftest.hcl (1)

78-84: Use non-default numeric values to strengthen health-check wiring validation.

Lines 80-83 currently set default values (30/3/60/5), so a hardcoded-default bug could still pass. Use non-default test inputs so this run actually verifies variable propagation.

Proposed test hardening
   variables {
     health_check_command      = ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
-    health_check_interval     = 30
-    health_check_retries      = 3
-    health_check_start_period = 60
-    health_check_timeout      = 5
+    health_check_interval     = 17
+    health_check_retries      = 4
+    health_check_start_period = 11
+    health_check_timeout      = 9
   }

Also applies to: 96-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infrastructure/modules/service/tests/service.tftest.hcl` around lines 78 -
84, The test uses default numeric values in the variables block
(health_check_interval, health_check_retries, health_check_start_period,
health_check_timeout) which can mask hardcoded-default bugs; update these
variables in the variables { ... } block (and the similar block at lines 96-113)
to non-default, clearly distinct integers (e.g.,
interval/retries/start_period/timeout values that differ from 30/3/60/5) so the
test actually verifies propagation of supplied values through the health-check
wiring.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@infrastructure/modules/service/tests/service.tftest.hcl`:
- Around line 78-84: The test uses default numeric values in the variables block
(health_check_interval, health_check_retries, health_check_start_period,
health_check_timeout) which can mask hardcoded-default bugs; update these
variables in the variables { ... } block (and the similar block at lines 96-113)
to non-default, clearly distinct integers (e.g.,
interval/retries/start_period/timeout values that differ from 30/3/60/5) so the
test actually verifies propagation of supplied values through the health-check
wiring.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 76655c2a-d9fd-477f-adc2-4d41d9ed6dbd

📥 Commits

Reviewing files that changed from the base of the PR and between 2bdbe73 and 0cc418b.

📒 Files selected for processing (1)
  • infrastructure/modules/service/tests/service.tftest.hcl

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 7, 2026
cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
infrastructure/modules/service/tests/service.tftest.hcl (1)

78-110: Assert exact health-check timing values, not only non-null.

!= null won’t catch regressions where wrong constants/units are wired. Prefer asserting equality, ideally with non-default overrides in this run.

Suggested test tightening
 run "test_health_check_included" {
   command = apply

   variables {
     health_check_command      = ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
+    health_check_interval     = 15
+    health_check_retries      = 5
+    health_check_start_period = 45
+    health_check_timeout      = 7
   }

@@
   assert {
-    condition     = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.interval, null) != null
-    error_message = "Health check interval must not be null when health check is configured."
+    condition     = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.interval, null) == var.health_check_interval
+    error_message = "Health check interval must match health_check_interval."
   }

   assert {
-    condition     = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.retries, null) != null
-    error_message = "Health check retries must not be null when health check is configured."
+    condition     = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.retries, null) == var.health_check_retries
+    error_message = "Health check retries must match health_check_retries."
   }

   assert {
-    condition     = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.startPeriod, null) != null
-    error_message = "Health check start period must not be null when health check is configured."
+    condition     = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.startPeriod, null) == var.health_check_start_period
+    error_message = "Health check start period must match health_check_start_period."
   }

   assert {
-    condition     = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.timeout, null) != null
-    error_message = "Health check timeout must not be null when health check is configured."
+    condition     = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.timeout, null) == var.health_check_timeout
+    error_message = "Health check timeout must match health_check_timeout."
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@infrastructure/modules/service/tests/service.tftest.hcl` around lines 78 -
110, The current asserts only check healthCheck fields are non-null; change them
to assert exact expected values instead: replace the != null checks for
healthCheck.interval, .retries, .startPeriod, and .timeout with equality
comparisons against the intended numeric constants (and for the command assert
equality to var.health_check_command is fine) by reading
aws_ecs_task_definition.main.container_definitions, jsondecoding the
[0].healthCheck fields and comparing them to the specific expected numbers you
intend to enforce (e.g., interval = X, retries = Y, startPeriod = Z, timeout =
W) so the test fails on incorrect constants/units rather than just nulls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@infrastructure/modules/service/tests/service.tftest.hcl`:
- Around line 78-110: The current asserts only check healthCheck fields are
non-null; change them to assert exact expected values instead: replace the !=
null checks for healthCheck.interval, .retries, .startPeriod, and .timeout with
equality comparisons against the intended numeric constants (and for the command
assert equality to var.health_check_command is fine) by reading
aws_ecs_task_definition.main.container_definitions, jsondecoding the
[0].healthCheck fields and comparing them to the specific expected numbers you
intend to enforce (e.g., interval = X, retries = Y, startPeriod = Z, timeout =
W) so the test fails on incorrect constants/units rather than just nulls.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3578054e-0d9d-4e93-9f8d-a6347aae5898

📥 Commits

Reviewing files that changed from the base of the PR and between 0cc418b and 495af75.

📒 Files selected for processing (1)
  • infrastructure/modules/service/tests/service.tftest.hcl

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="infrastructure/modules/service/tests/service.tftest.hcl">

<violation number="1" location="infrastructure/modules/service/tests/service.tftest.hcl:93">
P2: Health-check test assertions were weakened from exact value validation to non-null checks, reducing ability to catch variable-to-task-definition mapping regressions.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread infrastructure/modules/service/tests/service.tftest.hcl Outdated
coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 7, 2026
cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

@anirudhprmar anirudhprmar marked this pull request as ready for review April 7, 2026 17:35
Copy link
Copy Markdown
Collaborator

@rudransh-shrivastava rudransh-shrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I see some changes:

Please sort items alphabetically where order doesn't matter, and make sure to run make check-test:
image

Comment thread docker/backend/Dockerfile Outdated
Comment thread infrastructure/live/main.tf Outdated
target_group_arn = module.alb.backend_target_group_arn
task_role_policy_arns = [module.storage.static_read_write_policy_arn]
use_fargate_spot = var.backend_use_fargate_spot
health_check_command = ["CMD-SHELL", "curl -f http://localhost:8000/health/ || exit 1"]
Copy link
Copy Markdown
Collaborator

@rudransh-shrivastava rudransh-shrivastava Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think /health/ exists.

Image

Also, please move this command to the module itself and use variables like health_check_endpoint instead.

Another thing I notice, localhost is not in ALLOWED_HOSTS so this will fail. Use something like ${hostname -i} instead.

Copy link
Copy Markdown
Contributor Author

@anirudhprmar anirudhprmar Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can i go with 127.0.0.1 (as it is included in allowed host) instead of hostname ?

Comment thread infrastructure/live/main.tf Outdated
coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 9, 2026
cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 9, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 6 files (changes from recent commits).

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing in favor of #4502

@arkid15r arkid15r closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Container HealthCheck to ECS Tasks

3 participants