added HealthCheck container to ECS Tasks#4512
added HealthCheck container to ECS Tasks#4512anirudhprmar wants to merge 8 commits intoOWASP:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds optional container health-check variables to the service module, conditionally injects an ECS Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
will be adding tests soon |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
infrastructure/modules/service/variables.tf (1)
30-52: Consider adding validation blocks for health check parameters.The health check variables lack validation to ensure values fall within AWS ECS allowed ranges. Invalid values would cause Terraform apply failures with potentially unclear AWS API errors.
AWS ECS health check constraints:
interval: 5–300 secondstimeout: 2–120 secondsretries: 1–10startPeriod: 0–300 seconds🛡️ Proposed validation blocks
variable "health_check_interval" { description = "Time period between each health check execution (seconds)" type = number default = 30 + + validation { + condition = var.health_check_interval >= 5 && var.health_check_interval <= 300 + error_message = "health_check_interval must be between 5 and 300 seconds." + } } variable "health_check_retries" { description = "The number of consecutive failed health checks before marking container unhealthy" type = number default = 3 + + validation { + condition = var.health_check_retries >= 1 && var.health_check_retries <= 10 + error_message = "health_check_retries must be between 1 and 10." + } } variable "health_check_start_period" { description = "Grace period (seconds) before health check starts" type = number default = 60 + + validation { + condition = var.health_check_start_period >= 0 && var.health_check_start_period <= 300 + error_message = "health_check_start_period must be between 0 and 300 seconds." + } } variable "health_check_timeout" { description = "Time period to wait for successful health check before marking container unhealthy" type = number default = 5 + + validation { + condition = var.health_check_timeout >= 2 && var.health_check_timeout <= 120 + error_message = "health_check_timeout must be between 2 and 120 seconds." + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infrastructure/modules/service/variables.tf` around lines 30 - 52, Add validation blocks to each health check variable (health_check_interval, health_check_timeout, health_check_retries, health_check_start_period) to enforce AWS ECS allowed ranges: interval 5–300, timeout 2–120, retries 1–10, start_period 0–300; for each variable add a validation with a condition expression that checks the numeric range and a clear error message explaining the valid range so Terraform will fail fast with a helpful message instead of AWS API errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@infrastructure/live/main.tf`:
- Around line 129-133: The ECS health_check_command uses "wget --spider" but the
production frontend image (Alpine) doesn't include wget, causing "command not
found" failures; update health_check_command in main.tf to call a binary present
in the image and target a stable health endpoint (e.g., replace ["CMD-SHELL",
"wget --spider http://localhost:3000/ || exit 1"] with ["CMD-SHELL", "curl -f
http://localhost:3000/healthz || exit 1"] or similar), or alternatively install
wget in the frontend image Dockerfile by adding apk add --no-cache wget (or
curl) so health_check_command works; ensure the path uses a dedicated endpoint
like /healthz or /health to avoid redirects.
- Around line 62-66: The ECS health checks use health_check_command and require
curl/wget but the backend and frontend Alpine images do not install them; update
the Dockerfile RUN apk add line(s) that install packages (the RUN command that
currently installs make and postgresql16-client in the backend image) to include
curl (e.g., add curl to the apk add list) and likewise add wget to the frontend
image's apk add invocation so the health_check_command (and the frontend
equivalent) succeed; ensure you preserve the existing package versions/flags
(e.g., --no-cache) when adding curl/wget.
---
Nitpick comments:
In `@infrastructure/modules/service/variables.tf`:
- Around line 30-52: Add validation blocks to each health check variable
(health_check_interval, health_check_timeout, health_check_retries,
health_check_start_period) to enforce AWS ECS allowed ranges: interval 5–300,
timeout 2–120, retries 1–10, start_period 0–300; for each variable add a
validation with a condition expression that checks the numeric range and a clear
error message explaining the valid range so Terraform will fail fast with a
helpful message instead of AWS API errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 593e438c-82ca-4d15-87b1-a0b41c3ec1e5
📒 Files selected for processing (3)
infrastructure/live/main.tfinfrastructure/modules/service/main.tfinfrastructure/modules/service/variables.tf
There was a problem hiding this comment.
1 issue found across 3 files
Confidence score: 2/5
- There is a concrete regression risk in
infrastructure/live/main.tf: the ECS health check invokescurl, but the backend runtime image reportedly does not include it, so health checks can fail even when the app is otherwise healthy. - This can cause ECS tasks to be marked unhealthy and repeatedly restarted, creating real availability risk rather than a minor housekeeping issue.
- Given the severity (7/10) and high confidence (8/10), this is a high-risk merge unless the health check command or image dependencies are aligned before rollout.
- Pay close attention to
infrastructure/live/main.tf- health check command/runtime dependency mismatch may trigger unhealthy task restart loops.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="infrastructure/live/main.tf">
<violation number="1" location="infrastructure/live/main.tf:62">
P1: Backend ECS health check depends on `curl`, but backend runtime image does not install `curl`, risking unhealthy task restart loops.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/frontend/Dockerfile`:
- Line 25: The Dockerfile currently installs wget in the builder stage (the line
with "apk add --no-cache libc6-compat wget"), but the final image is produced
from the runner stage so wget is missing at runtime; remove wget from the
builder stage and add it to the runner stage's apk add invocation (the runner
stage block around the later "apk add --no-cache ...") so that wget is present
in the final image used for ECS health checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 15b722ec-1161-4744-a219-48b04c71e136
📒 Files selected for processing (2)
docker/backend/Dockerfiledocker/frontend/Dockerfile
✅ Files skipped from review due to trivial changes (1)
- docker/backend/Dockerfile
0cc418b
There was a problem hiding this comment.
🧹 Nitpick comments (1)
infrastructure/modules/service/tests/service.tftest.hcl (1)
78-84: Use non-default numeric values to strengthen health-check wiring validation.Lines 80-83 currently set default values (
30/3/60/5), so a hardcoded-default bug could still pass. Use non-default test inputs so this run actually verifies variable propagation.Proposed test hardening
variables { health_check_command = ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"] - health_check_interval = 30 - health_check_retries = 3 - health_check_start_period = 60 - health_check_timeout = 5 + health_check_interval = 17 + health_check_retries = 4 + health_check_start_period = 11 + health_check_timeout = 9 }Also applies to: 96-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infrastructure/modules/service/tests/service.tftest.hcl` around lines 78 - 84, The test uses default numeric values in the variables block (health_check_interval, health_check_retries, health_check_start_period, health_check_timeout) which can mask hardcoded-default bugs; update these variables in the variables { ... } block (and the similar block at lines 96-113) to non-default, clearly distinct integers (e.g., interval/retries/start_period/timeout values that differ from 30/3/60/5) so the test actually verifies propagation of supplied values through the health-check wiring.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@infrastructure/modules/service/tests/service.tftest.hcl`:
- Around line 78-84: The test uses default numeric values in the variables block
(health_check_interval, health_check_retries, health_check_start_period,
health_check_timeout) which can mask hardcoded-default bugs; update these
variables in the variables { ... } block (and the similar block at lines 96-113)
to non-default, clearly distinct integers (e.g.,
interval/retries/start_period/timeout values that differ from 30/3/60/5) so the
test actually verifies propagation of supplied values through the health-check
wiring.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 76655c2a-d9fd-477f-adc2-4d41d9ed6dbd
📒 Files selected for processing (1)
infrastructure/modules/service/tests/service.tftest.hcl
495af75
There was a problem hiding this comment.
🧹 Nitpick comments (1)
infrastructure/modules/service/tests/service.tftest.hcl (1)
78-110: Assert exact health-check timing values, not only non-null.
!= nullwon’t catch regressions where wrong constants/units are wired. Prefer asserting equality, ideally with non-default overrides in this run.Suggested test tightening
run "test_health_check_included" { command = apply variables { health_check_command = ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"] + health_check_interval = 15 + health_check_retries = 5 + health_check_start_period = 45 + health_check_timeout = 7 } @@ assert { - condition = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.interval, null) != null - error_message = "Health check interval must not be null when health check is configured." + condition = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.interval, null) == var.health_check_interval + error_message = "Health check interval must match health_check_interval." } assert { - condition = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.retries, null) != null - error_message = "Health check retries must not be null when health check is configured." + condition = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.retries, null) == var.health_check_retries + error_message = "Health check retries must match health_check_retries." } assert { - condition = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.startPeriod, null) != null - error_message = "Health check start period must not be null when health check is configured." + condition = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.startPeriod, null) == var.health_check_start_period + error_message = "Health check start period must match health_check_start_period." } assert { - condition = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.timeout, null) != null - error_message = "Health check timeout must not be null when health check is configured." + condition = try(jsondecode(aws_ecs_task_definition.main.container_definitions)[0].healthCheck.timeout, null) == var.health_check_timeout + error_message = "Health check timeout must match health_check_timeout." } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@infrastructure/modules/service/tests/service.tftest.hcl` around lines 78 - 110, The current asserts only check healthCheck fields are non-null; change them to assert exact expected values instead: replace the != null checks for healthCheck.interval, .retries, .startPeriod, and .timeout with equality comparisons against the intended numeric constants (and for the command assert equality to var.health_check_command is fine) by reading aws_ecs_task_definition.main.container_definitions, jsondecoding the [0].healthCheck fields and comparing them to the specific expected numbers you intend to enforce (e.g., interval = X, retries = Y, startPeriod = Z, timeout = W) so the test fails on incorrect constants/units rather than just nulls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@infrastructure/modules/service/tests/service.tftest.hcl`:
- Around line 78-110: The current asserts only check healthCheck fields are
non-null; change them to assert exact expected values instead: replace the !=
null checks for healthCheck.interval, .retries, .startPeriod, and .timeout with
equality comparisons against the intended numeric constants (and for the command
assert equality to var.health_check_command is fine) by reading
aws_ecs_task_definition.main.container_definitions, jsondecoding the
[0].healthCheck fields and comparing them to the specific expected numbers you
intend to enforce (e.g., interval = X, retries = Y, startPeriod = Z, timeout =
W) so the test fails on incorrect constants/units rather than just nulls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3578054e-0d9d-4e93-9f8d-a6347aae5898
📒 Files selected for processing (1)
infrastructure/modules/service/tests/service.tftest.hcl
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="infrastructure/modules/service/tests/service.tftest.hcl">
<violation number="1" location="infrastructure/modules/service/tests/service.tftest.hcl:93">
P2: Health-check test assertions were weakened from exact value validation to non-null checks, reducing ability to catch variable-to-task-definition mapping regressions.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| target_group_arn = module.alb.backend_target_group_arn | ||
| task_role_policy_arns = [module.storage.static_read_write_policy_arn] | ||
| use_fargate_spot = var.backend_use_fargate_spot | ||
| health_check_command = ["CMD-SHELL", "curl -f http://localhost:8000/health/ || exit 1"] |
There was a problem hiding this comment.
can i go with 127.0.0.1 (as it is included in allowed host) instead of hostname ?
6d284fe
57fb459
|





Proposed change
Resolves #4483
At Container-level added a ECS container health check
Defined container HealthCheck for both frontend and backend tasks.
Checklist
make check-testlocally: all warnings addressed, tests passed