test: add multi-language coverage for eval/hygiene.py (59% -> 84%)#166
test: add multi-language coverage for eval/hygiene.py (59% -> 84%)#166lukeinglis wants to merge 1 commit intoakashgit:mainfrom
Conversation
Add 18 tests covering previously untested code paths in the hygiene eval module: - _run_cmd() error handling: timeout, command not found, generic errors - eval_tests() for Node.js, Rust, and Go projects with mocked outputs - eval_tests() scoring: all-fail and partial-pass edge cases - eval_lint() for Node.js (eslint) and Rust (clippy) with clean/error - eval_lint() scoring: partial credit calculation and floor at zero - eval_type_check() for Node.js (tsc) with clean/error outputs - Mixed project aggregation across Python + Node.js All tests mock _run_cmd() to avoid requiring Node/Rust/Go toolchains. Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #166 +/- ##
==========================================
+ Coverage 86.23% 87.33% +1.09%
==========================================
Files 45 45
Lines 6307 6307
==========================================
+ Hits 5439 5508 +69
+ Misses 868 799 -69 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
akashgit
left a comment
There was a problem hiding this comment.
Code Review
Coverage claim verified accurate (116 missed lines → 47). All 1338 tests pass, lint is clean. Only tests/eval/test_hygiene.py is modified — the 14-file diff shown by gh pr diff is a branch divergence artifact (branch is behind PRs #136 and #163). Good contribution overall, but assertion quality needs tightening before merge.
Finding 1: Weak assertion in test_go_project_passing — ISSUE
assert result["score"] > 0.0With 2 "ok" lines, score should be 2/2 = 1.0. The Node and Rust passing tests both assert == 1.0 — this should too. > 0.0 would pass even if the scoring was completely wrong.
Fix: assert result["score"] == 1.0
Finding 2: Missing score assertion in test_go_project_failing — ISSUE
Checks passed is False and "go" in details but never checks the score. Should be 0.0 (0 passed, 1 failed).
Fix: Add assert result["score"] == 0.0
Finding 3: Missing score assertion in test_node_typecheck_errors — ISSUE
With 2 TS errors at 0.05 penalty each, score should be 0.9. Never checked.
Fix: Add assert result["score"] == 0.9
Finding 4: Missing score assertion in test_node_lint_errors — ISSUE
With 3 errors at 0.1 penalty each, score should be 0.7. Never checked.
Fix: Add assert result["score"] == 0.7
Finding 5: Missing score assertion in test_rust_lint_errors — ISSUE
With 2 errors at 0.1 penalty each, score should be 0.8. Never checked.
Fix: Add assert result["score"] == 0.8
Finding 6: Fragile detail string assertions — ISSUE
assert "2" in result["details"] # matches "12 errors", "rs2", paths containing "2"Used in multiple tests. These are too loose — "2" matches any string containing the digit 2.
Fix: Use "2 errors" in result["details"] or more specific patterns.
Finding 7: test_mixed_project_aggregates doesn't verify passed field — ISSUE
The mock has Node returning 1 failure, so passed should be False. Never checked — a bug that left passed: True with sub-project failures would be invisible.
Fix: Add assert result["passed"] is False
Branch hygiene
Branch needs rebase on current main (behind PRs #136 and #163). No conflicts on test merge, but rebase will clean up the diff.
Verdict: Do not merge yet. The tests are structurally sound but under-assert — they verify shape but not values. All 7 issues are straightforward fixes (add score assertions, tighten string matches). After fixes + rebase, this is a solid contribution.
Summary
factory/eval/hygiene.py_run_cmd()to avoid requiring Node.js/Rust/Go toolchainsWhat's covered
_run_cmd()error handlingeval_tests()Node.js/Rust/Goeval_tests()scoringeval_lint()Node.js/Rusteval_lint()scoringeval_type_check()Node.jsRemaining uncovered lines (47)
The remaining gaps are Python-specific branches that require real subprocess execution (pytest coverage parsing, mypy type-check parsing) rather than mocked outputs. These run against the actual project in CI and are covered by the existing
python_projectfixture tests.Test plan
uv run pytest tests/eval/test_hygiene.py -v)uv run pytest tests/ -x -q)