Skip to content

test: add multi-language coverage for eval/hygiene.py (59% -> 84%)#166

Open
lukeinglis wants to merge 1 commit intoakashgit:mainfrom
lukeinglis:tests/eval-hygiene-coverage
Open

test: add multi-language coverage for eval/hygiene.py (59% -> 84%)#166
lukeinglis wants to merge 1 commit intoakashgit:mainfrom
lukeinglis:tests/eval-hygiene-coverage

Conversation

@lukeinglis
Copy link
Copy Markdown
Collaborator

Summary

  • Adds 18 tests covering previously untested code paths in factory/eval/hygiene.py
  • Coverage: 59% -> 84% (47 lines remaining, down from 116)
  • All tests mock _run_cmd() to avoid requiring Node.js/Rust/Go toolchains

What's covered

Area Tests Added
_run_cmd() error handling 3 (timeout, command not found, generic exception)
eval_tests() Node.js/Rust/Go 5 (pass/fail for each language + mixed aggregation)
eval_tests() scoring 2 (all-fail, partial-pass)
eval_lint() Node.js/Rust 4 (clean + errors for each)
eval_lint() scoring 2 (partial credit, floor at zero)
eval_type_check() Node.js 2 (clean + errors)

Remaining uncovered lines (47)

The remaining gaps are Python-specific branches that require real subprocess execution (pytest coverage parsing, mypy type-check parsing) rather than mocked outputs. These run against the actual project in CI and are covered by the existing python_project fixture tests.

Test plan

  • All 36 hygiene tests pass (uv run pytest tests/eval/test_hygiene.py -v)
  • Full suite passes: 1338 passed (uv run pytest tests/ -x -q)
  • No new dependencies or fixtures required

Add 18 tests covering previously untested code paths in the hygiene
eval module:

- _run_cmd() error handling: timeout, command not found, generic errors
- eval_tests() for Node.js, Rust, and Go projects with mocked outputs
- eval_tests() scoring: all-fail and partial-pass edge cases
- eval_lint() for Node.js (eslint) and Rust (clippy) with clean/error
- eval_lint() scoring: partial credit calculation and floor at zero
- eval_type_check() for Node.js (tsc) with clean/error outputs
- Mixed project aggregation across Python + Node.js

All tests mock _run_cmd() to avoid requiring Node/Rust/Go toolchains.

Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.33%. Comparing base (d556bbb) to head (e97cd81).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
+ Coverage   86.23%   87.33%   +1.09%     
==========================================
  Files          45       45              
  Lines        6307     6307              
==========================================
+ Hits         5439     5508      +69     
+ Misses        868      799      -69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Owner

@akashgit akashgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Coverage claim verified accurate (116 missed lines → 47). All 1338 tests pass, lint is clean. Only tests/eval/test_hygiene.py is modified — the 14-file diff shown by gh pr diff is a branch divergence artifact (branch is behind PRs #136 and #163). Good contribution overall, but assertion quality needs tightening before merge.

Finding 1: Weak assertion in test_go_project_passing — ISSUE

assert result["score"] > 0.0

With 2 "ok" lines, score should be 2/2 = 1.0. The Node and Rust passing tests both assert == 1.0 — this should too. > 0.0 would pass even if the scoring was completely wrong.

Fix: assert result["score"] == 1.0

Finding 2: Missing score assertion in test_go_project_failing — ISSUE

Checks passed is False and "go" in details but never checks the score. Should be 0.0 (0 passed, 1 failed).

Fix: Add assert result["score"] == 0.0

Finding 3: Missing score assertion in test_node_typecheck_errors — ISSUE

With 2 TS errors at 0.05 penalty each, score should be 0.9. Never checked.

Fix: Add assert result["score"] == 0.9

Finding 4: Missing score assertion in test_node_lint_errors — ISSUE

With 3 errors at 0.1 penalty each, score should be 0.7. Never checked.

Fix: Add assert result["score"] == 0.7

Finding 5: Missing score assertion in test_rust_lint_errors — ISSUE

With 2 errors at 0.1 penalty each, score should be 0.8. Never checked.

Fix: Add assert result["score"] == 0.8

Finding 6: Fragile detail string assertions — ISSUE

assert "2" in result["details"]  # matches "12 errors", "rs2", paths containing "2"

Used in multiple tests. These are too loose — "2" matches any string containing the digit 2.

Fix: Use "2 errors" in result["details"] or more specific patterns.

Finding 7: test_mixed_project_aggregates doesn't verify passed field — ISSUE

The mock has Node returning 1 failure, so passed should be False. Never checked — a bug that left passed: True with sub-project failures would be invisible.

Fix: Add assert result["passed"] is False

Branch hygiene

Branch needs rebase on current main (behind PRs #136 and #163). No conflicts on test merge, but rebase will clean up the diff.


Verdict: Do not merge yet. The tests are structurally sound but under-assert — they verify shape but not values. All 7 issues are straightforward fixes (add score assertions, tighten string matches). After fixes + rebase, this is a solid contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants