Update runtests.jl by depial · Pull Request #1096 · exercism/julia

depial · 2026-03-07T12:38:09Z

Addresses Issue #1095.

Beyond testing for correct "fleet" size:

testing added for random "fleet" selection in Task 5
changed input to a copy of starships in case student uses the mutating shuffle! in Task 5.
corrected testing for correct prefix in Task 2
corrected testing for correct numerical range in Task 2

The testing for randomness of fleet selection is rather weak (only requires one ship out of given order) but I figure that's probably good enough. If there are other ideas, feel free to discuss.

Update: I've adjusted the testing for randomness in Task 5 to better reflect that the distribution of fixed points for random permutations is Poisson(1). I've set the limit of fixed points to be three, since the probability of having 4 or more fixed points is around 2%, but we could be more generous by setting the limit to four. This test is aimed at the larger fleets since this passes trivially for tests on fleets of under four starships. The issue is still open to discussion on whether there is a better option.

Addresses Issue #1095. Beyond testing for correct "fleet" size: - testing added for random "fleet" selection in Task 5 - changed input to a copy of `starships` in case student uses the mutating `shuffle!` - corrected testing for correct prefix in Task 2 - corrected testing for correct numerical range in Task 2 The testing for randomness of fleet selection is rather weak, only needed one ship out of given order, but I figure that's probably good enough. If there are other ideas, feel free to discuss.

github-actions · 2026-03-07T12:38:17Z

This PR touches files which potentially affect the outcome of the tests of an exercise. This will cause all students' solutions to affected exercises to be re-tested.

If this PR does not affect the result of the test (or, for example, adds an edge case that is not worth rerunning all tests for), please add the following to the merge-commit message which will stops student's tests from re-running. Please copy-paste to avoid typos.

[no important files changed]

For more information, refer to the documentation. If you are unsure whether to add the message or not, please ping @exercism/maintainers-admin in a comment. Thank you!

depial · 2026-03-07T12:39:37Z

I think these changes are big enough that we should rerun the ~180 students' solutions.

Update to reflect distribution of fixed points in random permutations (`Poisson(1)`).

depial · 2026-03-07T14:10:53Z

A further thought: It's possible to use the Chi-squared test for Task 3 if desired since the test statistic is the only thing really needed if you were going to hard code the number of bins (aka degrees of freedom) and the level of alpha. No need to actually compute the full p-value.

Edit: I've looked more closely at the RosettaCode example link that you provided in run_tests.jl and I don't understand where it's coming from for a number of reasons, but I'm not terribly well versed in Chi-squared tests. In any case, from what I can tell, if you run that code on exemplar.jl with our current tests, you'll end up with pval = 0.0 every time.

Instead of that implementation, I would have expected to bin the results, as you did into three bins for example, then work with counts. This produces more reasonable numbers and we could use the following test:

third = length(random_stardates) / 3
test_stat = sum((bin - third)^2 for bin in (low, med, high)) / third
test_stat < 6

Where I've used the counts from three bins you did originally. This avoids the full p-value calculation by working directly with the test statistic.

Just for reference, the test statistic for the exemplar.jl is ~1.5 and a skewed distribution with all the points in the high bin is 6666 (the test statistic for the RosettaCode implementation is ~20000).

colinleach · 2026-03-07T16:59:42Z

I used to know a reasonable amount about statistics, but I seem to have forgotten most of it! That said, your suggestions look reasonable and I'll go with what you recommend.

Porting this exercise to R is fairly high on the TODO list, so it would be good to have a working strategy.

depial · 2026-03-07T18:01:16Z

I've moved to the Chi-squared test, but I'm a bit confused as to why Julia nightly would be failing when the others pass. I tried to be informative and leave a comment for students because the test gives no information on what to shoot for.

Should I try removing the comments? I really don't understand what else could have broken with the nightly release.

Edit: I'm trying different possible modifications in the two tests which failed. Hopefully I'll be able to pin down the issue.

colinleach · 2026-03-07T18:13:42Z

It may just have been a random glitch? Or a bug in the nightly code? I find it hard to see what difference a comment makes!

What are the implications of relaxing the test a bit? Maybe setting the threshold to 7 instead of 6 is not statistically rigorous, but ultimately the future of civilization doesn't depend on how we test a learning exercise?

depial · 2026-03-07T18:16:56Z

It looks like it was just a glitch. I had removed the comment from one of the tests and left it in another and it passed. I've put it all back to how it was now and added some inclusive limits on some of the conditions while I was there.

That said, we can put it at 7 if you'd like, for alpha = 0.025 the threshold is 7.377

colinleach · 2026-03-07T18:28:15Z

It looks like it was just a glitch.

Good outcome!

we can put it at 7 if you'd like

If it now passes everything, I've no strong feelings. Just a general sense that this is a context where wrongly passing a student solution is less serious than wrongly failing.

depial · 2026-03-07T18:29:48Z

I used to know a reasonable amount about statistics, but I seem to have forgotten most of it!

I hope I'm getting my stats right. I'm in the process of learning a reasonable amount and then immediately forgetting it 😅

Let me go ahead and double check I've been doing everything correctly before merging.

depial · 2026-03-07T18:55:09Z

I'm already seeing a lot of failed solutions... Looks like the most common errors are either not shuffling properly or the issue with not using the correct length.

However, I'm also seeing some which look to have failed, but shouldn't have... I counted eight false negatives, but I might have over counted based on how Exercism displays the results.

colinleach · 2026-03-07T20:48:51Z

I'm also seeing some which look to have failed, but shouldn't have

That's worrying! What are we going to do about it?

depial · 2026-03-07T21:27:07Z

Well, I've gone through and made another count and it's about 10% of the current solutions are failing, but pass when I run them locally. I would suggest pushing the threshold for the Chi-squared test back. I ran some basic tests and found that if the bins were skewed by (-75, -75, +150) from the expected count (3,333), this would fail, so we can push the alpha down if you'd like.

I haven't done all the statistics on the Poisson case, but it should be only 2% chance of producing 4 or more stationary points, then at least four of those would have to find their way into the 1-20 elements out of 100 that we are asking for, so I find it highly unlikely that's the issue, but we can push that limit up one as well.

In the end, I don't think it would be a huge problem for students, since there would not likely be more than one false negative for the odd individual, but it could be a pain for us in the future if the exemplar.jl fails fairly regularly during CI checks.

colinleach · 2026-03-07T21:42:35Z

I (quite strongly) support relaxing the criteria.

depial requested a review from colinleach March 7, 2026 12:39

Update testing for randomness in Task 5

7bccd46

Update to reflect distribution of fixed points in random permutations (`Poisson(1)`).

colinleach approved these changes Mar 7, 2026

View reviewed changes

Update to Chi-squared testing

72c0a2a

Update runtests.jl

0296f73

Update runtests.jl

362c99a

depial merged commit 37bcdc7 into main Mar 7, 2026
5 checks passed

depial deleted the depial-patch-1 branch March 7, 2026 18:48

depial mentioned this pull request Mar 7, 2026

Also check correct number of choosen starships #1095

Closed

Uh oh!

Conversation

depial commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 7, 2026

Uh oh!

depial commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

depial commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinleach commented Mar 7, 2026

Uh oh!

depial commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinleach commented Mar 7, 2026

Uh oh!

depial commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinleach commented Mar 7, 2026

Uh oh!

depial commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

depial commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinleach commented Mar 7, 2026

Uh oh!

depial commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colinleach commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

depial commented Mar 7, 2026 •

edited

Loading

depial commented Mar 7, 2026 •

edited

Loading

depial commented Mar 7, 2026 •

edited

Loading

depial commented Mar 7, 2026 •

edited

Loading

depial commented Mar 7, 2026 •

edited

Loading

depial commented Mar 7, 2026 •

edited

Loading

depial commented Mar 7, 2026 •

edited

Loading

depial commented Mar 7, 2026 •

edited

Loading