fix: [DSM-103] Reset a canister's priority credit when aborted#8947
fix: [DSM-103] Reset a canister's priority credit when aborted#8947alin-at-dfinity wants to merge 3 commits intomasterfrom
Conversation
Without this, canisters with 100 compute allocation could accumulate priority (e.g. from refunds due to aborted executions) that it would never be able to spend. This is rather unfortunate (especially the arbitrary bit), but necessary.
| // Without this, a canister with 100 compute allocation will accumulate 100 | ||
| // priority it can then never spend for every round of an aborted DTS execution | ||
| // (priority credit is increased by 100 per round, but reset on abort). | ||
| const AP_ROUNDS_MAX: i64 = 5; |
There was a problem hiding this comment.
Doesn't the value of 5 mean that a canister with 100% compute allocation and a few aborted executions will have a priority of ~500 forever and thus always be prioritized over canisters with 100% compute allocation and no aborted executions whose priority will be ~100? (Vurrently, there cannot be two canisters with 100% compute allocation, but once we increase the number of cores, it'll be possible.)
There was a problem hiding this comment.
Well, not forever ("be prioritized over canisters with 100% compute allocation and no aborted executions"). If the other 100 compute allocation canisters don't get scheduled a few rounds, then they will also get bumped and become stuck at 500 AP.
I suppose a better solution is to leave 100 compute allocation canisters out of the regular schedule; and simply execute them every round (that they have something to execute). Thing is, a 99 compute allocation canister won't behave all that differently either: due to e.g. aborted executions it might still end up with ever increasing AP; while it may technically be able to consume more compute (100) than it pays for every round (99), we may still gift it more than 100 compute per round on average due to aborted executions and whatnot.
I suppose the way to look at these ever-increasing priorities is that they are simply a reflection of the fact that (in the presence of DTS) we cannot actually provide some canisters with the amount of compute they're paying for: they pay for 100 rounds out if 100, but due to aborting we can only provide e.g. an average of 98 rounds out of 100.
When a canister's DTS execution is aborted (at a checkpoint or because of too many paused long executions), its priority credit should be reset instead of applied. It is through no fault of the canister that it got aborted, so it should not be charged for compute it did not receive.
Also enforce an arbitrary maximum accumulated priority. Without this, canisters with 100 compute allocation would accumulate priority due to aborted executions that they would never be able to spend. This is rather unfortunate (especially the arbitrary bit), but necessary.