Priority Level
Medium
Task Summary
Our error-rate calculation for early shutdown in the ConcurrentThreadExecutor attempts to calculate error rates more or less in real time. This can lead to massive overestimates of the error rate, particularly for jobs with high concurrency, since jobs can fail faster than they succeed.
The proposal of this issue is to instead calculate error rates at the batch level (i.e., outside of ConcurrentThreadExecutor).
A couple benefits of this approach:
- The batch-level error rate will be a much more stable measurement at a consistent scale across jobs.
- This will allow the early-shutdown mechanism to be applied to generators that do not support concurrency.
One downside is that you always have to wait for at least one batch to complete. This seems acceptable given that the batch size is adjustable and (for cases where this all matters) will generally be much smaller than the target number of records.
Priority Level
Medium
Task Summary
Our error-rate calculation for early shutdown in the
ConcurrentThreadExecutorattempts to calculate error rates more or less in real time. This can lead to massive overestimates of the error rate, particularly for jobs with high concurrency, since jobs can fail faster than they succeed.The proposal of this issue is to instead calculate error rates at the batch level (i.e., outside of
ConcurrentThreadExecutor).A couple benefits of this approach:
One downside is that you always have to wait for at least one batch to complete. This seems acceptable given that the batch size is adjustable and (for cases where this all matters) will generally be much smaller than the target number of records.