Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,12 @@
<td>Integer</td>
<td>The tolerable checkpoint consecutive failure number. If set to 0, that means we do not tolerance any checkpoint failure. This only applies to the following failure reasons: IOException on the Job Manager, failures in the async phase on the Task Managers and checkpoint expiration due to a timeout. Failures originating from the sync phase on the Task Managers are always forcing failover of an affected task. Other types of checkpoint failures (such as checkpoint being subsumed) are being ignored.</td>
</tr>
<tr>
<td><h5>execution.checkpointing.unaligned.during-recovery.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to enable checkpointing during recovery from an unaligned checkpoint. When enabled, the job can take checkpoints while still recovering channel state (inflight data) from a previous unaligned checkpoint. This avoids the need to wait for full recovery before the first checkpoint can be triggered, which reduces the window of vulnerability to failures during recovery.<br /><br />This option requires <code class="highlighter-rouge">execution.checkpointing.unaligned.recover-output-on-downstream.enabled</code> to be enabled. It does not require unaligned checkpoints to be currently enabled, because a job may restore from an unaligned checkpoint while having unaligned checkpoints disabled for the new execution.</td>
</tr>
<tr>
<td><h5>execution.checkpointing.unaligned.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@
<td>Integer</td>
<td>The maximum number of completed checkpoints to retain.</td>
</tr>
<tr>
<td><h5>execution.checkpointing.unaligned.during-recovery.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to enable checkpointing during recovery from an unaligned checkpoint. When enabled, the job can take checkpoints while still recovering channel state (inflight data) from a previous unaligned checkpoint. This avoids the need to wait for full recovery before the first checkpoint can be triggered, which reduces the window of vulnerability to failures during recovery.<br /><br />This option requires <code class="highlighter-rouge">execution.checkpointing.unaligned.recover-output-on-downstream.enabled</code> to be enabled. It does not require unaligned checkpoints to be currently enabled, because a job may restore from an unaligned checkpoint while having unaligned checkpoints disabled for the new execution.</td>
</tr>
<tr>
<td><h5>execution.checkpointing.unaligned.recover-output-on-downstream.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -670,8 +670,7 @@ public class CheckpointingOptions {
+ "when job restores from the unaligned checkpoint.");

@Experimental
@Documentation.ExcludeFromDocumentation(
"This option is not yet ready for public use, will be documented in a follow-up commit")
@Documentation.Section(Documentation.Sections.COMMON_CHECKPOINTING)
public static final ConfigOption<Boolean> UNALIGNED_DURING_RECOVERY_ENABLED =
ConfigOptions.key("execution.checkpointing.unaligned.during-recovery.enabled")
.booleanType()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ private static void randomizeConfiguration(MiniCluster miniCluster, Configuratio
randomize(conf, CheckpointingOptions.ENABLE_UNALIGNED, true, false);
randomize(
conf, CheckpointingOptions.UNALIGNED_RECOVER_OUTPUT_ON_DOWNSTREAM, true, false);
randomize(conf, CheckpointingOptions.UNALIGNED_DURING_RECOVERY_ENABLED, true, false);
randomize(
conf,
CheckpointingOptions.ALIGNED_CHECKPOINT_TIMEOUT,
Expand Down