Skip to content

dispatchermanager: avoid redo init nil pointer panic#4268

Merged
ti-chi-bot[bot] merged 3 commits intopingcap:masterfrom
hongyunyan:0225
Feb 25, 2026
Merged

dispatchermanager: avoid redo init nil pointer panic#4268
ti-chi-bot[bot] merged 3 commits intopingcap:masterfrom
hongyunyan:0225

Conversation

@hongyunyan
Copy link
Collaborator

@hongyunyan hongyunyan commented Feb 25, 2026

What problem does this PR solve?

Issue Number: close #4271

What is changed and how it works?

This pull request enhances the robustness of the redo dispatcher manager by addressing potential nil pointer panics during its initialization and subsequent pre-check operations. The changes ensure that redo-related components are fully prepared and validated before being utilized, thereby improving the stability of the system when operating in redo mode.

Highlights

  • Redo Component Initialization: Reordered the initialization sequence for redo components to ensure RedoEnable is set to true only after all necessary redo structures (redoDispatcherMap, redoSink, redoSchemaIDToDispatchers) are fully initialized, preventing potential nil pointer panics.
  • Scheduler Pre-check Enhancement: Expanded the pre-check logic for scheduler requests in redo mode to include a nil check for dispatcherManager.redoSchemaIDToDispatchers, adding an extra layer of safety against nil pointer dereferences during scheduler operations.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • Bug Fixes
    • Initialization timing for redo functionality adjusted so availability is announced only after full setup, preventing premature scheduling checks.
    • Readiness validation for redo mode strengthened to require all necessary components be initialized before allowing redo operations, improving stability and preventing partial-state behavior.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 25, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @hongyunyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the redo dispatcher manager by addressing potential nil pointer panics during its initialization and subsequent pre-check operations. The changes ensure that redo-related components are fully prepared and validated before being utilized, thereby improving the stability of the system when operating in redo mode.

Highlights

  • Redo Component Initialization: Reordered the initialization sequence for redo components to ensure RedoEnable is set to true only after all necessary redo structures (redoDispatcherMap, redoSink, redoSchemaIDToDispatchers) are fully initialized, preventing potential nil pointer panics.
  • Scheduler Pre-check Enhancement: Expanded the pre-check logic for scheduler requests in redo mode to include a nil check for dispatcherManager.redoSchemaIDToDispatchers, adding an extra layer of safety against nil pointer dereferences during scheduler operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • downstreamadapter/dispatchermanager/dispatcher_manager_redo.go
    • Moved the assignment of manager.RedoEnable to occur after all redo components (redoDispatcherMap, redoSink, redoSchemaIDToDispatchers) have been initialized.
  • downstreamadapter/dispatchermanager/helper.go
    • Added a nil check for dispatcherManager.redoSchemaIDToDispatchers within the preCheckForSchedulerHandler function when in redo mode.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ti-chi-bot ti-chi-bot bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 25, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to prevent a nil pointer panic during redo log initialization by reordering the statement that enables the redo feature. While this is a step in the right direction, it does not fully address the underlying data race. The RedoEnable flag is accessed by multiple goroutines without proper synchronization, which can still lead to race conditions and panics. My review includes critical feedback on how to correctly fix this data race using atomic operations to ensure memory visibility across goroutines.

manager.redoSchemaIDToDispatchers = dispatcher.NewSchemaIDToDispatchers()
// Publish redo availability only after all redo components are initialized,
// so scheduler precheck won't observe a partially initialized manager.
manager.RedoEnable = true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

While moving this line is a good step, it doesn't fully resolve the potential data race. The RedoEnable field is a plain bool and is accessed from multiple goroutines without synchronization. This means a compiler or CPU could reorder memory operations, and another goroutine might see RedoEnable as true before the other redo* fields are initialized, leading to a nil pointer dereference.

To fix this data race correctly, RedoEnable should be an atomic boolean. This will ensure the necessary memory barriers are in place to guarantee that if RedoEnable is read as true, all preceding writes to other fields in this block are visible to the reading goroutine.

You would need to:

  1. Change the type of RedoEnable in DispatcherManager to atomic.Bool.
  2. Use manager.RedoEnable.Store(true) here.
  3. Use manager.RedoEnable.Load() in all places where it's read, such as in preCheckForSchedulerHandler.
Suggested change
manager.RedoEnable = true
manager.RedoEnable.Store(true)

Comment on lines 280 to 282
if isRedo && (!dispatcherManager.RedoEnable ||
dispatcherManager.redoDispatcherMap == nil ||
dispatcherManager.redoSchemaIDToDispatchers == nil) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In addition to my comment on dispatcher_manager_redo.go, to fix the data race on RedoEnable, you should use dispatcherManager.RedoEnable.Load() here. The added check for redoSchemaIDToDispatchers == nil is a good defensive measure, but the atomic load is necessary for correctness.

Suggested change
if isRedo && (!dispatcherManager.RedoEnable ||
dispatcherManager.redoDispatcherMap == nil ||
dispatcherManager.redoSchemaIDToDispatchers == nil) {
if isRedo && (!dispatcherManager.RedoEnable.Load() ||
dispatcherManager.redoDispatcherMap == nil ||
dispatcherManager.redoSchemaIDToDispatchers == nil) {

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b66b81 and 524b988.

📒 Files selected for processing (2)
  • downstreamadapter/dispatchermanager/dispatcher_manager_redo.go
  • downstreamadapter/dispatchermanager/helper.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • downstreamadapter/dispatchermanager/helper.go

📝 Walkthrough

Walkthrough

Defer publishing redo availability until redo components (redoSink, redoDispatcherMap, redoSchemaIDToDispatchers) are fully initialized; add nil-checks and consolidate redo readiness into a helper used by scheduler prechecks. No public APIs changed.

Changes

Cohort / File(s) Summary
Redo init & enable ordering
downstreamadapter/dispatchermanager/dispatcher_manager_redo.go
Move RedoEnable publication to after redo components are initialized; add nil-check for redoSink and return storage init error if nil; add comment explaining delayed publish to avoid exposing partially-initialized manager.
Redo readiness check
downstreamadapter/dispatchermanager/helper.go
Add unexported isRedoDispatcherManagerReady(dispatcherManager *DispatcherManager) bool checking RedoEnable, redoSink, redoDispatcherMap, and redoSchemaIDToDispatchers; replace direct readiness checks in scheduler precheck with this helper.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

lgtm, size/XS

Suggested reviewers

  • 3AceShowHand

Poem

🐇 I waited till every wire was snug and tight,

then raised the flag in the soft moonlight.
Nil-checks I plant, no panics today—
schedulers hop on a safer way. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: avoiding nil pointer panics during redo initialization, which is the core objective of the PR.
Description check ✅ Passed The PR description includes the required issue number, explains the problem and solution, provides highlights of changes, and addresses the template sections comprehensively.
Linked Issues check ✅ Passed The code changes directly address the panic in issue #4271 by reordering redo initialization, adding nil-checks for redoSink, and consolidating redo-readiness validation in the pre-check handler to prevent nil pointer dereferences.
Out of Scope Changes check ✅ Passed All changes are focused on fixing the redo dispatcher nil pointer panic and hardening readiness checks, with no unrelated modifications or scope creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
downstreamadapter/dispatchermanager/dispatcher_manager_redo.go (1)

51-53: Consider moving manager.RedoEnable = true to just before return nil for correctness-by-construction.

The comment states redo availability is published "only after all redo components are initialized," but RedoEnable is set at line 53 while substantial initialization still follows — quota computation (lines 55–61), NewTableTriggerRedoDispatcher (lines 64–69), metrics registration (lines 71–73), RegisterRedoMessageDs, and the goroutine start (lines 76–83).

The fix is functionally correct today because the precheck in helper.go only gates on redoDispatcherMap and redoSchemaIDToDispatchers (both set before line 53). However, if initialization steps are added after line 53 in the future that must complete before the scheduler precheck proceeds, the current position would silently expose an incompletely initialized manager.

Moving manager.RedoEnable = true to immediately before return nil would make the code match the comment's stated intent and be naturally correct in the face of future additions.

♻️ Proposed refactor
 	manager.redoDispatcherMap = newDispatcherMap[*dispatcher.RedoDispatcher]()
 	manager.redoSink = redo.New(ctx, changefeedID, manager.config.Consistent)
 	manager.redoSchemaIDToDispatchers = dispatcher.NewSchemaIDToDispatchers()
-	// Publish redo availability only after all redo components are initialized,
-	// so scheduler precheck won't observe a partially initialized manager.
-	manager.RedoEnable = true
 
 	totalQuota := manager.sinkQuota
 	// ... (quota, dispatcher creation, metrics, goroutine) ...
 
+	// Publish redo availability only after all redo components are fully initialized,
+	// so scheduler precheck won't observe a partially initialized manager.
+	manager.RedoEnable = true
 	return nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@downstreamadapter/dispatchermanager/dispatcher_manager_redo.go` around lines
51 - 53, Move the publish of redo availability so it truly occurs after all
initialization: remove the early assignment to manager.RedoEnable
(manager.RedoEnable = true) and place it immediately before the function's
successful return (just before return nil) in dispatcher_manager_redo.go. Ensure
this preserves existing population of redoDispatcherMap and
redoSchemaIDToDispatchers (the helper.go precheck relies on them) and keep all
subsequent initialization steps — quota computation,
NewTableTriggerRedoDispatcher, metrics registration, RegisterRedoMessageDs and
the goroutine start — executed before setting manager.RedoEnable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@downstreamadapter/dispatchermanager/dispatcher_manager_redo.go`:
- Around line 51-53: Move the publish of redo availability so it truly occurs
after all initialization: remove the early assignment to manager.RedoEnable
(manager.RedoEnable = true) and place it immediately before the function's
successful return (just before return nil) in dispatcher_manager_redo.go. Ensure
this preserves existing population of redoDispatcherMap and
redoSchemaIDToDispatchers (the helper.go precheck relies on them) and keep all
subsequent initialization steps — quota computation,
NewTableTriggerRedoDispatcher, metrics registration, RegisterRedoMessageDs and
the goroutine start — executed before setting manager.RedoEnable.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a13afb3 and 2701d14.

📒 Files selected for processing (2)
  • downstreamadapter/dispatchermanager/dispatcher_manager_redo.go
  • downstreamadapter/dispatchermanager/helper.go

isRedo := common.IsRedoMode(req.Config.Mode)
if isRedo && (!dispatcherManager.RedoEnable || dispatcherManager.redoDispatcherMap == nil) {
if isRedo && (!dispatcherManager.RedoEnable ||
dispatcherManager.redoDispatcherMap == nil ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be extracted as a function

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-linked-issue labels Feb 25, 2026
@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 25, 2026
@ti-chi-bot ti-chi-bot bot added the lgtm label Feb 25, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 25, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lidezhu, wk989898

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Feb 25, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 25, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-02-25 10:44:29.131168725 +0000 UTC m=+265341.645963334: ☑️ agreed by wk989898.
  • 2026-02-25 12:47:48.397131062 +0000 UTC m=+272740.911925671: ☑️ agreed by lidezhu.

@ti-chi-bot ti-chi-bot bot merged commit 4be7102 into pingcap:master Feb 25, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TiCDC meet panic for redo dispatcher

3 participants