dispatchermanager: avoid redo init nil pointer panic#4268
dispatchermanager: avoid redo init nil pointer panic#4268ti-chi-bot[bot] merged 3 commits intopingcap:masterfrom
Conversation
Summary of ChangesHello @hongyunyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of the redo dispatcher manager by addressing potential nil pointer panics during its initialization and subsequent pre-check operations. The changes ensure that redo-related components are fully prepared and validated before being utilized, thereby improving the stability of the system when operating in redo mode. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to prevent a nil pointer panic during redo log initialization by reordering the statement that enables the redo feature. While this is a step in the right direction, it does not fully address the underlying data race. The RedoEnable flag is accessed by multiple goroutines without proper synchronization, which can still lead to race conditions and panics. My review includes critical feedback on how to correctly fix this data race using atomic operations to ensure memory visibility across goroutines.
| manager.redoSchemaIDToDispatchers = dispatcher.NewSchemaIDToDispatchers() | ||
| // Publish redo availability only after all redo components are initialized, | ||
| // so scheduler precheck won't observe a partially initialized manager. | ||
| manager.RedoEnable = true |
There was a problem hiding this comment.
While moving this line is a good step, it doesn't fully resolve the potential data race. The RedoEnable field is a plain bool and is accessed from multiple goroutines without synchronization. This means a compiler or CPU could reorder memory operations, and another goroutine might see RedoEnable as true before the other redo* fields are initialized, leading to a nil pointer dereference.
To fix this data race correctly, RedoEnable should be an atomic boolean. This will ensure the necessary memory barriers are in place to guarantee that if RedoEnable is read as true, all preceding writes to other fields in this block are visible to the reading goroutine.
You would need to:
- Change the type of
RedoEnableinDispatcherManagertoatomic.Bool. - Use
manager.RedoEnable.Store(true)here. - Use
manager.RedoEnable.Load()in all places where it's read, such as inpreCheckForSchedulerHandler.
| manager.RedoEnable = true | |
| manager.RedoEnable.Store(true) |
| if isRedo && (!dispatcherManager.RedoEnable || | ||
| dispatcherManager.redoDispatcherMap == nil || | ||
| dispatcherManager.redoSchemaIDToDispatchers == nil) { |
There was a problem hiding this comment.
In addition to my comment on dispatcher_manager_redo.go, to fix the data race on RedoEnable, you should use dispatcherManager.RedoEnable.Load() here. The added check for redoSchemaIDToDispatchers == nil is a good defensive measure, but the atomic load is necessary for correctness.
| if isRedo && (!dispatcherManager.RedoEnable || | |
| dispatcherManager.redoDispatcherMap == nil || | |
| dispatcherManager.redoSchemaIDToDispatchers == nil) { | |
| if isRedo && (!dispatcherManager.RedoEnable.Load() || | |
| dispatcherManager.redoDispatcherMap == nil || | |
| dispatcherManager.redoSchemaIDToDispatchers == nil) { |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughDefer publishing redo availability until redo components (redoSink, redoDispatcherMap, redoSchemaIDToDispatchers) are fully initialized; add nil-checks and consolidate redo readiness into a helper used by scheduler prechecks. No public APIs changed. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
downstreamadapter/dispatchermanager/dispatcher_manager_redo.go (1)
51-53: Consider movingmanager.RedoEnable = trueto just beforereturn nilfor correctness-by-construction.The comment states redo availability is published "only after all redo components are initialized," but
RedoEnableis set at line 53 while substantial initialization still follows — quota computation (lines 55–61),NewTableTriggerRedoDispatcher(lines 64–69), metrics registration (lines 71–73),RegisterRedoMessageDs, and the goroutine start (lines 76–83).The fix is functionally correct today because the precheck in
helper.goonly gates onredoDispatcherMapandredoSchemaIDToDispatchers(both set before line 53). However, if initialization steps are added after line 53 in the future that must complete before the scheduler precheck proceeds, the current position would silently expose an incompletely initialized manager.Moving
manager.RedoEnable = trueto immediately beforereturn nilwould make the code match the comment's stated intent and be naturally correct in the face of future additions.♻️ Proposed refactor
manager.redoDispatcherMap = newDispatcherMap[*dispatcher.RedoDispatcher]() manager.redoSink = redo.New(ctx, changefeedID, manager.config.Consistent) manager.redoSchemaIDToDispatchers = dispatcher.NewSchemaIDToDispatchers() - // Publish redo availability only after all redo components are initialized, - // so scheduler precheck won't observe a partially initialized manager. - manager.RedoEnable = true totalQuota := manager.sinkQuota // ... (quota, dispatcher creation, metrics, goroutine) ... + // Publish redo availability only after all redo components are fully initialized, + // so scheduler precheck won't observe a partially initialized manager. + manager.RedoEnable = true return nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@downstreamadapter/dispatchermanager/dispatcher_manager_redo.go` around lines 51 - 53, Move the publish of redo availability so it truly occurs after all initialization: remove the early assignment to manager.RedoEnable (manager.RedoEnable = true) and place it immediately before the function's successful return (just before return nil) in dispatcher_manager_redo.go. Ensure this preserves existing population of redoDispatcherMap and redoSchemaIDToDispatchers (the helper.go precheck relies on them) and keep all subsequent initialization steps — quota computation, NewTableTriggerRedoDispatcher, metrics registration, RegisterRedoMessageDs and the goroutine start — executed before setting manager.RedoEnable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@downstreamadapter/dispatchermanager/dispatcher_manager_redo.go`:
- Around line 51-53: Move the publish of redo availability so it truly occurs
after all initialization: remove the early assignment to manager.RedoEnable
(manager.RedoEnable = true) and place it immediately before the function's
successful return (just before return nil) in dispatcher_manager_redo.go. Ensure
this preserves existing population of redoDispatcherMap and
redoSchemaIDToDispatchers (the helper.go precheck relies on them) and keep all
subsequent initialization steps — quota computation,
NewTableTriggerRedoDispatcher, metrics registration, RegisterRedoMessageDs and
the goroutine start — executed before setting manager.RedoEnable.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
downstreamadapter/dispatchermanager/dispatcher_manager_redo.godownstreamadapter/dispatchermanager/helper.go
| isRedo := common.IsRedoMode(req.Config.Mode) | ||
| if isRedo && (!dispatcherManager.RedoEnable || dispatcherManager.redoDispatcherMap == nil) { | ||
| if isRedo && (!dispatcherManager.RedoEnable || | ||
| dispatcherManager.redoDispatcherMap == nil || |
There was a problem hiding this comment.
I think it could be extracted as a function
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lidezhu, wk989898 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: close #4271
What is changed and how it works?
This pull request enhances the robustness of the redo dispatcher manager by addressing potential nil pointer panics during its initialization and subsequent pre-check operations. The changes ensure that redo-related components are fully prepared and validated before being utilized, thereby improving the stability of the system when operating in redo mode.
Highlights
RedoEnableis set to true only after all necessary redo structures (redoDispatcherMap,redoSink,redoSchemaIDToDispatchers) are fully initialized, preventing potential nil pointer panics.dispatcherManager.redoSchemaIDToDispatchers, adding an extra layer of safety against nil pointer dereferences during scheduler operations.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note
Summary by CodeRabbit