dp: Fix DP scheduler locking#10327
Merged
abonislawski merged 1 commit intothesofproject:mainfrom Oct 30, 2025
Merged
Conversation
When at least two DP modules are running, each on a separate core, using irq_lock() may lead to interrupts being disabled for a very long time. When module_is_ready_to_process() always returns true, the DP task is executed in a loop all the time except for periods when preempted by higher priority threads. irq_lock() disables interrupts globally. Using irq_lock() on multiple cores can lead to unbalanced double locks without unlock in between. Consider the case: core 1 calls irq_lock(1); this does not prevent core 2 from also calling flags = irq_lock(2); now flags contains the "interrupts disabled" state as interrupts were previously globally disabled by core 1. Then core 1 calls irq_unlock() -- interrupts are re-enabled; then core 2 calls irq_unlock(flags) to restore interrupts, which actually leads to interrupts being disabled. On the next loop iteration, core 1 calls flags = irq_lock(1), and since then interrupts might be disabled forever with only two DP threads constantly running. This fixes a regression in multicore DP tests. The issue is triggered by this commit 4225c27, which just allows the DP task to run all the available time without being triggered by LL for every cycle. Signed-off-by: Serhiy Katsyuba <serhiy.katsyuba@intel.com>
lyakh
approved these changes
Oct 27, 2025
softwarecki
approved these changes
Oct 27, 2025
Collaborator
softwarecki
left a comment
There was a problem hiding this comment.
I'm glad my solution helped resolve the issue :)
tmleman
approved these changes
Oct 27, 2025
lgirdwood
approved these changes
Oct 27, 2025
Member
|
@serhiy-katsyuba-intel can you check the internal CI. Thanks ! |
Contributor
Author
The CI fails on test_00_11_enter_d3_with_topology_stress test on NVL FPGA. That test does not use DP, so should not be directly affected by the changes from this PR. I checked few neighboring PRs: they all fail the same test. cc: @lrudyX , @tmleman . |
Fail is related to DUT issue. Working on solving this problem. |
abonislawski
approved these changes
Oct 28, 2025
Contributor
Author
|
Internal Intel CI now is working and passed successfully. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When at least two DP modules are running, each on a separate core, using irq_lock() may lead to interrupts being disabled for a very long time. When module_is_ready_to_process() always returns true, the DP task is executed in a loop all the time except for periods when preempted by higher priority threads. irq_lock() disables interrupts globally. Using irq_lock() on multiple cores can lead to unbalanced double locks without unlock in between.
Consider the case: core 1 calls irq_lock(1); this does not prevent core 2 from also calling flags = irq_lock(2); now flags contains the "interrupts disabled" state as interrupts were previously globally disabled by core 1. Then core 1 calls irq_unlock() -- interrupts are re-enabled; then core 2 calls irq_unlock(flags) to restore interrupts, which actually leads to interrupts being disabled. On the next loop iteration, core 1 calls flags = irq_lock(1), and since then interrupts might be disabled forever with only two DP threads constantly running.
This fixes a regression in multicore DP tests. The issue is triggered by this commit 4225c27, which just allows the DP task to run all the available time without being triggered by LL for every cycle.