Skip to content

[feat] Support multi-cluster operation in Slurm backends#3639

Open
vkarak wants to merge 1 commit intoreframe-hpc:developfrom
vkarak:feat/slurm-multi-cluster
Open

[feat] Support multi-cluster operation in Slurm backends#3639
vkarak wants to merge 1 commit intoreframe-hpc:developfrom
vkarak:feat/slurm-multi-cluster

Conversation

@vkarak
Copy link
Contributor

@vkarak vkarak commented Mar 9, 2026

This PR introduces a new configuration option for Slurm backends named slurm_multi_cluster_mode that supports Slurm's Multi-Cluster Operation. If not specified, nothing changes. If it is, then the clusters listed are being passed to Slurm's -M option. If set to ["all"], this is equivalent to -M all and all clusters are queried.

Closes #3559.

@JimPaine Would you mind trying this PR with your setup?

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.72%. Comparing base (c9fa4c7) to head (a4d3090).

Files with missing lines Patch % Lines
reframe/core/schedulers/slurm.py 50.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3639      +/-   ##
===========================================
+ Coverage    91.64%   91.72%   +0.08%     
===========================================
  Files           62       62              
  Lines        13530    13537       +7     
===========================================
+ Hits         12399    12417      +18     
+ Misses        1131     1120      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JimPaine
Copy link
Contributor

JimPaine commented Mar 10, 2026

@vkarak I have pulled from your fork and can confirm it is polling the the correct cluster.

Something that I think could improve the user experience would be to include it against the sbatch command as well. Currently I need to set the cluster twice, once for submission and once for job polling.

Here is a snippet of my partitions for the test I ran, you can see that I currently need to set it in access and slurm_multi_cluster_mode to be able to run the test.

                {
                    'name': 'cluster1',
                    'scheduler': 'slurm',
                    'launcher': 'local',
                    'environs': ['slurm_multi_cluster_mode'],
                    'access': ['-M tst1'],
                    'sched_options': {
                        'slurm_multi_cluster_mode': ['cluster1']
                    }
                },
                {
                    'name': 'cluster2',
                    'scheduler': 'slurm',
                    'launcher': 'local',
                    'environs': ['slurm_multi_cluster_mode'],
                    'access': ['-M tst2'],
                    'sched_options': {
                        'slurm_multi_cluster_mode': ['cluster2']
                    }
                }

@vkarak
Copy link
Contributor Author

vkarak commented Mar 10, 2026

Something that I think could improve the user experience would be to include it against the sbatch command as well. Currently I need to set the cluster twice, once for submission and once for job polling.

Yes, that make sense! I'll update the PR, so that the access options take multi-cluster mode into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Slurm Scheduler doesn't support multi-cluster

2 participants