Skip to content

[TE] Enabling UB Transport on the Kunpeng SuperNode Phase 2#1855

Merged
stmatengss merged 13 commits intokvcache-ai:mainfrom
zchuango:ub_transport_dev_phase2
Apr 14, 2026
Merged

[TE] Enabling UB Transport on the Kunpeng SuperNode Phase 2#1855
stmatengss merged 13 commits intokvcache-ai:mainfrom
zchuango:ub_transport_dev_phase2

Conversation

@zchuango
Copy link
Copy Markdown
Contributor

@zchuango zchuango commented Apr 9, 2026

Description

Description

Overview

This PR introduces UbTransport, a new transport implementation for Mooncake's Transfer Engine that enables high-performance data transfer on Kunpeng 950 platforms using the UB (Unified Bus) protocol and URMA (Unified Remote Memory Access) APIs from the openEuler UMDK.

Related Issue: #1773

Phase Contribution Plan

Following suggestion, we propose four phased PRs for incremental review and integration:

  • Phase 1: Enable UbTransport + UrmaEndpoint Phase 1 PR
  • Phase 2: CI/Testing Infrastructure With Kunpeng UB Hardware Mock Env ⬅️ Current PR
  • Phase 3: Enable UbTransport + OBMM Feature
  • Phase 4: Next TE Tent Module Integration with UbTransport

New Feature

This Phase2 branch has introduced the following new features compared to the main branch:

  1. Mock URMA Implementation : Provided a mock URMA implementation ( mock_urma.cpp ) for testing purposes, allowing tests to run without actual URMA hardware.
  2. Unit Tests : Added comprehensive test cases in ub_transport_test.cpp to verify the functionality of the new transport module.
  3. Build System Integration : Updated CMake configurations to integrate the new Kunpeng transport module into the existing build system.

Module

  • Transfer Engine (mooncake-transfer-engine)
  • Mooncake Store (mooncake-store)
  • Mooncake EP (mooncake-ep)
  • Integration (mooncake-integration)
  • P2P Store (mooncake-p2p-store)
  • Python Wheel (mooncake-wheel)
  • PyTorch Backend (mooncake-pg)
  • Mooncake RL (mooncake-rl)
  • CI/CD
  • Docs
  • Other

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Breaking change
  • Documentation update
  • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have formatted my own code using ./scripts/code_format.sh before submitting.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Kunpeng UB (Unified Bus) transport using URMA (Unified Remote Memory Access), including a mock implementation for testing and updated topology discovery logic. Feedback identifies a compilation error in topology.cpp caused by a misplaced header, redundant initialization in rdma_transport_test2.cpp, and documentation inaccuracies. It is also suggested to clean up debug messages in the CMake configuration to improve build output.

Comment thread mooncake-transfer-engine/src/topology.cpp
Comment thread docs/source/design/transfer-engine/kunpeng_ub_transport.md
Comment thread mooncake-common/FindUrma.cmake
Comment thread mooncake-common/FindUrma.cmake
Comment thread mooncake-transfer-engine/tests/rdma_transport_test2.cpp
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 82.48175% with 72 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...src/transport/kunpeng_transport/urma/mock_urma.cpp 79.77% 54 Missing ⚠️
...oncake-transfer-engine/tests/ub_transport_test.cpp 92.07% 8 Missing ⚠️
...transport/kunpeng_transport/urma/urma_endpoint.cpp 82.14% 5 Missing ⚠️
...e/src/transport/kunpeng_transport/ub_transport.cpp 55.55% 4 Missing ⚠️
...ake-transfer-engine/tests/rdma_transport_test2.cpp 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread .github/workflows/ci.yml
@zchuango
Copy link
Copy Markdown
Contributor Author

zchuango commented Apr 10, 2026

@alogfans @stmatengss ub trasport feature phase2 pr has been finshed, kindly help review and verify.🙏

@zchuango zchuango requested a review from alogfans April 11, 2026 07:50
}

int UrmaContext::openDevice(const std::string& device_name, uint8_t port,
int UrmaContext::openDevice(const std::string& device_name, uint8_t por,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

por is a typo?

Copy link
Copy Markdown
Contributor Author

@zchuango zchuango Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh,This might be a mistake made during CI testing coding,would fix it in the next phase?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this typo. I think it will influence the correctness of this method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this typo. I think it will influence the correctness of this method.

okay, I have fixed this typo. @stmatengss

Comment on lines 473 to +489
int Topology::discover(const std::vector<std::string> &filter) {
matrix_.clear();
#ifdef USE_UB
auto all_hca = listUBDevices(filter);
for (auto &ent : discoverCpuTopology(all_hca)) {
matrix_[ent.name] = ent;
}
#else
auto all_hca = listInfiniBandDevices(filter);
for (auto &ent : discoverCpuTopology(all_hca)) {
matrix_[ent.name] = ent;
}
#endif
#if defined(USE_CUDA) || defined(USE_MUSA) || defined(USE_HIP) || \
defined(USE_MLU) || defined(USE_MACA)
for (auto &ent : discoverCudaTopology(all_hca)) {
matrix_[ent.name] = ent;
}
#endif
#ifdef USE_UB
auto ub_all_hca = listUBDevices(filter);
for (auto &ent : discoverCpuTopology(ub_all_hca)) {
matrix_[ent.name] = ent;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring this code. What's the difference between CUDA/MUSA/HIP/MLU and UB?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listUBDevices(filter) return UbDevice vector but listInfiniBandDevices(filter) is InfinibandDevice vector, CUDA/MUSA/HIP/MLU method discoverCudaTopology(hca) current using the InfinibandDevice vector, I wonder if we could have a unified device class for this? @stmatengss

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We need a unified device class for different vendors. It can be the next PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll design it here and submit the reference code in the next PR.

Copy link
Copy Markdown
Collaborator

@alogfans alogfans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zchuango zchuango requested a review from stmatengss April 14, 2026 08:04
@stmatengss stmatengss merged commit 3a69fa4 into kvcache-ai:main Apr 14, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants