[TE] Enabling UB Transport on the Kunpeng SuperNode Phase 2#1855
[TE] Enabling UB Transport on the Kunpeng SuperNode Phase 2#1855stmatengss merged 13 commits intokvcache-ai:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for Kunpeng UB (Unified Bus) transport using URMA (Unified Remote Memory Access), including a mock implementation for testing and updated topology discovery logic. Feedback identifies a compilation error in topology.cpp caused by a misplaced header, redundant initialization in rdma_transport_test2.cpp, and documentation inaccuracies. It is also suggested to clean up debug messages in the CMake configuration to improve build output.
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@alogfans @stmatengss ub trasport feature phase2 pr has been finshed, kindly help review and verify.🙏 |
| } | ||
|
|
||
| int UrmaContext::openDevice(const std::string& device_name, uint8_t port, | ||
| int UrmaContext::openDevice(const std::string& device_name, uint8_t por, |
There was a problem hiding this comment.
oh,This might be a mistake made during CI testing coding,would fix it in the next phase?
There was a problem hiding this comment.
Please fix this typo. I think it will influence the correctness of this method.
There was a problem hiding this comment.
Please fix this typo. I think it will influence the correctness of this method.
okay, I have fixed this typo. @stmatengss
| int Topology::discover(const std::vector<std::string> &filter) { | ||
| matrix_.clear(); | ||
| #ifdef USE_UB | ||
| auto all_hca = listUBDevices(filter); | ||
| for (auto &ent : discoverCpuTopology(all_hca)) { | ||
| matrix_[ent.name] = ent; | ||
| } | ||
| #else | ||
| auto all_hca = listInfiniBandDevices(filter); | ||
| for (auto &ent : discoverCpuTopology(all_hca)) { | ||
| matrix_[ent.name] = ent; | ||
| } | ||
| #endif | ||
| #if defined(USE_CUDA) || defined(USE_MUSA) || defined(USE_HIP) || \ | ||
| defined(USE_MLU) || defined(USE_MACA) | ||
| for (auto &ent : discoverCudaTopology(all_hca)) { | ||
| matrix_[ent.name] = ent; | ||
| } | ||
| #endif | ||
| #ifdef USE_UB | ||
| auto ub_all_hca = listUBDevices(filter); | ||
| for (auto &ent : discoverCpuTopology(ub_all_hca)) { | ||
| matrix_[ent.name] = ent; | ||
| } |
There was a problem hiding this comment.
Consider refactoring this code. What's the difference between CUDA/MUSA/HIP/MLU and UB?
There was a problem hiding this comment.
listUBDevices(filter) return UbDevice vector but listInfiniBandDevices(filter) is InfinibandDevice vector, CUDA/MUSA/HIP/MLU method discoverCudaTopology(hca) current using the InfinibandDevice vector, I wonder if we could have a unified device class for this? @stmatengss
There was a problem hiding this comment.
Sure. We need a unified device class for different vendors. It can be the next PR
There was a problem hiding this comment.
Okay, I'll design it here and submit the reference code in the next PR.
Description
Description
Overview
This PR introduces UbTransport, a new transport implementation for Mooncake's Transfer Engine that enables high-performance data transfer on Kunpeng 950 platforms using the UB (Unified Bus) protocol and URMA (Unified Remote Memory Access) APIs from the openEuler UMDK.
Related Issue: #1773
Phase Contribution Plan
Following suggestion, we propose four phased PRs for incremental review and integration:
New Feature
This Phase2 branch has introduced the following new features compared to the main branch:
Module
mooncake-transfer-engine)mooncake-store)mooncake-ep)mooncake-integration)mooncake-p2p-store)mooncake-wheel)mooncake-pg)mooncake-rl)Type of Change
How Has This Been Tested?
Checklist
./scripts/code_format.shbefore submitting.