Skip to content

[黑客松10th·文心伙伴] 周报 #16 valorix25 2026.05.09~2026.05.14#610

Open
valorix25 wants to merge 3 commits into
PFCCLab:mainfrom
valorix25:main
Open

[黑客松10th·文心伙伴] 周报 #16 valorix25 2026.05.09~2026.05.14#610
valorix25 wants to merge 3 commits into
PFCCLab:mainfrom
valorix25:main

Conversation

@valorix25
Copy link
Copy Markdown

Summary

  • Benchmark 环境修复与 Baseline 建立:解决 /dev/shm 溢出 SIGBUS、OpenMP 库冲突,建立 Metax C500 Baseline
  • P0: PaddleX Layout 模型迁移到 Metax GPU:显式传递 device="metax_gpu:0",吞吐量 +30.4%
  • P3: Routing Prefix Sum 优化:binary search O(N×logM) → atomic counting + CUB exclusive sum O(N+M)
  • P4: SwiGLU In-place Fusion:自定义 CUDA kernel,VecSize=8 向量化
  • P5: RoPE + KV Cache Shared Memory 优化:代码完成,待编译验证

优化效果汇总

指标 Baseline P0 P0+P3+P4+P2 总变化
吞吐量 (文件) 0.217 files/sec 0.283 files/sec 0.285 files/sec +31.3%
平均批次延迟 73.72s 56.6s 56.2s -23.8%

交付物

交付物 状态 备注
RFC 文档 ✅ 已完成 已提交 RFC 文档至厂商邮箱
代码实现 🔄 进行中 P0/P2/P3/P4 已验证,P5 待编译验证
README ✅ 已完成 PROGRESS.md 记录完整
演示视频/截图 ⬜ 未开始 -

valorix25 and others added 3 commits May 13, 2026 11:13
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ions

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant