issue/949: add silu_and_mul for moore gpu with test pass #970

spike-zhu · 2026-01-22T11:01:04Z

参考 vLLM kernel Silu_and_Mul 接口，基于摩尔 muDNN Swiglu 开发 Silu_and_Mul，并添加对应的 infiniop 和 infinicore 测试，并添加到 InfiniLM v0.2.0 中使用：

python 测试：

…graph recording - Ensure embedding tensors are on the same device. Change format. - Optimize embedding kernel with vectorized memory access and __ldg - Add vectorized memory access using float4/float2, half2, and bfloat162 - Use __ldg instruction for read-only weight and indices access - Add memory alignment checks to enable vectorized paths - Add __restrict__ keywords for better compiler optimization - Implement dynamic block size selection based on embedding_dim

对 `NineToothedTensor` 进行 C++ 层封装加入使用数组作为 `shape` 和 `strides` 创建 `ninetoothed::Tensor` 的方式使用 `ninetoothed::Tensor` 接入九齿的 ReLU 算子 Add an include guard to `ninetoothed/utils.h`

…oothed/build.py` with `concurrent.futures`

…ild ntops

…stantiate

issue/811 use relax graph capture mode

issue/988 - adapt to ali ppu

spike-zhu · 2026-02-10T06:49:40Z

该 PR 需先合并到 demo131 分支，然后再合并到主分支，当前已向 demo131 提交合并 PR：https://github.com/InfiniTensor/InfiniCore/pull/1009，故当前 PR 关闭。

spike-zhu self-assigned this Jan 22, 2026

spike-zhu requested a review from a team January 22, 2026 11:01

spike-zhu force-pushed the issue/949 branch from dfebc9e to dcea681 Compare January 22, 2026 11:07

spike-zhu requested review from PanZezhong1725 and whjthu January 22, 2026 11:08

wooway777 and others added 25 commits January 27, 2026 10:36

issue/987 - add .cpp files to ninetoothed includes

1e63710

issue/978 - metax cuda graph impl and wrappings

822a534

issue/900 - support embedding on iluvatar, metax, and moore

835209e

issue/900 - adapt to graph and adjust test script

eb34d4d

issue/900 - maintains classic embedding for devices yet to be worked on

f9761a2

issue/791 fix add_rmsnorm api and rmsnorm module

0c204df

issue/884 - add_rms_norm on iluvatar, metax and moore

dfafc21

issue/632 - adapt to iluvatar core 20

4ddc664

issue/791 - fix add_rmsnorm api on mtx and mth

0611cb1

issue/810 support more ops as graph op

81e5fe9

issue/985 - adjust cxflags and cxxflags for lua scripts

7c5aa16

issue/402 - convenient ninetoothed util

55cd22e

对 `NineToothedTensor` 进行 C++ 层封装加入使用数组作为 `shape` 和 `strides` 创建 `ninetoothed::Tensor` 的方式使用 `ninetoothed::Tensor` 接入九齿的 ReLU 算子 Add an include guard to `ninetoothed/utils.h`

issue/925 - Speed up scripts/build_ntops.py and `src/infiniop/ninet…

32340fc

…oothed/build.py` with `concurrent.futures`

issue/940 - check build result and implicitly require build.py for bu…

ca58118

…ild ntops

issue/935 - add metax include dir for ninetoothed

47843aa

issue/919 - ninetoothed flash attention

6ac8f90

issue/931 - ninetoothed swiglu for nv, il, mtx

5614e1b

issue/923 - ninetoothed kv caching for nv, il, mtx

97eced0

issue/979 optimize paged attention

1c18c04

issue/979 - removed commented paged attn codes

4cd1f68

issue/983 - adapted the optimized paged attention to metax

7a18d24

demo131 - patch lua flags and includes

1fa5629

issue/811 use relax graph capture mode, add compile flag for graph in…

807e5e4

…stantiate

Merge pull request #989 from InfiniTensor/issue/811-fix

70862bc

issue/811 use relax graph capture mode

zhangyue207 and others added 6 commits January 29, 2026 17:06

issue/995 fix paged attn on iluvatar

bf0c825

issue/988 - adapt to ali ppu

7e2a4c0

issue/988 - unlock unused operators on ali ppu

5558e85

issue/988 - update readme

e0268b2

Merge pull request #999 from InfiniTensor/issue/988

abab565

issue/988 - adapt to ali ppu

issue/949 - feat: add silu_and_mul for moore gpu with test pass

c066d73

spike-zhu force-pushed the issue/949 branch from dcea681 to c066d73 Compare February 10, 2026 06:32

spike-zhu closed this Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue/949: add silu_and_mul for moore gpu with test pass #970

issue/949: add silu_and_mul for moore gpu with test pass #970

Uh oh!

spike-zhu commented Jan 22, 2026 •

edited

Loading

Uh oh!

spike-zhu commented Feb 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

issue/949: add silu_and_mul for moore gpu with test pass #970

issue/949: add silu_and_mul for moore gpu with test pass #970

Uh oh!

Conversation

spike-zhu commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spike-zhu commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

spike-zhu commented Jan 22, 2026 •

edited

Loading

spike-zhu commented Feb 10, 2026 •

edited

Loading