Skip to content

Conversation

@spike-zhu
Copy link
Contributor

@spike-zhu spike-zhu commented Jan 22, 2026

参考 vLLM kernel Silu_and_Mul 接口,基于摩尔 muDNN Swiglu 开发 Silu_and_Mul,并添加对应的 infiniop 和 infinicore 测试,并添加到 InfiniLM v0.2.0 中使用:

python 测试:
image

image

@spike-zhu spike-zhu self-assigned this Jan 22, 2026
@spike-zhu spike-zhu requested a review from a team January 22, 2026 11:01
wooway777 and others added 25 commits January 27, 2026 10:36
…graph recording

- Ensure embedding tensors are on the same device. Change format.
- Optimize embedding kernel with vectorized memory access and __ldg
- Add vectorized memory access using float4/float2, half2, and bfloat162
- Use __ldg instruction for read-only weight and indices access
- Add memory alignment checks to enable vectorized paths
- Add __restrict__ keywords for better compiler optimization
- Implement dynamic block size selection based on embedding_dim
对 `NineToothedTensor` 进行 C++ 层封装

加入使用数组作为 `shape` 和 `strides` 创建 `ninetoothed::Tensor` 的方式

使用 `ninetoothed::Tensor` 接入九齿的 ReLU 算子

Add an include guard to `ninetoothed/utils.h`
issue/811 use relax graph capture mode
@spike-zhu
Copy link
Contributor Author

spike-zhu commented Feb 10, 2026

该 PR 需先合并到 demo131 分支,然后再合并到主分支,当前已向 demo131 提交合并 PR:https://github.com/InfiniTensor/InfiniCore/pull/1009, 故当前 PR 关闭。

@spike-zhu spike-zhu closed this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants