Demo-131 Cuda graph with optimized paged attention #205

PanZezhong1725 · 2026-01-27T03:31:18Z

No description provided.

Signed-off-by: Ceng23333 <441651826@qq.com>

* issue/204 - support graph in server scripts * issue/208 - adapt to ali ppu * issue/194 - add quantization modify configs accordingly 支持nv w8 1batch 1tp 增加json支持 InfiniLM 增加量化层和global config 以一种比较优雅的方式增加了quant config的支持修改部分代码结构，删除无用代码跟随inifnicore修改删除所有的model_config，统一使用global_config 跟随InfiniLM最新代码修改修改函数参数顺序改名global config 为model config Refactor: add new API alongside legacy interfaces with deprecation warnings 添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore 添加w4 inifnicore相关内容，以及将Quantization config划入InfiniCore * issue/175 - qy device support qy_page_131: add qy device success qy inference_server.py * Issue/170 - Add HYGON support and improve device type handling. * Issue/193: feats for deployment Signed-off-by: Ceng23333 <441651826@qq.com> * skip responding eos token Signed-off-by: Ceng23333 <441651826@qq.com> * issue/143 use add_rmsnorm, nt flash attn, nt kv caching * issue/204 - support graph in server scripts * issue/208 - adapt to ali ppu * rebase main * issue/216 feat: support static kv cache in server * fix llm server cache config * demo131 - resolve mishandled conflicts * demo131 - further adjust attn and caching logic * demo131 - resolve merge requirements --------- Signed-off-by: Ceng23333 <441651826@qq.com> Co-authored-by: wooway777 <wooway777@gmail.com> Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: gongchensu <zhuyue_134@qq.com> Co-authored-by: Ceng23333 <441651826@qq.com> Co-authored-by: PanZezhong <panzezhong@qiyuanlab.com> Co-authored-by: MaYuhang <2902139028@qq.com>

Issue/221 - resolve cambricon encode plus

issue/219: support vllm bench

…example/bench.py

Issue/226：add warmup before InfiniLM bench.py generation

PanZezhong1725 requested review from a team, Ceng23333, ma-hang, pengcheng888, voltjia, whjthu, wooway777 and zhangyue207 January 27, 2026 03:31

PanZezhong1725 force-pushed the demo131 branch from be5878b to 4340dff Compare January 30, 2026 05:50

Ceng23333 and others added 11 commits February 10, 2026 14:17

issue/219: support vllm bench

9e64b06

Signed-off-by: Ceng23333 <441651826@qq.com>

issue/143 feat: static and paged graph compilers

21274f3

issue/143 init input to support warmup

429f54c

issue/143 add barrier for compilers

2c925eb

issue/143 fix add compile after model init

b5a809a

issue/143 use add_rmsnorm, nt flash attn, nt kv caching

693d74d

issue/204 - support graph in server scripts

69f1876

issue/143 fix bench script, worker cleanup, compiler initial input

144ba49

issue/991 optimize input preparation

6cc680b

issue/208 - adapt to ali ppu

67e8d6e

issue/214 - update attn and caching logics

ee59b3f

wooway777 force-pushed the demo131 branch from 60d6545 to ee59b3f Compare February 10, 2026 10:26

qinyiqun and others added 6 commits February 11, 2026 11:33

issue/221 - support cambricon encode plus

60a00b7

Merge pull request #223 from InfiniTensor/issue/221

1a408e1

Issue/221 - resolve cambricon encode plus

Merge pull request #220 from InfiniTensor/issue/219

a940a96

issue/219: support vllm bench

issue/226 - feat: add --warmup flag and disable warmup by default in …

b59f768

…example/bench.py

Merge pull request #227 from InfiniTensor/issue/226

0879747

Issue/226：add warmup before InfiniLM bench.py generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo-131 Cuda graph with optimized paged attention #205

Demo-131 Cuda graph with optimized paged attention #205

Uh oh!

PanZezhong1725 commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Demo-131 Cuda graph with optimized paged attention #205

Are you sure you want to change the base?

Demo-131 Cuda graph with optimized paged attention #205

Uh oh!

Conversation

PanZezhong1725 commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants