-
Notifications
You must be signed in to change notification settings - Fork 699
[BugFix]fix RL bug about blockwisefp8 #6466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
zoooo0820
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fastdeploy/model_executor/layers/moe/fused_moe_deepgemm_backend.py 中的groupgemm 也一起改下吧
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6466 +/- ##
==========================================
Coverage ? 69.36%
==========================================
Files ? 391
Lines ? 52797
Branches ? 8222
==========================================
Hits ? 36622
Misses ? 13511
Partials ? 2664
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
为兼容不同的包结构环境,将依赖从“父包属性”迁移到“完整模块路径”,使导入逻辑在各种环境下都能稳定工作。
‘fp8_gemm_nt = fastdeploy.model_executor.layers.quantization.fp8_utils.deep_gemm.fp8_gemm_nt’
is not import,this PR add ’from fastdeploy.model_executor.layers.quantization.fp8_utils import deep_gemm‘
Modifications
fix RL bug about blockwisefp8
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.