flashinfer

Here are 6 public repositories matching this topic...

Bruce-Lee-LY / decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

gpu cuda inference nvidia mha mla multi-head-attention gqa mqa llm large-language-model flash-attention cuda-core decoding-attention flashinfer flashmla

Updated Jun 11, 2025
C++

sgl-project / whl

Star

SGLang Kernel Wheel Index

cuda cutlass sglang flashinfer

Updated Apr 21, 2026
HTML

lna-lab / blackwell-geforce-nvfp4-gemm

Star

NVFP4 inference on Blackwell GeForce (RTX 5090/5080/5070 Ti/RTX PRO 6000) — SM120 patches for vLLM + FlashInfer + CUTLASS. 175 tok/s on Qwen3.6-35B MoE.

gpu-computing quantization cutlass gemm geforce blackwell vllm llm-inference flashinfer rtx-5090 sm120 nvfp4

Updated Apr 24, 2026

kantkrishan0206-crypto / gen-image3.0

Star

a powerful, large-scale, multimodal model for Text-to-Image generation.

python image deep-learning transformers pytorch text-to-image mixture-of-experts huggingface ai-model ai-models autoregressive-models flash-attention multimodal-ai flashinfer visual-generation

Updated Oct 3, 2025
Python

kamalrss88 / FlashMLA

Star

🚀 Accelerate attention mechanisms with FlashMLA, featuring optimized kernels for DeepSeek models, enhancing performance through sparse and dense attention.

windows gpu cuda inference nvidia nvidia-cuda mla multi-head-attention mqa llm flash-attention cuda-core decoding-attention deepseek flashinfer flashmla

Updated Apr 24, 2026
C++

aymanelrody / FlashMLA

Star

⚡ Optimize attention mechanisms with FlashMLA, a library of advanced sparse and dense kernels for DeepSeek models, improving performance and efficiency.

windows gpu cuda inference nvidia mha mla multi-head-attention gqa mqa llm flash-attention cuda-core decoding-attention deepseek flashinfer flashmla

Updated Apr 24, 2026
C++

Improve this page

Add a description, image, and links to the flashinfer topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flashinfer topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flashinfer

Here are 6 public repositories matching this topic...

Bruce-Lee-LY / decoding_attention

sgl-project / whl

lna-lab / blackwell-geforce-nvfp4-gemm

kantkrishan0206-crypto / gen-image3.0

kamalrss88 / FlashMLA

aymanelrody / FlashMLA

Improve this page

Add this topic to your repo