Skip to content
Change the repository type filter

All

    Repositories list

    • litellm

      Public
      Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
      Python
      5.2k001Updated Jan 4, 2026Jan 4, 2026
    • vllm

      Public
      vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      12k9414Updated Jan 2, 2026Jan 2, 2026
    • A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      12k100Updated Jan 2, 2026Jan 2, 2026
    • aiter

      Public
      AI Tensor Engine for ROCm
      Python
      167000Updated Jan 2, 2026Jan 2, 2026
    • JamAIBase

      Public
      The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
      Python
      371.1k10Updated Dec 31, 2025Dec 31, 2025
    • vllm-omni

      Public
      A high-throughput and memory efficient inference and serving engine for Omni-modality models
      Python
      251000Updated Dec 31, 2025Dec 31, 2025
    • Python
      0005Updated Dec 29, 2025Dec 29, 2025
    • High-performance safetensors model loader
      Python
      16002Updated Dec 29, 2025Dec 29, 2025
    • lmms-eval

      Public
      One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
      Python
      467000Updated Dec 21, 2025Dec 21, 2025
    • vllmtests

      Public
      This is a repository containing the tools for testing vLLM correctness and perf regression
      Python
      2200Updated Dec 16, 2025Dec 16, 2025
    • Python
      0005Updated Dec 8, 2025Dec 8, 2025
    • Collect the scripts and results of all reasoning experiments.
      Python
      1000Updated Dec 7, 2025Dec 7, 2025
    • recipes

      Public
      Common recipes to run vLLM
      Jupyter Notebook
      114000Updated Nov 25, 2025Nov 25, 2025
    • vllm-rocm

      Public
      Python
      0100Updated Nov 21, 2025Nov 21, 2025
    • HTML
      60000Updated Oct 3, 2025Oct 3, 2025
    • Python
      14000Updated Sep 25, 2025Sep 25, 2025
    • LMCache

      Public
      ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
      Python
      833000Updated Jul 15, 2025Jul 15, 2025
    • roxl

      Public
      NVIDIA Inference Xfer Library (NIXL)
      C++
      210000Updated Jun 6, 2025Jun 6, 2025
    • This is a repository to monitor the fast changing ROCm/aiter repository to alert user that AITER function of interests e.g. in vLLM, in SGLang has been updated at certain commit.
      Python
      00390Updated Apr 27, 2025Apr 27, 2025
    • vLLM Workshop Content
      0200Updated Apr 3, 2025Apr 3, 2025
    • Jupyter Notebook
      5000Updated Mar 20, 2025Mar 20, 2025
    • Typescript Documentation of JamAISDK
      HTML
      0000Updated Mar 14, 2025Mar 14, 2025
    • Python
      1000Updated Feb 24, 2025Feb 24, 2025
    • The driver for LMCache core to run in vLLM
      Python
      32000Updated Jan 24, 2025Jan 24, 2025
    • Python
      8000Updated Jan 23, 2025Jan 23, 2025
    • Python
      348000Updated Jan 22, 2025Jan 22, 2025
    • kvpress

      Public
      LLM KV cache compression made easy
      Python
      83100Updated Jan 21, 2025Jan 21, 2025
    • Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
      C++
      258000Updated Dec 20, 2024Dec 20, 2024
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      493000Updated Dec 16, 2024Dec 16, 2024
    • ROCm Implementation of torchac_cuda from LMCache
      Cuda
      6000Updated Dec 16, 2024Dec 16, 2024