EmbeddedLLM

All

63 repositories

litellm
Public
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Python
•
Other
•5.2k•0•0•1•Updated Jan 4, 2026Jan 4, 2026
vllm
Public
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
inference pytorch transformer gpt amdgpu rocm model-serving llm llm-inference
Python
•
Apache License 2.0
•12k•94•1•4•Updated Jan 2, 2026Jan 2, 2026
vllm-rocmfork
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•12k•1•0•0•Updated Jan 2, 2026Jan 2, 2026
aiter
Public
AI Tensor Engine for ROCm
Python
•
MIT License
•167•0•0•0•Updated Jan 2, 2026Jan 2, 2026
JamAIBase
Public
The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
python workflow ai serverless chatbot spreadsheet svelte orchestration baas agents
Python
•
Apache License 2.0
•37•1.1k•1•0•Updated Dec 31, 2025Dec 31, 2025
vllm-omni
Public
A high-throughput and memory efficient inference and serving engine for Omni-modality models
Python
•
Apache License 2.0
•251•0•0•0•Updated Dec 31, 2025Dec 31, 2025
vllm-rocm-wheel
Public
Python
•
Apache License 2.0
•0•0•0•5•Updated Dec 29, 2025Dec 29, 2025
fastsafetensors-rocm
Public
High-performance safetensors model loader
Python
•
Apache License 2.0
•16•0•0•2•Updated Dec 29, 2025Dec 29, 2025
lmms-eval
Public
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Python
•
Other
•467•0•0•0•Updated Dec 21, 2025Dec 21, 2025
vllmtests
Public
This is a repository containing the tools for testing vLLM correctness and perf regression
Python
•
Apache License 2.0
•2•2•0•0•Updated Dec 16, 2025Dec 16, 2025
vllm-wheel
Public
Python
•
Apache License 2.0
•0•0•0•5•Updated Dec 8, 2025Dec 8, 2025
inference-experiment
Public
Collect the scripts and results of all reasoning experiments.
Python
•
Apache License 2.0
•1•0•0•0•Updated Dec 7, 2025Dec 7, 2025
recipes
Public
Common recipes to run vLLM
Jupyter Notebook
•
Apache License 2.0
•114•0•0•0•Updated Nov 25, 2025Nov 25, 2025
vllm-rocm
Public
Python
•
Apache License 2.0
•0•1•0•0•Updated Nov 21, 2025Nov 21, 2025
vllm-project.github.io
Public
HTML
•60•0•0•0•Updated Oct 3, 2025Oct 3, 2025
mistral-evals
Public
Python
•14•0•0•0•Updated Sep 25, 2025Sep 25, 2025
LMCache
Public
ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
Python
•
Apache License 2.0
•833•0•0•0•Updated Jul 15, 2025Jul 15, 2025
roxl
Public
NVIDIA Inference Xfer Library (NIXL)
C++
•
Apache License 2.0
•210•0•0•0•Updated Jun 6, 2025Jun 6, 2025
aiter-api-watcher
Public
This is a repository to monitor the fast changing ROCm/aiter repository to alert user that AITER function of interests e.g. in vLLM, in SGLang has been updated at certain commit.
Python
•
Apache License 2.0
•0•0•39•0•Updated Apr 27, 2025Apr 27, 2025
vllmWorkshop
Public
vLLM Workshop Content
Apache License 2.0
•0•2•0•0•Updated Apr 3, 2025Apr 3, 2025
amd-gpu-workshops
Public
Jupyter Notebook
•5•0•0•0•Updated Mar 20, 2025Mar 20, 2025
jamaibase-ts-docs
Public
Typescript Documentation of JamAISDK
HTML
•0•0•0•0•Updated Mar 14, 2025Mar 14, 2025
git-version-tutorial
Public
Python
•
Apache License 2.0
•1•0•0•0•Updated Feb 24, 2025Feb 24, 2025
lmcache-vllm
Public
The driver for LMCache core to run in vLLM
Python
•
Apache License 2.0
•32•0•0•0•Updated Jan 24, 2025Jan 24, 2025
lmcache-tests
Public
Python
•8•0•0•0•Updated Jan 23, 2025Jan 23, 2025
production-stack
Public
Python
•
Apache License 2.0
•348•0•0•0•Updated Jan 22, 2025Jan 22, 2025
kvpress
Public
LLM KV cache compression made easy
Python
•
Apache License 2.0
•83•1•0•0•Updated Jan 21, 2025Jan 21, 2025
composable_kernel
Public
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
C++
•
Other
•258•0•0•0•Updated Dec 20, 2024Dec 20, 2024
Mooncake
Public
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
C++
•
Apache License 2.0
•493•0•0•0•Updated Dec 16, 2024Dec 16, 2024
torchac_rocm
Public
ROCm Implementation of torchac_cuda from LMCache
Cuda
•6•0•0•0•Updated Dec 16, 2024Dec 16, 2024