cpu-inference

Here are 71 public repositories matching this topic...

kennethleungty / Llama-2-Open-Source-LLM-CPU-Inference

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Updated Nov 6, 2023
Python

brontoguana / krasis

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware

transformer inference-engine inference-optimization mixture-of-experts cpu-inference large-language-models gpu-inference llm-inference high-performance-inference hybrid-inference gguf-model-support llama-cpp-alternative

Updated Apr 18, 2026
C++

CoderLSF / fast-llama

Star

Runs LLaMA with Extremely HIGH speed

llama inference-engine cpu-inference llama2

Updated Nov 21, 2023
C++

facex-engine / facex

Star

Face verification in the browser. 74 KB WebAssembly. No server, no cloud, no dependencies. Also runs native at 3ms on CPU.

computer-vision deep-learning c99 simd biometrics face-recognition low-latency avx2 production-ready face-verification face-embedding avx-512 zero-dependencies embedded-ai edge-ai onnx-runtime cpu-inference

Updated Apr 27, 2026
C

rbitr / llm.f90

Star

LLM inference in Fortran

ai chatbot transformer llama language-model mamba state-space-model cpu-inference llm llamacpp llama2 phi-2

Updated May 30, 2024
Fortran

FoxNoseTech / diarize

Star

Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache 2.0. ~10.8% DER on VoxConverse, 8x faster than real-time.

python audio-analysis speech-to-text speaker-recognition speech-processing speaker-diarization spectral-clustering voice-activity-detection onnx speaker-embedding diarization apache-2 rttm cpu-inference meeting-transcription who-spoke-when

Updated Mar 6, 2026
Python

gabriele-mastrapasqua / qwen3-tts

Star

Pure C inference engine for Qwen3-TTS text-to-speech. No Python, no PyTorch — just C and BLAS. Supports 0.6B and 1.7B models, 9 voices, 10 languages.

multilingual c text-to-speech inference simd tts speech-synthesis pure-c inference-engine voicecloning cpu-inference voiceclone qwen qwen3-tts

Updated Apr 19, 2026
C

lucienhuangfu / eLLM

Star

eLLM can infer LLM on CPUs faster than on GPUs

inference transformer moe llama minimax cpu-inference qwen llm-infernece rust-llm

Updated Apr 27, 2026
Rust

gyunggyung / Tiny-MoA

Star

Running Mixture of Agents on CPU: LFM2.5 Brain (1.2B) + Falcon-R Reasoner (600M) + Tool Caller (90M). CPU-only, 16GB RAM. Lightweight AI Legion.

multilingual lightweight falcon agents moa uv on-device-ai cpu-inference llm llama-cpp mixture-of-agents tool-calling lfm2

Updated Feb 7, 2026
Python

PureBee / purebee

Star

A GPU defined in software. Runs Llama 3.2 1B at 3.6 tok/sec. Zero dependencies.

javascript gpu webassembly inference llama cpu-inference llm

Updated Feb 27, 2026
JavaScript

jozsefszalma / homelab

Star

The bare metal in my basement

Updated Dec 4, 2025

Scottcjn / pse-vcipher-collapse

Star

Non-bijunctive attention collapse for LLM inference — POWER8 hardware AES (vcipher) + AltiVec vec_perm. Hebbian path selection, cross-head diffusion, O(1) KV prefiltering.

machine-learning deep-learning aes inference transformer attention-mechanism powerpc hardware-acceleration altivec hebbian-learning power8 cpu-inference llm llama-cpp non-bijunctive

Updated Mar 31, 2026
C

Minimal, zero-dependency LLM inference in pure C11. CPU-first with NEON/AVX2 SIMD. Flash MoE (pread + LRU expert cache). TurboQuant 3-bit KV compression (8.9x less memory per session). 20+ GGUF quant formats. Compiles to WASM.

c neon wasm inference simd moe avx2 quantization kv-cache cpu-inference llm gguf turboquant

Updated Mar 28, 2026
C

yybit / pllm

Star

Portable LLM - A rust library for LLM inference

cpu-inference aigc llm llama2

Updated Apr 13, 2024
Rust

JohnClaw / chatllm.v

Star

V-lang api wrapper for llm-inference chatllm.cpp

chatbot inference bindings api-wrapper llama quantization gemma mistral v-lang vlang cpu-inference llm llms chatllm ggml llm-inference qwen phi3

Updated Nov 20, 2024
C

laelhalawani / gguf_llama

Star

Wrapper for simplified use of Llama2 GGUF quantized models.

llama quantization cpu-inference llamacpp llama2 gguf

Updated Jan 14, 2024
Python

grctest / fastapi-gemma-translate

Star

A FastAPI server for querying Google's Gemma Translate AI models for translations

docker google cuda translate gemma ai-api fastapi cpu-inference gpu-inference gemmatranslate

Updated Apr 26, 2026
Python

HaseebKhalid1507 / VelociRAG

Star

Lightning-fast RAG for AI agents. ONNX-powered, 4-layer fusion, MCP server. No PyTorch.

Updated Apr 5, 2026
Python

Nishant1998 / PlantAi

Star

PlantAi is a ResNet-based CNN model trained on the PlantVillage dataset to classify plant leaf images as healthy or diseased. This repository includes PyTorch training code, tools to convert the model to TensorFlow Lite (TFLite) for deployment, and an Android app integrating the model for real-time leaf disease detection from camera images.

android java deep-neural-networks computer-vision deep-learning cnn image-classification resnet onnx pytoch cpu-inference tflight real-time-inference agriculture-ai

Updated Aug 21, 2025
Java

MekayelAnik / vllm-cpu

Star

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

cpu-inference vllm llm-inference vllm-serve vllm-server

Updated Apr 19, 2026
Shell

Improve this page

Add a description, image, and links to the cpu-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cpu-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu-inference

Here are 71 public repositories matching this topic...

kennethleungty / Llama-2-Open-Source-LLM-CPU-Inference

brontoguana / krasis

CoderLSF / fast-llama

facex-engine / facex

rbitr / llm.f90

FoxNoseTech / diarize

gabriele-mastrapasqua / qwen3-tts

lucienhuangfu / eLLM

gyunggyung / Tiny-MoA

PureBee / purebee

jozsefszalma / homelab

Scottcjn / pse-vcipher-collapse

artalis-io / bitnet.c

yybit / pllm

JohnClaw / chatllm.v

laelhalawani / gguf_llama

grctest / fastapi-gemma-translate

HaseebKhalid1507 / VelociRAG

Nishant1998 / PlantAi

MekayelAnik / vllm-cpu

Improve this page

Add this topic to your repo