Skip to content

[Bug] Loading an embeddings model alongside a non-embeddings model attempts to allocate extra VRAM even though embeddings is entirely on CPU #2069

@RobbieNeko

Description

@RobbieNeko

Describe the Issue
When attempting to load an embeddings model (Qwen3 0.6B at q8 to be exact) alongside my normal text-generation model via the terminal, I received a cudaMalloc error on ROCm due to it attempting to allocate 4964.22 MiB onto the GPU despite the fact that the embeddings model was supposed to be entirely on the CPU. When attempting the same on Vulkan, it also attempts the erroneous allocation, but due to Vulkan being okay with overflowing the VRAM it only raised a warning rather than an error, and was thus usable. The warning had no effect on use, as the processing and generation speed were both perfectly normal when generating text and the embeddings model appeared to be behaving appropriately as well.

Additional Information:
I am on Fedora KDE Plasma 43 (Linux).
My system has a Ryzen 5 5500, an RX 9060 XT 16GB, and 32 GB of DDR4 (Of which plenty was available for use at the time)
The error is reproducible on the latest KoboldCPP nocuda release and the rolling ROCm binary I downloaded today.
My bash scripts are as follows:
Vulkan

#!/bin/bash
./KoboldCPP/koboldcpp-linux-x64-nocuda --model ./KoboldCPP/Models/Maginum-Cydoms-24B.i1-IQ4_XS.gguf --usevulkan --contextsize 16384 --gpulayers 41 --flashattention --quantkv 1 --smartcache 2 --embeddingsmodel ./KoboldCPP/Models/Qwen3-Embedding-0.6B-Q8_0.gguf --embeddingsmaxctx 8192

ROCm

#!/bin/bash
./KoboldCPP/koboldcpp-linux-x64-rocm --model ./KoboldCPP/Models/Maginum-Cydoms-24B.i1-IQ4_XS.gguf --usehipblas mmq --contextsize 16384 --gpulayers 41 --flashattention --quantkv 1 --smartcache 2 --embeddingsmodel ./KoboldCPP/Models/Qwen3-Embedding-0.6B-Q8_0.gguf --embeddingsmaxctx 8192

Here is the full terminal output from both of the scripts above:

EmbeddingsFailureRocm.txt
EmbeddingsWarningVulkan.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions