Fix BnB quantization in vLLM by ItzikVa · Pull Request #53 · generative-computing/granite-switch

ItzikVa · 2026-05-21T06:22:16Z

BitsAndBytes 4-bit quantization packs weights as uint8 with shape [total_elements//2, 1], which breaks the existing weight.shape-based dimension detection in SwitchedLoRALinear.init().

Fix:

Prefer input_size_per_partition / output_size_per_partition attributes
Fall back to weight.shape only for non-parallel layers
Add dtype guard: if weight dtype is non-floating-point (uint8 for BnB), default to bfloat16 for LoRA buffer allocation

Also adds vLLM quantization tests (BnB INT4 + FP8) that verify:

Base model weights are actually quantized
LoRA/aLoRA weights remain in full precision
Adapters activate correctly under quantization
LoRA dimensions are not corrupted by packed weight shapes

BitsAndBytes 4-bit quantization packs weights as uint8 with shape [total_elements//2, 1], which breaks the existing weight.shape-based dimension detection in SwitchedLoRALinear.__init__(). Fix: - Prefer input_size_per_partition / output_size_per_partition attributes (always correct, regardless of weight packing format) - Fall back to weight.shape only for non-parallel layers - Add dtype guard: if weight dtype is non-floating-point (uint8 for BnB), default to bfloat16 for LoRA buffer allocation Also adds vLLM quantization tests (BnB INT4 + FP8) that verify: - Base model weights are actually quantized - LoRA/aLoRA weights remain in full precision - Adapters activate correctly under quantization - LoRA dimensions are not corrupted by packed weight shapes Closes #16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BnB quantization in vLLM#53

Fix BnB quantization in vLLM#53
ItzikVa wants to merge 1 commit into
devfrom
issue-16

ItzikVa commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ItzikVa commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant