[BugFix][Models] Fix PaddleFormers fallback attention layout and fused QKV loading in mixed weight formats #6465

jackyYang6 · 2026-02-11T10:42:04Z

Motivation

This PR fixes multiple issues in PaddleFormers fallback path that caused startup/runtime failures or semantic degradation on different environments (local/QA) and weight formats (torch/paddle):

Attention input layout detection was too strict in TP scenarios (local heads vs global heads), causing Invalid attention layout.
Fused QKV loading needed strict model_format-based behavior for transpose/writeback to avoid silent semantic corruption.
Fused QKV bias handling needed explicit fusion path consistency.
Prefix alias resolution needed to be robust for different checkpoint key prefixes (e.g. qwen2.* vs model.*).

Modifications

1) `fastdeploy/model_executor/models/paddleformers/base.py`

Updated fastdeploy_append_attention_forward:
- Added TP-safe heads matching logic (local_heads compatible with global_heads via divisibility).
- Kept strict layout validation and clear error messages.
- Ensured Q/K/V flatten sequence-length consistency checks remain enforced.
Updated fallback load_weights:
- Kept checkpoint prefix alias auto-collection and robust resolve_param_name.
- Kept strict fused QKV policy based on model_format (torch / paddle) and explicit layout checks.
- Preserved fused QKV weight fusion behavior and fused bias path.
- Preserved tie-word-embeddings post handling.

2) `tests/model_executor/test_paddleformers_base.py`

Added/updated tests for:
- attention layout conversion correctness under multiple input layouts.
- TP local-heads layout compatibility.
- fused QKV strict-policy behaviors and related edge cases.
Fixed test mocks so num_heads / num_key_value_heads are explicit and stable.

Usage or Command

Unit tests

python -m pytest -q tests/model_executor/test_paddleformers_base.py

Service startup verification (example)

python -m fastdeploy.entrypoints.openai.api_server \
  --model /workspace/models/Qwen/Qwen2.5-7B-Instruct \
  --model-impl paddleformers \
  --tensor_parallel_size 1

TP case (example)

python -m fastdeploy.entrypoints.openai.api_server \
  --model /workspace/models/Qwen/Qwen2.5-7B-Instruct \
  --model-impl paddleformers \
  --tensor_parallel_size 2

Accuracy Tests

This PR affects model forward/weight loading path. Accuracy checks were run on local and QA environments:

Qwen2/Qwen2.5 fallback startup: pass
Fused QKV loading:
- torch-format weights: pass
- paddle-format weights: pass
TP layout path (local heads vs global heads): pass (no Invalid attention layout)
Generation sanity (same prompt):
- Before: occasional garbled/irrelevant responses on affected paths
- After: coherent responses restored in tested scenarios

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests.
Provide accuracy results.
This PR targets develop branch (not a release cherry-pick PR).

paddle-bot · 2026-02-11T10:42:11Z

Thanks for your contribution!

codecov-commenter · 2026-02-11T12:43:05Z

Codecov Report

❌ Patch coverage is 70.13889% with 43 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e40fb16). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...deploy/model_executor/models/paddleformers/base.py	71.32%	22 Missing and 17 partials ⚠️
fastdeploy/input/ernie4_5_processor.py	50.00%	2 Missing ⚠️
fastdeploy/input/text_processor.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6465   +/-   ##
==========================================
  Coverage           ?   69.40%           
==========================================
  Files              ?      391           
  Lines              ?    52785           
  Branches           ?     8221           
==========================================
  Hits               ?    36633           
  Misses             ?    13482           
  Partials           ?     2670

Flag	Coverage Δ
GPU	`69.40% <70.13%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fix paddleformers fallback

e29abf4

jackyYang6 temporarily deployed to Metax_ci February 11, 2026 10:42 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][Models] Fix PaddleFormers fallback attention layout and fused QKV loading in mixed weight formats #6465

[BugFix][Models] Fix PaddleFormers fallback attention layout and fused QKV loading in mixed weight formats #6465

jackyYang6 commented Feb 11, 2026

Uh oh!

paddle-bot bot commented Feb 11, 2026

Uh oh!

codecov-commenter commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BugFix][Models] Fix PaddleFormers fallback attention layout and fused QKV loading in mixed weight formats #6465

Are you sure you want to change the base?

[BugFix][Models] Fix PaddleFormers fallback attention layout and fused QKV loading in mixed weight formats #6465

Conversation

jackyYang6 commented Feb 11, 2026

Motivation

Modifications

1) fastdeploy/model_executor/models/paddleformers/base.py

2) tests/model_executor/test_paddleformers_base.py

Usage or Command

Unit tests

Service startup verification (example)

TP case (example)

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Feb 11, 2026

Uh oh!

codecov-commenter commented Feb 11, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1) `fastdeploy/model_executor/models/paddleformers/base.py`

2) `tests/model_executor/test_paddleformers_base.py`