Skip to content

Conversation

@jackyYang6
Copy link
Contributor

Motivation

This PR fixes multiple issues in PaddleFormers fallback path that caused startup/runtime failures or semantic degradation on different environments (local/QA) and weight formats (torch/paddle):

  1. Attention input layout detection was too strict in TP scenarios (local heads vs global heads), causing Invalid attention layout.
  2. Fused QKV loading needed strict model_format-based behavior for transpose/writeback to avoid silent semantic corruption.
  3. Fused QKV bias handling needed explicit fusion path consistency.
  4. Prefix alias resolution needed to be robust for different checkpoint key prefixes (e.g. qwen2.* vs model.*).

Modifications

1) fastdeploy/model_executor/models/paddleformers/base.py

  • Updated fastdeploy_append_attention_forward:

    • Added TP-safe heads matching logic (local_heads compatible with global_heads via divisibility).
    • Kept strict layout validation and clear error messages.
    • Ensured Q/K/V flatten sequence-length consistency checks remain enforced.
  • Updated fallback load_weights:

    • Kept checkpoint prefix alias auto-collection and robust resolve_param_name.
    • Kept strict fused QKV policy based on model_format (torch / paddle) and explicit layout checks.
    • Preserved fused QKV weight fusion behavior and fused bias path.
    • Preserved tie-word-embeddings post handling.

2) tests/model_executor/test_paddleformers_base.py

  • Added/updated tests for:
    • attention layout conversion correctness under multiple input layouts.
    • TP local-heads layout compatibility.
    • fused QKV strict-policy behaviors and related edge cases.
  • Fixed test mocks so num_heads / num_key_value_heads are explicit and stable.

Usage or Command

Unit tests

python -m pytest -q tests/model_executor/test_paddleformers_base.py

Service startup verification (example)

python -m fastdeploy.entrypoints.openai.api_server \
  --model /workspace/models/Qwen/Qwen2.5-7B-Instruct \
  --model-impl paddleformers \
  --tensor_parallel_size 1

TP case (example)

python -m fastdeploy.entrypoints.openai.api_server \
  --model /workspace/models/Qwen/Qwen2.5-7B-Instruct \
  --model-impl paddleformers \
  --tensor_parallel_size 2

Accuracy Tests

This PR affects model forward/weight loading path. Accuracy checks were run on local and QA environments:

  • Qwen2/Qwen2.5 fallback startup: pass
  • Fused QKV loading:
    • torch-format weights: pass
    • paddle-format weights: pass
  • TP layout path (local heads vs global heads): pass (no Invalid attention layout)
  • Generation sanity (same prompt):
    • Before: occasional garbled/irrelevant responses on affected paths
    • After: coherent responses restored in tested scenarios

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests.
  • Provide accuracy results.
  • This PR targets develop branch (not a release cherry-pick PR).

@paddle-bot
Copy link

paddle-bot bot commented Feb 11, 2026

Thanks for your contribution!

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 70.13889% with 43 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e40fb16). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...deploy/model_executor/models/paddleformers/base.py 71.32% 22 Missing and 17 partials ⚠️
fastdeploy/input/ernie4_5_processor.py 50.00% 2 Missing ⚠️
fastdeploy/input/text_processor.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6465   +/-   ##
==========================================
  Coverage           ?   69.40%           
==========================================
  Files              ?      391           
  Lines              ?    52785           
  Branches           ?     8221           
==========================================
  Hits               ?    36633           
  Misses             ?    13482           
  Partials           ?     2670           
Flag Coverage Δ
GPU 69.40% <70.13%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants