Skip to content

feat: Add knowledge distillation example with offline support#1654

Open
tourzhao wants to merge 3 commits intoTHUDM:mainfrom
tourzhao:feat/knowledge-distillation
Open

feat: Add knowledge distillation example with offline support#1654
tourzhao wants to merge 3 commits intoTHUDM:mainfrom
tourzhao:feat/knowledge-distillation

Conversation

@tourzhao
Copy link
Copy Markdown

@tourzhao tourzhao commented Mar 2, 2026

Summary

  • Add knowledge distillation (KD) example supporting online KD via external teacher server and offline KD from pre-saved teacher data (JSONL)
  • Implement Top-K forward KL and sampled KL loss
  • Tested with Qwen3-4B (teacher) -> Qwen3-1.7B (student)

Files

  • examples/knowledge_distillation/__init__.py
  • examples/knowledge_distillation/knowledge_distillation.py
  • examples/knowledge_distillation/offline_kd.py
  • examples/knowledge_distillation/kd_loss.py
  • examples/knowledge_distillation/run-qwen3-1.7B-kd.sh
  • examples/knowledge_distillation/run-qwen3-1.7B-offline-kd.sh
  • examples/knowledge_distillation/README.md

Test plan

  • Verify online KD with Qwen3-4B teacher and Qwen3-1.7B student
  • Verify offline KD from pre-saved JSONL data
  • Confirm KD loss computation (Top-K forward KL / sampled KL)

🤖 Generated with Claude Code

tourzhao and others added 3 commits March 2, 2026 01:10
Add knowledge distillation (KD) example supporting online KD via
external teacher server and offline KD from pre-saved teacher data
(JSONL). Implements Top-K forward KL and sampled KL loss. Tested
with Qwen3-4B (teacher) -> Qwen3-1.7B (student).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add strict=False to zip() calls (ruff B905)
- Apply black formatting fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant