Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob

## 🚀 News

* [2026-01] [[Release Notes]](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 released: upgraded verl to v0.7.0, Tinker backend supports OpenAI API, bug fixes.
* [2026-01] Introducing [R3L](https://github.com/shiweijiezero/R3L): a systematic reflect-then-retry RL mechanism with efficient language-guided exploration and stable off-policy learning ([paper](https://arxiv.org/abs/2601.03715)).
* [2025-12] [[Release Notes]](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more.
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
Expand Down
1 change: 1 addition & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:

## 🚀 新闻

* [2026-01] [[发布说明]](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 发布:升级 verl 至 v0.7.0,Tinker 后端支持 OpenAI API,修复若干 Bug。
* [2026-01] 推出 [R3L](https://github.com/shiweijiezero/R3L):基于反思-重试的强化学习机制,由自然语言反馈引导高效探索,并达成稳定的 off-policy 学习([论文](https://arxiv.org/abs/2601.03715))。
* [2025-12] [[发布说明]](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 发布:新增[Tinker](https://thinkingmachines.ai/tinker/) 后端以支持在 **无 GPU** 的设备上训练,增加更多基准测试,增强在线 RL 等功能。
* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端,可在**无 GPU 的设备**上进行模型训练。
Expand Down
5 changes: 3 additions & 2 deletions docs/sphinx_doc/source/tutorial/trinity_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ algorithm:
repeat_times: 8
optimizer:
lr: 1e-6
warmup_style: "warmup"
lr_scheduler_type: "constant"
# The following parameters are optional
# If not specified, they will automatically be set based on the `algorithm_type`
sample_strategy: "default"
Expand All @@ -111,7 +111,8 @@ algorithm:
- `repeat_times`: Number of times each task is repeated. Default is `1`. In `dpo`, this is automatically set to `2`. Some algorithms such as GRPO and OPMD require `repeat_times` > 1.
- `optimizer`: Optimizer configuration for actor.
- `lr`: Learning rate for actor.
- `warmup_style`: Warmup style for actor's learning rate.
- `warmup_style`: Deprecated, use `lr_scheduler_type` instead. We will remove this field in future versions.
- `lr_scheduler_type`: Learning rate scheduler type for actor model. Default is `constant`. Supported types: `constant`, `consine`.
- `sample_strategy`: The sampling strategy used for loading experiences from experience buffer. Supported types: `default`, `staleness_control`, `mix`.
- `advantage_fn`: The advantage function used for computing advantages.
- `kl_penalty_fn`: The KL penalty function used for computing KL penalty applied in reward.
Expand Down
5 changes: 3 additions & 2 deletions docs/sphinx_doc/source_zh/tutorial/trinity_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ algorithm:
repeat_times: 8
optimizer:
lr: 1e-6
warmup_style: constant
lr_scheduler_type: constant
# 以下参数为可选
# 若未指定,将根据 `algorithm_type` 自动设置
sample_strategy: "default"
Expand All @@ -111,7 +111,8 @@ algorithm:
- `repeat_times`: 每个任务重复的次数。默认为 `1`。在 `dpo` 中自动设为 `2`。某些算法如 GRPO 和 OPMD 要求 `repeat_times` > 1。
- `optimizer`: Actor 优化器的参数。
- `lr`: 优化器的学习率。
- `warmup_style`: 学习率的预热策略。
- `warmup_style`:已弃用,请改用 `lr_scheduler_type`。该域将会在未来版本中移除。
- `lr_scheduler_type`:Actor 模型的学习率调度器类型。默认值为 `constant`。支持类型:`constant`、`cosine`。
- `sample_strategy`: 从 experience buffer 加载 experience 时使用的采样策略。支持类型:`default`、`staleness_control`、`mix`。
- `advantage_fn`: 用于计算优势值的函数。
- `kl_penalty_fn`: 用于在奖励中计算 KL 惩罚的函数。
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "trinity-rft"
version = "0.4.0"
version = "0.4.1"
authors = [
{name="Trinity-RFT Team", email="trinity-rft@outlook.com"},
]
Expand Down Expand Up @@ -88,7 +88,7 @@ tinker = [
]

doc = [
"sphinx",
"sphinx<9.0.0",
"sphinx-autobuild",
"sphinx-book-theme",
"myst-parser",
Expand Down
66 changes: 65 additions & 1 deletion tests/common/vllm_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1270,7 +1270,6 @@ def setUp(self):
self.config.explorer.rollout_model.chat_template = CHAT_TEMPLATE
self.config.explorer.rollout_model.enable_openai_api = True
self.config.explorer.rollout_model.enable_lora = True
self.config.explorer.rollout_model.enable_runtime_lora_updating = True

self.config.check_and_update()
self.engines, self.auxiliary_engines = create_inference_models(self.config)
Expand Down Expand Up @@ -1345,3 +1344,68 @@ async def test_tinker_api(self):
self.assertEqual(response.sequences[0].stop_reason, "length")
self.assertEqual(len(prompt.to_ints()), len(response.prompt_logprobs))
self.assertIsNone(response.topk_prompt_logprobs)

# test add remove lora
from vllm.lora.request import LoRARequest

# create a dummy lora adapter with all zero weights
lora_path_1 = os.path.join(self.config.checkpoint_job_dir, "adapter_1")
lora_path_2 = os.path.join(self.config.checkpoint_job_dir, "adapter_2")
_create_adapter(self.config.model.model_path, lora_path_1, "adapter_1")
_create_adapter(self.config.model.model_path, lora_path_2, "adapter_2")
lora_1 = LoRARequest(
lora_name="test_adapter_1",
lora_int_id=1,
lora_path=os.path.join(lora_path_1, "adapter_1"),
)
lora_2 = LoRARequest(
lora_name="test_adapter_2",
lora_int_id=2,
lora_path=os.path.join(lora_path_2, "adapter_2"),
)
response = await engine.sample.remote(
prompt=prompt,
num_samples=1,
sampling_params=types.SamplingParams(max_tokens=1),
include_prompt_logprobs=True,
lora_request=lora_1,
)
ids = await engine.list_lora_adapters.remote()
self.assertEqual(ids, [1])
self.assertEqual(len(response.sequences), 1)
self.assertEqual(response.sequences[0].stop_reason, "length")
self.assertEqual(len(prompt.to_ints()), len(response.prompt_logprobs))
self.assertIsNone(response.topk_prompt_logprobs)
response = await engine.sample.remote(
prompt=prompt,
num_samples=1,
sampling_params=types.SamplingParams(max_tokens=1),
include_prompt_logprobs=True,
lora_request=lora_2,
)
self.assertEqual(len(response.sequences), 1)
self.assertEqual(response.sequences[0].stop_reason, "length")
self.assertEqual(len(prompt.to_ints()), len(response.prompt_logprobs))
self.assertIsNone(response.topk_prompt_logprobs)
await engine.remove_lora_adapter.remote(lora_id=1)
await engine.remove_lora_adapter.remote(lora_id=2)
ids = await engine.list_lora_adapters.remote()
self.assertEqual(ids, [])


def _create_adapter(model_path: str, lora_path: str, name: str):
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="cpu",
)
lora_config = LoraConfig(
r=8,
lora_alpha=8,
target_modules=["gate_proj", "up_proj", "down_proj"],
lora_dropout=0.1,
)
lora_model = get_peft_model(model, lora_config, adapter_name=name)
lora_model.save_pretrained(lora_path)
2 changes: 1 addition & 1 deletion trinity/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
"""Trinity-RFT (Reinforcement Fine-Tuning)"""

__version__ = "0.4.0"
__version__ = "0.4.1"
29 changes: 29 additions & 0 deletions trinity/common/models/vllm_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,35 @@ async def logprobs( # type: ignore [override]
dtype=torch.float32,
)

async def add_lora_adapter(self, lora_request: Any) -> int:
"""Add a LoRA adapter to the vLLM engine.

Args:
lora_request (LoRARequest): The LoRA request.

Returns:
lora_id (int): The LoRA adapter ID.
"""
lora_id = await self.async_llm.add_lora(lora_request)
return lora_id

async def remove_lora_adapter(self, lora_id: int) -> None:
"""Remove a LoRA adapter from the vLLM engine.

Args:
lora_id (int): The LoRA adapter ID.
"""
await self.async_llm.remove_lora(lora_id)

async def list_lora_adapters(self) -> Sequence[int]:
"""List all LoRA adapter IDs in the vLLM engine.

Returns:
lora_ids (List[int]): The list of LoRA adapter IDs.
"""
lora_ids = await self.async_llm.list_loras()
return list(lora_ids)

async def sample(
self,
prompt: Any,
Expand Down