[vllm] Fully enable mp distributed executor backend in vLLM

Currently in #1300, there is a restriction when using the mp backend that 

```bash
    inference_engine_size = dp_size * tp_size * pp_size
    num_gpus_per_node = cfg.trainer.placement.policy_num_gpus_per_node
    if inference_engine_size > num_gpus_per_node and ie_cfg.distributed_executor_backend == "mp":
        raise ValueError(
            "Each inference engine must fit within a single node with the vLLM mp backend. Use the ray backend for per engine multi-node serving instead."
        )
```

This can be lifted by adding logic to create multiple placement groups per engine (creating one ray actor for every `min(inference_engine_size, num_gpus_per_node)` gpus, and setting `master_address` and `headless` as needed.

We also have the following restriction: 
```bash
    if cfg.generator.inference_engine.distributed_executor_backend == "mp":
        raise ValueError(
            "the mp backend for vLLM is not yet fully supported for the new inference backend. See https://github.com/NovaSky-AI/SkyRL/issues/1309. Use the ray backend instead."
        )
```

Colocated mode + the mp backend is blocking on #1291 for enabling colocated + mp backed on the new inference stack, and non-colocation has flaky issues with the new native weight syncing APIs. See below for details. cc: @hao-aaron 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vllm] Fully enable mp distributed executor backend in vLLM #1309

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[vllm] Fully enable mp distributed executor backend in vLLM #1309

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions