Skip to content

[vllm] Fully enable mp distributed executor backend in vLLM #1309

@erictang000

Description

@erictang000

Currently in #1300, there is a restriction when using the mp backend that

    inference_engine_size = dp_size * tp_size * pp_size
    num_gpus_per_node = cfg.trainer.placement.policy_num_gpus_per_node
    if inference_engine_size > num_gpus_per_node and ie_cfg.distributed_executor_backend == "mp":
        raise ValueError(
            "Each inference engine must fit within a single node with the vLLM mp backend. Use the ray backend for per engine multi-node serving instead."
        )

This can be lifted by adding logic to create multiple placement groups per engine (creating one ray actor for every min(inference_engine_size, num_gpus_per_node) gpus, and setting master_address and headless as needed.

We also have the following restriction:

    if cfg.generator.inference_engine.distributed_executor_backend == "mp":
        raise ValueError(
            "the mp backend for vLLM is not yet fully supported for the new inference backend. See https://github.com/NovaSky-AI/SkyRL/issues/1309. Use the ray backend instead."
        )

Colocated mode + the mp backend is blocking on #1291 for enabling colocated + mp backed on the new inference stack, and non-colocation has flaky issues with the new native weight syncing APIs. See below for details. cc: @hao-aaron

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions