test: intermittent failures from vllm tests on LSF cluster

I'm seeing intermittent failures from vllm tests on lsf cluster when run with
```
uv run --all-extras --all-groups pytest --isolate-heavy -v
```

For example:
```
==== 723 passed, 142 skipped, 2 xfailed, 90 warnings in 1572.83s (0:26:12) =====
```

when all worked well, and

```
FAILED test/backends/test_openai_vllm.py::test_instruct - openai.NotFoundErro...
FAILED test/backends/test_openai_vllm.py::test_multiturn - openai.NotFoundErr...
FAILED test/backends/test_openai_vllm.py::test_chat - openai.NotFoundError: E...
FAILED test/backends/test_openai_vllm.py::test_chat_stream - openai.NotFoundE...
FAILED test/backends/test_openai_vllm.py::test_format - openai.NotFoundError:...
FAILED test/backends/test_openai_vllm.py::test_generate_from_raw - openai.Not...
FAILED test/backends/test_openai_vllm.py::test_generate_from_raw_with_format
= 7 failed, 716 passed, 142 skipped, 2 xfailed, 90 warnings in 1409.38s (0:23:29) =
```

at other times.

Success seems about 50-75% failure from running multiple times

On further investigation the underlying error for all these cases is:
```
E               openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model `ibm-granite/granite-4.0-micro` does not exist.', 'type': 'NotFoundError', 'param': None, 'code': 404}}
```

Question to persue -- How is the vllm server initialized when tests are run with uv on a GPU enabled cluster - clearly sometimes we get access to a vllm environment with the right model, othertimes we don't 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: intermittent failures from vllm tests on LSF cluster #699

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

test: intermittent failures from vllm tests on LSF cluster #699

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions