Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR refactors the ESM-2 TE model hierarchy from Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Export as export.py
participant Convert as convert.py
participant HFModel as HF ESM2 Model
participant TEModel as TE ESM2 Model
participant StateMgmt as State Management
User->>Export: export_hf_checkpoint(tag)
Export->>HFModel: Load HF checkpoint
HFModel-->>Export: model + config
Export->>Convert: convert_esm_hf_to_te(model_hf)
Convert->>TEModel: Initialize TE model (empty)
Convert->>StateMgmt: extract state dicts
StateMgmt->>StateMgmt: Apply QKV packing transforms
StateMgmt->>StateMgmt: Apply embedding padding transforms
StateMgmt->>StateMgmt: Apply FC/bias transforms
StateMgmt->>TEModel: Load transformed state
TEModel-->>Convert: Converted model
Convert-->>Export: TE model ready
Export->>Export: Save checkpoint + config
Export->>Export: Smoke test load in bf16
Export-->>User: Exported checkpoint path
sequenceDiagram
participant Client as vLLM Client
participant VLLM as vLLM Engine
participant TEModel as TE ESM2 Model
participant Embeddings as Embeddings Layer
participant Encoder as TE Encoder
Client->>VLLM: Create LLM instance (pooling mode)
VLLM->>TEModel: Load checkpoint
VLLM->>TEModel: Initialize model
Client->>VLLM: embed(sequences)
VLLM->>TEModel: forward(input_ids, attention_mask)
TEModel->>Embeddings: Get token embeddings
Embeddings-->>TEModel: Embedded tokens
TEModel->>Encoder: Apply TE encoder layers
Encoder-->>TEModel: Encoded hidden states
TEModel->>TEModel: Extract last token + L2 normalize
TEModel-->>VLLM: Pooled embeddings
VLLM-->>Client: Normalized embeddings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required. |
|
@gagank1 : Are you working on a readme? |
|
/ok to test |
@gagank1, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test |
@gagank1, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test c34c09b |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (3)
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern.See comment on
modeling_esm_te.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, The class variable _tied_weights_keys uses the modern built-in generic dict[...] typing which may be incompatible with the rest of the codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is imported) or use typing.Mapping if immutability is desired, mirroring the fix applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str, str]] with the same key/value entries retained.bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern as inmodeling_esm_te.py.See comment on the canonical file — HF expects
list[str], notdict[str, str].🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, The _tied_weights_keys typed as ClassVar[dict[str, str]] should be changed to ClassVar[list[str]] to match HF expectations (same fix as in modeling_esm_te.py); replace the dict literal with a list of the relevant parameter names (e.g. ["lm_head.decoder.weight", "model.embeddings.word_embeddings.weight"]) and update any usages that assume dict semantics to use the list order or explicit pairing where needed.bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py (1)
405-407: Same_tied_weights_keysdict type concern.See comment on
modeling_esm_te.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py` around lines 405 - 407, _tied_weights_keys is annotated as a mutable dict ClassVar which repeats the same typing concern as in modeling_esm_te.py; change the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]] from typing) and, to avoid accidental mutation, assign a read-only view (e.g., types.MappingProxyType({"lm_head.decoder.weight": "model.embeddings.word_embeddings.weight"})); update the import list to include typing.Mapping and types if not present and mirror the same pattern used/fixed in modeling_esm_te.py.
🧹 Nitpick comments (4)
bionemo-recipes/vllm/launch.sh (1)
50-50:exec $DOCKER_CMDis unquoted — word splitting will break paths with spaces.If
PROJECT_ROOTcontains spaces (e.g.,/home/user/my projects/...), the-vargument will be incorrectly split into multiple tokens. Use a Bash array to avoid this:🔧 Proposed fix (array-based approach)
Replace the string-based
DOCKER_CMDwith an array throughout the script:-DOCKER_CMD="docker run -itd ..." +DOCKER_CMD=("docker" "run" "-itd" "--gpus" "all" "--network" "host" "--ipc=host" "-e" "HF_TOKEN" "--rm" "--name" "${CONTAINER}_dev") if [ "$MOUNT_DIR" = true ]; then PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)" - DOCKER_CMD="$DOCKER_CMD -v ${PROJECT_ROOT}:/workspace/bionemo-framework" + DOCKER_CMD+=("-v" "${PROJECT_ROOT}:/workspace/bionemo-framework") fi -DOCKER_CMD="$DOCKER_CMD $CONTAINER /bin/bash" +DOCKER_CMD+=("$CONTAINER" "/bin/bash") -exec $DOCKER_CMD +exec "${DOCKER_CMD[@]}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/launch.sh` at line 50, The exec $DOCKER_CMD call uses an unquoted string which allows word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in -v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker run ...)) and update all places that build/append to DOCKER_CMD so they push elements into the array, then replace exec $DOCKER_CMD with exec "${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is preserved; update any helper code that concatenates DOCKER_CMD to use array operations instead.bionemo-recipes/vllm/Dockerfile (2)
2-3: Base image is hosted on an internal NVIDIA GitLab registry — not pullable outside NVIDIA.
gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-develrequires internal network/credentials access. The commented-outnvcr.io/nvidia/vllm:26.01-py3alternative on line 1 is the publicly accessible equivalent. Once an NGC release with vLLM ≥ 0.14 becomes available, switching to the public image will make this recipe usable by external contributors without additional setup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/Dockerfile` around lines 2 - 3, The Dockerfile's FROM line currently uses an internal image tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is not pullable externally; replace that base image with the public equivalent "nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the FROM instruction so external contributors can build without internal credentials, and retain a short comment noting the vLLM>=0.14 requirement and to switch back when an official public image with the needed version is available.
30-30: Pin thetransformer_engineversion for reproducible builds.
pip install --no-build-isolation transformer_engine[pytorch]with no version specifier will install whichever version is latest at build time. TE releases frequently and has had breaking API changes between major versions (e.g., 1.x → 2.x). A silent version bump can break the integration without any change to this file.🔧 Proposed fix
-RUN pip install --no-build-isolation transformer_engine[pytorch] +RUN pip install --no-build-isolation "transformer_engine[pytorch]==<verified_version>"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/Dockerfile` at line 30, The Dockerfile currently installs transformer_engine without a version pin (the RUN pip install --no-build-isolation transformer_engine[pytorch] line); change that to install a specific, tested TE release by updating that RUN to include an exact version specifier (for example: RUN pip install --no-build-isolation transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and won’t break on upstream major/minor bumps; optionally add a short comment noting the chosen compatible version.bionemo-recipes/vllm/test_esm2_golden_values.py (1)
46-63:sys.path.insert+os.chdiris fragile for test infrastructure.
sys.path.insert(0, ...)at module level (line 46) andos.chdirinsidefresh_exportmake this script sensitive to working directory and import order. This is acceptable for a standalone validation script run manually inside a container, but consider adding a note that this is not designed to run as part of a pytest suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 46 - 63, The test module mutates import paths and working directory (sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile for pytest; update the file to document this by adding a clear module-level comment or docstring near sys.path.insert and a brief note on fresh_export explaining it intentionally changes cwd for export_hf_checkpoint and that the script is not intended to be run under pytest/parallel test runners (referencing sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively guard execution with a main-check so pytest won't import/run it implicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 1-36: Add a non-root user and switch to it in the Dockerfile so
the container does not run as root; create a user/group (e.g., bionemo), chown
the application directory (/workspace/bionemo) and any cache/venv paths to that
user, and add a final USER bionemo line. Locate the Dockerfile sections around
WORKDIR /workspace/bionemo and COPY . . and insert user creation
(useradd/addgroup or groupadd) and chown before switching context, then add USER
bionemo at the end so subsequent runtime processes run unprivileged.
In `@bionemo-recipes/vllm/launch.sh`:
- Around line 36-40: The script hardcodes "--name vllm_dev" in DOCKER_CMD which
prevents multiple instances and misrepresents the positional $CONTAINER
argument; update the usage comment to clarify that the positional argument is
the image name (or image_name) and change the DOCKER_CMD assignments in the
HEADLESS branch to use a container name derived from $CONTAINER (for example
--name "$CONTAINER" or --name "${CONTAINER}_dev") instead of "vllm_dev" so the
image argument is also used as the container name and avoids name collisions.
In `@bionemo-recipes/vllm/README.md`:
- Around line 21-23: Change the fenced code block that currently uses the wrong
language tag; locate the block containing the shell command "python
test_esm2_golden_values.py" in the README and replace the opening fence language
identifier from ```python to ```bash so the command is treated as a shell
snippet and highlighted correctly.
- Line 17: The README's "or use `launch.sh`" is misleading because `launch.sh
--mount_dir` mounts the repo to `/workspace/bionemo-framework` while the manual
`docker run` mounts to `/workspace/bionemo`, causing `python
test_esm2_golden_values.py` to fail when run from WORKDIR; fix by either (A)
update `launch.sh` to mount the project root into `/workspace/bionemo` instead
of `/workspace/bionemo-framework` (adjust mount target and any downstream path
references in launch.sh), or (B) update README.md to explicitly document that
`launch.sh --mount_dir` mounts at `/workspace/bionemo-framework` and instruct
users to cd into the correct subdirectory (where `test_esm2_golden_values.py`
lives) before running the test; mention the affected files `launch.sh`,
`README.md`, and `test_esm2_golden_values.py` so reviewers can locate the
changes.
In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 147-210: The script currently only prints comparisons and never
fails; add assertions that enforce the golden-value tolerances using RTOL and
ATOL: for each pair in pairs (refer to the pairs list and variables a, b),
assert np.allclose(a, b, rtol=RTOL, atol=ATOL) (or equivalently assert
(np.abs(a-b) <= ATOL + RTOL * np.abs(b)).all()) and fail the test if not, and
also assert cosine_sim(a, b) exceeds a sensible threshold or that exact is True
when ATOL/RTOL are zero; add per-sequence assertions inside the per-sequence
loop to ensure each sequence max-diff <= ATOL + RTOL * max(|b_i|) so the test
fails on unacceptable drift.
---
Duplicate comments:
In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.
In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.
In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The class variable _tied_weights_keys uses the modern
built-in generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.
---
Nitpick comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 2-3: The Dockerfile's FROM line currently uses an internal image
tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is
not pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.
- Line 30: The Dockerfile currently installs transformer_engine without a
version pin (the RUN pip install --no-build-isolation
transformer_engine[pytorch] line); change that to install a specific, tested TE
release by updating that RUN to include an exact version specifier (for example:
RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.
In `@bionemo-recipes/vllm/launch.sh`:
- Line 50: The exec $DOCKER_CMD call uses an unquoted string which allows
word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in
-v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker
run ...)) and update all places that build/append to DOCKER_CMD so they push
elements into the array, then replace exec $DOCKER_CMD with exec
"${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is
preserved; update any helper code that concatenates DOCKER_CMD to use array
operations instead.
In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 46-63: The test module mutates import paths and working directory
(sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile
for pytest; update the file to document this by adding a clear module-level
comment or docstring near sys.path.insert and a brief note on fresh_export
explaining it intentionally changes cwd for export_hf_checkpoint and that the
script is not intended to be run under pytest/parallel test runners (referencing
sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively
guard execution with a main-check so pytest won't import/run it implicitly.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
bionemo-recipes/models/esm2/convert.pybionemo-recipes/models/esm2/export.pybionemo-recipes/models/esm2/modeling_esm_te.pybionemo-recipes/models/esm2/tests/test_cp_bshd.pybionemo-recipes/models/esm2/tests/test_cp_thd.pybionemo-recipes/models/esm2/tests/test_distributed_fp8.pybionemo-recipes/models/esm2/tests/test_distributed_strategies.pybionemo-recipes/models/esm2/tests/test_modeling_esm_te.pybionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.pybionemo-recipes/recipes/esm2_native_te/train_ddp.pybionemo-recipes/recipes/esm2_native_te/train_ddp_cp.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.pybionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/vllm/Dockerfilebionemo-recipes/vllm/README.mdbionemo-recipes/vllm/launch.shbionemo-recipes/vllm/test_esm2_golden_values.py
bionemo-recipes/vllm/Dockerfile
Outdated
| # FROM nvcr.io/nvidia/vllm:26.01-py3 | ||
| FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel | ||
| # using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet. | ||
|
|
||
| # The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.) | ||
| # Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine | ||
| RUN apt-get update && apt-get install -y --no-install-recommends wget && \ | ||
| wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \ | ||
| dpkg -i cuda-keyring_1.1-1_all.deb && \ | ||
| rm cuda-keyring_1.1-1_all.deb && \ | ||
| apt-get update && apt-get install -y --no-install-recommends \ | ||
| cuda-nvtx-13-1 \ | ||
| cuda-cupti-dev-13-1 \ | ||
| cuda-nvml-dev-13-1 \ | ||
| libcusparse-dev-13-1 \ | ||
| libcusolver-dev-13-1 \ | ||
| libcufft-dev-13-1 \ | ||
| libnvjitlink-dev-13-1 \ | ||
| libnvjpeg-dev-13-1 \ | ||
| libcublasmp0-dev-cuda-13 \ | ||
| libcudnn9-cuda-13 \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Install remaining dependencies | ||
| RUN --mount=type=cache,target=/root/.cache/pip \ | ||
| --mount=type=bind,source=requirements.txt,target=/requirements.txt \ | ||
| pip install -r /requirements.txt | ||
|
|
||
| # Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel) | ||
| RUN pip install --no-build-isolation transformer_engine[pytorch] | ||
|
|
||
| RUN pip install transformers[torch]==5.0.0 | ||
|
|
||
|
|
||
| WORKDIR /workspace/bionemo | ||
| COPY . . |
There was a problem hiding this comment.
Container runs as root — add a USER directive.
No USER command is present, so all processes run as root inside the container. This is flagged by Trivy (DS-0002) and violates the principle of least privilege.
🔒 Proposed fix
WORKDIR /workspace/bionemo
COPY . .
+
+RUN useradd -m -u 1000 appuser && chown -R appuser /workspace/bionemo
+USER appuser📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # FROM nvcr.io/nvidia/vllm:26.01-py3 | |
| FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel | |
| # using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet. | |
| # The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.) | |
| # Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine | |
| RUN apt-get update && apt-get install -y --no-install-recommends wget && \ | |
| wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \ | |
| dpkg -i cuda-keyring_1.1-1_all.deb && \ | |
| rm cuda-keyring_1.1-1_all.deb && \ | |
| apt-get update && apt-get install -y --no-install-recommends \ | |
| cuda-nvtx-13-1 \ | |
| cuda-cupti-dev-13-1 \ | |
| cuda-nvml-dev-13-1 \ | |
| libcusparse-dev-13-1 \ | |
| libcusolver-dev-13-1 \ | |
| libcufft-dev-13-1 \ | |
| libnvjitlink-dev-13-1 \ | |
| libnvjpeg-dev-13-1 \ | |
| libcublasmp0-dev-cuda-13 \ | |
| libcudnn9-cuda-13 \ | |
| && rm -rf /var/lib/apt/lists/* | |
| # Install remaining dependencies | |
| RUN --mount=type=cache,target=/root/.cache/pip \ | |
| --mount=type=bind,source=requirements.txt,target=/requirements.txt \ | |
| pip install -r /requirements.txt | |
| # Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel) | |
| RUN pip install --no-build-isolation transformer_engine[pytorch] | |
| RUN pip install transformers[torch]==5.0.0 | |
| WORKDIR /workspace/bionemo | |
| COPY . . | |
| # FROM nvcr.io/nvidia/vllm:26.01-py3 | |
| FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel | |
| # using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet. | |
| # The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.) | |
| # Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine | |
| RUN apt-get update && apt-get install -y --no-install-recommends wget && \ | |
| wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \ | |
| dpkg -i cuda-keyring_1.1-1_all.deb && \ | |
| rm cuda-keyring_1.1-1_all.deb && \ | |
| apt-get update && apt-get install -y --no-install-recommends \ | |
| cuda-nvtx-13-1 \ | |
| cuda-cupti-dev-13-1 \ | |
| cuda-nvml-dev-13-1 \ | |
| libcusparse-dev-13-1 \ | |
| libcusolver-dev-13-1 \ | |
| libcufft-dev-13-1 \ | |
| libnvjitlink-dev-13-1 \ | |
| libnvjpeg-dev-13-1 \ | |
| libcublasmp0-dev-cuda-13 \ | |
| libcudnn9-cuda-13 \ | |
| && rm -rf /var/lib/apt/lists/* | |
| # Install remaining dependencies | |
| RUN --mount=type=cache,target=/root/.cache/pip \ | |
| --mount=type=bind,source=requirements.txt,target=/requirements.txt \ | |
| pip install -r /requirements.txt | |
| # Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel) | |
| RUN pip install --no-build-isolation transformer_engine[pytorch] | |
| RUN pip install transformers[torch]==5.0.0 | |
| WORKDIR /workspace/bionemo | |
| COPY . . | |
| RUN useradd -m -u 1000 appuser && chown -R appuser /workspace/bionemo | |
| USER appuser |
🧰 Tools
🪛 Trivy (0.69.1)
[error] 1-1: Image user should not be 'root'
Specify at least 1 USER command in Dockerfile with non-root user as argument
Rule: DS-0002
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/vllm/Dockerfile` around lines 1 - 36, Add a non-root user and
switch to it in the Dockerfile so the container does not run as root; create a
user/group (e.g., bionemo), chown the application directory (/workspace/bionemo)
and any cache/venv paths to that user, and add a final USER bionemo line. Locate
the Dockerfile sections around WORKDIR /workspace/bionemo and COPY . . and
insert user creation (useradd/addgroup or groupadd) and chown before switching
context, then add USER bionemo at the end so subsequent runtime processes run
unprivileged.
bionemo-recipes/vllm/launch.sh
Outdated
| if [ "$HEADLESS" = true ]; then | ||
| DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev" | ||
| else | ||
| DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev" | ||
| fi |
There was a problem hiding this comment.
Hardcoded --name vllm_dev prevents running multiple containers and shadows the $CONTAINER argument's meaning.
The script's positional argument ($CONTAINER) is the Docker image name, but the running container name is always vllm_dev. This has two problems:
- Starting a second instance (e.g., a second headless container) will fail with
docker: Error response from daemon: Conflict. The container name "/vllm_dev" is already in use. - The usage comment says
<container_name>but it's really<image_name>, which is confusing.
Consider using $CONTAINER as both the image name and the container name (or derive the container name from it):
🔧 Proposed fix
-DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
+DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"-DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
+DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"Also update the usage comment on line 4:
-# Usage: ./launch.sh <container_name> [--mount_dir] [--headless]
+# Usage: ./launch.sh <image_name> [--mount_dir] [--headless]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ "$HEADLESS" = true ]; then | |
| DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev" | |
| else | |
| DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev" | |
| fi | |
| if [ "$HEADLESS" = true ]; then | |
| DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev" | |
| else | |
| DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/vllm/launch.sh` around lines 36 - 40, The script hardcodes
"--name vllm_dev" in DOCKER_CMD which prevents multiple instances and
misrepresents the positional $CONTAINER argument; update the usage comment to
clarify that the positional argument is the image name (or image_name) and
change the DOCKER_CMD assignments in the HEADLESS branch to use a container name
derived from $CONTAINER (for example --name "$CONTAINER" or --name
"${CONTAINER}_dev") instead of "vllm_dev" so the image argument is also used as
the container name and avoids name collisions.
bionemo-recipes/vllm/README.md
Outdated
| docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm -v ${PWD}:/workspace/bionemo vllm /bin/bash | ||
| ``` | ||
|
|
||
| or use `launch.sh`. |
There was a problem hiding this comment.
launch.sh mounts to a different container path than the manual docker run command — the README's test instruction will break after using launch.sh --mount_dir.
The manual command (line 14) mounts ${PWD} → /workspace/bionemo, which is the container's WORKDIR, so python test_esm2_golden_values.py works directly. However, launch.sh --mount_dir mounts the project root (two levels up) to /workspace/bionemo-framework, leaving /workspace/bionemo populated only by the image's COPY . . layer. Running python test_esm2_golden_values.py from WORKDIR after using --mount_dir will fail unless the user navigates to the correct subdirectory. The "or use launch.sh" phrasing implies equivalence — either clarify the path difference or update launch.sh to mount to /workspace/bionemo instead of /workspace/bionemo-framework.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/vllm/README.md` at line 17, The README's "or use `launch.sh`"
is misleading because `launch.sh --mount_dir` mounts the repo to
`/workspace/bionemo-framework` while the manual `docker run` mounts to
`/workspace/bionemo`, causing `python test_esm2_golden_values.py` to fail when
run from WORKDIR; fix by either (A) update `launch.sh` to mount the project root
into `/workspace/bionemo` instead of `/workspace/bionemo-framework` (adjust
mount target and any downstream path references in launch.sh), or (B) update
README.md to explicitly document that `launch.sh --mount_dir` mounts at
`/workspace/bionemo-framework` and instruct users to cd into the correct
subdirectory (where `test_esm2_golden_values.py` lives) before running the test;
mention the affected files `launch.sh`, `README.md`, and
`test_esm2_golden_values.py` so reviewers can locate the changes.
bionemo-recipes/vllm/README.md
Outdated
| ```python | ||
| python test_esm2_golden_values.py | ||
| ``` |
There was a problem hiding this comment.
Wrong code fence language identifier — should be bash, not python.
The python test_esm2_golden_values.py command is a shell invocation, not Python source code. Using ```python causes syntax highlighters to misparse it.
📝 Proposed fix
-```python
+```bash
python test_esm2_golden_values.py</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @bionemo-recipes/vllm/README.md around lines 21 - 23, Change the fenced code
block that currently uses the wrong language tag; locate the block containing
the shell command "python test_esm2_golden_values.py" in the README and replace
the opening fence language identifier from python to bash so the command
is treated as a shell snippet and highlighted correctly.
</details>
<!-- fingerprinting:phantom:poseidon:churro -->
<!-- This is an auto-generated comment by CodeRabbit -->
| if __name__ == "__main__": | ||
| print(f"GPUs: {torch.cuda.device_count()}") | ||
|
|
||
| # Step 0: fresh export (facebook HF -> our TE format) | ||
| print("\n[0/3] Exporting checkpoint ...") | ||
| MODEL_ID = fresh_export() | ||
|
|
||
| print(f"MODEL_ID: {MODEL_ID}") | ||
| print(f"REFERENCE_MODEL_ID: {REFERENCE_MODEL_ID}") | ||
| print(f"Sequences: {len(SEQUENCES)}") | ||
|
|
||
| # 1) vLLM on exported model | ||
| print("\n[1/3] vLLM inference (exported model) ...") | ||
| emb_vllm = vllm_embed(MODEL_ID, SEQUENCES) | ||
|
|
||
| # 2) HuggingFace on exported model | ||
| print("\n[2/3] HuggingFace inference (exported model) ...") | ||
| emb_hf_exported = hf_embed(MODEL_ID, SEQUENCES) | ||
|
|
||
| # 3) HuggingFace on reference Hub model | ||
| print("\n[3/3] HuggingFace inference (reference model) ...") | ||
| emb_hf_reference = hf_embed(REFERENCE_MODEL_ID, SEQUENCES) | ||
|
|
||
| # ---- Pairwise comparisons ---- | ||
| pairs = [ | ||
| ("vLLM (exported)", "HF (exported)", emb_vllm, emb_hf_exported), | ||
| ("vLLM (exported)", "HF (reference)", emb_vllm, emb_hf_reference), | ||
| ("HF (exported)", "HF (reference)", emb_hf_exported, emb_hf_reference), | ||
| ] | ||
|
|
||
| # ---- Summary table ---- | ||
| header = f"{'Pair':<35} {'max |diff|':>14} {'mean |diff|':>14} {'cos sim':>12} {'exact':>7}" | ||
| sep = "-" * len(header) | ||
| print(f"\n{sep}") | ||
| print(header) | ||
| print(sep) | ||
|
|
||
| for name_a, name_b, a, b in pairs: | ||
| diffs = np.abs(a.astype(np.float64) - b.astype(np.float64)) | ||
| label = f"{name_a} vs {name_b}" | ||
| exact = np.array_equal(a, b) | ||
| print( | ||
| f"{label:<35} {diffs.max():>14.8e} {diffs.mean():>14.8e} " | ||
| f"{cosine_sim(a, b):>12.10f} {'YES' if exact else 'NO':>7}" | ||
| ) | ||
|
|
||
| print(sep) | ||
| print(f"Tolerance: rtol={RTOL}, atol={ATOL} (0 = exact match required)") | ||
|
|
||
| # Per-sequence breakdown | ||
| short = {"vLLM (exported)": "vllm", "HF (exported)": "hf_exp", "HF (reference)": "hf_ref"} | ||
| print("\nPer-sequence max |diff|:") | ||
| for i in range(len(SEQUENCES)): | ||
| row = f" seq {i}:" | ||
| for name_a, name_b, a, b in pairs: | ||
| d = float(np.abs(a[i].astype(np.float64) - b[i].astype(np.float64)).max()) | ||
| row += f" {short[name_a]}_vs_{short[name_b]}={d:.8e}" | ||
| print(row) | ||
|
|
||
| print(sep) | ||
|
|
||
| # Cleanup | ||
| if torch.distributed.is_initialized(): | ||
| torch.distributed.destroy_process_group() |
There was a problem hiding this comment.
Test script has no assertions — it will never fail.
This "test" prints a comparison table but never asserts that results are within tolerance. RTOL and ATOL (line 74) are defined but unused. A silent pass regardless of output drift defeats the purpose of a golden-value test.
Add assertions after the comparison loop, for example:
Suggested assertion block
+ all_passed = True
for name_a, name_b, a, b in pairs:
diffs = np.abs(a.astype(np.float64) - b.astype(np.float64))
label = f"{name_a} vs {name_b}"
exact = np.array_equal(a, b)
print(
f"{label:<35} {diffs.max():>14.8e} {diffs.mean():>14.8e} "
f"{cosine_sim(a, b):>12.10f} {'YES' if exact else 'NO':>7}"
)
+ if not np.allclose(a, b, rtol=RTOL, atol=ATOL):
+ all_passed = False
print(sep)
- print(f"Tolerance: rtol={RTOL}, atol={ATOL} (0 = exact match required)")
+ print(f"Tolerance: rtol={RTOL}, atol={ATOL}")
+
+ assert all_passed, "Golden value comparison failed — see table above for details."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 147 - 210, The
script currently only prints comparisons and never fails; add assertions that
enforce the golden-value tolerances using RTOL and ATOL: for each pair in pairs
(refer to the pairs list and variables a, b), assert np.allclose(a, b,
rtol=RTOL, atol=ATOL) (or equivalently assert (np.abs(a-b) <= ATOL + RTOL *
np.abs(b)).all()) and fail the test if not, and also assert cosine_sim(a, b)
exceeds a sensible threshold or that exact is True when ATOL/RTOL are zero; add
per-sequence assertions inside the per-sequence loop to ensure each sequence
max-diff <= ATOL + RTOL * max(|b_i|) so the test fails on unacceptable drift.
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
|
/ok to test c31c3e8 |
|
/ok to test ef612a3 |
There was a problem hiding this comment.
wait, what? why doesn't this work by just adding vllm to requirements.txt?
There was a problem hiding this comment.
To link it to an existing pytorch install (ie the one in our base image) you have to build it: https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#use-an-existing-pytorch-installation
There was a problem hiding this comment.
Btw that script is optional, the way i thought about it was someone might want to quickly build the image for training without waiting for vllm to install and then maybe decide to install it in which case they can use that script inside the container. or they can use the build args with the Dockerfile to install it on first build
|
/ok to test 60043cd |
|
/ok to test c2d2351 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1473 +/- ##
=======================================
Coverage 76.45% 76.45%
=======================================
Files 102 102
Lines 7952 7952
=======================================
Hits 6080 6080
Misses 1872 1872 |
There was a problem hiding this comment.
can we flatten this directory? e.g. no esm2/ folder. we can put modeling_esm_te.py and modeling_llama_te.py right next to each other.
There was a problem hiding this comment.
we could but i have a number of files for llama3 as well so it might get pretty messy: https://github.com/NVIDIA/bionemo-framework/pull/1501/changes. do we still want to go that way?
There was a problem hiding this comment.
you're going to get the one CI job for the vllm_inference folder. So cding into esm2 here isn't going to scale well when you have multiple folders. You can always set up deeper packges here if you need to, but pip install and pytest are getting run from the root folder
There was a problem hiding this comment.
so what i did to solve this in the llama3 repo was have it dive into the subfolders of vllm_inference to discover other recipe jobs: https://github.com/NVIDIA/bionemo-framework/pull/1501/changes#diff-f91d3e08cdd503116607af5517ca8357db1810406cb3c74c824a8f732061d0cb. should i do this here as well?
|
/ok to test b6a3bcf |
Add a vLLM-based inference recipe for ESM2 models using TransformerEngine, including model conversion utilities, export scripts, and tests. Also updates NVEsmModel and related classes to support configurable add_pooling_layer via config and use self.model as the base_model_prefix for HuggingFace compatibility. Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
0a57a07 to
f2bfbce
Compare
|
/ok to test f2bfbce |
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
|
/ok to test a9a6f75 |
There was a problem hiding this comment.
those are only the copies from check_copied_files (originals are all checked by ruff anyway). ruff was giving a different ordering of imports for copied files and the original for the vllm recipe, so the two precommit checks were clashing. we can remove the the non-vllm copied files from the ignore but i just thought it would be a good practice to avoid future issues like that
There was a problem hiding this comment.
fixed with a .ruff.toml file
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (8)
.pre-commit-config.yaml (1)
32-58: Consider DRY: use YAML anchors to avoid duplicating the exclude pattern.The identical exclude pattern is duplicated between
ruffandruff-formathooks. YAML anchors can reduce this duplication and ensure both hooks stay in sync.♻️ Proposed refactor using YAML anchors
- id: ruff # 1. Attempt to automatically fix any lint issues. args: ["--fix"] # Exclude check_copied_files destinations; they are verbatim copies # of source files and must not be reformatted independently. - exclude: | + exclude: &ruff_exclude | (?x)^( bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv\.py| ... )$ - id: ruff-format - exclude: | - (?x)^( - bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv\.py| - ... - )$ + exclude: *ruff_exclude🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.pre-commit-config.yaml around lines 32 - 58, The exclude regex is duplicated for the ruff and ruff-format hooks; create a single YAML anchor (e.g., &ruff_exclude) for that multiline exclude pattern and replace both exclude blocks with an alias (e.g., *ruff_exclude) so the pattern is defined once and referenced by the ruff and ruff-format hook entries in .pre-commit-config.yaml.bionemo-recipes/recipes/esm2_native_te/tests/test_train.py (1)
296-309: Consider deduplicating FP8 log assertions into a shared helper.This block is effectively duplicated from the DDP FP8 stats test; extracting a helper would reduce maintenance cost for future log-path changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/esm2_native_te/tests/test_train.py` around lines 296 - 309, Extract the repeated FP8 log assertions into a shared helper (e.g., assert_fp8_logs(fp8_log_dir: Path, rank: int = 0)) and replace the duplicated block in test_train.py with a call to that helper; the helper should verify fp8_log_dir exists, the "rank_<rank>" directory, the "nvdlfw_inspect_logs" and "nvdlfw_inspect_statistics_logs" subdirectories, and that the two files named like "nvdlfw_inspect_globalrank-<rank>.log" (metadata_log and stats_log) both exist and are non-empty. Ensure the helper uses the same Path names (fp8_log_dir, metadata_log, stats_log) so callers are straightforward to update and put the helper in a shared test utilities module (or conftest) so other tests (the DDP FP8 stats test) can import and call it.bionemo-recipes/recipes/vllm_inference/.ci_build.sh (1)
2-2: Add error handling forcdcommand.If the
esm2directory doesn't exist, the script will continue in the wrong directory and fail unexpectedly. This was also flagged by static analysis (SC2164).Proposed fix
#!/bin/bash -x -cd esm2 +cd esm2 || exit 1 PIP_CONSTRAINT= pip install -r requirements.txt ./install_vllm.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/.ci_build.sh` at line 2, The script .ci_build.sh currently runs the command `cd esm2` without checking its result; update the script to handle failure of the `cd esm2` step (e.g., test that the directory exists or check the exit status) and abort with a clear error message and non-zero exit code if the chdir fails so the build doesn't continue in the wrong directory.bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile (1)
1-24: Consider adding a non-root user for security.Trivy flagged that the image runs as root (DS-0002). While this may be acceptable for development containers, consider adding a non-root user for production deployments to follow security best practices.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile` around lines 1 - 24, Add a non-root user and switch to it before the final WORKDIR to avoid running the image as root: create a dedicated user/group in the Dockerfile (e.g., via RUN useradd/groupadd or adduser), chown the application directory (/workspace/bionemo) and any cache dirs used during build to that user, and set USER to that account before the final WORKDIR /workspace/bionemo. Ensure the conditional INSTALL_VLLM block still runs as root (keep it before creating/switching user) and only switch to the non-root user afterward so runtime processes no longer run as root.bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh (1)
10-14: Add error handling forcdcommands.If
/workspacedoesn't exist orcd vllmfails after cloning, the script will continue executing in the wrong directory, potentially causing unexpected behavior.Proposed fix
-cd /workspace +cd /workspace || { echo "ERROR: /workspace does not exist"; exit 1; } if [ ! -d vllm ]; then git clone --branch v0.15.1 --depth 1 https://github.com/vllm-project/vllm.git fi -cd vllm +cd vllm || { echo "ERROR: Failed to enter vllm directory"; exit 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh` around lines 10 - 14, The script currently runs `cd /workspace` and `cd vllm` without checking for errors; update the install_vllm.sh flow to fail fast on directory errors by verifying `cd /workspace` succeeds (create /workspace with `mkdir -p` or exit with an error message if it cannot be accessed) and after cloning verify `cd vllm` returned successfully (if not, print a clear error and exit non‑zero). Apply these checks around the existing `cd /workspace` and `cd vllm` commands (or enable `set -euo pipefail` at the top) so the script doesn’t continue in the wrong directory.bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py (3)
174-184: Mutatingself.configmay cause unintended side effects.Directly assigning to
self.config.layer_precisionmodifies the config object, which could affect other components sharing the same config instance. Consider storing this as an instance attribute instead.Suggested fix
self.config = config self._fp8_recipe: transformer_engine.common.recipe.Recipe | None = fp8_recipe self._fp4_recipe: transformer_engine.common.recipe.Recipe | None = fp4_recipe + self._layer_precision = config.layer_precision - if self.config.layer_precision is None: + if self._layer_precision is None: if fp8_recipe is not None and fp4_recipe is not None: raise RuntimeError("Both FP8 and FP4 recipes provided, but no layer precision provided.") if fp8_recipe is not None: warnings.warn("No layer precision provided, using FP8 recipe for all layers.", UserWarning) - self.config.layer_precision = ["fp8"] * self.config.num_hidden_layers + self._layer_precision = ["fp8"] * self.config.num_hidden_layers elif fp4_recipe is not None: raise RuntimeError( "FP4 recipe provided but no layer_precision configured. " "Set layer_precision explicitly when using FP4." ) - if self.config.layer_precision is not None and "fp4" in self.config.layer_precision and fp4_recipe is None: + if self._layer_precision is not None and "fp4" in self._layer_precision and fp4_recipe is None: raise RuntimeError("layer_precision contains 'fp4' entries but no fp4_recipe was provided.")Then update
get_autocast_contextto useself._layer_precisioninstead ofself.config.layer_precision.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` around lines 174 - 184, Don't mutate the shared config by assigning to self.config.layer_precision; instead set an instance attribute (e.g., self._layer_precision) based on the existing logic (use ["fp8"] * self.config.num_hidden_layers when fp8_recipe is provided, raise for both recipes or for fp4 without explicit layer_precision), and then update get_autocast_context to read from self._layer_precision instead of self.config.layer_precision so other components sharing the config are not affected.
696-696: Extract duplicated magic number to a class constant.
mask_ratio_train = 0.15 * 0.8is defined identically in both_apply_token_dropout_bshdand_apply_token_dropout_thd. Consider defining this as a class constant to follow DRY principles.Suggested fix
class NVEsmEmbeddings(nn.Module): """Modified version of EsmEmbeddings to support THD inputs.""" + # Mask ratio used during ESM model training (15% masking * 80% replacement) + _MASK_RATIO_TRAIN = 0.15 * 0.8 + def __init__(self, config):Then replace both occurrences:
- mask_ratio_train = 0.15 * 0.8 # Hardcoded as the ratio used in all ESM model training runs + mask_ratio_train = self._MASK_RATIO_TRAINAlso applies to: 717-717
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` at line 696, The duplicated magic number (0.15 * 0.8) used for mask_ratio_train in _apply_token_dropout_bshd and _apply_token_dropout_thd should be extracted to a class-level constant (e.g., MASK_RATIO_TRAIN) on the model class (the class that defines _apply_token_dropout_bshd/_apply_token_dropout_thd) and both functions should reference that constant instead of recalculating the value inline; add the constant near other class constants, replace the inline expressions in _apply_token_dropout_bshd and _apply_token_dropout_thd with the new class constant, and ensure any imports/uses within the class access it as self.MASK_RATIO_TRAIN or ClassName.MASK_RATIO_TRAIN as appropriate.
80-80: Confusing default value forpadded_vocab_size.The default of
64contradicts the docstring which says "If not provided, defaults to vocab_size." This will cause unexpected behavior for models with larger vocabularies. Consider usingNoneas the default to match the documented behavior.Suggested fix
- padded_vocab_size: Optional[int] = 64, + padded_vocab_size: Optional[int] = None,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` at line 80, The parameter padded_vocab_size currently defaults to 64; change its signature to padded_vocab_size: Optional[int] = None (and update any type hints) and ensure the constructor/initialization logic in the class/function (where padded_vocab_size is used) treats None by setting padded_vocab_size = vocab_size so behavior matches the docstring; also update any downstream uses that assume an int to handle the None-to-vocab_size assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.pre-commit-config.yaml:
- Around line 44-45: The regex used to exclude files under
bionemo-recipes/models/(llama3|mixtral)/tests/common/ only matches the directory
itself, not files inside it; update the exclusion patterns in
.pre-commit-config.yaml for the ruff and ruff-format hooks (the entries
containing the string "bionemo-recipes/models/(llama3|mixtral)/tests/common/")
to append ".*" so they match files within that directory (e.g., change the
pattern to end with "/tests/common/.*").
In `@bionemo-recipes/models/esm2/modeling_esm_te.py`:
- Around line 414-416: The current fallback for add_pooling_layer in the model
init uses getattr(config, "add_pooling_layer", True) which conflicts with
NVEsmConfig's declared default False and the docstring; change the getattr
default to False so the fallback is getattr(config, "add_pooling_layer", False)
(keep the outer check if add_pooling_layer is None), and update the
function/class docstring where it states the default (currently "defaults to
True") to reflect the False default; reference symbols: add_pooling_layer,
config, NVEsmConfig, and the initializer in modeling_esm_te.py.
In `@bionemo-recipes/recipes/vllm_inference/.ci_test_env.sh`:
- Around line 1-2: The script lacks a shebang and blindly runs the cd command;
add a proper shebang (e.g., #!/usr/bin/env bash) at the top and guard the cd by
checking that the target directory exists and is a directory before changing
into it (for example test -d "esm2" || exit 1, or use if [ -d "esm2" ]; then cd
"esm2"; else echo "esm2 not found" >&2; exit 1; fi) so the script fails fast
instead of continuing in the wrong directory; update the file around the
existing cd esm2 invocation accordingly.
In `@bionemo-recipes/recipes/vllm_inference/esm2/convert.py`:
- Around line 110-129: The call to output_model.post_init() is redundant because
state.apply_transforms(...) already performs weight tying (see
state.apply_transforms) and related init work; remove the
output_model.post_init() call after apply_transforms, or if you only want to
ensure tied weights explicitly, replace it with a direct call to
output_model.tie_weights() instead of post_init; update the code around the
EsmForMaskedLM construction and the output_model variable accordingly.
In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile`:
- Around line 16-21: The Dockerfile runs the vLLM build steps (git clone vllm,
python use_existing_torch.py, uv pip install -r requirements/build.txt --system,
uv pip install --no-build-isolation -e . --system, pip install --upgrade
"transformers[torch]") but does not pass TORCH_CUDA_ARCH_LIST and MAX_JOBS like
install_vllm.sh; fix by exporting or prefixing TORCH_CUDA_ARCH_LIST and MAX_JOBS
with appropriate values (matching install_vllm.sh) for the uv pip install -r
requirements/build.txt --system and uv pip install --no-build-isolation -e .
--system commands (and the final pip install if needed) so the build targets the
correct CUDA arch and uses the desired parallelism.
In `@bionemo-recipes/recipes/vllm_inference/esm2/export.py`:
- Around line 70-121: The export_hf_checkpoint function loads local assets using
bare relative paths which break when the caller's CWD is different; change all
asset references in export_hf_checkpoint
(AutoTokenizer.from_pretrained("esm_fast_tokenizer"),
open("model_readme.template"), shutil.copy("modeling_esm_te.py", ...),
shutil.copy("LICENSE", ...)) to resolve them relative to the recipe file by
computing root = Path(__file__).resolve().parent and then using root /
"esm_fast_tokenizer", root / "model_readme.template", root /
"modeling_esm_te.py", and root / "LICENSE" so the tokenizer, template, and
copied files are always loaded from the recipe package rather than the current
working directory or the Hub.
In `@bionemo-recipes/recipes/vllm_inference/esm2/model_readme.template`:
- Around line 91-94: The "Runtime Engine(s):" section currently lists only
"Hugging Face Transformers" but this is a vllm_inference template and should
explicitly list vLLM (and include Transformers only if both are supported);
update the "Runtime Engine(s):" block in the vllm_inference
model_readme.template to include "vLLM" and, if the recipe supports both, list
"vLLM, Hugging Face Transformers" (apply the same change at the second
occurrence referenced in the comment).
- Around line 18-20: The description for "ESM-2" in model_readme.template
overstates its capability as directly performing protein structure prediction;
update the wording to say ESM-2 is a protein language/embedding model trained
with a masked LM objective that produces sequence embeddings useful for
downstream tasks (including structure-related analyses) rather than claiming it
predicts 3D structures directly; change the paragraph referencing "ESM-2" and
the similar sentence at the other occurrence (line ~44) to this softened
phrasing so the README correctly presents the model as an embedding/language
model for downstream use.
In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py`:
- Around line 140-143: The assertion checking padded_vocab_size vs vocab_size
should be converted to an explicit runtime check: replace the assert in the
block that references self.padded_vocab_size and self.vocab_size with an
if-condition that raises a ValueError when padded_vocab_size is less than
vocab_size, using the same descriptive message (e.g., "padded_vocab_size (...)
must be greater than or equal to vocab_size (...)"); update the check in the
method or initializer where these attributes are set (the code block containing
self.padded_vocab_size and self.vocab_size) so the validation cannot be bypassed
by Python optimizations.
In `@bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py`:
- Around line 1-20: Add a top-level Google-style module docstring to conftest.py
describing the module's purpose (e.g., pytest configuration and test path setup)
to satisfy pydocstyle; place a triple-quoted docstring at the very top of the
file that briefly summarizes what conftest.py does (adjusts sys.path and
provides test fixtures/config) and include any relevant notes or authorship per
project conventions.
- Line 20: Replace the sys.path.append call in conftest.py with
sys.path.insert(0, Path(__file__).parent.parent.as_posix()) so the local esm2
package is prioritized during tests; this ensures the import in test_vllm.py
(from export import export_hf_checkpoint) resolves to the local source rather
than an installed package and you can locate the call by searching for
sys.path.append and Path(__file__).parent.parent.as_posix() in conftest.py.
In `@bionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.py`:
- Around line 62-131: Add a module-level skip marker so all tests are skipped
when CUDA isn't available: detect CUDA availability (torch.cuda.is_available())
and set pytestmark = pytest.mark.skipif(not torch.cuda.is_available(),
reason="requires CUDA") at the top of the module; this prevents session-scoped
fixtures hf_exported_embeddings and hf_reference_embeddings from calling
_hf_embed (which .to("cuda")s models/tensors) on CPU-only runners and avoids
failures during fixture setup.
---
Nitpick comments:
In @.pre-commit-config.yaml:
- Around line 32-58: The exclude regex is duplicated for the ruff and
ruff-format hooks; create a single YAML anchor (e.g., &ruff_exclude) for that
multiline exclude pattern and replace both exclude blocks with an alias (e.g.,
*ruff_exclude) so the pattern is defined once and referenced by the ruff and
ruff-format hook entries in .pre-commit-config.yaml.
In `@bionemo-recipes/recipes/esm2_native_te/tests/test_train.py`:
- Around line 296-309: Extract the repeated FP8 log assertions into a shared
helper (e.g., assert_fp8_logs(fp8_log_dir: Path, rank: int = 0)) and replace the
duplicated block in test_train.py with a call to that helper; the helper should
verify fp8_log_dir exists, the "rank_<rank>" directory, the
"nvdlfw_inspect_logs" and "nvdlfw_inspect_statistics_logs" subdirectories, and
that the two files named like "nvdlfw_inspect_globalrank-<rank>.log"
(metadata_log and stats_log) both exist and are non-empty. Ensure the helper
uses the same Path names (fp8_log_dir, metadata_log, stats_log) so callers are
straightforward to update and put the helper in a shared test utilities module
(or conftest) so other tests (the DDP FP8 stats test) can import and call it.
In `@bionemo-recipes/recipes/vllm_inference/.ci_build.sh`:
- Line 2: The script .ci_build.sh currently runs the command `cd esm2` without
checking its result; update the script to handle failure of the `cd esm2` step
(e.g., test that the directory exists or check the exit status) and abort with a
clear error message and non-zero exit code if the chdir fails so the build
doesn't continue in the wrong directory.
In `@bionemo-recipes/recipes/vllm_inference/esm2/Dockerfile`:
- Around line 1-24: Add a non-root user and switch to it before the final
WORKDIR to avoid running the image as root: create a dedicated user/group in the
Dockerfile (e.g., via RUN useradd/groupadd or adduser), chown the application
directory (/workspace/bionemo) and any cache dirs used during build to that
user, and set USER to that account before the final WORKDIR /workspace/bionemo.
Ensure the conditional INSTALL_VLLM block still runs as root (keep it before
creating/switching user) and only switch to the non-root user afterward so
runtime processes no longer run as root.
In `@bionemo-recipes/recipes/vllm_inference/esm2/install_vllm.sh`:
- Around line 10-14: The script currently runs `cd /workspace` and `cd vllm`
without checking for errors; update the install_vllm.sh flow to fail fast on
directory errors by verifying `cd /workspace` succeeds (create /workspace with
`mkdir -p` or exit with an error message if it cannot be accessed) and after
cloning verify `cd vllm` returned successfully (if not, print a clear error and
exit non‑zero). Apply these checks around the existing `cd /workspace` and `cd
vllm` commands (or enable `set -euo pipefail` at the top) so the script doesn’t
continue in the wrong directory.
In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py`:
- Around line 174-184: Don't mutate the shared config by assigning to
self.config.layer_precision; instead set an instance attribute (e.g.,
self._layer_precision) based on the existing logic (use ["fp8"] *
self.config.num_hidden_layers when fp8_recipe is provided, raise for both
recipes or for fp4 without explicit layer_precision), and then update
get_autocast_context to read from self._layer_precision instead of
self.config.layer_precision so other components sharing the config are not
affected.
- Line 696: The duplicated magic number (0.15 * 0.8) used for mask_ratio_train
in _apply_token_dropout_bshd and _apply_token_dropout_thd should be extracted to
a class-level constant (e.g., MASK_RATIO_TRAIN) on the model class (the class
that defines _apply_token_dropout_bshd/_apply_token_dropout_thd) and both
functions should reference that constant instead of recalculating the value
inline; add the constant near other class constants, replace the inline
expressions in _apply_token_dropout_bshd and _apply_token_dropout_thd with the
new class constant, and ensure any imports/uses within the class access it as
self.MASK_RATIO_TRAIN or ClassName.MASK_RATIO_TRAIN as appropriate.
- Line 80: The parameter padded_vocab_size currently defaults to 64; change its
signature to padded_vocab_size: Optional[int] = None (and update any type hints)
and ensure the constructor/initialization logic in the class/function (where
padded_vocab_size is used) treats None by setting padded_vocab_size = vocab_size
so behavior matches the docstring; also update any downstream uses that assume
an int to handle the None-to-vocab_size assignment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e2413d6e-00ae-495d-aa4b-c6737be37029
📒 Files selected for processing (41)
.pre-commit-config.yaml.secrets.baselinebionemo-recipes/models/esm2/README.mdbionemo-recipes/models/esm2/convert.pybionemo-recipes/models/esm2/export.pybionemo-recipes/models/esm2/modeling_esm_te.pybionemo-recipes/models/esm2/tests/test_cp_bshd.pybionemo-recipes/models/esm2/tests/test_cp_thd.pybionemo-recipes/models/esm2/tests/test_distributed_fp8.pybionemo-recipes/models/esm2/tests/test_distributed_strategies.pybionemo-recipes/models/esm2/tests/test_modeling_esm_te.pybionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/esm2_native_te/fp4_debugging_stats.yamlbionemo-recipes/recipes/esm2_native_te/modeling_esm_te.pybionemo-recipes/recipes/esm2_native_te/quantization.pybionemo-recipes/recipes/esm2_native_te/tests/test_quantization.pybionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.pybionemo-recipes/recipes/esm2_native_te/tests/test_train.pybionemo-recipes/recipes/esm2_native_te/train_ddp.pybionemo-recipes/recipes/esm2_native_te/train_ddp_cp.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2.pybionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.pybionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.pybionemo-recipes/recipes/vllm_inference/.ci_build.shbionemo-recipes/recipes/vllm_inference/.ci_test_env.shbionemo-recipes/recipes/vllm_inference/esm2/Dockerfilebionemo-recipes/recipes/vllm_inference/esm2/LICENSEbionemo-recipes/recipes/vllm_inference/esm2/README.mdbionemo-recipes/recipes/vllm_inference/esm2/convert.pybionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/special_tokens_map.jsonbionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/tokenizer.jsonbionemo-recipes/recipes/vllm_inference/esm2/esm_fast_tokenizer/tokenizer_config.jsonbionemo-recipes/recipes/vllm_inference/esm2/export.pybionemo-recipes/recipes/vllm_inference/esm2/install_vllm.shbionemo-recipes/recipes/vllm_inference/esm2/model_readme.templatebionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.pybionemo-recipes/recipes/vllm_inference/esm2/requirements.txtbionemo-recipes/recipes/vllm_inference/esm2/state.pybionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.pybionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.pyci/scripts/check_copied_files.py
| with init_empty_weights(): | ||
| model_hf = EsmForMaskedLM(hf_config) | ||
|
|
||
| # Remove contact_head since it's not present in TE models | ||
| if hasattr(model_hf.esm, "contact_head"): | ||
| delattr(model_hf.esm, "contact_head") | ||
|
|
||
| output_model = state.apply_transforms( | ||
| model_te, | ||
| model_hf, | ||
| reverse_mapping, | ||
| [_unpack_qkv_weight, _unpack_qkv_bias, _unpad_embeddings, _unpad_decoder_weights, _unpad_bias], | ||
| state_dict_ignored_entries=[ | ||
| "lm_head.decoder.weight", | ||
| "esm.contact_head.regression.weight", | ||
| "esm.contact_head.regression.bias", | ||
| ], | ||
| ) | ||
|
|
||
| output_model.post_init() |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the current Hugging Face Transformers version, what does PreTrainedModel.post_init() do? If it is called after parameters have been manually loaded into a model, does it reinitialize weights or only perform setup/tie_weights logic?
💡 Result:
In current Transformers (v5.x), PreTrainedModel.post_init() is the “end of __init__” hook. Its main job is to call the model’s weight-init/tie/pruning routine and to finalize some “static” model properties for composite models. Concretely, it calls self.init_weights() (and uses that moment to ensure things like tied-weights metadata are correct). [1]
init_weights():
- prunes heads if
config.pruned_headsis set, and - runs initialization by recursively applying
_initialize_weightsonly to modules that are not already marked as initialized (via a per-module_is_hf_initializedflag), and then - calls
self.tie_weights()to enforce configured weight tying. [2][3]
So, if you call post_init() after you manually loaded parameters, it typically will not “blindly reinitialize everything”—because modules that have already been initialized (and thus have _is_hf_initialized=True) are skipped by _initialize_weights. [3] However:
- it will (re)run
tie_weights(), which can overwrite some parameters by design (e.g., tying output embeddings to input embeddings whentie_word_embeddings=True). [2][3] - it can initialize newly-added / newly-resized modules that don’t have
_is_hf_initializedset yet (this is one reason people see “unexpected reinit” after resizing embeddings or adding heads). [3]
Rule of thumb: post_init() is not a “safe no-op setup call”; it’s “init + tie (and prune)”. After manual loading, it won’t usually reinit already-initialized modules, but it can initialize uninitialized/new modules and it will enforce tying. [1][2][3]
Sources
- HF docs (
post_initdescription, v5.3.0 docs): https://huggingface.co/docs/transformers/main_classes/model - HF docs (
init_weights/tie_weightsbehavior, v5.3.0 docs): https://huggingface.co/docs/transformers/main_classes/model modeling_utils.py(showspost_init -> init_weights, and_initialize_weightsskipping modules with_is_hf_initialized): https://huggingface.co/Yuvarraj/MASR/blob/main/transformers/src/transformers/modeling_utils.py
🏁 Script executed:
# Find the file and understand the context
cd bionemo-recipes || true
find . -type f -name "convert.py" | grep -E "esm2|vllm" | head -5Repository: NVIDIA/bionemo-framework
Length of output: 133
🏁 Script executed:
# Search for ESM2 config setup and weight tying settings
rg -t py "tie_word_embeddings|tie_weights|EsmConfig" --max-count=20Repository: NVIDIA/bionemo-framework
Length of output: 19722
🏁 Script executed:
# Check what apply_transforms does and whether it handles tying
rg -t py "apply_transforms" -B 5 -A 10 --max-count=30Repository: NVIDIA/bionemo-framework
Length of output: 37790
🏁 Script executed:
# Check the actual convert.py file at the target lines
cd bionemo-recipes && head -150 recipes/vllm_inference/esm2/convert.py | tail -50Repository: NVIDIA/bionemo-framework
Length of output: 2046
🏁 Script executed:
# Compare with models/esm2/convert.py to see if they differ
diff -u models/esm2/convert.py recipes/vllm_inference/esm2/convert.py | head -100Repository: NVIDIA/bionemo-framework
Length of output: 195
🏁 Script executed:
# Check how other models handle post_init() after apply_transforms
rg -t py "post_init\(\)" --max-count=20Repository: NVIDIA/bionemo-framework
Length of output: 3627
🏁 Script executed:
# Check if ESM2 has weight tying configured
rg -t py "tie_word_embeddings" --max-count=30 | grep -i esmRepository: NVIDIA/bionemo-framework
Length of output: 50
🏁 Script executed:
# Look at what EsmForMaskedLM's post_init() would actually do
rg -A 20 "class EsmForMaskedLM" recipes/vllm_inference/esm2/modeling_esm_te.py | head -40Repository: NVIDIA/bionemo-framework
Length of output: 154
🏁 Script executed:
# Check the state.py apply_transforms function to see if it handles tying
cd bionemo-recipes && sed -n '1,200p' recipes/vllm_inference/esm2/state.py | grep -A 50 "def apply_transforms"Repository: NVIDIA/bionemo-framework
Length of output: 2814
🏁 Script executed:
# Check if state.py calls tie_weights() after apply_transforms
cd bionemo-recipes && cat recipes/vllm_inference/esm2/state.py | grep -A 5 "def apply_transforms" | tail -100Repository: NVIDIA/bionemo-framework
Length of output: 322
🏁 Script executed:
# Check the end of apply_transforms to see if it calls tie_weights
rg -A 100 "def apply_transforms" recipes/vllm_inference/esm2/state.py | tail -80Repository: NVIDIA/bionemo-framework
Length of output: 144
🏁 Script executed:
# Check the actual convert.py more completely around the post_init call
cd bionemo-recipes && cat -n recipes/vllm_inference/esm2/convert.py | sed -n '95,135p'Repository: NVIDIA/bionemo-framework
Length of output: 1972
🏁 Script executed:
# Look for any comments explaining why post_init is needed
rg -B 5 -A 5 "output_model.post_init()" recipes/vllm_inference/esm2/convert.pyRepository: NVIDIA/bionemo-framework
Length of output: 146
🏁 Script executed:
# Find the actual path to the convert.py file
find . -name "convert.py" -path "*vllm_inference/esm2*"Repository: NVIDIA/bionemo-framework
Length of output: 124
🏁 Script executed:
# Check the full path and list the directory
git ls-files | grep -E "vllm_inference.*esm2.*convert"Repository: NVIDIA/bionemo-framework
Length of output: 122
🏁 Script executed:
# Look at the state.py end of apply_transforms function to see if it calls tie_weights
git ls-files | grep -E "vllm_inference.*esm2.*state"Repository: NVIDIA/bionemo-framework
Length of output: 120
🏁 Script executed:
# Check models/esm2/convert.py for comparison
git ls-files | grep -E "models/esm2.*convert"Repository: NVIDIA/bionemo-framework
Length of output: 106
🏁 Script executed:
# Now check the actual file with correct path
cat bionemo-recipes/recipes/vllm_inference/esm2/convert.py | grep -A 150 "def convert_esm_te_to_hf" | head -70Repository: NVIDIA/bionemo-framework
Length of output: 3196
🏁 Script executed:
# Check the state.py apply_transforms ending to see if tie_weights is called
tail -100 bionemo-recipes/recipes/vllm_inference/esm2/state.py | grep -B 20 "return target"Repository: NVIDIA/bionemo-framework
Length of output: 50
🏁 Script executed:
# Check models/esm2/convert.py for the same function and see if it also calls post_init
grep -A 20 "output_model.post_init()" bionemo-recipes/models/esm2/convert.pyRepository: NVIDIA/bionemo-framework
Length of output: 895
🏁 Script executed:
# Check if there's a difference in how state.py handles tie_weights
grep -n "tie_weights" bionemo-recipes/recipes/vllm_inference/esm2/state.pyRepository: NVIDIA/bionemo-framework
Length of output: 143
Remove redundant post_init() call after state.apply_transforms().
state.apply_transforms() already calls tie_weights() at line 220 of state.py, so calling output_model.post_init() immediately after (line 129) redundantly re-runs tie_weights() and other initialization logic. Since all weights have already been loaded and tied by apply_transforms(), either remove the post_init() call or replace it with an explicit output_model.tie_weights() if only tying is needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/recipes/vllm_inference/esm2/convert.py` around lines 110 -
129, The call to output_model.post_init() is redundant because
state.apply_transforms(...) already performs weight tying (see
state.apply_transforms) and related init work; remove the
output_model.post_init() call after apply_transforms, or if you only want to
ensure tied weights explicitly, replace it with a direct call to
output_model.tie_weights() instead of post_init; update the code around the
EsmForMaskedLM construction and the output_model variable accordingly.
There was a problem hiding this comment.
out of scope of PR
There was a problem hiding this comment.
@gagank1, understood — I'll note this for a follow-up. Would you like me to open a GitHub issue to track the potential removal of the redundant output_model.post_init() call in convert_esm_te_to_hf?
| **Runtime Engine(s):** | ||
|
|
||
| - Hugging Face Transformers | ||
|
|
There was a problem hiding this comment.
Runtime engine section is inconsistent with this recipe context.
In a vllm_inference template, listing only Hugging Face Transformers can mislead users. Consider listing vLLM explicitly (and Transformers if both are supported).
Also applies to: 214-217
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/recipes/vllm_inference/esm2/model_readme.template` around
lines 91 - 94, The "Runtime Engine(s):" section currently lists only "Hugging
Face Transformers" but this is a vllm_inference template and should explicitly
list vLLM (and include Transformers only if both are supported); update the
"Runtime Engine(s):" block in the vllm_inference model_readme.template to
include "vLLM" and, if the recipe supports both, list "vLLM, Hugging Face
Transformers" (apply the same change at the second occurrence referenced in the
comment).
There was a problem hiding this comment.
@gagank1, understood! I'll set this aside for now. If you'd like to track it for a future update, I can open a follow-up issue — just let me know.
| if self.padded_vocab_size is not None and self.vocab_size is not None: | ||
| assert self.padded_vocab_size >= self.vocab_size, ( | ||
| f"padded_vocab_size ({self.padded_vocab_size}) must be greater than or equal to vocab_size ({self.vocab_size})" | ||
| ) |
There was a problem hiding this comment.
Replace assert with explicit exception for production code.
Using assert for validation can be disabled with Python's -O flag, bypassing this critical check. Use raise ValueError for robust validation.
Suggested fix
# Ensure padded_vocab_size is at least as large as vocab_size
if self.padded_vocab_size is not None and self.vocab_size is not None:
- assert self.padded_vocab_size >= self.vocab_size, (
- f"padded_vocab_size ({self.padded_vocab_size}) must be greater than or equal to vocab_size ({self.vocab_size})"
- )
+ if self.padded_vocab_size < self.vocab_size:
+ raise ValueError(
+ f"padded_vocab_size ({self.padded_vocab_size}) must be >= vocab_size ({self.vocab_size})"
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/recipes/vllm_inference/esm2/modeling_esm_te.py` around lines
140 - 143, The assertion checking padded_vocab_size vs vocab_size should be
converted to an explicit runtime check: replace the assert in the block that
references self.padded_vocab_size and self.vocab_size with an if-condition that
raises a ValueError when padded_vocab_size is less than vocab_size, using the
same descriptive message (e.g., "padded_vocab_size (...) must be greater than or
equal to vocab_size (...)"); update the check in the method or initializer where
these attributes are set (the code block containing self.padded_vocab_size and
self.vocab_size) so the validation cannot be bypassed by Python optimizations.
There was a problem hiding this comment.
@gagank1 Understood, I'll defer this to a future PR. No further action needed here.
| from pathlib import Path | ||
|
|
||
|
|
||
| sys.path.append(Path(__file__).parent.parent.as_posix()) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check if the file exists and view its contents
fd "conftest.py" bionemo-recipes/recipes/vllm_inference/esm2/tests/Repository: NVIDIA/bionemo-framework
Length of output: 129
🏁 Script executed:
# Now let's examine the conftest.py file
cat -n bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.pyRepository: NVIDIA/bionemo-framework
Length of output: 994
🏁 Script executed:
# Check the directory structure to understand the path resolution
tree -L 4 bionemo-recipes/recipes/vllm_inference/esm2/Repository: NVIDIA/bionemo-framework
Length of output: 469
🏁 Script executed:
# Look for pyproject.toml locations in the repository
fd "pyproject.toml" | head -20Repository: NVIDIA/bionemo-framework
Length of output: 1061
🏁 Script executed:
# Let's check what imports are in test_vllm.py that might depend on sys.path
cat -n bionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.py | head -50Repository: NVIDIA/bionemo-framework
Length of output: 2119
🏁 Script executed:
# Check what files/modules are in the esm2 directory that might be imported
ls -la bionemo-recipes/recipes/vllm_inference/esm2/*.pyRepository: NVIDIA/bionemo-framework
Length of output: 492
🏁 Script executed:
# Look at the full directory structure to understand what could be imported
find bionemo-recipes/recipes/vllm_inference/esm2/ -maxdepth 2 -type f -name "*.py" -o -type dRepository: NVIDIA/bionemo-framework
Length of output: 575
🏁 Script executed:
# Search for other conftest.py files to understand the patterns used in this repo
find . -name "conftest.py" -type f | head -20Repository: NVIDIA/bionemo-framework
Length of output: 1228
🏁 Script executed:
# Let's also check if there's any __init__.py in the esm2 directory or its parents
find bionemo-recipes/recipes/vllm_inference/ -name "__init__.py" -type fRepository: NVIDIA/bionemo-framework
Length of output: 50
🏁 Script executed:
# Let's trace what the proposed fix would actually do
# Starting from conftest location, where would it find pyproject.toml?
find bionemo-recipes/recipes/vllm_inference -name "pyproject.toml" -type fRepository: NVIDIA/bionemo-framework
Length of output: 50
🏁 Script executed:
# Check if there are any imports from parent packages in test files
grep -r "import\|from" bionemo-recipes/recipes/vllm_inference/esm2/tests/ --include="*.py"Repository: NVIDIA/bionemo-framework
Length of output: 1570
🏁 Script executed:
# Let's check a few other conftest files to see how they handle sys.path
head -30 bionemo-recipes/models/esm2/tests/conftest.pyRepository: NVIDIA/bionemo-framework
Length of output: 1090
🏁 Script executed:
# Check the parent directory structure more carefully
ls -la bionemo-recipes/recipes/vllm_inference/esm2/Repository: NVIDIA/bionemo-framework
Length of output: 940
🏁 Script executed:
# Let's verify what the proposed fix would actually resolve to
# Starting from conftest, walk up to find pyproject.toml
python3 << 'EOF'
from pathlib import Path
conftest_path = Path("bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py")
start = conftest_path.parent.resolve()
print(f"Starting from: {start}")
print(f"Parent: {start.parent}")
print(f"Parent.parent: {start.parent.parent}")
# Walk up looking for pyproject.toml
for parent in [start, *start.parents]:
if (parent / "pyproject.toml").exists():
print(f"Found pyproject.toml at: {parent}")
break
# What current code does
current_append = start.parent.parent
print(f"\nCurrent sys.path.append would add: {current_append}")
print(f"Files in that directory: {list(current_append.glob('*.py'))}")
EOFRepository: NVIDIA/bionemo-framework
Length of output: 461
Use sys.path.insert(0, ...) instead of append(); installed packages should not take priority over local sources.
Using append() places the path at the end of sys.path, allowing installed packages to be found first. Since the import from export import export_hf_checkpoint in test_vllm.py expects the esm2 directory in the path, change this to insert(0) to prioritize local sources during tests.
🔧 Proposed fix
-sys.path.append(Path(__file__).parent.parent.as_posix())
+sys.path.insert(0, Path(__file__).parent.parent.as_posix())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sys.path.append(Path(__file__).parent.parent.as_posix()) | |
| sys.path.insert(0, Path(__file__).parent.parent.as_posix()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py` at line 20,
Replace the sys.path.append call in conftest.py with sys.path.insert(0,
Path(__file__).parent.parent.as_posix()) so the local esm2 package is
prioritized during tests; this ensures the import in test_vllm.py (from export
import export_hf_checkpoint) resolves to the local source rather than an
installed package and you can locate the call by searching for sys.path.append
and Path(__file__).parent.parent.as_posix() in conftest.py.
There was a problem hiding this comment.
out of scope, follows other conftests
There was a problem hiding this comment.
@gagank1, understood — I'll defer to the existing project conventions for conftest files. I've noted this for future reviews.
(ᵔᴥᵔ)
✏️ Learnings added
Learnt from: gagank1
Repo: NVIDIA/bionemo-framework PR: 1473
File: bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py:20-20
Timestamp: 2026-03-16T15:51:29.934Z
Learning: In bionemo-recipes conftest.py files (e.g., bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py), using `sys.path.append(Path(__file__).parent.parent.as_posix())` is the accepted project pattern for adding parent directories to the Python path. Do not flag or suggest replacing it with `sys.path.insert(0, ...)`, as this follows the existing style of other conftest files in the repository.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: gagank1
Repo: NVIDIA/bionemo-framework PR: 1473
File: bionemo-recipes/recipes/vllm_inference/esm2/tests/conftest.py:1-20
Timestamp: 2026-03-16T15:50:46.694Z
Learning: Do not require or flag a top-level module docstring in conftest.py files across the repository (e.g., bionemo-recipes/recipes/.../tests/conftest.py). This follows the project's existing style for conftest files. If CI enforces docstrings elsewhere, exceptions for conftest should remain isolated to this pattern.
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
|
/ok to test dcaa6c7 |
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
|
/ok to test 13046cb |
| assert fp8_log_dir.exists() | ||
| assert (fp8_log_dir / "rank_0" / "nvdlfw_inspect_logs" / "nvdlfw_inspect_globalrank-0.log").exists() | ||
| assert (fp8_log_dir / "rank_0" / "nvdlfw_inspect_statistics_logs" / "nvdlfw_inspect_globalrank-0.log").exists() | ||
| assert fp8_log_dir.exists(), "FP8 log directory was not created" | ||
| assert (fp8_log_dir / "rank_0").exists(), "rank_0 directory was not created" | ||
| assert (fp8_log_dir / "rank_0" / "nvdlfw_inspect_logs").exists(), "nvdlfw_inspect_logs directory was not created" | ||
| assert (fp8_log_dir / "rank_0" / "nvdlfw_inspect_statistics_logs").exists(), ( | ||
| "nvdlfw_inspect_statistics_logs directory was not created" | ||
| ) | ||
|
|
||
| metadata_log = fp8_log_dir / "rank_0" / "nvdlfw_inspect_logs" / "nvdlfw_inspect_globalrank-0.log" | ||
| stats_log = fp8_log_dir / "rank_0" / "nvdlfw_inspect_statistics_logs" / "nvdlfw_inspect_globalrank-0.log" | ||
|
|
||
| assert metadata_log.exists(), "Metadata log file was not created" | ||
| assert stats_log.exists(), "Statistics log file was not created" | ||
| assert metadata_log.stat().st_size > 0, "Metadata log file is empty" | ||
| assert stats_log.stat().st_size > 0, "Statistics log file is empty" |
There was a problem hiding this comment.
Any reason you're changing these?
There was a problem hiding this comment.
strengthening the test - cursor suggested because it was only checking the existence of the files instead of checking that they're non-empty like above in test_sanity_ddp_fp8_stats_logging
There was a problem hiding this comment.
Why do we need all these files? You should just need the modeling_esm file right?
There was a problem hiding this comment.
Is LICENSE not necessary? I saw it in some of the other recipes. The other files are needed since the tests for this recipe export/convert the original weights.
Description
This PR makes the ESM2 model compatible with vLLM. Primary issues were a naming incompatibility (vLLM expects
model.prefix and ESM2 usesesm.) andNVEsmModeldefaults toadd_pooling_layer=Truewhen loading the checkpoint even though it's exported without pooler weights.Usage
python test_esm2_golden_values.pyfrom inside the container, instructions to build and run it are provided.Type of changes
Triggering Code Rabbit AI Review
To trigger a code review from code rabbit, comment on a pull request with one of these commands:
See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.
Pre-submit Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores