Skip to content

Recommend Inferentia2/Trainium for inference workloads #13

@maksimov

Description

@maksimov

When an NVIDIA G-class instance is detected running inference with low
utilization, recommend AWS Inferentia2 (inf2) or Trainium (trn1) as
alternatives. These offer up to 3x price-performance for supported models.

  • Detect inference patterns (steady invocation rate, low batch variability)
  • Map compatible model architectures to inf2/trn1 support
  • Show concrete price-performance comparison vs current NVIDIA instance
  • Caveat: not all models are compatible — flag this clearly in recommendations

Major trend in 2026: NVIDIA → AWS silicon migration for inference workloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestv0.2Version 0.2 milestone

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions