Skip to content

Metal backend: Add Metal int4 quantization support to Parakeet#17235

Merged
manuelcandales merged 57 commits intomainfrom
gh/manuelcandales/163/head
Feb 5, 2026
Merged

Metal backend: Add Metal int4 quantization support to Parakeet#17235
manuelcandales merged 57 commits intomainfrom
gh/manuelcandales/163/head

Conversation

@manuelcandales
Copy link
Contributor

@manuelcandales manuelcandales commented Feb 5, 2026

This PR adds support for 4-bit weight quantization on the Metal backend for Parakeet TDT model.

Parakeet Export Script (export_parakeet_tdt.py, quantize.py)

  • Added fpa4w (floating point activation, 4-bit weight) quantization option for encoder and decoder linear layers
  • Implemented Metal-specific quantization path using torchao's MPS API (UIntxWeightOnlyConfig)
  • Added validation to ensure fpa4w is only used with Metal backend
  • Filters out incompatible layers (weights not divisible by 8) during quantization

Documentation (README.md)

  • Added fpa4w to quantization config table with Metal backend designation
  • Added example showing Metal 4-bit quantization usage
  • Reorganized examples to separate CUDA and Metal quantization workflows

CI Integration (export_model_artifact.sh, metal.yml)

  • Added quantized-int4-metal option to export script with proper backend validation
  • Updated Metal CI workflow to test int4 quantization specifically with parakeet-tdt model

Dependencies

  • Bumped torchao pin for latest Metal quantization support

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Base automatically changed from gh/manuelcandales/159/head to main February 5, 2026 18:12
[ghstack-poisoned]
Copilot AI review requested due to automatic review settings February 5, 2026 18:35
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements 4-bit weight quantization support for the Parakeet TDT model on the Metal backend using torchao's MPS API. The changes enable Metal-specific quantization while maintaining existing CUDA quantization workflows.

Changes:

  • Added fpa4w (floating point activation, 4-bit weight) quantization option for Metal backend
  • Implemented validation to ensure Metal-specific quantization is only used with Metal backend
  • Updated CI workflows to test Metal int4 quantization with parakeet-tdt model

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
third-party/ao Updated torchao submodule to version with Metal quantization support
examples/models/parakeet/quantize.py Added Metal int4 quantization implementation using UIntxWeightOnlyConfig
examples/models/parakeet/export_parakeet_tdt.py Added fpa4w option and validation for Metal backend requirement
examples/models/parakeet/README.md Updated documentation with fpa4w config and Metal quantization example
.github/workflows/metal.yml Added int4 quantization testing for parakeet-tdt model
.ci/scripts/export_model_artifact.sh Added quantized-int4-metal option with backend validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mergennachin
Copy link
Contributor

@manuelcandales In the README.md, do you wanna add to run "EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh" as a prequisite step for int4 metal quantization?

@manuelcandales
Copy link
Contributor Author

@manuelcandales In the README.md, do you wanna add to run "EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh" as a prequisite step for int4 metal quantization?

yeah, that's true

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Copilot AI review requested due to automatic review settings February 5, 2026 19:45
@manuelcandales manuelcandales changed the base branch from main to gh/manuelcandales/166/head February 5, 2026 19:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

[ghstack-poisoned]
[ghstack-poisoned]
@manuelcandales manuelcandales had a problem deploying to upload-benchmark-results February 5, 2026 20:05 — with GitHub Actions Failure
[ghstack-poisoned]
[ghstack-poisoned]
Base automatically changed from gh/manuelcandales/166/head to main February 5, 2026 20:28
[ghstack-poisoned]

config = UIntxWeightOnlyConfig(
group_size=qlinear_group_size,
bitwidth=4,
Copy link
Contributor

@mergennachin mergennachin Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the pin past pytorch/ao#3829, and set

uintx_choose_qparams_algorithm="hqq"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be done in a follow-up PR too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's my plan, to do in follow-up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here #17258

@manuelcandales manuelcandales temporarily deployed to upload-benchmark-results February 5, 2026 21:50 — with GitHub Actions Inactive
@manuelcandales manuelcandales merged commit a8a5f6d into main Feb 5, 2026
332 of 340 checks passed
@manuelcandales manuelcandales deleted the gh/manuelcandales/163/head branch February 5, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants