Skip to content

Fix GRU w8a32 operator#17226

Open
mgiordy wants to merge 2 commits intopytorch:mainfrom
mgiordy:export-D90437262
Open

Fix GRU w8a32 operator#17226
mgiordy wants to merge 2 commits intopytorch:mainfrom
mgiordy:export-D90437262

Conversation

@mgiordy
Copy link
Contributor

@mgiordy mgiordy commented Feb 4, 2026

Summary:

Context

This diff fixes the reference implementation of the w8a32 GRU operator and enhances the operator's pattern matching.

Mitigation

The reference implementation has now the right output dimension and pattern matching now uses a safer check for the operator parameters.

Reviewed By: hsharma35

Differential Revision: D90437262

Marco Giordano added 2 commits February 4, 2026 15:19
Summary:

#### Summary

This diff fixes the Conv1d w8a32 operator by adding a transformation to the `val` attribute of the `other_inputs[0].meta` dictionary. Specifically, the `permute` operation is applied to the `original_val` tensor with the `fake_mode` context, and the resulting `transposed_val` is assigned to `transposed_inputs.meta["val"]`.

Reviewed By: mcremon-meta

Differential Revision: D89863750
Summary:
# Context
This diff fixes the reference implementation of the w8a32 GRU operator and enhances the operator's pattern matching.

# Mitigation
The reference implementation has now the right output dimension and pattern matching now uses a safer check for the operator parameters.

Reviewed By: hsharma35

Differential Revision: D90437262
Copilot AI review requested due to automatic review settings February 4, 2026 23:19
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17226

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Awaiting Approval, 4 New Failures

As of commit 6f6ff6a with merge base 267a59d (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Feb 4, 2026

@mgiordy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D90437262.

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes the GRU w8a32 operator by correcting the output shape in both the reference implementation and meta kernel, and enhancing pattern matching with safer parameter checks.

Changes:

  • Fixed GRU w8a32 operator output shape from (2, hidden_dim) to (2, batch, input_dim, hidden_dim) to properly reflect the expected dimensions
  • Enhanced pattern matching safety by using .get() method instead of direct dictionary access for tensor metadata
  • Added SharedQuantizationSpec for GRU biases to ensure consistent quantization scales
  • Added metadata propagation for transposed tensors in fusion pass
  • Added input shape validation for conv operator

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
backends/cadence/aot/tests/test_ref_implementations.py Updated test expectations to match corrected output shape
backends/cadence/aot/ref_implementations.py Fixed output shape calculation by expanding hidden state instead of flattening
backends/cadence/aot/quantizer/patterns.py Added safer metadata access, input validation, and shared bias quantization spec
backends/cadence/aot/quantizer/fusion_pass.py Added val metadata propagation for transposed inputs and weights
backends/cadence/aot/ops_registrations.py Updated meta kernel to return correct output shape with improved documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2857 to +2865
seq_len = inputs.shape[1]
assert seq_len == 1
# inputs comes in shape [batch, seq_len, input_size]
# hidden comes in shape [batch, seq_len, hidden_size]
# weights_inputs comes in shape [3 * hidden_size, input_size]
# weights_hidden comes in shape [3 * hidden_size, hidden_size]
# output comes in empty with shape [2, batch, seq_len, hidden_size]
# The first dimension stacks the output and the new hidden state
return hidden.new_empty((2, inputs.shape[0], inputs.shape[1], hidden.shape[-1]), dtype=torch.float32)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion assert seq_len == 1 will fail with the 2D inputs used in the test cases. Looking at test_ref_implementations.py, the tests use 2D inputs like torch.tensor([[1.0, 2.0]], dtype=torch.float32) with shape [1, 2], where inputs.shape[1] is 2, not 1. Either the assertion logic needs to check if inputs is 3D before asserting seq_len == 1, or the comments describing the expected shapes are incorrect and should indicate 2D inputs [batch, input_size] instead of 3D [batch, seq_len, input_size].

Suggested change
seq_len = inputs.shape[1]
assert seq_len == 1
# inputs comes in shape [batch, seq_len, input_size]
# hidden comes in shape [batch, seq_len, hidden_size]
# weights_inputs comes in shape [3 * hidden_size, input_size]
# weights_hidden comes in shape [3 * hidden_size, hidden_size]
# output comes in empty with shape [2, batch, seq_len, hidden_size]
# The first dimension stacks the output and the new hidden state
return hidden.new_empty((2, inputs.shape[0], inputs.shape[1], hidden.shape[-1]), dtype=torch.float32)
# inputs may come in shape [batch, seq_len, input_size] or [batch, input_size] (for seq_len == 1)
if inputs.dim() == 3:
batch = inputs.shape[0]
seq_len = inputs.shape[1]
assert seq_len == 1
elif inputs.dim() == 2:
batch = inputs.shape[0]
seq_len = 1
else:
raise AssertionError(f"Unsupported inputs.dim() for quantized_w8a32_gru_meta: {inputs.dim()}")
# hidden is expected to have hidden_size in its last dimension
hidden_size = hidden.shape[-1]
# weights_inputs comes in shape [3 * hidden_size, input_size]
# weights_hidden comes in shape [3 * hidden_size, hidden_size]
# output comes in empty with shape [2, batch, seq_len, hidden_size]
# The first dimension stacks the output and the new hidden state
return hidden.new_empty((2, batch, seq_len, hidden_size), dtype=torch.float32)

Copilot uses AI. Check for mistakes.
Comment on lines +2857 to +2865
seq_len = inputs.shape[1]
assert seq_len == 1
# inputs comes in shape [batch, seq_len, input_size]
# hidden comes in shape [batch, seq_len, hidden_size]
# weights_inputs comes in shape [3 * hidden_size, input_size]
# weights_hidden comes in shape [3 * hidden_size, hidden_size]
# output comes in empty with shape [2, batch, seq_len, hidden_size]
# The first dimension stacks the output and the new hidden state
return hidden.new_empty((2, inputs.shape[0], inputs.shape[1], hidden.shape[-1]), dtype=torch.float32)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments describe inputs as 3D [batch, seq_len, input_size] and hidden as 3D [batch, seq_len, hidden_size], but the test cases in test_ref_implementations.py use 2D tensors [batch, input_size] and [batch, hidden_size]. The comments should be updated to reflect the actual expected shapes, or the implementation should be modified to match the documented shapes.

Suggested change
seq_len = inputs.shape[1]
assert seq_len == 1
# inputs comes in shape [batch, seq_len, input_size]
# hidden comes in shape [batch, seq_len, hidden_size]
# weights_inputs comes in shape [3 * hidden_size, input_size]
# weights_hidden comes in shape [3 * hidden_size, hidden_size]
# output comes in empty with shape [2, batch, seq_len, hidden_size]
# The first dimension stacks the output and the new hidden state
return hidden.new_empty((2, inputs.shape[0], inputs.shape[1], hidden.shape[-1]), dtype=torch.float32)
# inputs comes in shape [batch, input_size]
# hidden comes in shape [batch, hidden_size]
# weights_inputs comes in shape [3 * hidden_size, input_size]
# weights_hidden comes in shape [3 * hidden_size, hidden_size]
# output comes in empty with shape [2, batch, hidden_size]
# The first dimension stacks the output and the new hidden state
assert len(inputs.shape) == 2
assert len(hidden.shape) == 2
assert inputs.shape[0] == hidden.shape[0]
return hidden.new_empty((2, inputs.shape[0], hidden.shape[-1]), dtype=torch.float32)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant