Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
670 changes: 670 additions & 0 deletions backends/apple/coreml/test/test_image_processing.py

Large diffs are not rendered by default.

257 changes: 257 additions & 0 deletions extension/vision/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
# ExecuTorch Vision Extension

Image preprocessing and postprocessing utilities for on-device inference with ExecuTorch.

## Overview

This module provides `ImagePreprocessor` and `ImagePostprocessor` classes that generate `ExportedProgram` objects ready to be lowered to any ExecuTorch backend (CoreML, MPS, XNNPACK, etc.).

### Key Features

- **SDR Support**: 8-bit sRGB content (photos, standard video)
- **HDR Support**: 10/12-bit PQ (HDR10, Dolby Vision) and HLG (broadcast HDR)
- **Color Gamut Conversion**: BT.2020 ↔ BT.709 wide color gamut support
- **Standard Normalizations**: ImageNet, [0,1], [-1,1] scaling
- **Color Layout**: RGB ↔ BGR conversion, grayscale
- **Precision Control**: fp16/fp32 output dtype selection

## Quick Start

```python
from executorch.extension.vision import ImagePreprocessor, ImagePostprocessor

# Simple [0, 255] → [0, 1] preprocessing
preprocessor = ImagePreprocessor.from_scale_0_1(
shape=(1, 3, 480, 640),
input_dtype=torch.float16,
)

# ImageNet normalization
preprocessor = ImagePreprocessor.from_imagenet(
shape=(1, 3, 224, 224),
input_dtype=torch.float16,
)

# HDR10 → linear BT.709 (for HDR video processing)
preprocessor = ImagePreprocessor.from_hdr10(
shape=(1, 3, 1080, 1920),
input_dtype=torch.float32, # fp32 recommended for PQ precision
output_dtype=torch.float16,
)

# Lower to your backend
from executorch.exir import to_edge_transform_and_lower
program = to_edge_transform_and_lower(preprocessor, partitioner=[YourPartitioner()])
```

## ImagePreprocessor

Converts input images to normalized tensors for model inference.

### Factory Methods

| Method | Input | Output | Use Case |
|--------|-------|--------|----------|
| `from_scale_0_1()` | [0, 255] | [0, 1] | Simple normalization |
| `from_scale_neg1_1()` | [0, 255] | [-1, 1] | Zero-centered models |
| `from_imagenet()` | [0, 255] | ImageNet normalized | Classification models |
| `from_sdr()` | 8-bit sRGB | Linear or normalized | SDR with gamma correction |
| `from_hdr10()` | 10-bit PQ BT.2020 | Linear BT.709 | HDR10/Dolby Vision |
| `from_hlg()` | 10-bit HLG BT.2020 | Linear BT.709 | Broadcast HDR |

### Processing Pipeline

```
Input Image
┌─────────────────────────┐
│ 1. Color Layout │ BGR → RGB (if needed)
├─────────────────────────┤
│ 2. Normalize to [0,1] │ Divide by max_value (255, 1023, 4095)
├─────────────────────────┤
│ 3. Inverse Transfer │ PQ/HLG/sRGB → Linear
├─────────────────────────┤
│ 4. Gamut Conversion │ BT.2020 → BT.709 (if needed)
├─────────────────────────┤
│ 5. Output Transfer │ Linear → sRGB (if needed)
├─────────────────────────┤
│ 6. Bias & Scale │ (x + bias) * scale
├─────────────────────────┤
│ 7. Output Layout │ RGB → Grayscale/BGR (if needed)
└─────────────────────────┘
Normalized Tensor
```

## ImagePostprocessor

Converts model output tensors back to displayable images. Performs the inverse operations of `ImagePreprocessor`.

### Factory Methods

| Method | Input | Output | Use Case |
|--------|-------|--------|----------|
| `from_scale_0_1()` | [0, 1] | [0, 255] | Simple denormalization |
| `from_scale_neg1_1()` | [-1, 1] | [0, 255] | Zero-centered models |
| `from_imagenet()` | ImageNet normalized | [0, 255] | Classification models |
| `from_linear_to_srgb()` | Linear | 8-bit sRGB | SDR output |
| `from_linear_to_hdr10()` | Linear BT.709 | 10-bit PQ BT.2020 | HDR10 output |
| `from_linear_to_hlg()` | Linear BT.709 | 10-bit HLG BT.2020 | Broadcast HDR output |

## HDR Processing

### Precision Considerations

⚠️ **Important**: PQ (HDR10/Dolby Vision) transfer functions require **fp32 compute precision** for accurate results. The PQ EOTF contains an exponent of m2=78.84, which causes significant precision loss when computed in fp16.

| Transfer Function | Recommended Precision | Notes |
|-------------------|----------------------|-------|
| **PQ (HDR10)** | fp32 input | m2=78.84 exponent loses precision in fp16 |
| **HLG** | fp16 OK | Piecewise sqrt/log is fp16-friendly |
| **sRGB** | fp16 OK | Simple x^2.2 power function |

### HDR10 (PQ / Dolby Vision)

```python
# Preprocessing: HDR10 input → linear for model
# ⚠️ Use fp32 input_dtype for accurate PQ decoding
preprocessor = ImagePreprocessor.from_hdr10(
shape=(1, 3, 1080, 1920),
input_dtype=torch.float32, # IMPORTANT: fp32 required for PQ precision
output_dtype=torch.float16, # Output can be fp16 for model inference
)

# Postprocessing: model output → HDR10 display
# ⚠️ Use fp32 output_dtype for accurate PQ encoding
postprocessor = ImagePostprocessor.from_linear_to_hdr10(
shape=(1, 3, 1080, 1920),
input_dtype=torch.float16,
output_dtype=torch.float32, # IMPORTANT: fp32 required for PQ precision
)
```

### HLG (Broadcast HDR)

HLG uses a piecewise sqrt/log function that is more numerically stable in fp16.

```python
# Preprocessing: HLG input → linear for model
preprocessor = ImagePreprocessor.from_hlg(
shape=(1, 3, 1080, 1920),
input_dtype=torch.float16, # fp16 OK for HLG
output_dtype=torch.float16,
)

# Postprocessing: model output → HLG display
postprocessor = ImagePostprocessor.from_linear_to_hlg(
shape=(1, 3, 1080, 1920),
input_dtype=torch.float16,
output_dtype=torch.float16, # fp16 OK for HLG
)
```

## Platform Integration Notes

The following operations should be done **outside** the model using platform-native APIs for best performance:

| Operation | iOS | Android |
|-----------|-----|---------|
| uint8 ↔ float | vDSP (~1ms for 512×512) | RenderScript |
| YUV → RGB | vImage | YuvImage |
| Resize/Crop | vImage, Metal | Bitmap APIs |

The preprocessor/postprocessor handles the **numerically intensive** operations (transfer functions, gamut conversion, normalization) that benefit from acceleration.

## Transfer Functions

### Supported Functions

| Function | Standard | Use Case |
|----------|----------|----------|
| **sRGB** | IEC 61966-2-1 | Standard displays, web |
| **PQ** | SMPTE ST.2084 | HDR10, Dolby Vision |
| **HLG** | BT.2100 | Broadcast HDR (BBC/NHK) |
| **Linear** | - | ML models, compositing |

### Implementation Details

- **PQ (Perceptual Quantizer)**: Uses the ST.2084 EOTF with constants m1=0.1593, m2=78.84, c1=0.8359, c2=18.85, c3=18.69. The large m2 exponent causes precision loss in fp16 - use fp32 for accurate results.

- **HLG (Hybrid Log-Gamma)**: Piecewise function with sqrt for low values and log for high values. More fp16-friendly than PQ.

- **sRGB**: Uses the x^2.2 / x^(1/2.2) approximation rather than the piecewise sRGB transfer function.

## Color Gamut

### Supported Gamuts

| Gamut | Standard | Coverage |
|-------|----------|----------|
| **BT.709** | Rec. 709 / sRGB | Standard SDR |
| **BT.2020** | Rec. 2020 | Wide color gamut (HDR) |

### Conversion Matrices

The module includes accurate 3×3 matrices for BT.709 ↔ BT.2020 conversion, validated against the `colour-science` reference library.

## Testing

Tests compare implementations against the `colour-science` library which provides reference implementations of ITU/SMPTE standards.

```bash
# Run tests
python -m unittest extension.vision.test.test_image_processing -v

# Or directly
python extension/vision/test/test_image_processing.py -v
```

### Test Coverage

- Transfer function accuracy vs colour-science reference
- Gamut conversion matrix validation
- Full E2E pipeline tests for all factory methods
- fp16 and fp32 precision validation
- Roundtrip tests (forward → inverse)

## API Reference

### ImagePreprocessor

```python
class ImagePreprocessor(torch.nn.Module):
"""
Args:
bit_depth: Input bit depth (8, 10, or 12). Default 8.
input_transfer: Transfer function of input (SRGB, PQ, HLG, LINEAR).
output_transfer: Desired transfer function of output.
input_gamut: Color gamut of input (BT709, BT2020).
output_gamut: Desired color gamut of output.
input_color: Color layout of input (RGB, BGR).
output_color: Desired color layout (RGB, BGR, GRAYSCALE).
channel_bias: Per-channel bias [R, G, B].
channel_scale: Per-channel scale [R, G, B].
preset: Preset name ("scale_0_1", "scale_neg1_1", "imagenet").
output_dtype: Output dtype (torch.float16 or torch.float32).
"""
```

### ImagePostprocessor

```python
class ImagePostprocessor(torch.nn.Module):
"""
Args:
bit_depth: Output bit depth (8, 10, or 12). Default 8.
input_transfer: Transfer function of input (LINEAR, SRGB).
output_transfer: Desired transfer function of output (SRGB, PQ, HLG).
input_gamut: Color gamut of input (BT709, BT2020).
output_gamut: Desired color gamut of output.
input_color: Color layout of input (RGB, BGR, GRAYSCALE).
output_color: Desired color layout (RGB, BGR).
preset: Preset name for inverse normalization.
output_dtype: Output dtype (torch.float16 or torch.float32).
"""
```
32 changes: 32 additions & 0 deletions extension/vision/TARGETS
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Any targets that should be shared between fbcode and xplat must be defined in
# targets.bzl. This file can contain fbcode-only targets.

load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime")

oncall("executorch")

runtime.python_library(
name = "image_processing",
srcs = [
"__init__.py",
"image_processing.py",
],
visibility = ["PUBLIC"],
deps = [
"//caffe2:torch",
],
)

runtime.python_test(
name = "test_image_processing",
srcs = [
"test/__init__.py",
"test/test_image_processing.py",
],
deps = [
"fbsource//third-party/pypi/colour-science:colour-science",
"fbsource//third-party/pypi/parameterized:parameterized",
":image_processing",
"//caffe2:torch",
],
)
21 changes: 21 additions & 0 deletions extension/vision/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

"""Vision extension module for image processing utilities."""

from executorch.extension.vision.image_processing import (
ColorGamut,
ImagePostprocessor,
ImagePreprocessor,
TransferFunction,
)

__all__ = [
"ImagePreprocessor",
"ImagePostprocessor",
"TransferFunction",
"ColorGamut",
]
Loading
Loading