diff --git a/.claude/backends.md b/.claude/backends.md
new file mode 100644
index 00000000000..5112b2e29e0
--- /dev/null
+++ b/.claude/backends.md
@@ -0,0 +1,34 @@
+# Backends
+
+| Backend | Platform | Hardware | Location |
+|---------|----------|----------|----------|
+| XNNPACK | All | CPU | `backends/xnnpack/` |
+| CUDA | Linux/Windows | GPU | `backends/cuda/` |
+| CoreML | iOS, macOS | NPU/GPU/CPU | `backends/apple/coreml/` |
+| MPS | iOS, macOS | GPU | `backends/apple/mps/` |
+| Vulkan | Android | GPU | `backends/vulkan/` |
+| QNN | Android | NPU | `backends/qualcomm/` |
+| MediaTek | Android | NPU | `backends/mediatek/` |
+| Arm Ethos-U | Embedded | NPU | `backends/arm/` |
+| OpenVINO | Embedded | CPU/GPU/NPU | `backends/openvino/` |
+| Cadence | Embedded | DSP | See `backends-cadence.md` |
+| Samsung | Android | NPU | `backends/samsung/` |
+
+## Partitioner imports
+```python
+from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
+from executorch.backends.apple.coreml.partition.coreml_partitioner import CoreMLPartitioner
+from executorch.backends.qualcomm.partition.qnn_partitioner import QnnPartitioner
+from executorch.backends.vulkan.partition.vulkan_partitioner import VulkanPartitioner
+```
+
+## Usage pattern
+```python
+from executorch.exir import to_edge
+
+edge = to_edge(exported_program)
+edge = edge.to_backend(XnnpackPartitioner())  # or other partitioner
+exec_prog = edge.to_executorch()
+```
+
+Unsupported ops fall back to portable CPU. Use multiple partitioners for priority fallback.
diff --git a/.claude/faq.md b/.claude/faq.md
new file mode 100644
index 00000000000..1aa415e4ce9
--- /dev/null
+++ b/.claude/faq.md
@@ -0,0 +1,35 @@
+# Common Errors
+
+## Error Codes
+Error codes defined in `runtime/core/error.h`.
+
+| Code | Name | Common Cause |
+|------|------|--------------|
+| 0x10 | InvalidArgument | Input shape mismatch - inputs don't match export shapes. Use dynamic shapes if needed. |
+| 0x14 | OperatorMissing | Selective build missing operator. Regenerate `et_operator_library` from current model. |
+| 0x20 | NotFound | Missing backend. Link with `--whole-archive`: `-Wl,--whole-archive libxnnpack_backend.a -Wl,--no-whole-archive` |
+
+## Export Issues
+
+**Missing out variants**: Custom ops need ExecuTorch implementation. See `kernel-library-custom-aten-kernel.md`.
+
+**RuntimeError: convert function not implemented**: Unsupported operator. File GitHub issue.
+
+## Runtime Issues
+
+**Slow inference**:
+1. Build with `-DCMAKE_BUILD_TYPE=Release`
+2. Ensure model is delegated (use `XnnpackPartitioner`)
+3. Set thread count: `threadpool::get_threadpool()->_unsafe_reset_threadpool(num_threads)`
+
+**Numerical accuracy**: Use devtools to debug. See `/profile` skill.
+
+**Error setting input 0x10**: Input shape mismatch. Specify dynamic shapes at export.
+
+**Duplicate kernel registration abort**: Multiple `gen_operators_lib` linked. Use only one per target.
+
+## Installation
+
+**Missing python-dev**: `sudo apt install python<version>-dev`
+
+**Missing pytorch_tokenizers**: `pip install -e ./extension/llm/tokenizers/`
diff --git a/.claude/llm-export.md b/.claude/llm-export.md
new file mode 100644
index 00000000000..65e3414b421
--- /dev/null
+++ b/.claude/llm-export.md
@@ -0,0 +1,65 @@
+# LLM Export
+
+High-level API for exporting LLMs to .pte format.
+
+## Supported Models
+Llama 2/3/3.1/3.2, Qwen 2.5/3, Phi 3.5/4-mini, SmolLM2
+
+Full list: `extension/llm/export/config/llm_config.py`
+
+For other models (Gemma, Mistral, BERT, Whisper): use optimum-executorch (see `/setup` skill).
+
+## Basic Usage
+
+```bash
+python -m executorch.extension.llm.export.export_llm \
+  --config path/to/config.yaml
+```
+
+## Config Structure
+
+```yaml
+base:
+  model_class: llama3_2
+  checkpoint: path/to/consolidated.00.pth
+  params: path/to/params.json
+  metadata: '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
+
+model:
+  use_kv_cache: True              # recommended
+  use_sdpa_with_kv_cache: True    # recommended
+  use_attention_sink: False       # extend generation
+  quantize_kv_cache: False        # int8 KV cache
+
+quantization:
+  qmode: 8da4w                    # int8 activation + int4 weight
+  group_size: 32
+  embedding_quantize: 4,32
+
+backend:
+  xnnpack:
+    enabled: True
+    extended_ops: True
+
+debug:
+  verbose: True                   # show delegation table
+  generate_etrecord: True         # for devtools profiling
+```
+
+## Quantization Modes
+
+**TorchAO (XNNPACK)**:
+- `8da4w`: int8 dynamic activation + int4 weight
+- `int8`: int8 weight-only
+- `torchao:8da4w`: low-bit kernels for Arm
+
+**pt2e (QNN, CoreML, Vulkan)**: Use for non-CPU backends.
+
+## Config Classes
+All options in `extension/llm/export/config/llm_config.py`:
+- `LlmConfig` - top level
+- `ExportConfig` - max_seq_length, max_context_length
+- `ModelConfig` - model optimizations
+- `QuantizationConfig` - quantization options
+- `BackendConfig` - backend settings
+- `DebugConfig` - verbose, etrecord, profiling
diff --git a/.claude/quantization.md b/.claude/quantization.md
new file mode 100644
index 00000000000..94abee0431e
--- /dev/null
+++ b/.claude/quantization.md
@@ -0,0 +1,13 @@
+# Quantization
+
+Docs: https://docs.pytorch.org/ao/main/pt2e_quantization/index.html
+
+## Backend quantizers
+| Backend | Quantizer |
+|---------|-----------|
+| XNNPACK | `XNNPACKQuantizer` |
+| Qualcomm | `QnnQuantizer` |
+| CoreML | `CoreMLQuantizer` |
+
+## LLM modes
+See `examples/models/llama/source_transformation/quantize.py`: `int8`, `8da4w`, `4w`
diff --git a/.claude/runtime-api.md b/.claude/runtime-api.md
new file mode 100644
index 00000000000..4078ec68daa
--- /dev/null
+++ b/.claude/runtime-api.md
@@ -0,0 +1,28 @@
+# Runtime API
+
+## executorch.runtime (preferred)
+```python
+from executorch.runtime import Runtime, Program, Method
+runtime = Runtime.get()
+program = runtime.load_program(Path("model.pte"))
+outputs = program.load_method("forward").execute(inputs)
+```
+
+## portable_lib (low-level)
+```python
+from executorch.extension.pybindings.portable_lib import _load_for_executorch
+module = _load_for_executorch("model.pte")
+outputs = module.forward(inputs)
+```
+
+## Missing kernel fixes
+
+If runtime shows missing kernel errors, import the kernel module before loading:
+
+```python
+# Missing quantized kernels (e.g., quantized_decomposed::embedding_byte.out)
+from executorch.kernels import quantized
+
+# Missing LLM custom ops (e.g., llama::custom_sdpa.out, llama::update_cache.out)
+from executorch.extension.llm.custom_ops import custom_ops
+```
diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
new file mode 100644
index 00000000000..7ff7be38df1
--- /dev/null
+++ b/.claude/skills/building/SKILL.md
@@ -0,0 +1,23 @@
+---
+name: building
+description: Build ExecuTorch runners or C++ libraries. Use when compiling runners for Llama, Whisper, or other models, or building the C++ runtime.
+---
+
+# Building
+
+## Runners (Makefile)
+```bash
+make help              # list all targets
+make llama-cpu         # Llama
+make whisper-metal     # Whisper on Metal
+make gemma3-cuda       # Gemma3 on CUDA
+```
+
+Output: `cmake-out/examples/models/<model>/<runner>`
+
+## C++ Libraries (CMake)
+```bash
+cmake --list-presets                    # list presets
+cmake --workflow --preset llm-release   # LLM CPU
+cmake --workflow --preset llm-release-metal  # LLM Metal
+```
diff --git a/.claude/skills/export/SKILL.md b/.claude/skills/export/SKILL.md
new file mode 100644
index 00000000000..c075e9403c6
--- /dev/null
+++ b/.claude/skills/export/SKILL.md
@@ -0,0 +1,28 @@
+---
+name: export
+description: Export a PyTorch model to .pte format for ExecuTorch. Use when converting models, lowering to edge, or generating .pte files.
+---
+
+# Export
+
+## Basic pattern
+```python
+from executorch.exir import to_edge_transform_and_lower
+from torch.export import export
+
+exported = export(model.eval(), example_inputs)
+edge = to_edge_transform_and_lower(exported)
+with open("model.pte", "wb") as f:
+    f.write(edge.to_executorch().buffer)
+```
+
+## Model-specific scripts
+| Model | Script |
+|-------|--------|
+| Llama | `examples/models/llama/export_llama.py` |
+| Whisper | `examples/models/whisper/` |
+| Parakeet | `examples/models/parakeet/export_parakeet_tdt.py` |
+
+## Debugging
+- Draft export: `export(model, inputs, strict=False)`
+- tlparse: `TORCH_LOGS="+dynamo,+export" python script.py 2>&1 | tlparse`
diff --git a/.claude/skills/profile/SKILL.md b/.claude/skills/profile/SKILL.md
new file mode 100644
index 00000000000..b118a8a61e4
--- /dev/null
+++ b/.claude/skills/profile/SKILL.md
@@ -0,0 +1,24 @@
+---
+name: profile
+description: Profile ExecuTorch model execution. Use when measuring performance, analyzing operator timing, or debugging slow models.
+---
+
+# Profile
+
+## 1. Enable ETDump when loading
+```python
+program = runtime.load_program("model.pte", enable_etdump=True, debug_buffer_size=int(1e7))
+```
+
+## 2. Execute and save
+```python
+outputs = program.load_method("forward").execute(inputs)
+program.write_etdump_result_to_file("etdump.etdp", "debug.bin")
+```
+
+## 3. Analyze with Inspector
+```python
+from executorch.devtools import Inspector
+inspector = Inspector(etrecord="model.etrecord", etdump_path="etdump.etdp")
+inspector.print_data_tabular()
+```
diff --git a/.claude/skills/setup/SKILL.md b/.claude/skills/setup/SKILL.md
new file mode 100644
index 00000000000..3b5e2955357
--- /dev/null
+++ b/.claude/skills/setup/SKILL.md
@@ -0,0 +1,15 @@
+---
+name: setup
+description: Set up ExecuTorch development environment. Use when installing dependencies, setting up conda environments, or preparing to develop with ExecuTorch.
+---
+
+# Setup
+
+1. Activate conda: `conda activate executorch`
+   - If not found: `conda env list | grep -E "(executorch|et)"`
+
+2. Install executorch: `./install_executorch.sh`
+
+3. (Optional) For Huggingface integration:
+   - Read commit from `.ci/docker/ci_commit_pins/optimum-executorch.txt`
+   - Install: `pip install git+https://github.com/huggingface/optimum-executorch.git@<COMMIT>`
diff --git a/.claude/tokenizers.md b/.claude/tokenizers.md
new file mode 100644
index 00000000000..bcdc540136b
--- /dev/null
+++ b/.claude/tokenizers.md
@@ -0,0 +1,54 @@
+# Tokenizers
+
+C++ tokenizer implementations with Python bindings. Located in `extension/llm/tokenizers/`.
+
+## Installation
+```bash
+pip install -e ./extension/llm/tokenizers/
+```
+
+## Python API
+
+```python
+from pytorch_tokenizers import get_tokenizer
+
+# Auto-detect tokenizer type from file
+tokenizer = get_tokenizer("path/to/tokenizer.model")  # or .json
+
+# Encode/decode
+tokens = tokenizer.encode("Hello world")
+text = tokenizer.decode(tokens)
+```
+
+## Available Tokenizers
+
+| Class | Format | Use Case |
+|-------|--------|----------|
+| `HuggingFaceTokenizer` | `.json` | HuggingFace models |
+| `TiktokenTokenizer` | `.model` | OpenAI/Llama 3 |
+| `Llama2cTokenizer` | `.model` | Llama 2, SentencePiece |
+| `CppSPTokenizer` | `.model` | SentencePiece (C++) |
+
+## Direct Usage
+
+```python
+from pytorch_tokenizers import HuggingFaceTokenizer, TiktokenTokenizer, Llama2cTokenizer
+
+# HuggingFace (tokenizer.json)
+tokenizer = HuggingFaceTokenizer("tokenizer.json", "tokenizer_config.json")
+
+# Tiktoken (Llama 3, etc.)
+tokenizer = TiktokenTokenizer(model_path="tokenizer.model")
+
+# Llama2c/SentencePiece
+tokenizer = Llama2cTokenizer(model_path="tokenizer.model")
+```
+
+## C++ Tokenizers
+
+For C++ runners, include headers from `extension/llm/tokenizers/include/`:
+- `hf_tokenizer.h` - HuggingFace
+- `tiktoken.h` - Tiktoken
+- `sentencepiece.h` - SentencePiece
+- `llama2c_tokenizer.h` - Llama2c
+- `tekken.h` - Mistral Tekken v7
diff --git a/CLAUDE.md b/CLAUDE.md
index 67b29b9c652..8d0b6675186 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,67 +1,48 @@
-# Repo and framework name
+# ExecuTorch
 
-Refer to the repo/framework/runtime "executorch" (in lower cases) or "ExecuTorch" (in 
-camel cases), not "ExecutorTorch". With limited code or comment length, maybe refer
-to the framework "ET" but consider it as very unofficial and not recommended.
+## Skills
+- `/setup` - Set up environment
+- `/export` - Export model to .pte
+- `/building` - Build runners or C++ libs
+- `/profile` - Profile execution
 
-# Install
+Reference docs in `.claude/`: backends, runtime-api, quantization, llm-export, faq, tokenizers
 
-## Python
+## Quick Reference
 
-If the user is mostly importing `executorch` module and experimenting with Ahead-Of-Time
-export flow, installation means installing `executorch` python package.
+**Install Python package:**
+```bash
+./install_executorch.sh        # first time (or .bat on Windows)
+pip install -e . --no-build-isolation  # subsequent installs
+```
 
-Python virtual environment or conda environment is highly recommended for installing 
-executorch from source. Double check if the user wants to enable virtual enablement before
-building from source.
+**Build C++ libraries:** see `CMakeLists.txt`; for LLM/ASR runners use `Makefile` and `CMakePresets.json`
 
-First time install: run `install_executorch.sh` (or `install_executorch.bat` for Windows).
+**Run tests:** `pytest -n auto` (Python), `ctest --output-on-failure` (C++)
 
-This script handles dependencies properly (since `executorch` depends on nightly versions
-of `torch`, those packages won't be available in pip so need special index url).
+**Lint:** `lintrunner init && lintrunner -a`
 
-Subsequent install: run `pip install . -v --no-build-isolation` inside `executorch`
-directory.
+Details: [docs/source/using-executorch-building-from-source.md](docs/source/using-executorch-building-from-source.md)
 
-Editable mode is avilable (either through `install_executorch.sh` script or `pip install . -e`.
+## Naming
 
-Refer to more details in this [doc](docs/source/using-executorch-building-from-source.md).
+- Use "executorch" (lowercase) or "ExecuTorch" (camel case)
+- Never "ExecutorTorch"
+- "ET" only when space-constrained (unofficial)
 
-## C++
-If the user is building basic executorch C++ libraries, refer to root level [CMakeLists.txt](CMakeLists.txt).
+## Commits
 
-If working with LLM/ASR runners, prefer to use [Makefile](Makefile) and cmake [presets](CMakePresets.json).
+- Only commit when explicitly asked
+- No bullet lists of changes; explain review order for large PRs, or omit for small ones
+- Disclose PR was authored with Claude
 
-Again refer to this [doc](docs/source/using-executorch-building-from-source.md#building-the-c-runtime)
-for more details.
+## Code Style
 
-# Commit messages
+- Minimal comments; code should be self-documenting
+- Comments only for non-obvious global context
+- No trivial (1-2 LOC) single-use helpers unless significantly improving readability
+- Explicit state management; no dynamic `setattr`/`getattr` patterns
+- Match existing style and architecture
+- Assume reader knows ExecuTorch/PyTorch basics
 
-Don't commit unless the user explicitly asks you to.
-
-When writing a commit message, don't make a bullet list of the individual
-changes. Instead, if the PR is large, explain the order to review changes
-(e.g., the logical progression), or if it's short just omit the bullet list
-entirely.
-
-Disclose that the PR was authored with Claude.
-
-# Coding Style Guidelines
-
-Follow these rules for all code changes in this repository:
-
-- Minimize comments; be concise; code should be self-explanatory and self-documenting.
-- Comments should be useful, for example, comments that remind the reader about
-  some global context that is non-obvious and can't be inferred locally.
-- Don't make trivial (1-2 LOC) helper functions that are only used once unless
-  it significantly improves code readability.
-- Prefer clear abstractions. State management should be explicit.
-  For example, if managing state in a Python class: there should be a clear
-  class definition that has all of the members: don't dynamically `setattr`
-  a field on an object and then dynamically `getattr` the field on the object.
-- Match existing code style and architectural patterns.
-- Assume the reader has familiarity with ExecuTorch and PyTorch. They may not be the expert
-  on the code that is being read, but they should have some experience in the
-  area.
-
-If uncertain, choose the simpler, more concise implementation.
+**When uncertain: choose simpler, more concise.**