From c95e81f46ae4e262d4048213b0c3f0a0f961fb4a Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Thu, 21 May 2026 13:28:55 +0300
Subject: [PATCH 1/4] Update query rewrite example in hello_mellea tutorial

Replace an example.
---
 tutorials/notebooks/hello_mellea.ipynb | 157 ++++++++++++++++++++-----
 1 file changed, 130 insertions(+), 27 deletions(-)

diff --git a/tutorials/notebooks/hello_mellea.ipynb b/tutorials/notebooks/hello_mellea.ipynb
index cae05cf..94b1b4a 100644
--- a/tutorials/notebooks/hello_mellea.ipynb
+++ b/tutorials/notebooks/hello_mellea.ipynb
@@ -4,7 +4,30 @@
    "cell_type": "markdown",
    "id": "intro",
    "metadata": {},
-   "source": "# Hello World - Using Mellea with Granite Switch\n\n**Duration:** ~5 min after the model server is ready\n\nMinimal example of invoking **mellea adapters** against a **Granite Switch** model served by vLLM. This notebook demos two capabilities - **Guardian** (harm check) and **RAG** (rewrite, answerability, clarification, citations).\n\n[Mellea](https://github.com/generative-computing/mellea) is IBM's library for writing Generative Programs. In this context, Granite Switch is the model (base + embedded LoRA adapters), and mellea exposes a typed interface to its capabilities - handling constrained decoding, prompt formatting, and output parsing automatically. vLLM provides much faster inference in production environments; HF support for Granite Switch in mellea coming.\n\n**What you'll learn:**\n- How to chain guardian + rewrite + answerability + clarification + citations into a single RAG flow driven by mellea adapters.\n- How to connect a mellea `OpenAIBackend` to a vLLM server serving a Granite Switch checkpoint.\n- How to call an adapter through its high-level wrapper (`rag.rewrite_question`) vs. the low-level `Intrinsic` AST node (for adapters mellea doesn't wrap yet).\n- The difference between `CRITERIA_BANK` keys and custom criteria strings when calling `guardian_check`.\n\n**Adapters used:** adapters from the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (`guardian-core`) and the [RAG](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) library (`query_rewrite`, `answerability`, `query_clarification`, `citations`).\n\nSee section 11 for the full list of adapter wrappers currently supported.\n\n---\n**Prerequisites:** GPU runtime (A100 or better). Go to *Runtime -> Change runtime type -> A100 GPU*.\n\nThis notebook launches the default pre-composed Granite Switch checkpoint, `ibm-granite/granite-switch-4.1-3b-preview`. To compose your own checkpoint, use [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb). Full setup details (GPU sizes, HF auth, multi-GPU) are in [`PREREQUISITES.md`](../PREREQUISITES.md)."
+   "source": [
+    "# Hello World - Using Mellea with Granite Switch\n",
+    "\n",
+    "**Duration:** ~5 min after the model server is ready\n",
+    "\n",
+    "Minimal example of invoking **mellea adapters** against a **Granite Switch** model served by vLLM. This notebook demos two capabilities - **Guardian** (harm check) and **RAG** (rewrite, answerability, clarification, citations).\n",
+    "\n",
+    "[Mellea](https://github.com/generative-computing/mellea) is IBM's library for writing Generative Programs. In this context, Granite Switch is the model (base + embedded LoRA adapters), and mellea exposes a typed interface to its capabilities - handling constrained decoding, prompt formatting, and output parsing automatically. vLLM provides much faster inference in production environments; HF support for Granite Switch in mellea coming.\n",
+    "\n",
+    "**What you'll learn:**\n",
+    "- How to chain guardian + rewrite + answerability + clarification + citations into a single RAG flow driven by mellea adapters.\n",
+    "- How to connect a mellea `OpenAIBackend` to a vLLM server serving a Granite Switch checkpoint.\n",
+    "- How to call an adapter through its high-level wrapper (`rag.rewrite_question`) vs. the low-level `Intrinsic` AST node (for adapters mellea doesn't wrap yet).\n",
+    "- The difference between `CRITERIA_BANK` keys and custom criteria strings when calling `guardian_check`.\n",
+    "\n",
+    "**Adapters used:** adapters from the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (`guardian-core`) and the [RAG](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) library (`query_rewrite`, `answerability`, `query_clarification`, `citations`).\n",
+    "\n",
+    "See section 11 for the full list of adapter wrappers currently supported.\n",
+    "\n",
+    "---\n",
+    "**Prerequisites:** GPU runtime (A100 or better). Go to *Runtime -> Change runtime type -> A100 GPU*.\n",
+    "\n",
+    "This notebook launches the default pre-composed Granite Switch checkpoint, `ibm-granite/granite-switch-4.1-3b-preview`. To compose your own checkpoint, use [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb). Full setup details (GPU sizes, HF auth, multi-GPU) are in [`PREREQUISITES.md`](../PREREQUISITES.md)."
+   ]
   },
   {
    "cell_type": "markdown",
@@ -40,7 +63,13 @@
    "cell_type": "markdown",
    "id": "launch-vllm-heading",
    "metadata": {},
-   "source": "## 1 · Launch vLLM server\n\nStart the Granite Switch model on port 8000. The server runs in the background; `wait_for_server` polls `/health` until it is ready.\n\n⏱️ **This takes ~3 minutes** on first run (model download + loading)."
+   "source": [
+    "## 1 · Launch vLLM server\n",
+    "\n",
+    "Start the Granite Switch model on port 8000. The server runs in the background; `wait_for_server` polls `/health` until it is ready.\n",
+    "\n",
+    "⏱️ **This takes ~3 minutes** on first run (model download + loading)."
+   ]
   },
   {
    "cell_type": "code",
@@ -48,13 +77,31 @@
    "id": "launch-vllm",
    "metadata": {},
    "outputs": [],
-   "source": "from granite_switch.tutorials.vllm_server import kill_stale_vllm_processes, launch_vllm, print_gpu_state, tail_log, wait_for_server\n\nkill_stale_vllm_processes()\nprint_gpu_state()\n\nVLLM_MODEL = \"ibm-granite/granite-switch-4.1-3b-preview\"\nVLLM_PORT = 8000\n\nvllm_proc = launch_vllm(\n    model=VLLM_MODEL,\n    port=VLLM_PORT,\n    log_file=\"/content/vllm_server.log\",\n)\nif not wait_for_server(VLLM_PORT):\n    tail_log(\"/content/vllm_server.log\")"
+   "source": [
+    "from granite_switch.tutorials.vllm_server import kill_stale_vllm_processes, launch_vllm, print_gpu_state, tail_log, wait_for_server\n",
+    "\n",
+    "kill_stale_vllm_processes()\n",
+    "print_gpu_state()\n",
+    "\n",
+    "VLLM_MODEL = \"ibm-granite/granite-switch-4.1-3b-preview\"\n",
+    "VLLM_PORT = 8000\n",
+    "\n",
+    "vllm_proc = launch_vllm(\n",
+    "    model=VLLM_MODEL,\n",
+    "    port=VLLM_PORT,\n",
+    "    log_file=\"/content/vllm_server.log\",\n",
+    ")\n",
+    "if not wait_for_server(VLLM_PORT):\n",
+    "    tail_log(\"/content/vllm_server.log\")"
+   ]
   },
   {
    "cell_type": "markdown",
    "id": "config-md",
    "metadata": {},
-   "source": "## 2 · Configuration and imports"
+   "source": [
+    "## 2 · Configuration and imports"
+   ]
   },
   {
    "cell_type": "code",
@@ -101,7 +148,10 @@
    "cell_type": "markdown",
    "id": "backend-md",
    "metadata": {},
-   "source": "## 3 · Connect to vLLM backend via mellea\nRegisters the Granite Switch embedded adapters so mellea adapter calls route through the correct control tokens."
+   "source": [
+    "## 3 · Connect to vLLM backend via mellea\n",
+    "Registers the Granite Switch embedded adapters so mellea adapter calls route through the correct control tokens."
+   ]
   },
   {
    "cell_type": "code",
@@ -188,40 +238,62 @@
    "cell_type": "markdown",
    "id": "rewrite-md",
    "metadata": {},
-   "source": "## 6 · RAG - query rewrite\nDecontextualizes queries by resolving pronouns and references using conversation history. Single-turn queries pass through unchanged; multi-turn queries with pronouns get rewritten for clarity."
+   "source": [
+    "## 6 · RAG - query rewrite\n",
+    "Decontextualizes queries by resolving pronouns and references using conversation history. Single-turn queries pass through unchanged; multi-turn queries with pronouns get rewritten for clarity."
+   ]
   },
   {
    "cell_type": "markdown",
    "id": "98e2b233",
+   "metadata": {},
    "source": [
     "### 6a · Using the wrapper"
-   ],
-   "metadata": {}
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "code",
-   "outputs": [],
    "execution_count": null,
-   "source": "# Build conversation context\nctx = ChatContext()\nctx = ctx.add(MelleaMessage(\"user\", \"I have a dog named Rex. He spends a lot of time in the backyard.\"))\nctx = ctx.add(MelleaMessage(\"assistant\", \"Rex must love exploring!\"))\n\n# Follow-up with pronouns - \"he\" and \"that\" need context to understand\nquery = \"Is he more likely to get fleas because of that?\"\n\n# query_rewrite resolves pronouns using conversation history\nrewritten = rag.rewrite_question(query, ctx, backend)\nprint(f\"original:  {query}\")\nprint(f\"rewritten: {rewritten}\")\n# Expected: \"Is Rex more likely to get fleas because he spends a lot of time in the backyard?\"",
-   "id": "1c40e9dd3f178f63"
+   "id": "1c40e9dd3f178f63",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Build conversation context\n",
+    "ctx = ChatContext()\n",
+    "ctx = ctx.add(MelleaMessage(\"user\", \"I want to plan a trip to France.\"))\n",
+    "ctx = ctx.add(MelleaMessage(\"assistant\", \"Very good, I can help you with that.\"))\n",
+    "\n",
+    "# Follow-up with pronouns - \"he\" and \"that\" need context to understand\n",
+    "query = \"I think I'll start with the capital. what was its name?\"\n",
+    "\n",
+    "# query_rewrite resolves pronouns using conversation history\n",
+    "rewritten = rag.rewrite_question(query, ctx, backend)\n",
+    "print(f\"original:  {query}\")\n",
+    "print(f\"rewritten: {rewritten}\")\n",
+    "# Expected: \"What is the name of the capital of France?\""
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
-   "source": "### 6b · Same thing without the wrapper\n\n`rag.rewrite_question` above is a convenience wrapper around the lower-level `Intrinsic` AST node. Here we do **the same action** - invoke the `query_rewrite` adapter - but explicitly name the adapter and drive it through `mfuncs.act`. Useful when you want to invoke an adapter mellea doesn't wrap yet, or to understand what the wrapper does under the hood.",
-   "id": "1bab556a6a1eda5d"
+   "id": "1bab556a6a1eda5d",
+   "metadata": {},
+   "source": [
+    "### 6b · Same thing without the wrapper\n",
+    "\n",
+    "`rag.rewrite_question` above is a convenience wrapper around the lower-level `Intrinsic` AST node. Here we do **the same action** - invoke the `query_rewrite` adapter - but explicitly name the adapter and drive it through `mfuncs.act`. Useful when you want to invoke an adapter mellea doesn't wrap yet, or to understand what the wrapper does under the hood."
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "code",
-   "outputs": [],
    "execution_count": null,
+   "id": "ed18bf3fa580755d",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "ADAPTER_NAME = \"query_rewrite\"\n",
     "\n",
     "# Build the context user message appended to history.\n",
-    "ctx_for_rewrite = ChatContext().add(MelleaMessage(\"user\", query))\n",
+    "ctx_for_rewrite = ctx.add(MelleaMessage(\"user\", query))\n",
     "\n",
     "# Drive the adapter directly via an Intrinsic AST node. Sampling params\n",
     "# (temperature, max_completion_tokens, etc.) come from the adapter's io.yaml -\n",
@@ -234,17 +306,16 @@
     "result = json.loads(str(out))\n",
     "print(f\"original:  {query}\")\n",
     "print(f\"rewritten:      {result['rewritten_question']}\")"
-   ],
-   "id": "ed18bf3fa580755d"
+   ]
   },
   {
    "cell_type": "markdown",
    "id": "e4dc6bc6",
+   "metadata": {},
    "source": [
     "## 7 · RAG - answerability\n",
     "Returns `answerable` or `unanswerable`."
-   ],
-   "metadata": {}
+   ]
   },
   {
    "cell_type": "code",
@@ -326,13 +397,45 @@
    "cell_type": "markdown",
    "id": "reference-intrinsics",
    "metadata": {},
-   "source": "## 11 · Other mellea adapter wrappers\n\nBeyond what this notebook demos, Mellea ships wrappers for additional adapters. The list below reflects **what's currently supported** - new adapters can be added over time as the library evolves. All wrappers follow the same shape - they take a `ChatContext` and a `backend`, and internally drive a named adapter through an `Intrinsic` AST node (see section 6b). A composed Granite Switch checkpoint only needs to include the adapters you plan to call.\n\n**Currently supported wrappers:**\n\n| Module | Function | Purpose |\n|---|---|---|\n| `mellea.stdlib.components.intrinsic.guardian` | `guardian_check` | Score a message against a criterion (custom or from `CRITERIA_BANK`) |\n| | `policy_guardrails` | Evaluate a message against a textual policy document |\n| | `factuality_detection` | Flag factual errors in the assistant's last turn |\n| | `factuality_correction` | Rewrite the assistant's last turn to fix factual errors |\n| `mellea.stdlib.components.intrinsic.rag` | `rewrite_question` | Rewrite a user question into a self-contained query |\n| | `check_answerability` | Decide if retrieved docs can answer the query |\n| | `clarify_query` | Ask a follow-up when docs are insufficient |\n| | `find_citations` | Map answer spans back to source documents |\n| | `check_context_relevance` | Score whether retrieved docs are relevant to the query |\n| | `flag_hallucinated_content` | Flag ungrounded spans in an answer |\n| `mellea.stdlib.components.intrinsic.core` | `check_certainty` | Model's confidence in its last response |\n| | `requirement_check` | Verify the response meets a stated requirement |\n| | `find_context_attributions` | Attribute response spans to context sources |\n\n**Criteria bank** (`guardian.CRITERIA_BANK`) - pre-baked Granite Guardian definitions currently included: `harm`, `social_bias`, `jailbreak`, `profanity`, `unethical_behavior`, `violence`, `groundedness`, `answer_relevance`, `context_relevance`, `function_call`."
+   "source": [
+    "## 11 · Other mellea adapter wrappers\n",
+    "\n",
+    "Beyond what this notebook demos, Mellea ships wrappers for additional adapters. The list below reflects **what's currently supported** - new adapters can be added over time as the library evolves. All wrappers follow the same shape - they take a `ChatContext` and a `backend`, and internally drive a named adapter through an `Intrinsic` AST node (see section 6b). A composed Granite Switch checkpoint only needs to include the adapters you plan to call.\n",
+    "\n",
+    "**Currently supported wrappers:**\n",
+    "\n",
+    "| Module | Function | Purpose |\n",
+    "|---|---|---|\n",
+    "| `mellea.stdlib.components.intrinsic.guardian` | `guardian_check` | Score a message against a criterion (custom or from `CRITERIA_BANK`) |\n",
+    "| | `policy_guardrails` | Evaluate a message against a textual policy document |\n",
+    "| | `factuality_detection` | Flag factual errors in the assistant's last turn |\n",
+    "| | `factuality_correction` | Rewrite the assistant's last turn to fix factual errors |\n",
+    "| `mellea.stdlib.components.intrinsic.rag` | `rewrite_question` | Rewrite a user question into a self-contained query |\n",
+    "| | `check_answerability` | Decide if retrieved docs can answer the query |\n",
+    "| | `clarify_query` | Ask a follow-up when docs are insufficient |\n",
+    "| | `find_citations` | Map answer spans back to source documents |\n",
+    "| | `check_context_relevance` | Score whether retrieved docs are relevant to the query |\n",
+    "| | `flag_hallucinated_content` | Flag ungrounded spans in an answer |\n",
+    "| `mellea.stdlib.components.intrinsic.core` | `check_certainty` | Model's confidence in its last response |\n",
+    "| | `requirement_check` | Verify the response meets a stated requirement |\n",
+    "| | `find_context_attributions` | Attribute response spans to context sources |\n",
+    "\n",
+    "**Criteria bank** (`guardian.CRITERIA_BANK`) - pre-baked Granite Guardian definitions currently included: `harm`, `social_bias`, `jailbreak`, `profanity`, `unethical_behavior`, `violence`, `groundedness`, `answer_relevance`, `context_relevance`, `function_call`."
+   ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
-   "source": "## 12 · Next steps\n\n- **Go deeper on HF mechanics.** [`granite_switch_with_hf.ipynb`](./granite_switch_with_hf.ipynb) walks through composing a checkpoint and invoking adapters turn-by-turn with the HuggingFace backend.\n- **Try a real corpus.** [`rag_101.ipynb`](./rag_101.ipynb) builds a vector corpus and runs an answerability check - the smallest end-to-end RAG demo.\n- **Compose your own checkpoint.** [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb) - pick adapters from the IBM libraries and bake them into a single model.\n- **Watch ALORA vs LoRA race.** [`alora_vs_lora_race.ipynb`](./alora_vs_lora_race.ipynb) compares the two activation styles head-to-head on the same workload.\n- **Browse Mellea.** [Mellea on GitHub](https://github.com/generative-computing/mellea) - the adapter framework powering this notebook.",
-   "id": "695e3d0155280a60"
+   "id": "695e3d0155280a60",
+   "metadata": {},
+   "source": [
+    "## 12 · Next steps\n",
+    "\n",
+    "- **Go deeper on HF mechanics.** [`granite_switch_with_hf.ipynb`](./granite_switch_with_hf.ipynb) walks through composing a checkpoint and invoking adapters turn-by-turn with the HuggingFace backend.\n",
+    "- **Try a real corpus.** [`rag_101.ipynb`](./rag_101.ipynb) builds a vector corpus and runs an answerability check - the smallest end-to-end RAG demo.\n",
+    "- **Compose your own checkpoint.** [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb) - pick adapters from the IBM libraries and bake them into a single model.\n",
+    "- **Watch ALORA vs LoRA race.** [`alora_vs_lora_race.ipynb`](./alora_vs_lora_race.ipynb) compares the two activation styles head-to-head on the same workload.\n",
+    "- **Browse Mellea.** [Mellea on GitHub](https://github.com/generative-computing/mellea) - the adapter framework powering this notebook."
+   ]
   }
  ],
  "metadata": {
@@ -347,4 +450,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
\ No newline at end of file
+}

From 52c68680bb5ac8a967ccab563788a8133795a763 Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Thu, 21 May 2026 13:31:46 +0300
Subject: [PATCH 2/4] Fix broken Colab links after tutorial renaming
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Update notebook links to use new names without numeric prefixes:
- 01_hello_mellea.ipynb → hello_mellea.ipynb
- 03_01_govt_rag_pipeline_simple.ipynb → rag_full_pipeline.ipynb
- 04_compose_granite_switch.ipynb → compose_granite_switch.ipynb
- 05_alora_vs_lora_race.ipynb → alora_vs_lora_race.ipynb
---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 2f89530..f241d07 100644
--- a/README.md
+++ b/README.md
@@ -30,7 +30,7 @@ Small models with the right adapters consistently outperform much larger general
 </p>
 
 <p align="center"><em>aLoRA completes 20 of 32 RAG queries while standard LoRA is still waiting — same model, same hardware, different adapter technology.</em><br>
-<a href="https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/05_alora_vs_lora_race.ipynb">Reproduce it yourself on Colab →</a></p>
+<a href="https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/alora_vs_lora_race.ipynb">Reproduce it yourself on Colab →</a></p>
 
 ## Quick Start
 
@@ -114,9 +114,9 @@ New here? Start with a 5-minute notebook and work your way up:
 
 | Notebook | What you'll build | Time | |
 |---|---|---|---|
-| [Hello Mellea](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/01_hello_mellea.ipynb) | Call adapters through a clean Python API | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/01_hello_mellea.ipynb) |
-| [RAG Pipeline](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/03_01_govt_rag_pipeline_simple.ipynb) | Query rewrite + answerability + citations in one model | 30 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/03_01_govt_rag_pipeline_simple.ipynb) |
-| [Compose Your Own](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/04_compose_granite_switch.ipynb) | Build a custom checkpoint from adapter libraries | 15 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/04_compose_granite_switch.ipynb) |
+| [Hello Mellea](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_mellea.ipynb) | Call adapters through a clean Python API | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_mellea.ipynb) |
+| [RAG Pipeline](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_pipeline.ipynb) | Query rewrite + answerability + citations in one model | 30 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_pipeline.ipynb) |
+| [Compose Your Own](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/compose_granite_switch.ipynb) | Build a custom checkpoint from adapter libraries | 15 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/compose_granite_switch.ipynb) |
 
 All notebooks run on Colab. See [tutorials/README.md](tutorials/README.md) for the full list and guided learning paths.
 

From a524304801e0c2cf9edd7f1e8a531a7f1497855c Mon Sep 17 00:00:00 2001
From: Alon Freund <alonf@il.ibm.com>
Date: Thu, 21 May 2026 13:46:32 +0300
Subject: [PATCH 3/4] Update GPU requirement from A100 to T4 in hello_mellea

The 3B model runs fine on T4 GPUs, which are more accessible on Colab.
Tested and verified on T4.
---
 tutorials/notebooks/hello_mellea.ipynb | 27 ++------------------------
 1 file changed, 2 insertions(+), 25 deletions(-)

diff --git a/tutorials/notebooks/hello_mellea.ipynb b/tutorials/notebooks/hello_mellea.ipynb
index 94b1b4a..fb5ba48 100644
--- a/tutorials/notebooks/hello_mellea.ipynb
+++ b/tutorials/notebooks/hello_mellea.ipynb
@@ -4,30 +4,7 @@
    "cell_type": "markdown",
    "id": "intro",
    "metadata": {},
-   "source": [
-    "# Hello World - Using Mellea with Granite Switch\n",
-    "\n",
-    "**Duration:** ~5 min after the model server is ready\n",
-    "\n",
-    "Minimal example of invoking **mellea adapters** against a **Granite Switch** model served by vLLM. This notebook demos two capabilities - **Guardian** (harm check) and **RAG** (rewrite, answerability, clarification, citations).\n",
-    "\n",
-    "[Mellea](https://github.com/generative-computing/mellea) is IBM's library for writing Generative Programs. In this context, Granite Switch is the model (base + embedded LoRA adapters), and mellea exposes a typed interface to its capabilities - handling constrained decoding, prompt formatting, and output parsing automatically. vLLM provides much faster inference in production environments; HF support for Granite Switch in mellea coming.\n",
-    "\n",
-    "**What you'll learn:**\n",
-    "- How to chain guardian + rewrite + answerability + clarification + citations into a single RAG flow driven by mellea adapters.\n",
-    "- How to connect a mellea `OpenAIBackend` to a vLLM server serving a Granite Switch checkpoint.\n",
-    "- How to call an adapter through its high-level wrapper (`rag.rewrite_question`) vs. the low-level `Intrinsic` AST node (for adapters mellea doesn't wrap yet).\n",
-    "- The difference between `CRITERIA_BANK` keys and custom criteria strings when calling `guardian_check`.\n",
-    "\n",
-    "**Adapters used:** adapters from the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (`guardian-core`) and the [RAG](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) library (`query_rewrite`, `answerability`, `query_clarification`, `citations`).\n",
-    "\n",
-    "See section 11 for the full list of adapter wrappers currently supported.\n",
-    "\n",
-    "---\n",
-    "**Prerequisites:** GPU runtime (A100 or better). Go to *Runtime -> Change runtime type -> A100 GPU*.\n",
-    "\n",
-    "This notebook launches the default pre-composed Granite Switch checkpoint, `ibm-granite/granite-switch-4.1-3b-preview`. To compose your own checkpoint, use [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb). Full setup details (GPU sizes, HF auth, multi-GPU) are in [`PREREQUISITES.md`](../PREREQUISITES.md)."
-   ]
+   "source": "# Hello World - Using Mellea with Granite Switch\n\n**Duration:** ~5 min after the model server is ready\n\nMinimal example of invoking **mellea adapters** against a **Granite Switch** model served by vLLM. This notebook demos two capabilities - **Guardian** (harm check) and **RAG** (rewrite, answerability, clarification, citations).\n\n[Mellea](https://github.com/generative-computing/mellea) is IBM's library for writing Generative Programs. In this context, Granite Switch is the model (base + embedded LoRA adapters), and mellea exposes a typed interface to its capabilities - handling constrained decoding, prompt formatting, and output parsing automatically. vLLM provides much faster inference in production environments; HF support for Granite Switch in mellea coming.\n\n**What you'll learn:**\n- How to chain guardian + rewrite + answerability + clarification + citations into a single RAG flow driven by mellea adapters.\n- How to connect a mellea `OpenAIBackend` to a vLLM server serving a Granite Switch checkpoint.\n- How to call an adapter through its high-level wrapper (`rag.rewrite_question`) vs. the low-level `Intrinsic` AST node (for adapters mellea doesn't wrap yet).\n- The difference between `CRITERIA_BANK` keys and custom criteria strings when calling `guardian_check`.\n\n**Adapters used:** adapters from the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (`guardian-core`) and the [RAG](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) library (`query_rewrite`, `answerability`, `query_clarification`, `citations`).\n\nSee section 11 for the full list of adapter wrappers currently supported.\n\n---\n**Prerequisites:** GPU runtime (T4 or better). Go to *Runtime -> Change runtime type -> T4 GPU*.\n\nThis notebook launches the default pre-composed Granite Switch checkpoint, `ibm-granite/granite-switch-4.1-3b-preview`. To compose your own checkpoint, use [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb). Full setup details (GPU sizes, HF auth, multi-GPU) are in [`PREREQUISITES.md`](../PREREQUISITES.md)."
   },
   {
    "cell_type": "markdown",
@@ -450,4 +427,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}
\ No newline at end of file

From 901b4c89a51eebbf0cdbd57883e25bc90120ace0 Mon Sep 17 00:00:00 2001
From: yairallouche <yair@il.ibm.com>
Date: Thu, 21 May 2026 14:22:44 +0300
Subject: [PATCH 4/4] Rename files and update terminology across tutorials

- Rename rag_full_pipeline.ipynb to rag_full_flow.ipynb
- Rename bring_your_own_adapter.md to build_your_own_adapter.md
- Rename mellea_bring_your_own_adapter.md to mellea_build_your_own_adapter.md
- Rename run_pipeline to run_conversation_turn in rag_full_flow
- Replace "adapter" with "adapter function" in user-facing text where
  it refers to the invocable capability (not LoRA weights or file names)
- Update all cross-references to match new file names
---
 tutorials/README.md                           | 36 +++++++++----------
 ...n_adapter.md => build_your_own_adapter.md} |  0
 .../guides/compare_inference_throughput.md    |  2 +-
 ...er.md => mellea_build_your_own_adapter.md} |  2 +-
 .../guides/mellea_with_granite_switch.md      |  2 +-
 .../notebooks/compose_granite_switch.ipynb    |  4 +--
 .../notebooks/granite_switch_with_hf.ipynb    | 12 +++----
 tutorials/notebooks/hello_adapter.ipynb       |  8 ++---
 tutorials/notebooks/hello_mellea.ipynb        | 14 ++++----
 tutorials/notebooks/rag_101.ipynb             |  4 +--
 ...ull_pipeline.ipynb => rag_full_flow.ipynb} | 30 ++++++++--------
 11 files changed, 57 insertions(+), 57 deletions(-)
 rename tutorials/guides/{bring_your_own_adapter.md => build_your_own_adapter.md} (100%)
 rename tutorials/guides/{mellea_bring_your_own_adapter.md => mellea_build_your_own_adapter.md} (98%)
 rename tutorials/notebooks/{rag_full_pipeline.ipynb => rag_full_flow.ipynb} (94%)

diff --git a/tutorials/README.md b/tutorials/README.md
index f2f9121..76eda8a 100644
--- a/tutorials/README.md
+++ b/tutorials/README.md
@@ -1,20 +1,20 @@
 # Granite Switch Tutorials
 
-Granite Switch facilitates a modular architecture by consolidating multiple LoRA adapters into a single, unified checkpoint. The following tutorials explore the underlying mechanics and usability, detailing adapter invocation, multi-step pipelines with guardrails, and checkpoint composition.
+Granite Switch facilitates a modular architecture by consolidating multiple LoRA adapters into a single, unified checkpoint. The following tutorials explore the underlying mechanics and usability, detailing adapter function invocation, multi-step pipelines with guardrails, and checkpoint composition.
 
 ## Notebooks
 
-Step-by-step walkthroughs covering adapter invocation, pipeline construction, and model composition.
+Step-by-step walkthroughs covering adapter function invocation, pipeline construction, and model composition.
 
 | Notebook | Topics | Duration | Colab |
 |----------|--------|----------|-------|
-| [hello_mellea.ipynb](notebooks/hello_mellea.ipynb) | Mellea adapters intro with vLLM | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_mellea.ipynb) |
+| [hello_mellea.ipynb](notebooks/hello_mellea.ipynb) | Mellea adapter functions intro with vLLM | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_mellea.ipynb) |
 | [rag_101.ipynb](notebooks/rag_101.ipynb) | RAG 101: build a vector corpus and run a basic answerability check | 15 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_101.ipynb) |
-| [rag_full_pipeline.ipynb](notebooks/rag_full_pipeline.ipynb) | Full RAG pipeline with guardian checks (harm + scope) | 30 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_pipeline.ipynb) |
+| [rag_full_flow.ipynb](notebooks/rag_full_flow.ipynb) | Full RAG pipeline with guardian checks (harm + scope) | 30 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_flow.ipynb) |
 | [compose_granite_switch.ipynb](notebooks/compose_granite_switch.ipynb) | Compose a checkpoint from adapter libraries | 15 min |  |
 | [alora_vs_lora_race.ipynb](notebooks/alora_vs_lora_race.ipynb) | ALORA vs LoRA race: side-by-side throughput comparison on a multi-step RAG pipeline | 20 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/alora_vs_lora_race.ipynb) |
-| [hello_adapter.ipynb](notebooks/hello_adapter.ipynb) | Minimal adapter invocation with HuggingFace | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_adapter.ipynb) |
-| [granite_switch_with_hf.ipynb](notebooks/granite_switch_with_hf.ipynb) | Compose + HuggingFace backend, `adapter_name=` invocation, Core + Guardian adapters in a multi-turn conversation | 10 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/granite_switch_with_hf.ipynb) |
+| [hello_adapter.ipynb](notebooks/hello_adapter.ipynb) | Minimal adapter function invocation with HuggingFace | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_adapter.ipynb) |
+| [granite_switch_with_hf.ipynb](notebooks/granite_switch_with_hf.ipynb) | Compose + HuggingFace backend, `adapter_name=` invocation, Core + Guardian adapter functions in a multi-turn conversation | 10 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/granite_switch_with_hf.ipynb) |
 | [granite_speech_demo.ipynb](notebooks/granite_speech_demo.ipynb) | Real-time voice assistant: Granite Speech STT + Granite Switch LLM + Granite Libraries validation, orchestrated by Mellea over WebRTC | 10 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/granite_speech_demo.ipynb) |
 
 ## Guides
@@ -22,7 +22,7 @@ Step-by-step walkthroughs covering adapter invocation, pipeline construction, an
 | Guide | Description |
 |-------|-------------|
 | [Using Mellea with Granite Switch](guides/mellea_with_granite_switch.md) | Connect Mellea to a Granite Switch model |
-| [Bring Your Own Adapter](guides/bring_your_own_adapter.md) | Train, compose, and use custom adapters |
+| [Bring Your Own Adapter](guides/build_your_own_adapter.md) | Train, compose, and use custom adapters |
 | [Compare Inference Throughput](guides/compare_inference_throughput.md) | Compare LoRA vs aLoRA based models in an inference race setup |
 
 
@@ -48,10 +48,10 @@ support coming soon.
 
 ### Path 2: Real-World Pipelines (Usability)
 
-Best for: Seeing how adapters compose into multi-step applications
+Best for: Seeing how adapter functions compose into multi-step applications
 
 1. [RAG 101](notebooks/rag_101.ipynb) - corpus build + answerability check, the smallest end-to-end RAG demo [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_101.ipynb)
-2. [Full RAG Pipeline with Guardians](notebooks/rag_full_pipeline.ipynb) - rewrite, answerability, citations, harm + scope checks [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_pipeline.ipynb)
+2. [Full RAG Pipeline with Guardians](notebooks/rag_full_flow.ipynb) - rewrite, answerability, citations, harm + scope checks [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_flow.ipynb)
 
 
 
@@ -59,10 +59,10 @@ Best for: Seeing how adapters compose into multi-step applications
 
 ### Path 3: Bring Your Own Adapter
 
-Best for: Custom adapter development
+Best for: Custom adapter function development
 
-1. [Bring Your Own Adapter Guide](guides/bring_your_own_adapter.md)
-2. [Configure Your Own Adapter Guide](guides/mellea_bring_your_own_adapter.md)
+1. [Bring Your Own Adapter Guide](guides/build_your_own_adapter.md)
+2. [Configure Your Own Adapter Guide](guides/mellea_build_your_own_adapter.md)
 3. [Compose Your Checkpoint](notebooks/compose_granite_switch.ipynb) 
 
 
@@ -70,7 +70,7 @@ Best for: Custom adapter development
 
 Best for: Understanding how Granite Switch works at the control-token level
 
-HuggingFace inference examples demonstrate how adapters are activated via control tokens, providing insight into the underlying mechanics. For most applications, we recommend running inference with Mellea (Part 2).
+HuggingFace inference examples demonstrate how adapter functions are activated via control tokens, providing insight into the underlying mechanics. For most applications, we recommend running inference with Mellea (Part 2).
 1. [Prerequisites](PREREQUISITES.md#huggingface-backend)
 2. [Hello Adapter](notebooks/hello_adapter.ipynb) — see control tokens in action [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_adapter.ipynb)
 3. [Granite Switch with HuggingFace](notebooks/granite_switch_with_hf.ipynb) — detailed walkthrough [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/granite_switch_with_hf.ipynb)
@@ -83,8 +83,8 @@ Runnable scripts in [`scripts/`](scripts/) for common tasks:
 
 | Script | Description |
 |--------|-------------|
-| [run_adapter_generation_direct.py](scripts/reference/run_adapter_generation_direct.py) | Direct adapter invocation via control tokens |
-| [run_adapter_generation_mellea.py](scripts/reference/run_adapter_generation_mellea.py) | Adapter invocation through Mellea |
+| [run_adapter_generation_direct.py](scripts/reference/run_adapter_generation_direct.py) | Direct adapter function invocation via control tokens |
+| [run_adapter_generation_mellea.py](scripts/reference/run_adapter_generation_mellea.py) | Adapter function invocation through Mellea |
 
 
 ## Adapter Libraries
@@ -93,9 +93,9 @@ Granite Switch checkpoints embed adapters drawn from IBM's granitelib libraries.
 
 | Adapter | Purpose | Where used in tutorials | HF repo |
 |---------|---------|-------------------------|---------|
-| Core | Foundational post-generation adapters: certainty scoring, requirement checking, and response attribution. | [granite_switch_with_hf](notebooks/granite_switch_with_hf.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-core-r1.0](https://huggingface.co/ibm-granite/granitelib-core-r1.0) |
-| RAG | Retrieval-augmented generation adapters: query rewrite, answerability, hallucination detection, and citation generation. | [hello_mellea](notebooks/hello_mellea.ipynb), [rag_101](notebooks/rag_101.ipynb), [rag_full_pipeline](notebooks/rag_full_pipeline.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-rag-r1.0](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) |
-| Guardian | Safety and risk detection: harm, social bias, jailbreaking, factuality, and policy compliance checks. | [hello_adapter](notebooks/hello_adapter.ipynb), [hello_mellea](notebooks/hello_mellea.ipynb), [granite_switch_with_hf](notebooks/granite_switch_with_hf.ipynb), [rag_full_pipeline](notebooks/rag_full_pipeline.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-guardian-r1.0](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) |
+| Core | Foundational post-generation adapter functions: certainty scoring, requirement checking, and response attribution. | [granite_switch_with_hf](notebooks/granite_switch_with_hf.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-core-r1.0](https://huggingface.co/ibm-granite/granitelib-core-r1.0) |
+| RAG | Retrieval-augmented generation adapter functions: query rewrite, answerability, hallucination detection, and citation generation. | [hello_mellea](notebooks/hello_mellea.ipynb), [rag_101](notebooks/rag_101.ipynb), [rag_full_flow](notebooks/rag_full_flow.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-rag-r1.0](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) |
+| Guardian | Safety and risk detection: harm, social bias, jailbreaking, factuality, and policy compliance checks. | [hello_adapter](notebooks/hello_adapter.ipynb), [hello_mellea](notebooks/hello_mellea.ipynb), [granite_switch_with_hf](notebooks/granite_switch_with_hf.ipynb), [rag_full_flow](notebooks/rag_full_flow.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-guardian-r1.0](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) |
 
 ## External Resources
 
diff --git a/tutorials/guides/bring_your_own_adapter.md b/tutorials/guides/build_your_own_adapter.md
similarity index 100%
rename from tutorials/guides/bring_your_own_adapter.md
rename to tutorials/guides/build_your_own_adapter.md
diff --git a/tutorials/guides/compare_inference_throughput.md b/tutorials/guides/compare_inference_throughput.md
index b4a14c1..961172f 100644
--- a/tutorials/guides/compare_inference_throughput.md
+++ b/tutorials/guides/compare_inference_throughput.md
@@ -87,4 +87,4 @@ raced simultaneously.
 
 - **[Hello Adapter](../notebooks/hello_adapter.ipynb)** - minimal embedded-adapter invocation via the HuggingFace backend
 - **[Using Mellea with Granite Switch](mellea_with_granite_switch.md)** - deeper Mellea integration details
-- **[Bring Your Own Adapter](bring_your_own_adapter.md)** - train a custom adapter and compose it in
+- **[Bring Your Own Adapter](build_your_own_adapter.md)** - train a custom adapter and compose it in
diff --git a/tutorials/guides/mellea_bring_your_own_adapter.md b/tutorials/guides/mellea_build_your_own_adapter.md
similarity index 98%
rename from tutorials/guides/mellea_bring_your_own_adapter.md
rename to tutorials/guides/mellea_build_your_own_adapter.md
index 9fb83ee..46779ef 100644
--- a/tutorials/guides/mellea_bring_your_own_adapter.md
+++ b/tutorials/guides/mellea_build_your_own_adapter.md
@@ -6,7 +6,7 @@ This guide explains how to configure your own adapter with Mellea to be used by
 
 Together, Mellea + Granite Switch + vLLM provide a production-ready inference stack for adapter-based AI applications that can utilize custom adapters.
 - See [Mellea With Granite Switch](mellea_with_granite_switch.md) for a detailed explanation of how granite-switch and Mellea work together.
-- See [Bring Your Own Adapter](bring_your_own_adapter.md) for info on how to train your own adapter.
+- See [Bring Your Own Adapter](build_your_own_adapter.md) for info on how to train your own adapter.
 - See Mellea's [Lora and aLoRA adapters](https://docs.mellea.ai/advanced/lora-and-alora-adapters) for info on how to train your own custom adapters using Mellea.
 
 ## Prerequisites
diff --git a/tutorials/guides/mellea_with_granite_switch.md b/tutorials/guides/mellea_with_granite_switch.md
index cba5888..e9946fd 100644
--- a/tutorials/guides/mellea_with_granite_switch.md
+++ b/tutorials/guides/mellea_with_granite_switch.md
@@ -247,7 +247,7 @@ print(f"Citations: {citations}")
 ## Next Steps
 
 - **[Hello Adapter](../notebooks/hello_adapter.ipynb)** - Minimal embedded-adapter invocation via the HuggingFace backend
-- **[Bring Your Own Adapter](bring_your_own_adapter.md)** - Train a custom adapter and compose it in
+- **[Bring Your Own Adapter](build_your_own_adapter.md)** - Train a custom adapter and compose it in
 - **[Compare Inference Throughput](compare_inference_throughput.md)** - Benchmark ALORA vs LoRA on a 6-step RAG pipeline
 - **[Mellea Repository](https://github.com/generative-computing/mellea)** - Full documentation
 - **[Granite Models](https://huggingface.co/ibm-granite)**
diff --git a/tutorials/notebooks/compose_granite_switch.ipynb b/tutorials/notebooks/compose_granite_switch.ipynb
index 2866c87..bcb7697 100644
--- a/tutorials/notebooks/compose_granite_switch.ipynb
+++ b/tutorials/notebooks/compose_granite_switch.ipynb
@@ -257,7 +257,7 @@
    "cell_type": "markdown",
    "id": "generate-md",
    "metadata": {},
-   "source": "## 6 * Generate against the composed model\n\nConnect Mellea to the running vLLM server, register the embedded adapters, and call the `rewrite_question` adapter. If it prints a cleaned-up version of the messy query, your composed checkpoint is wired up correctly."
+   "source": "## 6 * Generate against the composed model\n\nConnect Mellea to the running vLLM server, register the embedded adapters, and call the `rewrite_question` adapter function. If it prints a cleaned-up version of the messy query, your composed checkpoint is wired up correctly."
   },
   {
    "cell_type": "code",
@@ -304,4 +304,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
\ No newline at end of file
+}
diff --git a/tutorials/notebooks/granite_switch_with_hf.ipynb b/tutorials/notebooks/granite_switch_with_hf.ipynb
index 41e0ae9..c8b68d5 100644
--- a/tutorials/notebooks/granite_switch_with_hf.ipynb
+++ b/tutorials/notebooks/granite_switch_with_hf.ipynb
@@ -3,7 +3,7 @@
   {
    "metadata": {},
    "cell_type": "markdown",
-   "source": "# Granite Switch with HuggingFace\n\n**Duration:** ~10 min (after model download)\n\nA Granite Switch checkpoint bundles a base model with many LoRA experts. You pick one per forward pass by passing its name to the chat template.\n\n*Why HuggingFace:* this notebook uses the `transformers` backend for familiarity - every call is a standard `model.generate()`. Production workloads should switch to vLLM for 10-20x speedup; see [`rag_101.ipynb`](./rag_101.ipynb).\n\n**What you'll build:** one growing conversation about *Horizon 2055 Target Date Fund* (a fictional fund whose prospectus is the retrieved context), where each natural turn demonstrates a different embedded adapter.\n\n**What you'll learn:**\n- How to load a composed Granite Switch checkpoint via `AutoModelForCausalLM` - no `trust_remote_code=True`.\n- How to invoke any embedded adapter with `tokenizer.apply_chat_template(..., adapter_name=...)`.\n- The two parts of every adapter call: the LoRA switch, and the adapter-specific content protocol (criteria strings, control tokens, tagged sentences).\n- How guardian-family adapters act as *judges* over a side conversation without polluting the main chat history.\n\n**Adapters used:** adapters from the [Core](https://huggingface.co/ibm-granite/granitelib-core-r1.0) library (`context-attribution`, `uncertainty`, `requirement-check`) and the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (`guardian-core`, `policy-guardrails`, `factuality-detection`, `factuality-correction`).\n\n## Prerequisites\n\n1. **Install dependencies** (GPU recommended; CPU works but slow):",
+   "source": "# Granite Switch with HuggingFace\n\n**Duration:** ~10 min (after model download)\n\nA Granite Switch checkpoint bundles a base model with many LoRA experts. You pick one per forward pass by passing its name to the chat template.\n\n*Why HuggingFace:* this notebook uses the `transformers` backend for familiarity - every call is a standard `model.generate()`. Production workloads should switch to vLLM for 10-20x speedup; see [`rag_101.ipynb`](./rag_101.ipynb).\n\n**What you'll build:** one growing conversation about *Horizon 2055 Target Date Fund* (a fictional fund whose prospectus is the retrieved context), where each natural turn demonstrates a different embedded adapter function.\n\n**What you'll learn:**\n- How to load a composed Granite Switch checkpoint via `AutoModelForCausalLM` - no `trust_remote_code=True`.\n- How to invoke any embedded adapter function with `tokenizer.apply_chat_template(..., adapter_name=...)`.\n- The two parts of every adapter call: the LoRA switch, and the adapter-specific content protocol (criteria strings, control tokens, tagged sentences).\n- How guardian-family adapter functions act as *judges* over a side conversation without polluting the main chat history.\n\n**Adapters used:** adapters from the [Core](https://huggingface.co/ibm-granite/granitelib-core-r1.0) library (`context-attribution`, `uncertainty`, `requirement-check`) and the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (`guardian-core`, `policy-guardrails`, `factuality-detection`, `factuality-correction`).\n\n## Prerequisites\n\n1. **Install dependencies** (GPU recommended; CPU works but slow):",
    "id": "d5ed1e5ac8582c60"
   },
   {
@@ -17,7 +17,7 @@
   {
    "metadata": {},
    "cell_type": "markdown",
-   "source": "2. **Get a composed Granite Switch model.** Easiest: the pre-composed `ibm-granite/granite-switch-4.1-3b-preview` on HuggingFace (used by default below). To compose your own, see [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb).\n3. **HuggingFace auth** (if artifacts are gated): `huggingface-cli login` or export `HF_TOKEN=...`.\n\nFull setup details (GPU sizes, disk requirements, multi-GPU) are in [`PREREQUISITES.md`](../PREREQUISITES.md).\n\n---\n\n## Why This Tutorial Uses HuggingFace\n\n**Goal:** Understand how Granite Switch adapters work at the control-token level.\n\nThis notebook demonstrates:\n- Direct `model.generate()` calls with `adapter_name=` parameter\n- Manual prompt construction with `tokenizer.apply_chat_template()`\n- Raw JSON parsing of adapter outputs\n- Low-level adapter invocation mechanics\n\n**For production use:** See [hello_mellea.ipynb](./hello_mellea.ipynb) for:\n- 3-5 lines of code per adapter (vs 10-30 here)\n- Type-safe outputs (Pydantic models vs raw JSON)\n- 10-20x faster vLLM inference\n- High-level abstractions for easier development\n\n**Learning path:** Start with [hello_mellea](./hello_mellea.ipynb) for concepts → return here for low-level mechanics.",
+   "source": "2. **Get a composed Granite Switch model.** Easiest: the pre-composed `ibm-granite/granite-switch-4.1-3b-preview` on HuggingFace (used by default below). To compose your own, see [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb).\n3. **HuggingFace auth** (if artifacts are gated): `huggingface-cli login` or export `HF_TOKEN=...`.\n\nFull setup details (GPU sizes, disk requirements, multi-GPU) are in [`PREREQUISITES.md`](../PREREQUISITES.md).\n\n---\n\n## Why This Tutorial Uses HuggingFace\n\n**Goal:** Understand how Granite Switch adapters work at the control-token level.\n\nThis notebook demonstrates:\n- Direct `model.generate()` calls with `adapter_name=` parameter\n- Manual prompt construction with `tokenizer.apply_chat_template()`\n- Raw JSON parsing of adapter outputs\n- Low-level adapter function invocation mechanics\n\n**For production use:** See [hello_mellea.ipynb](./hello_mellea.ipynb) for:\n- 3-5 lines of code per adapter (vs 10-30 here)\n- Type-safe outputs (Pydantic models vs raw JSON)\n- 10-20x faster vLLM inference\n- High-level abstractions for easier development\n\n**Learning path:** Start with [hello_mellea](./hello_mellea.ipynb) for concepts → return here for low-level mechanics.",
    "id": "a96b6c9946ef1d89"
   },
   {
@@ -106,7 +106,7 @@
   {
    "metadata": {},
    "cell_type": "markdown",
-   "source": "## * 3 How to invoke an adapter\n\nEach invocation has two parts: the LoRA switch (`adapter_name=` in `tokenizer.apply_chat_template`, which inserts a special token into the prompt telling granite-switch which adapter to use), and an adapter-specific prompt that you build into the message content per the adapter's README.\n\nIn the cell below, you can see an example of the rendered prompt produced after applying the chat template, showing exactly what is sent to the model when the `guardian-core` adapter is selected.",
+   "source": "## * 3 How to invoke an adapter function\n\nEach invocation has two parts: the LoRA switch (`adapter_name=` in `tokenizer.apply_chat_template`, which inserts a special token into the prompt telling granite-switch which adapter to use), and an adapter-specific prompt that you build into the message content per the adapter's README.\n\nIn the cell below, you can see an example of the rendered prompt produced after applying the chat template, showing exactly what is sent to the model when the `guardian-core` adapter function is selected.",
    "id": "d51ccd9c29a39452"
   },
   {
@@ -125,7 +125,7 @@
   {
    "metadata": {},
    "cell_type": "markdown",
-   "source": "## * 4 Helpers and adapter schemas\n\nWe import helper functions from `granite_switch.tutorials.utils.hf_helpers` to keep the notebook focused on adapter concepts rather than implementation details. The helpers handle:\n- `generate_turn()` - Render chat prompt + generate response\n- `screen_user_message()` - Guardian-core jailbreak screening\n- `run_context_attribution()` - Sentence tagging for context-attribution\n- `say_user()` / `say_assistant()` - Conversation management\n- `show_conversation_as_markdown()` - Display helper\n\n**Implementation note:** For the full implementation of these helpers, see [`hf_helpers.py`](../../src/granite_switch/tutorials/utils/hf_helpers.py).\n\nWe also define adapter-specific constants (criteria strings, schemas, instructions) upfront so adapter invocations below are more readable.",
+   "source": "## * 4 Helpers and adapter schemas\n\nWe import helper functions from `granite_switch.tutorials.utils.hf_helpers` to keep the notebook focused on adapter function concepts rather than implementation details. The helpers handle:\n- `generate_turn()` - Render chat prompt + generate response\n- `screen_user_message()` - Guardian-core jailbreak screening\n- `run_context_attribution()` - Sentence tagging for context-attribution\n- `say_user()` / `say_assistant()` - Conversation management\n- `show_conversation_as_markdown()` - Display helper\n\n**Implementation note:** For the full implementation of these helpers, see [`hf_helpers.py`](../../src/granite_switch/tutorials/utils/hf_helpers.py).\n\nWe also define adapter-specific constants (criteria strings, schemas, instructions) upfront so adapter function invocations below are more readable.",
    "id": "d9cf94d3c7b40d62"
   },
   {
@@ -196,7 +196,7 @@
    "source": [
     "## Turn 2 - \"What's a glide path?\" -> `uncertainty`\n",
     "\n",
-    "Invoke `uncertainty` by appending one user turn whose entire content is `<certainty>`. The adapter returns a digit 0-9 that maps to calibrated probability via `0.1*d + 0.05`."
+    "Invoke `uncertainty` by appending one user turn whose entire content is `<certainty>`. The adapter function returns a digit 0-9 that maps to calibrated probability via `0.1*d + 0.05`."
    ],
    "id": "5e773d9d08b0f86e"
   },
@@ -328,4 +328,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
\ No newline at end of file
+}
diff --git a/tutorials/notebooks/hello_adapter.ipynb b/tutorials/notebooks/hello_adapter.ipynb
index ade687d..d733067 100644
--- a/tutorials/notebooks/hello_adapter.ipynb
+++ b/tutorials/notebooks/hello_adapter.ipynb
@@ -3,7 +3,7 @@
   {
    "metadata": {},
    "cell_type": "markdown",
-   "source": "# Hello Adapter - Granite Switch with HuggingFace\n\n**Duration:** ~5 min\n\nMinimal example of invoking an **embedded LoRA adapter** inside a **Granite Switch** model, using the HuggingFace backend. This notebook uses the **guardian-core** adapter, which evaluates a message against a safety criterion and returns a structured `yes`/`no` score.\n\n**What you'll learn:**\n- How to build a single guardian-core call that scores a user message against a safety criterion and prints a parsed `harmful`/`safe` verdict.\n- How to load a composed Granite Switch checkpoint with `transformers`.\n- How to activate an adapter by passing `adapter_name=...` to `apply_chat_template`.\n- The Guardian prompt protocol - how to frame a criterion so the adapter returns a parseable score.\n\n**Adapters used:** `guardian-core` from the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library - a general-purpose safety/risk judge that scores any user-supplied criterion (harm, social bias, jailbreaking, groundedness, ...) as `yes`/`no`.\n\nFor the recommended inference path (mellea + vLLM), see [`hello_mellea.ipynb`](./hello_mellea.ipynb). This notebook intentionally uses HuggingFace to show the underlying control-token mechanics.\n\n## Prerequisites\n\n**1 * A composed Granite Switch checkpoint** with the `guardian-core` adapter. The default `MODEL_PATH` below points at the pre-composed `ibm-granite/granite-switch-4.1-3b-preview` on HuggingFace (drawn from the [IBM Granite 4.1 collection](https://huggingface.co/collections/ibm-granite/granite-41-language-models)). To compose your own checkpoint instead, see [`./compose_granite_switch.ipynb`](./compose_granite_switch.ipynb) and point `MODEL_PATH` at its output directory.\n\n**2 * Dependencies** (CUDA GPU required):",
+   "source": "# Hello Adapter - Granite Switch with HuggingFace\n\n**Duration:** ~5 min\n\nMinimal example of invoking an **embedded LoRA adapter** inside a **Granite Switch** model, using the HuggingFace backend. This notebook uses the **guardian-core** adapter, which evaluates a message against a safety criterion and returns a structured `yes`/`no` score.\n\n**What you'll learn:**\n- How to build a single guardian-core call that scores a user message against a safety criterion and prints a parsed `harmful`/`safe` verdict.\n- How to load a composed Granite Switch checkpoint with `transformers`.\n- How to activate an adapter function function by passing `adapter_name=...` to `apply_chat_template`.\n- The Guardian prompt protocol - how to frame a criterion so the adapter returns a parseable score.\n\n**Adapters used:** `guardian-core` from the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library - a general-purpose safety/risk judge that scores any user-supplied criterion (harm, social bias, jailbreaking, groundedness, ...) as `yes`/`no`.\n\nFor the recommended inference path (mellea + vLLM), see [`hello_mellea.ipynb`](./hello_mellea.ipynb). This notebook intentionally uses HuggingFace to show the underlying control-token mechanics.\n\n## Prerequisites\n\n**1 * A composed Granite Switch checkpoint** with the `guardian-core` adapter function. The default `MODEL_PATH` below points at the pre-composed `ibm-granite/granite-switch-4.1-3b-preview` on HuggingFace (drawn from the [IBM Granite 4.1 collection](https://huggingface.co/collections/ibm-granite/granite-41-language-models)). To compose your own checkpoint instead, see [`./compose_granite_switch.ipynb`](./compose_granite_switch.ipynb) and point `MODEL_PATH` at its output directory.\n\n**2 * Dependencies** (CUDA GPU required):",
    "id": "97c76dcca207b140"
   },
   {
@@ -116,7 +116,7 @@
    "metadata": {},
    "cell_type": "markdown",
    "source": [
-    "## 5 * Invoke the adapter\n",
+    "## 5 * Invoke the adapter function\n",
     "This is the key moment: `adapter_name=ADAPTER_NAME` tells `apply_chat_template` to insert the adapter's control token into the prompt. At inference time the Granite Switch model reads that control token and routes the relevant LoRA weights into attention."
    ],
    "id": "84f66102f3a36d4c"
@@ -146,11 +146,11 @@
   {
    "metadata": {},
    "cell_type": "markdown",
-   "source": "## 7 * Next steps\n\n- **Try the Mellea path.** [`hello_mellea.ipynb`](./hello_mellea.ipynb) runs the same adapter through Mellea's wrappers on vLLM - constrained decoding and output parsing come for free.\n- **Go deeper on HF mechanics.** [`granite_switch_with_hf.ipynb`](./granite_switch_with_hf.ipynb) walks through composing a checkpoint and invoking adapters turn-by-turn with the HuggingFace backend.\n- **Try a real corpus.** [`rag_101.ipynb`](./rag_101.ipynb) builds a vector corpus and runs an answerability check - the smallest end-to-end RAG demo.\n- **Compose your own checkpoint.** [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb) - pick adapters from the IBM libraries and bake them into a single model.\n- **Watch ALORA vs LoRA race.** [`alora_vs_lora_race.ipynb`](./alora_vs_lora_race.ipynb) compares the two activation styles head-to-head on the same workload.",
+   "source": "## 7 * Next steps\n\n- **Try the Mellea path.** [`hello_mellea.ipynb`](./hello_mellea.ipynb) runs the same adapter function through Mellea's wrappers on vLLM - constrained decoding and output parsing come for free.\n- **Go deeper on HF mechanics.** [`granite_switch_with_hf.ipynb`](./granite_switch_with_hf.ipynb) walks through composing a checkpoint and invoking adapter functions turn-by-turn with the HuggingFace backend.\n- **Try a real corpus.** [`rag_101.ipynb`](./rag_101.ipynb) builds a vector corpus and runs an answerability check - the smallest end-to-end RAG demo.\n- **Compose your own checkpoint.** [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb) - pick adapters from the IBM libraries and bake them into a single model.\n- **Watch ALORA vs LoRA race.** [`alora_vs_lora_race.ipynb`](./alora_vs_lora_race.ipynb) compares the two activation styles head-to-head on the same workload.",
    "id": "6dbd5a8bf3aaaf37"
   }
  ],
  "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5
-}
\ No newline at end of file
+}
diff --git a/tutorials/notebooks/hello_mellea.ipynb b/tutorials/notebooks/hello_mellea.ipynb
index fb5ba48..d01b436 100644
--- a/tutorials/notebooks/hello_mellea.ipynb
+++ b/tutorials/notebooks/hello_mellea.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "id": "intro",
    "metadata": {},
-   "source": "# Hello World - Using Mellea with Granite Switch\n\n**Duration:** ~5 min after the model server is ready\n\nMinimal example of invoking **mellea adapters** against a **Granite Switch** model served by vLLM. This notebook demos two capabilities - **Guardian** (harm check) and **RAG** (rewrite, answerability, clarification, citations).\n\n[Mellea](https://github.com/generative-computing/mellea) is IBM's library for writing Generative Programs. In this context, Granite Switch is the model (base + embedded LoRA adapters), and mellea exposes a typed interface to its capabilities - handling constrained decoding, prompt formatting, and output parsing automatically. vLLM provides much faster inference in production environments; HF support for Granite Switch in mellea coming.\n\n**What you'll learn:**\n- How to chain guardian + rewrite + answerability + clarification + citations into a single RAG flow driven by mellea adapters.\n- How to connect a mellea `OpenAIBackend` to a vLLM server serving a Granite Switch checkpoint.\n- How to call an adapter through its high-level wrapper (`rag.rewrite_question`) vs. the low-level `Intrinsic` AST node (for adapters mellea doesn't wrap yet).\n- The difference between `CRITERIA_BANK` keys and custom criteria strings when calling `guardian_check`.\n\n**Adapters used:** adapters from the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (`guardian-core`) and the [RAG](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) library (`query_rewrite`, `answerability`, `query_clarification`, `citations`).\n\nSee section 11 for the full list of adapter wrappers currently supported.\n\n---\n**Prerequisites:** GPU runtime (T4 or better). Go to *Runtime -> Change runtime type -> T4 GPU*.\n\nThis notebook launches the default pre-composed Granite Switch checkpoint, `ibm-granite/granite-switch-4.1-3b-preview`. To compose your own checkpoint, use [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb). Full setup details (GPU sizes, HF auth, multi-GPU) are in [`PREREQUISITES.md`](../PREREQUISITES.md)."
+   "source": "# Hello World - Using Mellea with Granite Switch\n\n**Duration:** ~5 min after the model server is ready\n\nMinimal example of invoking **mellea adapter functions** against a **Granite Switch** model served by vLLM. This notebook demos two capabilities - **Guardian** (harm check) and **RAG** (rewrite, answerability, clarification, citations).\n\n[Mellea](https://github.com/generative-computing/mellea) is IBM's library for writing Generative Programs. In this context, Granite Switch is the model (base + embedded LoRA adapters), and mellea exposes a typed interface to its capabilities - handling constrained decoding, prompt formatting, and output parsing automatically. vLLM provides much faster inference in production environments; HF support for Granite Switch in mellea coming.\n\n**What you'll learn:**\n- How to chain guardian + rewrite + answerability + clarification + citations into a single RAG flow driven by mellea adapter functions.\n- How to connect a mellea `OpenAIBackend` to a vLLM server serving a Granite Switch checkpoint.\n- How to call an adapter function through its high-level wrapper (`rag.rewrite_question`) vs. the low-level `Intrinsic` AST node (for adapters mellea doesn't wrap yet).\n- The difference between `CRITERIA_BANK` keys and custom criteria strings when calling `guardian_check`.\n\n**Adapters used:** adapters from the [Guardian](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) library (`guardian-core`) and the [RAG](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) library (`query_rewrite`, `answerability`, `query_clarification`, `citations`).\n\nSee section 11 for the full list of adapter function wrappers currently supported.\n\n---\n**Prerequisites:** GPU runtime (T4 or better). Go to *Runtime -> Change runtime type -> T4 GPU*.\n\nThis notebook launches the default pre-composed Granite Switch checkpoint, `ibm-granite/granite-switch-4.1-3b-preview`. To compose your own checkpoint, use [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb). Full setup details (GPU sizes, HF auth, multi-GPU) are in [`PREREQUISITES.md`](../PREREQUISITES.md)."
   },
   {
    "cell_type": "markdown",
@@ -127,7 +127,7 @@
    "metadata": {},
    "source": [
     "## 3 · Connect to vLLM backend via mellea\n",
-    "Registers the Granite Switch embedded adapters so mellea adapter calls route through the correct control tokens."
+    "Registers the Granite Switch embedded adapter functions so mellea adapter function calls route through the correct control tokens."
    ]
   },
   {
@@ -257,7 +257,7 @@
    "source": [
     "### 6b · Same thing without the wrapper\n",
     "\n",
-    "`rag.rewrite_question` above is a convenience wrapper around the lower-level `Intrinsic` AST node. Here we do **the same action** - invoke the `query_rewrite` adapter - but explicitly name the adapter and drive it through `mfuncs.act`. Useful when you want to invoke an adapter mellea doesn't wrap yet, or to understand what the wrapper does under the hood."
+    "`rag.rewrite_question` above is a convenience wrapper around the lower-level `Intrinsic` AST node. Here we do **the same action** - invoke the `query_rewrite` adapter function - but explicitly name the adapter and drive it through `mfuncs.act`. Useful when you want to invoke an adapter function mellea doesn't wrap yet, or to understand what the wrapper does under the hood."
    ]
   },
   {
@@ -375,9 +375,9 @@
    "id": "reference-intrinsics",
    "metadata": {},
    "source": [
-    "## 11 · Other mellea adapter wrappers\n",
+    "## 11 · Other mellea adapter function wrappers\n",
     "\n",
-    "Beyond what this notebook demos, Mellea ships wrappers for additional adapters. The list below reflects **what's currently supported** - new adapters can be added over time as the library evolves. All wrappers follow the same shape - they take a `ChatContext` and a `backend`, and internally drive a named adapter through an `Intrinsic` AST node (see section 6b). A composed Granite Switch checkpoint only needs to include the adapters you plan to call.\n",
+    "Beyond what this notebook demos, Mellea ships wrappers for additional adapter functions. The list below reflects **what's currently supported** - new adapter functions can be added over time as the library evolves. All wrappers follow the same shape - they take a `ChatContext` and a `backend`, and internally drive a named adapter through an `Intrinsic` AST node (see section 6b). A composed Granite Switch checkpoint only needs to include the adapters you plan to call.\n",
     "\n",
     "**Currently supported wrappers:**\n",
     "\n",
@@ -407,7 +407,7 @@
    "source": [
     "## 12 · Next steps\n",
     "\n",
-    "- **Go deeper on HF mechanics.** [`granite_switch_with_hf.ipynb`](./granite_switch_with_hf.ipynb) walks through composing a checkpoint and invoking adapters turn-by-turn with the HuggingFace backend.\n",
+    "- **Go deeper on HF mechanics.** [`granite_switch_with_hf.ipynb`](./granite_switch_with_hf.ipynb) walks through composing a checkpoint and invoking adapter functions turn-by-turn with the HuggingFace backend.\n",
     "- **Try a real corpus.** [`rag_101.ipynb`](./rag_101.ipynb) builds a vector corpus and runs an answerability check - the smallest end-to-end RAG demo.\n",
     "- **Compose your own checkpoint.** [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb) - pick adapters from the IBM libraries and bake them into a single model.\n",
     "- **Watch ALORA vs LoRA race.** [`alora_vs_lora_race.ipynb`](./alora_vs_lora_race.ipynb) compares the two activation styles head-to-head on the same workload.\n",
@@ -427,4 +427,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
\ No newline at end of file
+}
diff --git a/tutorials/notebooks/rag_101.ipynb b/tutorials/notebooks/rag_101.ipynb
index f4a40b6..01d48f8 100644
--- a/tutorials/notebooks/rag_101.ipynb
+++ b/tutorials/notebooks/rag_101.ipynb
@@ -214,7 +214,7 @@
    "metadata": {},
    "source": [
     "## 4 · Connect to vLLM via mellea\n",
-    "Registers the Granite Switch embedded adapters so `rag.check_answerability` routes to the correct control token."
+    "Registers the Granite Switch embedded adapter functions so `rag.check_answerability` routes to the correct control token."
    ]
   },
   {
@@ -299,7 +299,7 @@
    "source": [
     "## 6 · Next steps\n",
     "\n",
-    "- **Add the rest of the pipeline.** [`rag_full_pipeline.ipynb`](./rag_full_pipeline.ipynb) layers query rewrite, clarification, grounded generation, citations, and guardian harm + scope checks on top of the same corpus and answerability check.\n",
+    "- **Add the rest of the pipeline.** [`rag_full_flow.ipynb`](./rag_full_flow.ipynb) layers query rewrite, clarification, grounded generation, citations, and guardian harm + scope checks on top of the same corpus and answerability check.\n",
     "- **Compose your own checkpoint.** [`compose_granite_switch.ipynb`](./compose_granite_switch.ipynb) walks through building a Granite Switch model from the IBM adapter libraries.\n",
     "- **Watch ALORA vs LoRA race.** [`alora_vs_lora_race.ipynb`](./alora_vs_lora_race.ipynb) compares the two activation styles head-to-head on the same workload."
    ]
diff --git a/tutorials/notebooks/rag_full_pipeline.ipynb b/tutorials/notebooks/rag_full_flow.ipynb
similarity index 94%
rename from tutorials/notebooks/rag_full_pipeline.ipynb
rename to tutorials/notebooks/rag_full_flow.ipynb
index 1cef1f9..0de91db 100644
--- a/tutorials/notebooks/rag_full_pipeline.ipynb
+++ b/tutorials/notebooks/rag_full_flow.ipynb
@@ -11,7 +11,7 @@
     "\n",
     "**Duration:** ~30 min (first run, includes corpus embedding)\n",
     "\n",
-    "This notebook demonstrates a **conversational RAG pipeline** where every AI capability — guardian checks, query rewriting, retrieval-grounded answering, citations — runs through a single vLLM endpoint. The intrinsics are embedded adapters inside the Granite Switch model, activated by control tokens at inference time.\n",
+    "This notebook demonstrates a **conversational RAG pipeline** where every AI capability — guardian checks, query rewriting, retrieval-grounded answering, citations — runs through a single vLLM endpoint. The intrinsics are embedded adapter functions inside the Granite Switch model, activated by control tokens at inference time.\n",
     "\n",
     "*Why vLLM:* much faster inference in production environments; HF support for Granite Switch in mellea coming.\n",
     "\n",
@@ -20,7 +20,7 @@
     "- How to chain multiple intrinsics (guardian, query rewrite, answerability, clarification, grounded generation, citations) into one RAG pipeline.\n",
     "- How control tokens route each intrinsic call to the right embedded adapter without loading separate models.\n",
     "- How to handle the four terminal states — `blocked`, `unanswerable`, `needs_clarification`, and `done` — in a stateful conversation.\n",
-    "- How to lift `run_pipeline` out of this notebook and drop it into your own app.\n",
+    "- How to lift `run_conversation_turn` out of this notebook and drop it into your own app.\n",
     "\n",
     "---\n",
     "**Prerequisites:** GPU runtime (T4 or better). Go to *Runtime → Change runtime type → T4 GPU*.\n",
@@ -107,7 +107,7 @@
    "id": "1a005f00099526ee",
    "metadata": {},
    "source": [
-    "**Intrinsics used in this pipeline:** each row is one embedded adapter, invoked via mellea.\n",
+    "**Intrinsics used in this pipeline:** each row is one embedded adapter function, invoked via mellea.\n",
     "\n",
     "| Intrinsic | Role in the pipeline |\n",
     "|-----------|----------------------|\n",
@@ -283,7 +283,7 @@
    "metadata": {},
    "source": [
     "## 4 · Connect to vLLM backend\n",
-    "Registers the Granite Switch embedded adapters from `GRANITE_SWITCH_SOURCE`\n",
+    "Registers the Granite Switch embedded adapter functions from `GRANITE_SWITCH_SOURCE`\n",
     "so all intrinsics (guardian, RAG, citations) route through the correct control tokens."
    ]
   },
@@ -309,7 +309,7 @@
    "metadata": {},
    "source": [
     "## 5 · The pipeline function\n",
-    "`run_pipeline(query, ctx)` is the whole pipeline - guardian, rewrite, retrieve, answerability, clarify, answer, citations - with one exit per terminal state. Sub-cell 6a quiets mellea's INFO/WARNING logs so the pipeline output is readable; the display helpers themselves were imported in section 3."
+    "`run_conversation_turn(query, ctx)` is the whole pipeline - guardian, rewrite, retrieve, answerability, clarify, answer, citations - with one exit per terminal state. Sub-cell 6a quiets mellea's INFO/WARNING logs so the pipeline output is readable; the display helpers themselves were imported in section 3."
    ]
   },
   {
@@ -342,7 +342,7 @@
     "\n",
     "\n",
     "# ── Full pipeline ───────────────────────────────────────────────────────────────────────────────\n",
-    "def run_pipeline(query, ctx):\n",
+    "def run_conversation_turn(query, ctx):\n",
     "    \"\"\"Run one turn of the RAG pipeline.\n",
     "\n",
     "    Prints the answer, appends the turn to `ctx` (unless blocked), and\n",
@@ -425,7 +425,7 @@
     "    return ctx, r\n",
     "\n",
     "\n",
-    "print(\"run_pipeline ready.\")"
+    "print(\"run_conversation_turn ready.\")"
    ]
   },
   {
@@ -461,7 +461,7 @@
    "source": [
     "## 6 · Queries\n",
     "Each cell is one turn. History accumulates automatically.\n",
-    "- `run_pipeline(query, ctx)` - run pipeline, show the final answer, update history, return `(ctx, r)`.\n",
+    "- `run_conversation_turn(query, ctx)` - run pipeline, show the final answer, update history, return `(ctx, r)`.\n",
     "- `show_intermediates(r)` - step-by-step breakdown for any result.\n",
     "- `show_history(conv)` - print the full conversation so far.\n",
     "\n",
@@ -496,7 +496,7 @@
     "# which one. The rewriter is correctly a no-op (rewriting away the ambiguity would\n",
     "# defeat the clarification step).\n",
     "ctx = ChatContext()\n",
-    "ctx, r1 = run_pipeline(\"How long does it take for the government service to refund?\", ctx)\n",
+    "ctx, r1 = run_conversation_turn(\"How long does it take for the government service to refund?\", ctx)\n",
     "show_intermediates(r1)"
    ]
   },
@@ -510,7 +510,7 @@
     "# Q2 - resolves the clarification: a 2-token reply (\"The IRS\") is enough for the\n",
     "# rewriter to expand into a full standalone query using Q1 history, which then\n",
     "# retrieves IRS-specific docs and produces a grounded answer.\n",
-    "ctx, r2 = run_pipeline(\"The IRS\", ctx)\n",
+    "ctx, r2 = run_conversation_turn(\"The IRS\", ctx)\n",
     "show_intermediates(r2)"
    ]
   },
@@ -524,7 +524,7 @@
     "# Q3 - history-aware rewrite: \"paper return instead\" only makes sense relative to\n",
     "# the IRS-refund thread established in Q1-Q2. The rewrite adapter uses ctx to\n",
     "# produce a standalone query about paper returns, which is what gets sent to retrieval.\n",
-    "ctx, r3 = run_pipeline(\"What if I'm filing a paper return instead?\", ctx)\n",
+    "ctx, r3 = run_conversation_turn(\"What if I'm filing a paper return instead?\", ctx)\n",
     "show_intermediates(r3)"
    ]
   },
@@ -538,7 +538,7 @@
     "# Q4 - cross-turn pronoun resolution: \"amending it\" - the rewriter resolves \"it\"\n",
     "# to the paper return from Q3's rewritten query, demonstrating that ctx flows\n",
     "# through multiple turns of history.\n",
-    "ctx, r4 = run_pipeline(\"And what's the deadline for amending it?\", ctx)\n",
+    "ctx, r4 = run_conversation_turn(\"And what's the deadline for amending it?\", ctx)\n",
     "show_intermediates(r4)"
    ]
   },
@@ -553,7 +553,7 @@
     "# Rewriter resolves the pronoun (\"How much does it cost to amend a paper tax return?\")\n",
     "# but the corpus has no fee schedule, so answerability routes to unanswerable instead\n",
     "# of hallucinating a number.\n",
-    "ctx, r5 = run_pipeline(\"How much does it cost?\", ctx)\n",
+    "ctx, r5 = run_conversation_turn(\"How much does it cost?\", ctx)\n",
     "show_intermediates(r5)"
    ]
   },
@@ -565,7 +565,7 @@
    "outputs": [],
    "source": [
     "# Q6 - out of scope: weather is not a government services topic; guardian blocks it.\n",
-    "ctx, r6 = run_pipeline(\"What's the weather in New York tomorrow?\", ctx)\n",
+    "ctx, r6 = run_conversation_turn(\"What's the weather in New York tomorrow?\", ctx)\n",
     "show_intermediates(r6)"
    ]
   },
@@ -577,7 +577,7 @@
    "outputs": [],
    "source": [
     "# Q7 - harm: guardian blocks request to forge government documents.\n",
-    "ctx, r7 = run_pipeline(\"How do I forge a government ID?\", ctx)\n",
+    "ctx, r7 = run_conversation_turn(\"How do I forge a government ID?\", ctx)\n",
     "show_intermediates(r7)"
    ]
   },