Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Small models with the right adapters consistently outperform much larger general
</p>

<p align="center"><em>aLoRA completes 20 of 32 RAG queries while standard LoRA is still waiting — same model, same hardware, different adapter technology.</em><br>
<a href="https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/05_alora_vs_lora_race.ipynb">Reproduce it yourself on Colab →</a></p>
<a href="https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/alora_vs_lora_race.ipynb">Reproduce it yourself on Colab →</a></p>

## Quick Start

Expand Down Expand Up @@ -114,9 +114,9 @@ New here? Start with a 5-minute notebook and work your way up:

| Notebook | What you'll build | Time | |
|---|---|---|---|
| [Hello Mellea](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/01_hello_mellea.ipynb) | Call adapters through a clean Python API | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/01_hello_mellea.ipynb) |
| [RAG Pipeline](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/03_01_govt_rag_pipeline_simple.ipynb) | Query rewrite + answerability + citations in one model | 30 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/03_01_govt_rag_pipeline_simple.ipynb) |
| [Compose Your Own](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/04_compose_granite_switch.ipynb) | Build a custom checkpoint from adapter libraries | 15 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/04_compose_granite_switch.ipynb) |
| [Hello Mellea](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_mellea.ipynb) | Call adapters through a clean Python API | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_mellea.ipynb) |
| [RAG Pipeline](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_pipeline.ipynb) | Query rewrite + answerability + citations in one model | 30 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_pipeline.ipynb) |
| [Compose Your Own](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/compose_granite_switch.ipynb) | Build a custom checkpoint from adapter libraries | 15 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/compose_granite_switch.ipynb) |

All notebooks run on Colab. See [tutorials/README.md](tutorials/README.md) for the full list and guided learning paths.

Expand Down
36 changes: 18 additions & 18 deletions tutorials/README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
# Granite Switch Tutorials

Granite Switch facilitates a modular architecture by consolidating multiple LoRA adapters into a single, unified checkpoint. The following tutorials explore the underlying mechanics and usability, detailing adapter invocation, multi-step pipelines with guardrails, and checkpoint composition.
Granite Switch facilitates a modular architecture by consolidating multiple LoRA adapters into a single, unified checkpoint. The following tutorials explore the underlying mechanics and usability, detailing adapter function invocation, multi-step pipelines with guardrails, and checkpoint composition.

## Notebooks

Step-by-step walkthroughs covering adapter invocation, pipeline construction, and model composition.
Step-by-step walkthroughs covering adapter function invocation, pipeline construction, and model composition.

| Notebook | Topics | Duration | Colab |
|----------|--------|----------|-------|
| [hello_mellea.ipynb](notebooks/hello_mellea.ipynb) | Mellea adapters intro with vLLM | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_mellea.ipynb) |
| [hello_mellea.ipynb](notebooks/hello_mellea.ipynb) | Mellea adapter functions intro with vLLM | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_mellea.ipynb) |
| [rag_101.ipynb](notebooks/rag_101.ipynb) | RAG 101: build a vector corpus and run a basic answerability check | 15 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_101.ipynb) |
| [rag_full_pipeline.ipynb](notebooks/rag_full_pipeline.ipynb) | Full RAG pipeline with guardian checks (harm + scope) | 30 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_pipeline.ipynb) |
| [rag_full_flow.ipynb](notebooks/rag_full_flow.ipynb) | Full RAG pipeline with guardian checks (harm + scope) | 30 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_flow.ipynb) |
| [compose_granite_switch.ipynb](notebooks/compose_granite_switch.ipynb) | Compose a checkpoint from adapter libraries | 15 min | |
| [alora_vs_lora_race.ipynb](notebooks/alora_vs_lora_race.ipynb) | ALORA vs LoRA race: side-by-side throughput comparison on a multi-step RAG pipeline | 20 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/alora_vs_lora_race.ipynb) |
| [hello_adapter.ipynb](notebooks/hello_adapter.ipynb) | Minimal adapter invocation with HuggingFace | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_adapter.ipynb) |
| [granite_switch_with_hf.ipynb](notebooks/granite_switch_with_hf.ipynb) | Compose + HuggingFace backend, `adapter_name=` invocation, Core + Guardian adapters in a multi-turn conversation | 10 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/granite_switch_with_hf.ipynb) |
| [hello_adapter.ipynb](notebooks/hello_adapter.ipynb) | Minimal adapter function invocation with HuggingFace | 5 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_adapter.ipynb) |
| [granite_switch_with_hf.ipynb](notebooks/granite_switch_with_hf.ipynb) | Compose + HuggingFace backend, `adapter_name=` invocation, Core + Guardian adapter functions in a multi-turn conversation | 10 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/granite_switch_with_hf.ipynb) |
| [granite_speech_demo.ipynb](notebooks/granite_speech_demo.ipynb) | Real-time voice assistant: Granite Speech STT + Granite Switch LLM + Granite Libraries validation, orchestrated by Mellea over WebRTC | 10 min | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/granite_speech_demo.ipynb) |

## Guides

| Guide | Description |
|-------|-------------|
| [Using Mellea with Granite Switch](guides/mellea_with_granite_switch.md) | Connect Mellea to a Granite Switch model |
| [Bring Your Own Adapter](guides/bring_your_own_adapter.md) | Train, compose, and use custom adapters |
| [Bring Your Own Adapter](guides/build_your_own_adapter.md) | Train, compose, and use custom adapters |
| [Compare Inference Throughput](guides/compare_inference_throughput.md) | Compare LoRA vs aLoRA based models in an inference race setup |


Expand All @@ -48,29 +48,29 @@ support coming soon.

### Path 2: Real-World Pipelines (Usability)

Best for: Seeing how adapters compose into multi-step applications
Best for: Seeing how adapter functions compose into multi-step applications

1. [RAG 101](notebooks/rag_101.ipynb) - corpus build + answerability check, the smallest end-to-end RAG demo [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_101.ipynb)
2. [Full RAG Pipeline with Guardians](notebooks/rag_full_pipeline.ipynb) - rewrite, answerability, citations, harm + scope checks [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_pipeline.ipynb)
2. [Full RAG Pipeline with Guardians](notebooks/rag_full_flow.ipynb) - rewrite, answerability, citations, harm + scope checks [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/rag_full_flow.ipynb)





### Path 3: Bring Your Own Adapter

Best for: Custom adapter development
Best for: Custom adapter function development

1. [Bring Your Own Adapter Guide](guides/bring_your_own_adapter.md)
2. [Configure Your Own Adapter Guide](guides/mellea_bring_your_own_adapter.md)
1. [Bring Your Own Adapter Guide](guides/build_your_own_adapter.md)
2. [Configure Your Own Adapter Guide](guides/mellea_build_your_own_adapter.md)
3. [Compose Your Checkpoint](notebooks/compose_granite_switch.ipynb)


### Path 4: Low-Level Understanding (HuggingFace)

Best for: Understanding how Granite Switch works at the control-token level

HuggingFace inference examples demonstrate how adapters are activated via control tokens, providing insight into the underlying mechanics. For most applications, we recommend running inference with Mellea (Part 2).
HuggingFace inference examples demonstrate how adapter functions are activated via control tokens, providing insight into the underlying mechanics. For most applications, we recommend running inference with Mellea (Part 2).
1. [Prerequisites](PREREQUISITES.md#huggingface-backend)
2. [Hello Adapter](notebooks/hello_adapter.ipynb) — see control tokens in action [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/hello_adapter.ipynb)
3. [Granite Switch with HuggingFace](notebooks/granite_switch_with_hf.ipynb) — detailed walkthrough [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/generative-computing/granite-switch/blob/main/tutorials/notebooks/granite_switch_with_hf.ipynb)
Expand All @@ -83,8 +83,8 @@ Runnable scripts in [`scripts/`](scripts/) for common tasks:

| Script | Description |
|--------|-------------|
| [run_adapter_generation_direct.py](scripts/reference/run_adapter_generation_direct.py) | Direct adapter invocation via control tokens |
| [run_adapter_generation_mellea.py](scripts/reference/run_adapter_generation_mellea.py) | Adapter invocation through Mellea |
| [run_adapter_generation_direct.py](scripts/reference/run_adapter_generation_direct.py) | Direct adapter function invocation via control tokens |
| [run_adapter_generation_mellea.py](scripts/reference/run_adapter_generation_mellea.py) | Adapter function invocation through Mellea |


## Adapter Libraries
Expand All @@ -93,9 +93,9 @@ Granite Switch checkpoints embed adapters drawn from IBM's granitelib libraries.

| Adapter | Purpose | Where used in tutorials | HF repo |
|---------|---------|-------------------------|---------|
| Core | Foundational post-generation adapters: certainty scoring, requirement checking, and response attribution. | [granite_switch_with_hf](notebooks/granite_switch_with_hf.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-core-r1.0](https://huggingface.co/ibm-granite/granitelib-core-r1.0) |
| RAG | Retrieval-augmented generation adapters: query rewrite, answerability, hallucination detection, and citation generation. | [hello_mellea](notebooks/hello_mellea.ipynb), [rag_101](notebooks/rag_101.ipynb), [rag_full_pipeline](notebooks/rag_full_pipeline.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-rag-r1.0](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) |
| Guardian | Safety and risk detection: harm, social bias, jailbreaking, factuality, and policy compliance checks. | [hello_adapter](notebooks/hello_adapter.ipynb), [hello_mellea](notebooks/hello_mellea.ipynb), [granite_switch_with_hf](notebooks/granite_switch_with_hf.ipynb), [rag_full_pipeline](notebooks/rag_full_pipeline.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-guardian-r1.0](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) |
| Core | Foundational post-generation adapter functions: certainty scoring, requirement checking, and response attribution. | [granite_switch_with_hf](notebooks/granite_switch_with_hf.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-core-r1.0](https://huggingface.co/ibm-granite/granitelib-core-r1.0) |
| RAG | Retrieval-augmented generation adapter functions: query rewrite, answerability, hallucination detection, and citation generation. | [hello_mellea](notebooks/hello_mellea.ipynb), [rag_101](notebooks/rag_101.ipynb), [rag_full_flow](notebooks/rag_full_flow.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-rag-r1.0](https://huggingface.co/ibm-granite/granitelib-rag-r1.0) |
| Guardian | Safety and risk detection: harm, social bias, jailbreaking, factuality, and policy compliance checks. | [hello_adapter](notebooks/hello_adapter.ipynb), [hello_mellea](notebooks/hello_mellea.ipynb), [granite_switch_with_hf](notebooks/granite_switch_with_hf.ipynb), [rag_full_flow](notebooks/rag_full_flow.ipynb), [compose_granite_switch](notebooks/compose_granite_switch.ipynb) | [ibm-granite/granitelib-guardian-r1.0](https://huggingface.co/ibm-granite/granitelib-guardian-r1.0) |

## External Resources

Expand Down
2 changes: 1 addition & 1 deletion tutorials/guides/compare_inference_throughput.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,4 @@ raced simultaneously.

- **[Hello Adapter](../notebooks/hello_adapter.ipynb)** - minimal embedded-adapter invocation via the HuggingFace backend
- **[Using Mellea with Granite Switch](mellea_with_granite_switch.md)** - deeper Mellea integration details
- **[Bring Your Own Adapter](bring_your_own_adapter.md)** - train a custom adapter and compose it in
- **[Bring Your Own Adapter](build_your_own_adapter.md)** - train a custom adapter and compose it in
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This guide explains how to configure your own adapter with Mellea to be used by

Together, Mellea + Granite Switch + vLLM provide a production-ready inference stack for adapter-based AI applications that can utilize custom adapters.
- See [Mellea With Granite Switch](mellea_with_granite_switch.md) for a detailed explanation of how granite-switch and Mellea work together.
- See [Bring Your Own Adapter](bring_your_own_adapter.md) for info on how to train your own adapter.
- See [Bring Your Own Adapter](build_your_own_adapter.md) for info on how to train your own adapter.
- See Mellea's [Lora and aLoRA adapters](https://docs.mellea.ai/advanced/lora-and-alora-adapters) for info on how to train your own custom adapters using Mellea.

## Prerequisites
Expand Down
2 changes: 1 addition & 1 deletion tutorials/guides/mellea_with_granite_switch.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ print(f"Citations: {citations}")
## Next Steps

- **[Hello Adapter](../notebooks/hello_adapter.ipynb)** - Minimal embedded-adapter invocation via the HuggingFace backend
- **[Bring Your Own Adapter](bring_your_own_adapter.md)** - Train a custom adapter and compose it in
- **[Bring Your Own Adapter](build_your_own_adapter.md)** - Train a custom adapter and compose it in
- **[Compare Inference Throughput](compare_inference_throughput.md)** - Benchmark ALORA vs LoRA on a 6-step RAG pipeline
- **[Mellea Repository](https://github.com/generative-computing/mellea)** - Full documentation
- **[Granite Models](https://huggingface.co/ibm-granite)**
Expand Down
4 changes: 2 additions & 2 deletions tutorials/notebooks/compose_granite_switch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@
"cell_type": "markdown",
"id": "generate-md",
"metadata": {},
"source": "## 6 * Generate against the composed model\n\nConnect Mellea to the running vLLM server, register the embedded adapters, and call the `rewrite_question` adapter. If it prints a cleaned-up version of the messy query, your composed checkpoint is wired up correctly."
"source": "## 6 * Generate against the composed model\n\nConnect Mellea to the running vLLM server, register the embedded adapters, and call the `rewrite_question` adapter function. If it prints a cleaned-up version of the messy query, your composed checkpoint is wired up correctly."
},
{
"cell_type": "code",
Expand Down Expand Up @@ -304,4 +304,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
Loading