diff --git a/index.toml b/index.toml index b8a0abb..2b92622 100644 --- a/index.toml +++ b/index.toml @@ -124,20 +124,6 @@ created_at = 2024-02-16 dependencies = [] featured = true -[[tutorial]] -title = "Simplifying Pipeline Inputs with Multiplexer" -description = "Learn how to declutter the inputs of complex pipelines" -level = "intermediate" -weight = 84 -notebook = "37_Simplifying_Pipeline_Inputs_with_Multiplexer.ipynb" -aliases = [] -completion_time = "10 min" -created_at = 2024-02-19 -haystack_version = "2.3.1" -dependencies = ["transformers", "huggingface_hub>=0.23.0, <0.28.0"] -hidden = true -sitemap_exclude = true - [[tutorial]] title = "Embedding Metadata for Improved Retrieval" description = "Learn how to embed metadata while indexing, to improve the quality of retrieval results" diff --git a/tutorials/37_Simplifying_Pipeline_Inputs_with_Multiplexer.ipynb b/tutorials/37_Simplifying_Pipeline_Inputs_with_Multiplexer.ipynb deleted file mode 100644 index 3cfa137..0000000 --- a/tutorials/37_Simplifying_Pipeline_Inputs_with_Multiplexer.ipynb +++ /dev/null @@ -1,528 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "JFAFUa7BECmK" - }, - "source": [ - "# Tutorial: Simplifying Pipeline Inputs with Multiplexer\n", - "\n", - "\n", - "- **Level**: Intermediate\n", - "- **Time to complete**: 10 minutes\n", - "- **Components Used**: [Multiplexer](https://docs.haystack.deepset.ai/docs/multiplexer), [InMemoryDocumentStore](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore), [HuggingFaceAPIDocumentEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder), [HuggingFaceAPITextEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder), [InMemoryEmbeddingRetriever](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder), [HuggingFaceAPIGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and [AnswerBuilder](https://docs.haystack.deepset.ai/docs/answerbuilder)\n", - "- **Prerequisites**: You must have a [Hugging Face API Key](https://huggingface.co/settings/tokens) and be familiar with [creating pipelines](https://docs.haystack.deepset.ai/docs/creating-pipelines)\n", - "- **Goal**: After completing this tutorial, you'll have learned how to use a Multiplexer to simplify the inputs that `Pipeline.run()` get\n", - "\n", - "> As of version 2.2.0, `Multiplexer` has been deprecated in Haystack and will be completely removed from Haystack as of v2.4.0. We recommend using [BranchJoiner](https://docs.haystack.deepset.ai/docs/branchjoiner) instead. For more details about this deprecation, check out [Haystack 2.2.0 release notes](https://github.com/deepset-ai/haystack/releases/tag/v2.2.0) on Github." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "jy3ZkDzu9-CW" - }, - "source": [ - "## Overview\n", - "\n", - "If you've ever built a Haystack pipeline with more than 3-4 components, you probably noticed that the number of inputs to pass to the `run()` method of the pipeline grow endlessly. New components take some of their input from the other components of a pipeline, but many of them also require additional input from the user. As a result, the `data` input of `Pipeline.run()` grows and becomes very repetitive.\n", - "\n", - "There is one component that can help managing this repetition in a more effective manner, and it's called [`Multiplexer`](https://docs.haystack.deepset.ai/docs/multiplexer).\n", - "\n", - "In this tutorial, you will learn how to drastically simplify the `Pipeline.run()` of a RAG pipeline using a Multiplexer." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "RJPsjBXZKWnb" - }, - "source": [ - "## Setup\n", - "### Prepare the Colab Environment\n", - "\n", - "- [Enable GPU Runtime in Colab](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration)\n", - "- [Set logging level to INFO](https://docs.haystack.deepset.ai/docs/setting-the-log-level)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "CcK-dK--G5ng" - }, - "source": [ - "### Install Haystack\n", - "\n", - "Install Haystack with `pip`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0hwJTyV5HARC" - }, - "outputs": [], - "source": [ - "%%bash\n", - "\n", - "pip install haystack-ai \"huggingface_hub>=0.23.0, <0.28.0\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "uTNEeEcBJc_4" - }, - "source": [ - "### Enter a Hugging Face API key\n", - "\n", - "Set a Hugging Face API key:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "aiHltCF7JgaV", - "outputId": "b973435d-94c1-458a-8212-c543fd45ffab" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Enter a Hugging Face API Token:··········\n" - ] - } - ], - "source": [ - "import os\n", - "from getpass import getpass\n", - "\n", - "if \"HF_API_TOKEN\" not in os.environ:\n", - " os.environ[\"HF_API_TOKEN\"] = getpass(\"Enter Hugging Face token:\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e57ugQB7dYsQ" - }, - "source": [ - "## Indexing Documents with a Pipeline\n", - "\n", - "Create a pipeline to store the small example dataset in the [InMemoryDocumentStore](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore) with their embeddings. You will use [HuggingFaceAPIDocumentEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapidocumentembedder) to generate embeddings for your Documents and write them to the document store with the [DocumentWriter](https://docs.haystack.deepset.ai/docs/documentwriter).\n", - "\n", - "After adding these components to your pipeline, connect them and run the pipeline.\n", - "\n", - "> If you'd like to learn about preprocessing files before you index them to your document store, follow the [Preprocessing Different File Types](https://haystack.deepset.ai/tutorials/30_file_type_preprocessing_index_pipeline) tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "My_fx0lNJUVb", - "outputId": "b731efb8-14bb-4f13-ca49-d8706a777dd5" - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/anakin87/.virtualenvs/tutorials/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n", - "Calculating embeddings: 100%|██████████| 1/1 [00:00<00:00, 1.13it/s]\n" - ] - }, - { - "data": { - "text/plain": [ - "{'doc_writer': {'documents_written': 5}}" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from haystack import Pipeline, Document\n", - "from haystack.document_stores.in_memory import InMemoryDocumentStore\n", - "from haystack.components.writers import DocumentWriter\n", - "from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder\n", - "\n", - "documents = [\n", - " Document(content=\"My name is Jean and I live in Paris.\"),\n", - " Document(content=\"My name is Mark and I live in Berlin.\"),\n", - " Document(content=\"My name is Giorgio and I live in Rome.\"),\n", - " Document(content=\"My name is Giorgio and I live in Milan.\"),\n", - " Document(content=\"My name is Giorgio and I lived in many cities, but I settled in Naples eventually.\"),\n", - "]\n", - "\n", - "document_store = InMemoryDocumentStore()\n", - "\n", - "indexing_pipeline = Pipeline()\n", - "indexing_pipeline.add_component(\n", - " instance=HuggingFaceAPIDocumentEmbedder(\n", - " api_type=\"serverless_inference_api\", api_params={\"model\": \"sentence-transformers/all-MiniLM-L6-v2\"}\n", - " ),\n", - " name=\"doc_embedder\",\n", - ")\n", - "indexing_pipeline.add_component(instance=DocumentWriter(document_store=document_store), name=\"doc_writer\")\n", - "\n", - "indexing_pipeline.connect(\"doc_embedder.documents\", \"doc_writer.documents\")\n", - "\n", - "indexing_pipeline.run({\"doc_embedder\": {\"documents\": documents}})" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e9hOmQx4L2Lw" - }, - "source": [ - "## Building a RAG Pipeline\n", - "\n", - "Build a basic retrieval augmented generative pipeline with [HuggingFaceAPITextEmbedder](https://docs.haystack.deepset.ai/docs/huggingfaceapitextembedder), [InMemoryEmbeddingRetriever](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), [PromptBuilder](https://docs.haystack.deepset.ai/docs/promptbuilder) and [HuggingFaceAPIGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator). Additionally, add [AnswerBuilder](https://docs.haystack.deepset.ai/docs/answerbuilder) to help you enrich the generated answer with `meta` info and the `query` input.\n", - "\n", - "> For a step-by-step guide to create a RAG pipeline with Haystack, follow the [Creating Your First QA Pipeline with Retrieval-Augmentation](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline) tutorial" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 1000 - }, - "id": "ueu5W07IWyXa", - "outputId": "51419b90-14d8-4e4a-cd24-8884053b9688" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "🚅 Components\n", - " - embedder: HuggingFaceAPITextEmbedder\n", - " - retriever: InMemoryEmbeddingRetriever\n", - " - prompt_builder: PromptBuilder\n", - " - llm: HuggingFaceAPIGenerator\n", - " - answer_builder: AnswerBuilder\n", - "🛤️ Connections\n", - " - embedder.embedding -> retriever.query_embedding (List[float])\n", - " - retriever.documents -> prompt_builder.documents (List[Document])\n", - " - prompt_builder.prompt -> llm.prompt (str)\n", - " - llm.replies -> answer_builder.replies (List[str])\n", - " - llm.meta -> answer_builder.meta (List[Dict[str, Any]])" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from haystack.components.embedders import HuggingFaceAPITextEmbedder\n", - "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n", - "from haystack.components.builders import PromptBuilder, AnswerBuilder\n", - "from haystack.components.generators import HuggingFaceAPIGenerator\n", - "\n", - "template = \"\"\"\n", - " <|user|>\n", - " Answer the question based on the given context.\n", - "\n", - "Context:\n", - "{% for document in documents %}\n", - " {{ document.content }}\n", - "{% endfor %}\n", - "Question: {{ question }}\n", - "<|assistant|>\n", - "Answer:\n", - "\"\"\"\n", - "pipe = Pipeline()\n", - "pipe.add_component(\n", - " \"embedder\",\n", - " HuggingFaceAPITextEmbedder(\n", - " api_type=\"serverless_inference_api\", api_params={\"model\": \"sentence-transformers/all-MiniLM-L6-v2\"}\n", - " ),\n", - ")\n", - "pipe.add_component(\"retriever\", InMemoryEmbeddingRetriever(document_store=document_store))\n", - "pipe.add_component(\"prompt_builder\", PromptBuilder(template=template))\n", - "pipe.add_component(\n", - " \"llm\",\n", - " HuggingFaceAPIGenerator(api_type=\"serverless_inference_api\", api_params={\"model\": \"HuggingFaceH4/zephyr-7b-beta\"}),\n", - ")\n", - "pipe.add_component(\"answer_builder\", AnswerBuilder())\n", - "\n", - "pipe.connect(\"embedder.embedding\", \"retriever.query_embedding\")\n", - "pipe.connect(\"retriever\", \"prompt_builder.documents\")\n", - "pipe.connect(\"prompt_builder\", \"llm\")\n", - "pipe.connect(\"llm.replies\", \"answer_builder.replies\")\n", - "pipe.connect(\"llm.meta\", \"answer_builder.meta\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5xxvPqyurZTi" - }, - "source": [ - "## Running the Pipeline\n", - "Pass the `query` to `embedder`, `prompt_builder` and `answer_builder` and run it:" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "AIsphy4hJDpE", - "outputId": "4498e7c9-0ff2-424c-9ddd-535f8630572e" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "{'answer_builder': {'answers': [GeneratedAnswer(data=' Mark lives in Berlin, as stated in the first sentence of the context provided.', query='Where does Mark live?', documents=[], meta={'model': 'HuggingFaceH4/zephyr-7b-beta', 'finish_reason': None, 'usage': {'completion_tokens': 0}})]}}" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "query = \"Where does Mark live?\"\n", - "pipe.run({\"embedder\": {\"text\": query}, \"prompt_builder\": {\"question\": query}, \"answer_builder\": {\"query\": query}})" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "wrH2MGSBLvVC" - }, - "source": [ - "In this basic RAG pipeline, components require a `query` to operate are `embedder`, `prompt_builder`, and `answer_builder`. However, as you extend the pipeline with additional components like Retrievers and Rankers, the number of components needing a `query` can increase indefinitely. This leads to repetitive and increasingly complex `Pipeline.run()` calls. In such cases, using a Multiplexer can help simplify and declutter `Pipeline.run()`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ewDXDrw9N0CG" - }, - "source": [ - "## Introducing a Multiplexer\n", - "\n", - "The [Multiplexer](https://docs.haystack.deepset.ai/docs/multiplexer) is a component that can accept multiple input connections and then distributes the first value it receives to all components connected to its output. In this seeting, you can use this component by connecting it to other pipeline components that expect a `query` during runtime.\n", - "\n", - "Now, initialize the Multiplexer with the expected input type (in this case, `str` since the `query` is a string):" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "id": "kArO87EKN3N-" - }, - "outputs": [], - "source": [ - "from haystack.components.others import Multiplexer\n", - "\n", - "multiplexer = Multiplexer(str)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "vBGC2wO5LWIL" - }, - "source": [ - "## Adding the `Multiplexer` to the Pipeline\n", - "\n", - "Create the same RAG pipeline but this time with the Multiplexer. Add the Multiplexer to the pipeline and connect it to all the components that need the `query` as an input:" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 1000 - }, - "id": "CTmnCZvgEAut", - "outputId": "a0ab0df0-32f7-4778-954a-e7b9cc8b612d" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "🚅 Components\n", - " - multiplexer: Multiplexer\n", - " - embedder: HuggingFaceAPITextEmbedder\n", - " - retriever: InMemoryEmbeddingRetriever\n", - " - prompt_builder: PromptBuilder\n", - " - llm: HuggingFaceAPIGenerator\n", - " - answer_builder: AnswerBuilder\n", - "🛤️ Connections\n", - " - multiplexer.value -> embedder.text (str)\n", - " - multiplexer.value -> prompt_builder.question (str)\n", - " - multiplexer.value -> answer_builder.query (str)\n", - " - embedder.embedding -> retriever.query_embedding (List[float])\n", - " - retriever.documents -> prompt_builder.documents (List[Document])\n", - " - prompt_builder.prompt -> llm.prompt (str)\n", - " - llm.replies -> answer_builder.replies (List[str])\n", - " - llm.meta -> answer_builder.meta (List[Dict[str, Any]])" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from haystack.components.embedders import HuggingFaceAPITextEmbedder\n", - "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n", - "from haystack.components.builders import PromptBuilder, AnswerBuilder\n", - "from haystack.components.generators import HuggingFaceAPIGenerator\n", - "\n", - "template = \"\"\"\n", - " <|user|>\n", - " Answer the question based on the given context.\n", - "\n", - "Context:\n", - "{% for document in documents %}\n", - " {{ document.content }}\n", - "{% endfor %}\n", - "Question: {{ question }}\n", - "<|assistant|>\n", - "Answer:\n", - "\"\"\"\n", - "pipe = Pipeline()\n", - "\n", - "pipe.add_component(\"multiplexer\", multiplexer)\n", - "\n", - "pipe.add_component(\n", - " \"embedder\",\n", - " HuggingFaceAPITextEmbedder(\n", - " api_type=\"serverless_inference_api\", api_params={\"model\": \"sentence-transformers/all-MiniLM-L6-v2\"}\n", - " ),\n", - ")\n", - "pipe.add_component(\"retriever\", InMemoryEmbeddingRetriever(document_store=document_store))\n", - "pipe.add_component(\"prompt_builder\", PromptBuilder(template=template))\n", - "pipe.add_component(\n", - " \"llm\",\n", - " HuggingFaceAPIGenerator(api_type=\"serverless_inference_api\", api_params={\"model\": \"HuggingFaceH4/zephyr-7b-beta\"}),\n", - ")\n", - "pipe.add_component(\"answer_builder\", AnswerBuilder())\n", - "\n", - "# Connect the Multiplexer to all the components that need the query\n", - "pipe.connect(\"multiplexer.value\", \"embedder.text\")\n", - "pipe.connect(\"multiplexer.value\", \"prompt_builder.question\")\n", - "pipe.connect(\"multiplexer.value\", \"answer_builder.query\")\n", - "\n", - "pipe.connect(\"embedder.embedding\", \"retriever.query_embedding\")\n", - "pipe.connect(\"retriever\", \"prompt_builder.documents\")\n", - "pipe.connect(\"prompt_builder\", \"llm\")\n", - "pipe.connect(\"llm.replies\", \"answer_builder.replies\")\n", - "pipe.connect(\"llm.meta\", \"answer_builder.meta\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "i2wW4nbEQKhJ" - }, - "source": [ - "## Running the Pipeline with a Multiplexer\n", - "\n", - "Run the pipeline that you updated with a Multiplexer. This time, instead of passing the query to `prompt_builder`, `retriever` and `answer_builder` seperately, you only need to pass it to the `multiplexer`. As a result, you will get the same answer." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "YbIHBCKPQF4f", - "outputId": "32fb9d11-eec2-49d7-9ab1-97a90d9bbc28" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "{'answer_builder': {'answers': [GeneratedAnswer(data=' Mark lives in Berlin, as stated in the first sentence of the context provided.', query='Where does Mark live?', documents=[], meta={'model': 'HuggingFaceH4/zephyr-7b-beta', 'finish_reason': None, 'usage': {'completion_tokens': 0}})]}}" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "pipe.run({\"multiplexer\": {\"value\": \"Where does Mark live?\"}})" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "kPiSU2xoKmio" - }, - "source": [ - "## What's next\n", - "\n", - "🎉 Congratulations! You've simplified your pipeline run with a Multiplexer!\n", - "\n", - "If you liked this tutorial, there's more to learn about Haystack:\n", - "- [Creating a Hybrid Retrieval Pipeline](https://haystack.deepset.ai/tutorials/33_hybrid_retrieval)\n", - "- [Building Fallbacks to Websearch with Conditional Routing](https://haystack.deepset.ai/tutorials/36_building_fallbacks_with_conditional_routing)\n", - "- [Evaluating RAG Pipelines](https://haystack.deepset.ai/tutorials/35_evaluating_rag_pipelines)\n", - "\n", - "To stay up to date on the latest Haystack developments, you can [sign up for our newsletter](https://landing.deepset.ai/haystack-community-updates) or [join Haystack discord community](https://discord.gg/haystack).\n", - "\n", - "Thanks for reading!" - ] - } - ], - "metadata": { - "accelerator": "GPU", - "colab": { - "gpuType": "T4", - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.12" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} \ No newline at end of file