Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
7652b78
update semantic search page
charlotte-hoblik Mar 29, 2026
c8d910d
applying tutorial content type guidelines
charlotte-hoblik Mar 31, 2026
d8974b9
restructuring Optimize vector storage for semantic search page
charlotte-hoblik Mar 31, 2026
0e6a629
Merge branch 'main' into charlotte-semantic-search-semantic-text
charlotte-hoblik Mar 31, 2026
ed81262
update links
charlotte-hoblik Mar 31, 2026
d76ffd0
update links
charlotte-hoblik Mar 31, 2026
7a3e06c
Apply changes based on feedback
charlotte-hoblik Apr 7, 2026
998eed4
Merge branch 'main' into charlotte-semantic-search-semantic-text
charlotte-hoblik Apr 7, 2026
2c3be28
Update solutions/search/vector/knn.md
charlotte-hoblik Apr 9, 2026
9ca83cf
Update solutions/search/semantic-search/semantic-search-semantic-text.md
charlotte-hoblik Apr 9, 2026
6066398
Update solutions/search/semantic-search/semantic-search-semantic-text.md
charlotte-hoblik Apr 9, 2026
1a2ab32
Update solutions/search/semantic-search/semantic-search-semantic-text.md
charlotte-hoblik Apr 9, 2026
95addb7
Update solutions/search/semantic-search/semantic-search-semantic-text.md
charlotte-hoblik Apr 9, 2026
c8ca455
Update solutions/search/vector/vector-storage-for-semantic-search.md
charlotte-hoblik Apr 9, 2026
81cc283
Update solutions/search/vector/vector-storage-for-semantic-search.md
charlotte-hoblik Apr 9, 2026
2e0e20c
Apply feedback
charlotte-hoblik Apr 9, 2026
e0e149d
Add curl and response examples
charlotte-hoblik Apr 9, 2026
16e1085
Add links to new pages
charlotte-hoblik Apr 9, 2026
aaa19bf
Merge branch 'main' into charlotte-semantic-search-semantic-text
charlotte-hoblik Apr 9, 2026
de7d7e7
Merge branch 'main' into charlotte-semantic-search-semantic-text
charlotte-hoblik Apr 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions explore-analyze/elastic-inference/inference-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ The following section lists the default {{infer}} endpoints, identified by their

Use the `inference_id` of the endpoint in a [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field definition or when creating an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md). The API call will automatically download and deploy the model which might take a couple of minutes. Default {{infer}} enpoints have adaptive allocations enabled. For these models, the minimum number of allocations is `0`. If there is no {{infer}} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.

For an end-to-end tutorial on using {{infer}} endpoints with `semantic_text` fields, refer to [Semantic search with `semantic_text`](/solutions/search/semantic-search/semantic-search-semantic-text.md).

## {{infer-cap}} endpoints UI [inference-endpoints]

The **{{infer-cap}} endpoints** page provides an interface for managing {{infer}} endpoints.
Expand Down
3 changes: 3 additions & 0 deletions solutions/search/hybrid-semantic-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,5 +238,8 @@ POST /_query?format=txt
:::
::::

## Related pages

- To set up semantic search before combining it with hybrid search, follow the [Semantic search with `semantic_text`](semantic-search/semantic-search-semantic-text.md) tutorial.
- To reduce memory usage for dense vector embeddings at scale, refer to [Optimizing vector storage](vector/vector-storage-for-semantic-search.md).

2 changes: 1 addition & 1 deletion solutions/search/semantic-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ You can also deploy NLP in {{es}} manually, without using an {{infer}} endpoint.
For an end-to-end tutorial, refer to [Semantic search with a model deployed in {{es}}](vector/dense-versus-sparse-ingest-pipelines.md).

::::{tip}
Refer to [vector queries and field types](vector.md#vector-queries-and-field-types) for a quick reference overview.
Refer to [vector queries and field types](vector.md#vector-queries-and-field-types) for a quick reference overview. To reduce the memory footprint of dense vector embeddings, refer to [Optimizing vector storage](vector/vector-storage-for-semantic-search.md).
::::

## Learn more [semantic-search-read-more]
Expand Down
425 changes: 243 additions & 182 deletions solutions/search/semantic-search/semantic-search-semantic-text.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion solutions/search/vector/dense-vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,5 +57,5 @@ New indices with 384 or more dimensions will default to BBQ HNSW automatically f
Learn more about how BBQ works, supported algorithms, and configuration examples in the [Better Binary Quantization (BBQ) documentation](https://www.elastic.co/docs/reference/elasticsearch/index-settings/bbq).

::::{tip}
When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md), you can configure BBQ and other quantization options through the `index_options` parameter. Refer to [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples of using `bbq_hnsw`, `int8_hnsw`, and other quantization strategies with semantic text fields.
When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md), you can configure BBQ and other quantization options through the `index_options` parameter. Refer to [Optimizing vector storage with `index_options`](vector-storage-for-semantic-search.md) for examples of using `bbq_hnsw`, `int8_hnsw`, and other quantization strategies with semantic text fields.
::::
2 changes: 1 addition & 1 deletion solutions/search/vector/knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ For approximate kNN, {{es}} stores dense vector values per segment as an [HNSW g
{applies_to}`stack: ga 9.2` In addition to search-time parameters, HNSW and DiskBBQ expose index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters:

::::{tip}
When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md) with dense vector embeddings, you can also configure `index_options` directly on the field. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples.
When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md) with dense vector embeddings, you can also configure `index_options` directly on the field. Refer to [Optimizing vector storage with `index_options`](vector-storage-for-semantic-search.md) for examples.
::::

```console
Expand Down
341 changes: 341 additions & 0 deletions solutions/search/vector/vector-storage-for-semantic-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,341 @@
---
navigation_title: Optimize vector storage for semantic search
applies_to:
stack:
serverless:
products:
- id: elasticsearch
type: how-to
description: Reduce the memory footprint of dense vector embeddings in semantic search by configuring quantization strategies on semantic_text fields.
---

# Optimize dense vector storage for semantic search [semantic-text-index-options]

When scaling semantic search, the memory footprint of dense vector embeddings is a primary concern. You can reduce storage requirements by configuring a [quantization strategy](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization) on your `semantic_text` fields using the `index_options` parameter.

This guide walks you through choosing a strategy and applying it to a `semantic_text` field mapping. For full details on all available quantization options and their parameters, refer to the [`dense_vector` field type reference](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).

## Requirements

- You need a `semantic_text` field that uses an {{infer}} endpoint producing **dense vector embeddings** (such as E5, OpenAI embeddings, or Cohere).
- If you use a custom model, create the {{infer}} endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).

::::{note}
These `index_options` do not apply to sparse vector models like ELSER, which use a different internal representation.
::::

:::{tip}
To run the `curl` examples on this page, set the following environment variables:
```bash
export ELASTICSEARCH_URL="your-elasticsearch-url"
export API_KEY="your-api-key"
```
To generate API keys, search for `API keys` in the [global search bar](/explore-analyze/find-and-organize/find-apps-and-objects.md). [Learn more about finding your endpoint and credentials](/solutions/elasticsearch-solution-project/search-connection-details.md).
:::

## Choose a quantization strategy

Select a quantization strategy based on your dataset size and performance requirements:

| Strategy | Memory reduction | Best for | Trade-offs |
|----------|-----------------|----------|------------|
| `bbq_hnsw` | Up to 32x | Most production use cases (default for 384+ dimensions) | Minimal accuracy loss |
| `bbq_flat` | Up to 32x | Smaller datasets needing maximum accuracy | Slower queries (brute-force search) |
| `bbq_disk` {applies_to}`stack: ga 9.2` | Up to 32x | Large datasets with constrained RAM | Slower queries (disk-based) |
| `int8_hnsw` | 4x | High accuracy retention | Lower compression than BBQ |
| `int4_hnsw` | 8x | Balance between compression and accuracy | Some accuracy loss |

For most use cases with dense vector embeddings from text models, we recommend [Better Binary Quantization (BBQ)](elasticsearch://reference/elasticsearch/mapping-reference/bbq.md). BBQ requires a minimum of 64 dimensions and works best with text embeddings.

## Configure your index mapping
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a separate tab called "Using curl" is weird here

Can we nest tabs so that each example has curl and console?

If not you just have a console followed by a curl in each existing tab


Create an index with a `semantic_text` field and set the `index_options` to your chosen quantization strategy.

:::::::::{tab-set}

::::::::{tab-item} BBQ with HNSW

```console
PUT semantic-embeddings-optimized
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch", <1>
"index_options": {
"dense_vector": {
"type": "bbq_hnsw" <2>
}
}
}
}
}
}
```

1. Reference to a text embedding {{infer}} endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
2. Uses [BBQ](elasticsearch://reference/elasticsearch/mapping-reference/bbq.md) with HNSW indexing for up to 32x memory reduction.

::::::::

::::::::{tab-item} BBQ flat

Use `bbq_flat` for smaller datasets where you need maximum accuracy at the expense of speed:

```console
PUT semantic-embeddings-flat
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch", <1>
"index_options": {
"dense_vector": {
"type": "bbq_flat" <2>
}
}
}
}
}
}
```
1. Reference to a text embedding {{infer}} endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
2. BBQ without HNSW for smaller datasets. Uses brute-force search, so queries are slower but indexing is lighter.

::::::::

::::::::{tab-item} DiskBBQ

```{applies_to}
stack: ga 9.2
serverless: unavailable
```

For large datasets where RAM is constrained, use `bbq_disk` (DiskBBQ) to minimize memory usage:

```console
PUT semantic-embeddings-disk
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch", <1>
"index_options": {
"dense_vector": {
"type": "bbq_disk" <2>
}
}
}
}
}
}
```
1. Reference to a text embedding {{infer}} endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
2. DiskBBQ keeps vectors compressed on disk, dramatically reducing RAM requirements at the cost of slower queries.

::::::::

::::::::{tab-item} Integer quantization

```console
PUT semantic-embeddings-int8
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch", <1>
"index_options": {
"dense_vector": {
"type": "int8_hnsw" <2>
}
}
}
}
}
}
```
1. Reference to a text embedding {{infer}} endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
2. 8-bit integer quantization for ~4x memory reduction. For higher compression, use `"type": "int4_hnsw"` (~8x reduction).

::::::::

::::::::{tab-item} Using curl

The following example creates an index with BBQ and HNSW quantization:

```bash
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-optimized" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch", <1>
"index_options": {
"dense_vector": {
"type": "bbq_hnsw"
}
}
}
}
}
}'
```

1. Reference to a text embedding {{infer}} endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
2. Uses [BBQ](elasticsearch://reference/elasticsearch/mapping-reference/bbq.md) with HNSW indexing for up to 32x memory reduction.

To use a different quantization strategy, replace `"type": "bbq_hnsw"` with your chosen strategy (`bbq_flat`, `bbq_disk`, `int8_hnsw`, or `int4_hnsw`) and update the index name accordingly.

::::::::

:::::::::


:::{dropdown} Example response

```js
{
"acknowledged": true, <1>
"shards_acknowledged": true,
"index": "semantic-embeddings-optimized"
}
```

1. `true` confirms the index was created successfully with your mapping configuration.

:::

## Verify your configuration

Confirm that the `index_options` are applied to your index:

::::{tab-set}

:::{tab-item} Console

```console
GET semantic-embeddings-optimized/_mapping
```

:::

:::{tab-item} curl

```bash
curl -X GET "${ELASTICSEARCH_URL}/semantic-embeddings-optimized/_mapping" \
-H "Authorization: ApiKey ${API_KEY}"
```

:::

::::

The response includes the `index_options` you configured under the `content` field's mapping. If the `index_options` block is missing, check that you specified it correctly in the `PUT` request.

:::{dropdown} Example response

```js
{
"semantic-embeddings-optimized": {
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": { <1>
"dense_vector": {
"type": "bbq_hnsw"
}
}
}
}
}
}
}
```

1. The `index_options` block confirms your quantization strategy is applied. After indexing data, the mapping may also include auto-detected `model_settings` such as dimensions and similarity metric.

Check notice on line 264 in solutions/search/vector/vector-storage-for-semantic-search.md

View workflow job for this annotation

GitHub Actions / build / vale

Elastic.WordChoice: Consider using 'can, might' instead of 'may', unless the term is in the UI.

:::

## (Optional) Tune HNSW parameters

For HNSW-based strategies, you can tune graph parameters like `m` and `ef_construction` in the `index_options`. Refer to the [`dense_vector` field type reference](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) for the full list of tunable parameters.

::::{tab-set}

:::{tab-item} Console

```console
PUT semantic-embeddings-custom
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_hnsw",
"m": 32, <1>
"ef_construction": 200 <2>
}
}
}
}
}
}
```

1. Controls graph connectivity. Higher values improve recall at the cost of memory. Default: `16`.
2. Controls index build quality. Higher values improve quality but slow indexing. Default: `100`.

:::

:::{tab-item} curl

```bash
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-custom" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_hnsw",
"m": 32,
"ef_construction": 200
}
}
}
}
}
}'
```

:::

::::

## Next steps

- Follow the [Semantic search with `semantic_text`](../semantic-search/semantic-search-semantic-text.md) tutorial to set up an end-to-end semantic search workflow.
- Combine semantic search with keyword search using [hybrid search](../hybrid-semantic-text.md).

## Related pages

- [`dense_vector` `index_options` reference](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options)
- [Better Binary Quantization (BBQ)](elasticsearch://reference/elasticsearch/mapping-reference/bbq.md)
- [Dense vector search](dense-vector.md)
- [Trained model autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md)
1 change: 1 addition & 0 deletions solutions/toc.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
project: "Solutions and use cases"
toc:
- file: index.md
Expand Down Expand Up @@ -35,6 +35,7 @@
children:
- file: search/vector/knn.md
- file: search/vector/bring-own-vectors.md
- file: search/vector/vector-storage-for-semantic-search.md
- file: search/vector/sparse-vector.md
- file: search/vector/dense-versus-sparse-ingest-pipelines.md
- file: search/semantic-search.md
Expand Down
Loading