diff --git a/code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php b/code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php new file mode 100644 index 0000000000..4d0af39535 --- /dev/null +++ b/code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php @@ -0,0 +1,69 @@ +embeddingProviderResolver->resolve(); + $embedding = $embeddingProvider->getEmbedding('example_content'); + + $query = EmbeddingQueryBuilder::create() + ->withEmbedding(new TaxonomyEmbedding($embedding)) + ->setFilter(new ContentTypeIdentifier('article')) + ->setLimit(10) + ->setOffset(0) + ->setPerformCount(true) + ->build(); + + $result = $this->searchService->findContent($query); + + $io->success(sprintf('Found %d items.', $result->totalCount)); + + foreach ($result->searchHits as $searchHit) { + assert($searchHit instanceof SearchHit); + + /** @var \Ibexa\Contracts\Core\Repository\Values\Content\Content $content */ + $content = $searchHit->valueObject; + $contentInfo = $content->versionInfo->contentInfo; + + $io->writeln(sprintf( + '%d: %s', + $contentInfo->id, + $contentInfo->name + )); + } + + return self::SUCCESS; + } +} diff --git a/code_samples/api/public_php_api/src/embedding_fields.php b/code_samples/api/public_php_api/src/embedding_fields.php new file mode 100644 index 0000000000..6276bab30e --- /dev/null +++ b/code_samples/api/public_php_api/src/embedding_fields.php @@ -0,0 +1,11 @@ +create(); +echo $embeddingField->getType(); // for example, "ibexa_dense_vector_model_123" + +// Create a custom embedding field with a specific type +$customField = $factory->create('custom_embedding_type'); +echo $customField->getType(); // "custom_embedding_type" diff --git a/docs/content_management/content_api/managing_content.md b/docs/content_management/content_api/managing_content.md index a8d6b05007..976b53354c 100644 --- a/docs/content_management/content_api/managing_content.md +++ b/docs/content_management/content_api/managing_content.md @@ -122,7 +122,7 @@ $this->trashService->recover($trashItem, $newParent); ``` You can also search through Trash items and sort the results using several public PHP API Search Criteria and Sort Clauses that have been exposed for `TrashService` queries. -For more information, see [Searching in trash](search_api.md#searching-in-trash). +For more information, see [Search in trash](search_api.md#search-in-trash). ## Content types diff --git a/docs/release_notes/ez_platform_v3.1.md b/docs/release_notes/ez_platform_v3.1.md index 07cc9e3404..a17b4471f4 100644 --- a/docs/release_notes/ez_platform_v3.1.md +++ b/docs/release_notes/ez_platform_v3.1.md @@ -122,7 +122,7 @@ A customizable search controller has been extracted and placed in `ezplatform-se You can now search through the contents of Trash and sort the search results based on a number of Search Criteria and Sort Clauses that can be used by the `\eZ\Publish\API\Repository\TrashService::findTrashItems` method only. -For more information, see [Searching in trash](https://doc.ibexa.co/en/latest/api/public_php_api_search/#searching-in-trash). +For more information, see [Search in trash](https://doc.ibexa.co/en/latest/api/public_php_api_search/#search-in-trash). ### Repository filtering diff --git a/docs/search/embeddings_reference/embeddings_reference.md b/docs/search/embeddings_reference/embeddings_reference.md new file mode 100644 index 0000000000..527735f566 --- /dev/null +++ b/docs/search/embeddings_reference/embeddings_reference.md @@ -0,0 +1,101 @@ +--- +month_change: true +description: Embedding queries, embedding configuration, providers, and embedding search fields +--- + +# Embeddings search reference + +Embeddings provide vector representations of content or text, enabling [semantic similarity search](search_api.md#search-with-embeddings). +Foundational abstractions are provided for embedding-based search, while embedding providers generate vector representations. + +Searching with embeddings is designed for use with the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature. +The [`Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Taxonomy-Search-Query-Value-TaxonomyEmbedding.html) class allows embedding queries to target taxonomy data. + +!!! tip + + Searching with embeddings isn't possible with the Legacy Search engine. + +## Core query objects + +### EmbeddingQuery + +- [`Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html) represents a semantic similarity search request. + It encapsulates an [Embedding](#embedding) instance and supports pagination, aggregations, and result counting through the same API as standard content queries. + + !!! note "Embedding query properties" + + Embedding queries do not use criteria for similarity, but for additional filtering applied through the query filter. + Also, embedding queries do not allow standard Query properties supported by [search engines](search_engines.md) other than the Legacy Search, such as `query`, `sortClauses`, or `spellcheck` + +- [EmbeddingQueryBuilder](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQueryBuilder.html) is a builder for constructing `EmbeddingQuery` instances. + It helps construct queries consistently and integrates embedding queries with the search query pipeline. + You must provide the required embedding value by using the `withEmbedding` method + +### Embedding + +- [`Ibexa\Contracts\Core\Repository\Values\Content\Query\Embedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-Query-Embedding.html) represents the vector input used +for similarity search. + It stores embedding values as float arrays, while providers generate those vectors from text input + +## Query execution + +Embedding queries are executed by the search engine by using the configured embedding model and provider. + +At runtime, the system resolves the appropriate embedding provider and ensures that the embedding vector is compatible with the configured model. +Runtime validation includes validating vector dimensionality and selecting the correct indexed field for similarity search. +Field selection is determined by the configured embedding model and backend specific query mapping, while vector dimensionality is validated when the query reaches the search engine. + +## Embedding providers + +Embedding providers implement the contract for generating vector representations of input data. +Out of the box, embedding search integration is provided for `TaxonomyEmbedding`. +If you use a custom embedding value type, implement matching embedding visitors for your [search engine](search_engines.md). +Otherwise, query execution may fail due to no visitor available. + +- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html) generates embeddings for the provided text or other input + +- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html) lists available embedding providers or gets one by its identifier + +- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderResolverInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html) determines the embedding provider to be used for generating embeddings based on the system configuration, or a demand passed through the `resolveByModelIdentifier` method + +## Configuration + +Models used to resolve embedding queries must be configured per SiteAccess in [system configuration](configuration.md). +Each entry defines the model's name, vector dimensionality, the field suffix, and the embedding provider that generates vectors. +Field suffixes assigned to the models must be unique, as they becomes part of the indexed field name. +You select the default model by setting a value in the `default_embedding_model` key. + +``` yaml +ibexa: + system: + default: + embedding_models: + text-embedding-3-small: + name: 'text-embedding-3-small' + dimensions: 1536 + field_suffix: '3small' + embedding_provider: 'ibexa_openai' + default_embedding_model: text-embedding-ada-002 +``` + +For a real-life example of embedding models configuration, see [Taxonomy suggestions](taxonomy.md#change-the-embedding-generation-model). + +- [EmbeddingConfigurationInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingConfigurationInterface.html) allows access to the embedding model configuration in the system (for example, list of available models, default model name, default provider, field suffix, and so on) + +## Embedding fields + +Embedding vectors are stored in dedicated search fields. +These fields can be used by the search engine to perform vector similarity comparisons when embedding queries are executed. + +``` php +[[= include_file('code_samples/api/public_php_api/src/embedding_fields.php') =]] +``` + +Once you create a field, subscribe to the `ContentIndexCreateEvent` indexing event that [adds the field to the index](index_custom_elasticsearch_data.md). + + +- [`Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-FieldType-EmbeddingFieldFactory.html) creates dedicated search fields that store embedding vectors + +## Validation + +- [`Ibexa\Contracts\Core\Repository\Values\Content\QueryValidatorInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-QueryValidatorInterface.html) validates embedding query structure before execution diff --git a/docs/search/search_api.md b/docs/search/search_api.md index 83de734dff..6287959309 100644 --- a/docs/search/search_api.md +++ b/docs/search/search_api.md @@ -1,4 +1,5 @@ --- +month_change: true description: You can search for content, locations and products by using the PHP API. Fine-tune the search with Search Criteria, Sort Clauses and Aggregations. --- @@ -18,7 +19,7 @@ The service should be [injected into the constructor of your command or controll `SearchService` is also used in the back office of [[= product_name =]], in components such as Universal Discovery Widget or Sub-items List. -### Performing a search +### Perform search To search through content you need to create a [`LocationQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-LocationQuery.html) and provide your Search Criteria as a series of Criterion objects. @@ -70,7 +71,7 @@ As such, `query` is recommended when the search is based on user input. The difference between `query` and `filter` is only relevant when using Solr or Elasticsearch search engine. With the Legacy search engine both properties give identical results. -#### Processing large result sets +#### Process large result sets To process a large result set, use [`Ibexa\Contracts\Core\Repository\Iterator\BatchIterator`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Iterator-BatchIterator.html). `BatchIterator` divides the results of search or filtering into smaller batches. @@ -175,7 +176,7 @@ $filter It's recommended to use an IDE that can recognize type hints when working with Repository Filtering. If you try to use an unsupported Criterion or Sort Clause, the IDE indicates an issue. -## Searching in a controller +## Search in controller You can use the `SearchService` or repository filtering in a controller, as long as you provide the required parameters. For example, in the code below, `locationId` is provided to list all children of a location by using the `SearchService`. @@ -196,7 +197,7 @@ When using Repository filtering, provide the results of `ContentService::find()` [[= include_file('code_samples/api/public_php_api/src/Controller/CustomFilterController.php', 16, 31) =]] ``` -### Paginating search results +### Paginate search results To paginate search or filtering results, it's recommended to use the [Pagerfanta library](https://github.com/BabDev/Pagerfanta) and [[[= product_name =]]'s adapters for it.](https://github.com/ibexa/core/blob/main/src/lib/Pagination/Pagerfanta/Pagerfanta.php) @@ -258,7 +259,7 @@ that doesn't belong to the provided Section: [[= include_file('code_samples/api/public_php_api/src/Command/FindComplexCommand.php', 46, 54) =]] ``` -### Combining independent Criteria +### Combine independent Criteria Criteria are independent of one another. This can lead to unexpected behavior, for instance because content can have multiple locations. @@ -281,7 +282,7 @@ Even though the location B is hidden, the query finds the content because both c - the content item is visible (it has the visible location A) -## Sorting results +## Sort results To sort the results of a query, use one of more [Sort Clauses](sort_clause_reference.md). @@ -295,27 +296,6 @@ For example, to order search results by their publication date, from oldest to n For the full list and details of available Sort Clauses, see [Sort Clause reference](sort_clause_reference.md). -## Searching in trash - -In the user interface, on the **Trash** screen, you can search for content items, and then sort the results based on different criteria. -To search the trash with the API, use the `TrashService::findInTrash` method to submit a query for content items that are held in trash. -Searching in trash supports a limited set of Criteria and Sort Clauses. -For a list of supported Criteria and Sort Clauses, see [Search in trash reference](search_in_trash_reference.md). - -!!! note - - Searching through the trashed content items operates directly on the database, therefore you cannot use external search engines, such as Solr or Elasticsearch, and it's impossible to reindex the data. - -``` php -[[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 4, 6) =]]//... -[[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 35, 42) =]] -``` - -!!! caution - - Make sure that you set the Criterion on the `filter` property. - It's impossible to use the `query` property, because the search in trash operation filters the database instead of querying. - ## Aggregation !!! caution "Feature support" @@ -378,4 +358,64 @@ $query->aggregations[] = new IntegerRangeAggregation('range', 'person', 'age', `null` means that a range doesn't have an end. In the example all values above (and including) 60 are included in the last range. -See [Agrregation reference](aggregation_reference.md) for details of all available aggregations. +See [Aggregation reference](aggregation_reference.md) for details of all available aggregations. + +## Search with embeddings + + +!!! caution "Feature support" + + Search with embeddings is only available with the Solr and Elasticsearch search engines. + +Embeddings are numerical representations that capture the meaning of text, images, or other content. +AI providers generate embeddings by converting words or documents into lists of numbers, instead of treating them as plain text. +Such lists, aka vectors, can then be compared to find content with similar meaning. + +Searching with embeddings enables matching content based on meaning rather than exact text matches. +Instead of comparing keywords, the system compares vectors that represent the semantic meaning of content and the query input. + +!!! note "Taxonomy suggestions" + + Embedding queries have been introduced primarily to support the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature, therefore embedding search integration is provided for `TaxonomyEmbedding`. + +You can narrow down the search results, for example, by content type or location. +To do this, combine searching with embeddings with filters. +Repository search also respects the permissions of the current user. + +An embedding query is represented by the [`Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html)` value object. +The object encapsulates the embedding used for similarity search and optional search parameters such as filtering, pagination, aggregations, and result counting. + +### Use embedding queries in search + +Embedding queries are executed through the search API in the same way as other search requests. +You build an `EmbeddingQuery` instance by using a builder and pass it to the search service. + +This example shows a minimal embedding query executed directly through the search service: + +``` php +// ... +[[= include_file('code_samples/api/public_php_api/src/Command/FindByTaxonomyEmbeddingCommand.php') =]] +``` + +For more information, see [Embeddings reference](embeddings_reference.md). + +## Search in trash + +In the user interface, on the **Trash** screen, you can search for content items, and then sort the results based on different criteria. +To search the trash with the API, use the `TrashService::findInTrash` method to submit a query for content items that are held in trash. +Searching in trash supports a limited set of Criteria and Sort Clauses. +For a list of supported Criteria and Sort Clauses, see [Search in trash reference](search_in_trash_reference.md). + +!!! note + + Searching through the trashed content items operates directly on the database, therefore you cannot use external search engines, such as Solr or Elasticsearch, and it's impossible to reindex the data. + +``` php +[[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 4, 6) =]]//... +[[= include_file('code_samples/api/public_php_api/src/Command/FindInTrashCommand.php', 35, 42) =]] +``` + +!!! caution + + Make sure that you set the Criterion on the `filter` property. + It's impossible to use the `query` property, because the search in trash operation filters the database instead of querying. diff --git a/docs/search/search_criteria_and_sort_clauses.md b/docs/search/search_criteria_and_sort_clauses.md index a6a624322b..b772074ab1 100644 --- a/docs/search/search_criteria_and_sort_clauses.md +++ b/docs/search/search_criteria_and_sort_clauses.md @@ -79,7 +79,7 @@ Available tags for Sort Clause handlers in Legacy Storage Engine are: - for Criterion handlers: `ibexa.core.trash.search.legacy.gateway.criterion_handler` - for Sort Clause handlers: `ibexa.core.trash.search.legacy.gateway.sort_clause_handler` - For more information about searching for content items in Trash, see [Searching in trash](search_api.md#searching-in-trash). + For more information about searching for content items in Trash, see [Search in trash](search_api.md#search-in-trash). For more information about the Criteria and Sort Clauses that are supported when searching for trashed content items, see [Searching in trash reference](search_in_trash_reference.md). diff --git a/docs/search/search_in_trash_reference.md b/docs/search/search_in_trash_reference.md index 681f53eaa5..59f7b80aca 100644 --- a/docs/search/search_in_trash_reference.md +++ b/docs/search/search_in_trash_reference.md @@ -6,7 +6,7 @@ month_change: false # Search in trash reference -When you [search for content items that are held in trash](search_api.md#searching-in-trash), you can apply only a limited subset of Search Criteria and Sort Clauses +When you [search for content items that are held in trash](search_api.md#search-in-trash), you can apply only a limited subset of Search Criteria and Sort Clauses which can be used by [`Ibexa\Contracts\Core\Repository\TrashService::findTrashItems`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-TrashService.html#method_findTrashItems). Some sort clauses are exclusive to trash search. diff --git a/mkdocs.yml b/mkdocs.yml index 77a37fda43..0a5abe7762 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -798,6 +798,7 @@ nav: - ProductPriceRangeAggregation: search/aggregation_reference/productpricerange_aggregation.md - ProductTypeTermAggregation: search/aggregation_reference/producttypeterm_aggregation.md - TaxonomyEntryIdAggregation: search/aggregation_reference/taxonomyentryid_aggregation.md + - Embeddings search reference: search/embeddings_reference/embeddings_reference.md - Search in trash reference: search/search_in_trash_reference.md - Extend search: - Create custom Search Criterion: search/extensibility/create_custom_search_criterion.md