-
Notifications
You must be signed in to change notification settings - Fork 81
IBX-9846: Describe Embeddings search API #3029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 5.0
Are you sure you want to change the base?
Changes from all commits
d5b4419
6fb06d4
b4e31da
d018f87
51570c0
c12ae0e
6512ff2
c210ba9
ed5daa8
c409888
5db4c62
f6291bd
e45d0d1
9ef653b
3a73985
82b7059
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| <?php | ||
|
|
||
| declare(strict_types=1); | ||
|
|
||
| namespace App\Command; | ||
|
|
||
| use Ibexa\Contracts\Core\Repository\SearchService; | ||
| use Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQueryBuilder; | ||
| use Ibexa\Contracts\Core\Repository\Values\Content\Query\Criterion\ContentTypeIdentifier; | ||
| use Ibexa\Contracts\Core\Repository\Values\Content\Search\SearchHit; | ||
| use Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderResolverInterface; | ||
| use Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding; | ||
| use Symfony\Component\Console\Attribute\AsCommand; | ||
| use Symfony\Component\Console\Command\Command; | ||
| use Symfony\Component\Console\Input\InputInterface; | ||
| use Symfony\Component\Console\Output\OutputInterface; | ||
| use Symfony\Component\Console\Style\SymfonyStyle; | ||
|
|
||
| #[AsCommand( | ||
| name: 'ibexa:taxonomy:find-by-embedding', | ||
| description: 'Finds content using a taxonomy embedding query.' | ||
| )] | ||
| final class FindByTaxonomyEmbeddingCommand extends Command | ||
| { | ||
| public function __construct( | ||
| private readonly SearchService $searchService, | ||
| private readonly EmbeddingProviderResolverInterface $embeddingProviderResolver, | ||
| ) { | ||
| parent::__construct(); | ||
| } | ||
|
|
||
| protected function execute( | ||
| InputInterface $input, | ||
| OutputInterface $output | ||
| ): int { | ||
| $io = new SymfonyStyle($input, $output); | ||
|
|
||
| $embeddingProvider = $this->embeddingProviderResolver->resolve(); | ||
| $embedding = $embeddingProvider->getEmbedding('example_content'); | ||
|
|
||
| $query = EmbeddingQueryBuilder::create() | ||
| ->withEmbedding(new TaxonomyEmbedding($embedding)) | ||
| ->setFilter(new ContentTypeIdentifier('article')) | ||
| ->setLimit(10) | ||
| ->setOffset(0) | ||
| ->setPerformCount(true) | ||
| ->build(); | ||
|
|
||
| $result = $this->searchService->findContent($query); | ||
|
|
||
| $io->success(sprintf('Found %d items.', $result->totalCount)); | ||
|
|
||
| foreach ($result->searchHits as $searchHit) { | ||
| assert($searchHit instanceof SearchHit); | ||
|
|
||
| /** @var \Ibexa\Contracts\Core\Repository\Values\Content\Content $content */ | ||
| $content = $searchHit->valueObject; | ||
| $contentInfo = $content->versionInfo->contentInfo; | ||
|
|
||
| $io->writeln(sprintf( | ||
| '%d: %s', | ||
| $contentInfo->id, | ||
| $contentInfo->name | ||
| )); | ||
| } | ||
|
|
||
| return self::SUCCESS; | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| <?php declare(strict_types=1); | ||
|
|
||
| // Create an embedding field using the default embedding provider (type derived from configuration's field suffix) | ||
|
|
||
| /** @var Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory $factory */ | ||
| $embeddingField = $factory->create(); | ||
| echo $embeddingField->getType(); // for example, "ibexa_dense_vector_model_123" | ||
|
|
||
| // Create a custom embedding field with a specific type | ||
| $customField = $factory->create('custom_embedding_type'); | ||
| echo $customField->getType(); // "custom_embedding_type" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,101 @@ | ||
| --- | ||
| month_change: true | ||
| description: Embedding queries, embedding configuration, providers, and embedding search fields | ||
| --- | ||
|
|
||
| # Embeddings search reference | ||
|
|
||
| Embeddings provide vector representations of content or text, enabling [semantic similarity search](search_api.md#search-with-embeddings). | ||
| Foundational abstractions are provided for embedding-based search, while embedding providers generate vector representations. | ||
|
Check notice on line 9 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| Searching with embeddings is designed for use with the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature. | ||
|
Check notice on line 11 in docs/search/embeddings_reference/embeddings_reference.md
|
||
| The [`Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Taxonomy-Search-Query-Value-TaxonomyEmbedding.html) class allows embedding queries to target taxonomy data. | ||
|
|
||
| !!! tip | ||
|
|
||
| Searching with embeddings isn't possible with the Legacy Search engine. | ||
|
|
||
| ## Core query objects | ||
|
|
||
| ### EmbeddingQuery | ||
|
Check notice on line 20 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| - [`Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html) represents a semantic similarity search request. | ||
| It encapsulates an [Embedding](#embedding) instance and supports pagination, aggregations, and result counting through the same API as standard content queries. | ||
|
|
||
| !!! note "Embedding query properties" | ||
|
|
||
| Embedding queries do not use criteria for similarity, but for additional filtering applied through the query filter. | ||
|
Check warning on line 27 in docs/search/embeddings_reference/embeddings_reference.md
|
||
| Also, embedding queries do not allow standard Query properties supported by [search engines](search_engines.md) other than the Legacy Search, such as `query`, `sortClauses`, or `spellcheck` | ||
|
Check warning on line 28 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| - [EmbeddingQueryBuilder](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQueryBuilder.html) is a builder for constructing `EmbeddingQuery` instances. | ||
| It helps construct queries consistently and integrates embedding queries with the search query pipeline. | ||
| You must provide the required embedding value by using the `withEmbedding` method | ||
|
|
||
| ### Embedding | ||
|
|
||
| - [`Ibexa\Contracts\Core\Repository\Values\Content\Query\Embedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-Query-Embedding.html) represents the vector input used | ||
| for similarity search. | ||
| It stores embedding values as float arrays, while providers generate those vectors from text input | ||
|
|
||
| ## Query execution | ||
|
|
||
| Embedding queries are executed by the search engine by using the configured embedding model and provider. | ||
|
Check notice on line 42 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| At runtime, the system resolves the appropriate embedding provider and ensures that the embedding vector is compatible with the configured model. | ||
| Runtime validation includes validating vector dimensionality and selecting the correct indexed field for similarity search. | ||
| Field selection is determined by the configured embedding model and backend specific query mapping, while vector dimensionality is validated when the query reaches the search engine. | ||
|
Check notice on line 46 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| ## Embedding providers | ||
|
|
||
| Embedding providers implement the contract for generating vector representations of input data. | ||
| Out of the box, embedding search integration is provided for `TaxonomyEmbedding`. | ||
|
Check notice on line 51 in docs/search/embeddings_reference/embeddings_reference.md
|
||
| If you use a custom embedding value type, implement matching embedding visitors for your [search engine](search_engines.md). | ||
| Otherwise, query execution may fail due to no visitor available. | ||
|
|
||
| - [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html) generates embeddings for the provided text or other input | ||
|
|
||
| - [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html) lists available embedding providers or gets one by its identifier | ||
|
|
||
| - [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderResolverInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html) determines the embedding provider to be used for generating embeddings based on the system configuration, or a demand passed through the `resolveByModelIdentifier` method | ||
|
Check notice on line 59 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| ## Configuration | ||
|
|
||
| Models used to resolve embedding queries must be configured per SiteAccess in [system configuration](configuration.md). | ||
|
Check notice on line 63 in docs/search/embeddings_reference/embeddings_reference.md
|
||
| Each entry defines the model's name, vector dimensionality, the field suffix, and the embedding provider that generates vectors. | ||
| Field suffixes assigned to the models must be unique, as they becomes part of the indexed field name. | ||
| You select the default model by setting a value in the `default_embedding_model` key. | ||
|
|
||
| ``` yaml | ||
| ibexa: | ||
| system: | ||
| default: | ||
| embedding_models: | ||
| text-embedding-3-small: | ||
| name: 'text-embedding-3-small' | ||
| dimensions: 1536 | ||
| field_suffix: '3small' | ||
| embedding_provider: 'ibexa_openai' | ||
| default_embedding_model: text-embedding-ada-002 | ||
| ``` | ||
|
|
||
| For a real-life example of embedding models configuration, see [Taxonomy suggestions](taxonomy.md#change-the-embedding-generation-model). | ||
|
|
||
| - [EmbeddingConfigurationInterface](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingConfigurationInterface.html) allows access to the embedding model configuration in the system (for example, list of available models, default model name, default provider, field suffix, and so on) | ||
|
Check warning on line 83 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| ## Embedding fields | ||
|
|
||
| Embedding vectors are stored in dedicated search fields. | ||
|
Check notice on line 87 in docs/search/embeddings_reference/embeddings_reference.md
|
||
| These fields can be used by the search engine to perform vector similarity comparisons when embedding queries are executed. | ||
|
Check notice on line 88 in docs/search/embeddings_reference/embeddings_reference.md
|
||
|
|
||
| ``` php | ||
| [[= include_file('code_samples/api/public_php_api/src/embedding_fields.php') =]] | ||
| ``` | ||
|
|
||
| Once you create a field, subscribe to the `ContentIndexCreateEvent` indexing event that [adds the field to the index](index_custom_elasticsearch_data.md). | ||
|
|
||
|
|
||
| - [`Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-FieldType-EmbeddingFieldFactory.html) creates dedicated search fields that store embedding vectors | ||
|
|
||
| ## Validation | ||
|
|
||
| - [`Ibexa\Contracts\Core\Repository\Values\Content\QueryValidatorInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-QueryValidatorInterface.html) validates embedding query structure before execution | ||
Uh oh!
There was an error while loading. Please reload this page.