feat: DuckDB extension for zvec collections (MVP) by akuligowski9 · Pull Request #136 · alibaba/zvec

akuligowski9 · 2026-02-16T20:00:21Z

Summary

Implements a DuckDB extension (Phase 1 MVP) per [Feature]: Add duckdb extension #134, using a collection-bridge approach — zvec owns storage, DuckDB functions act as a SQL bridge to the Collection API
Adds 4 SQL functions: zvec_create, zvec_insert, zvec_search (table function), zvec_fetch (table function)
Includes thread-safe CollectionRegistry, full zvec↔DuckDB type mapping, JSON schema parser, and SQL tests

Design Decisions (seeking feedback)

Collection-bridge vs native DuckDB index: Chose bridge approach for simplicity — zvec manages storage, DuckDB functions query it. This avoids deep DuckDB storage layer integration but means collections are external to DuckDB's catalog.
Vector representation: DuckDB FLOAT[] arrays, converted to zvec raw binary format internally.
JSON-based schema & insert: Schema creation and document insertion use JSON strings for flexibility. Alternative: structured DuckDB parameters.
Global singleton registry: Open collections are cached in a mutex-protected singleton keyed by path. This enables concurrent read access but serializes writes.

Functions

Function	Type	SQL Signature
`zvec_create`	Scalar	`(path VARCHAR, schema_json VARCHAR) → VARCHAR`
`zvec_insert`	Scalar	`(path VARCHAR, pk VARCHAR, doc_json VARCHAR) → VARCHAR`
`zvec_search`	Table	`(path VARCHAR, field VARCHAR, vector FLOAT[], topk INT) → (pk, score, ...fields)`
`zvec_fetch`	Table	`(path VARCHAR, pk VARCHAR) → (pk, ...fields)`

Example Usage

SELECT zvec_create('/tmp/my_col', '{"name": "articles", "fields": [
  {"name": "title", "type": "STRING"},
  {"name": "embedding", "type": "VECTOR_FP32", "dimension": 128,
   "index": {"type": "HNSW", "metric": "COSINE"}}
]}');

SELECT zvec_insert('/tmp/my_col', 'doc1',
  '{"title": "hello world", "embedding": [0.1, 0.2, ...]}');

SELECT * FROM zvec_search('/tmp/my_col', 'embedding',
  [0.1, 0.2, ...]::FLOAT[], 10);

Test plan

Verify make builds the extension without errors against a DuckDB submodule
Verify the extension loads in DuckDB CLI
SQL test: create collection → insert docs → search → verify score ordering
SQL test: create → insert → fetch by PK → verify field values
Test with larger collections (100+ docs)

Closes #134

🤖 Generated with Claude Code

CLAassistant · 2026-02-16T20:00:37Z

All committers have signed the CLA.

Implements a collection-bridge DuckDB extension with 4 SQL functions: - zvec_create(path, schema_json): create collections from JSON schema - zvec_insert(path, pk, doc_json): insert documents with JSON fields - zvec_search(path, field, vector, topk): vector similarity search - zvec_fetch(path, pk): fetch documents by primary key Includes thread-safe CollectionRegistry, full type mapping between zvec and DuckDB types, JSON schema parser, and SQL tests. Closes alibaba#134 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

akuligowski9 mentioned this pull request Feb 16, 2026

[Feature]: Add duckdb extension #134

Open

akuligowski9 force-pushed the feat/duckdb-extension branch from 73a1766 to d3724aa Compare February 16, 2026 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: DuckDB extension for zvec collections (MVP)#136

feat: DuckDB extension for zvec collections (MVP)#136
akuligowski9 wants to merge 1 commit intoalibaba:mainfrom
akuligowski9:feat/duckdb-extension

akuligowski9 commented Feb 16, 2026

Uh oh!

CLAassistant commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akuligowski9 commented Feb 16, 2026

Summary

Design Decisions (seeking feedback)

Functions

Example Usage

Test plan

Uh oh!

CLAassistant commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Feb 16, 2026 •

edited

Loading