Embedding layer for OMOP CDM.
omop-emb now separates model metadata from embedding storage:
- model metadata is stored locally in SQLite (
metadata.db) - embedding vectors are stored by the selected backend (
pgvectororfaiss) - OMOP concept metadata remains in the OMOP CDM database
omop-emb now exposes backend-specific optional dependencies so installation
can match the embedding backend you actually intend to use.
pip install "omop-emb[pgvector]"
pip install "omop-emb[faiss]"
pip install "omop-emb[all]"Notes:
pgvectorinstalls the PostgreSQL/pgvector dependencies.faissinstalls the FAISS-based backend dependencies. This currently only includes CPU supportallinstalls both backend stacks for development or mixed environments.- A plain
pip install omop-embinstalls the shared core package only. - PostgreSQL-specific embedding dependencies are optional, but
omop-embstill requires OMOP CDM database access. - Non-PostgreSQL database backends have not yet been tested.
Common environment variables:
OMOP_EMB_BACKEND: backend name (pgvectororfaiss) used by the backend factory.OMOP_EMB_BASE_STORAGE_DIR: local base directory foromop-embartifacts, including local metadata (metadata.db) and FAISS files. If unset,omop-embdefaults to./.omop_embin the current working directory.OMOP_DATABASE_URL: SQLAlchemy URL for the OMOP CDM database.OMOP_EMB_DOCUMENT_EMBEDDING_PREFIX: task prefix prepended to concept texts at index time. Required for asymmetric models (e.g.search_document:for nomic-embed-text,passage:for E5).OMOP_EMB_QUERY_EMBEDDING_PREFIX: task prefix prepended to search queries at query time. Required for asymmetric models (e.g.search_query:for nomic-embed-text,query:for E5).
The prefix variables default to "" and are safe to omit for symmetric models. See Asymmetric Embeddings for details.
Extended documentation can be found here.
- Interface for PostgreSQL storage of vectors
- Interface for FAISS storage of embeddings
- Extensive unit testing
- Backend testing
- Corruption and restoration of DB testing
- Support importing and exporting of calculated embeddings
- Support non-Flat indices for each backend
-
faissGPU support -
pgvectorscalesupport - Vector-quantisation for more efficient storage