Skip to content

Allow custom model names for Dense, Sparse embeddings, fix Sparse Embeddings#146

Open
tomaarsen wants to merge 1 commit intoalibaba:mainfrom
tomaarsen:feat/expand_st_model_support
Open

Allow custom model names for Dense, Sparse embeddings, fix Sparse Embeddings#146
tomaarsen wants to merge 1 commit intoalibaba:mainfrom
tomaarsen:feat/expand_st_model_support

Conversation

@tomaarsen
Copy link

Hello!

Pull Request overview

  • Allow custom model_name in DefaultLocalSparseEmbedding and DefaultLocalDenseEmbedding
  • Simplify model loading, only 1 _get_model across DenseEmbedding, SparseEmbedding, and ReRanker via a _get_model_class property
  • Fix SparseEncoder model loading: It used SentenceTransformer previously. Also, encode_query & encode_document exists for both SparseEncoder and SentenceTransformer.
  • Removed _manual_sparse_encode for older Sentence Transformer: I would recommend requiring at least Sentence Transformers v5.0 so the SparseEncoder can be used.
  • Replaced SentenceTransformerReRanker with DefaultLocalReRanker in the docs, as the previous doesn't exist anymore.

Details:

This PR extends the support for local dense, sparse, and reranker models a good bit. I've had some installation issues, so I'm unable to properly run the tests at the moment, but you should get much clearer results for the SparseEmbedding class. I also simplified the sparse embedding post-processing using model.decode, which also removes some issues with SparseCUDA and SparseCPU not implementing all operations.

Please do let me know if you have any questions.

  • Tom Aarsen

@CLAassistant
Copy link

CLAassistant commented Feb 17, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants