Skip to content

feat(indexers): add framework for default hooks#140

Open
lukeroantreeONS wants to merge 1 commit intomainfrom
139-default-hooks
Open

feat(indexers): add framework for default hooks#140
lukeroantreeONS wants to merge 1 commit intomainfrom
139-default-hooks

Conversation

@lukeroantreeONS
Copy link
Collaborator

📌 Add framework for flexible, common hooks to be provided as part of the package

✨ Summary

Introduces a hooks submodule within indexers to provide robust, flexible hooks to the user to cover common pre- and post-processing tasks, and provide a base class should an advanced user want to implement custom ones.

Note: this does not remove the ability of a user to define a hook function themselves, without relying on the base class, as is currently the recommended approach.

📜 Changes Introduced

  • feat(indexers): hook_factory defines base classes and shared utilities
  • feat(indexers): default_hooks contains particular flexible pre- or post-processing hooks we offer as part of the package

✅ Checklist

Please confirm you've completed these checks before requesting a review.

  • Code passes linting with Ruff
  • Security checks pass using Bandit
  • API and Unit tests are written and pass using pytest
  • Terraform files (if applicable) follow best practices and have been validated (terraform fmt & terraform validate)
  • DocStrings follow Google-style and are added as per Pylint recommendations
  • Documentation has been updated if needed

🔍 How to Test

  1. re-build the package with uv build
  2. re-install classifai from dist/classifai<version>.whl with any optional dependencies you want to test with
  3. create a test.py file as described below:
from classifai.indexers import VectorStore
from classifai.indexers.dataclasses import VectorStoreSearchInput
from classifai.indexers.hooks import CapitalisationStandardisingHook
from classifai.vectorisers import HuggingFaceVectoriser

demo_vectoriser = HuggingFaceVectoriser(model_name="sentence-transformers/all-MiniLM-L6-v2")

default_hook_capitalise = CapitalisationStandardisingHook(method="upper")

demo_vectorstore = VectorStore(
    file_name="./data/fake_soc_dataset.csv",
    data_type="csv",
    vectoriser=demo_vectoriser,
    output_dir="./demo_vdb",
    overwrite=True,
    hooks={"search_preprocess": default_hook_capitalise},
)

query_df = VectorStoreSearchInput({"id": [1, 2], "query": ["apple merchant", "pub landlord"]})

results = demo_vectorstore.search(query_df, n_results=5)

print(results)
  1. run uv run test.py; confirm it runs successfully, and the query_text column in the output is all capitalised.
  2. repeat, changing the method parameter within the CapitalisationStandardisingHook object initialisation

@lukeroantreeONS lukeroantreeONS requested a review from a team as a code owner March 5, 2026 15:54
@lukeroantreeONS lukeroantreeONS linked an issue Mar 5, 2026 that may be closed by this pull request
@github-actions github-actions bot added the enhancement New feature or request label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default Hooks

1 participant