Skip to content

Abstracting Model Usage #406

@josephjclark

Description

@josephjclark

General concerns

Anthropic specific features we use

  • structured outputs
  • answer prefillng

Not all models are equal - so we should expect and support different levels of feature parity. Eg, model X may not support streaming. How do we declare and work around this?

making the embeddings swappable is itself going to be complex and hard. 1. need to generate embeddings (for the corpus and user question), 2. need to store embeddings to search against. So you need an embedding API and and embedding database (with search). Even with langchain this stuff is kind of hard. We should standardise the DB on postgres, which means we'd only need to abstract out the embedding service. TODO raise this out as a standalone issue because we should do this anyway.

models are not all equal. Eg in RAG we use a cheaper model for some queries. So we might need a means of supporting multiple models per models. eg big model, quick model, media model

different token usage per backing model doens't really matter (thunderbolt tracks token use and the user provides an api key)

need some kind of rating syste for differnet models. compliance/compatibility matrics. quality, cost, speed,reliability. We declare known successful integrations, but must also declare risk areas and unknown. We may need a minima test suite, running in CI, to ensure minimal complince. Unsupported, Supported, Approved.

Places we need to cut

prompts

All prompts are engineered towards anthropic structures. This includes:

  • using xmlish tags to structure content
  • using markdown
  • breakpoints for caching
  • emphasis and style
  • specific bug fixes

job_chat

  • explicit use of anthropic agent
  • error handling/mapping
  • prompt
  • structured outputs
  • streaming processing

basically the same as workflow chat

RAG is proproprietary so no action needed there (of course the search services used by rag need work)

workflow_chat

  • error handling - anthropic errors get mapped to ApolloError
  • abstracting the anthropic client
  • response needs to be normalised - history, usage. Likely we'd break the existing API, so we need to convert incoming legacy structures to the new format (not a big deal)
  • the streaming API (not currently in use) is Anthropic. Code here is complex, abstraction is hard
  • the prompt itself (uses markdown structure, breakpoints)
  • Selection/declaration/config of the model we use
  • structured output handling is coupled to anthropic (answer prefilling)

UUID tracking and name sanitisation are our code and model agnostic. No embeddings calls.

search_adaptor_docs

  • embeddings

search_docsite

  • embeddings

supervisor

docs agent

embeddings

vocab mapper

(doesn't matter really)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions