Abstracting Model Usage

## General concerns

Anthropic specific features we use
- structured outputs
- answer prefillng

Not all models are equal - so we should expect and support different levels of feature parity. Eg, model X may not support streaming. How do we declare and work around this?

making the embeddings swappable is itself going to be complex and hard. 1. need to generate embeddings (for the corpus and user question), 2. need to store embeddings to search against. So you need an embedding API and and embedding database (with search). Even with langchain this stuff is kind of hard. We should standardise the DB on postgres, which means we'd only need to abstract out the embedding service. TODO raise this out as a standalone issue because we should do this anyway.

models are not all equal. Eg in RAG we use a cheaper model for some queries. So we might need a means of supporting multiple models per models. eg big model, quick model, media model

different token usage per backing model doens't really matter (thunderbolt tracks token use and the user provides an api key)

need some kind of rating syste for differnet models. compliance/compatibility matrics. quality, cost, speed,reliability. We declare known successful integrations, but must also declare risk areas and unknown.  We may need a minima test suite, running in CI, to ensure minimal complince. Unsupported, Supported, Approved.


## Places we need to cut

### prompts

All prompts are engineered towards anthropic structures. This includes:
- using xmlish tags to structure content
- using markdown
- breakpoints for caching
- emphasis and style
- specific bug fixes


### job_chat

- explicit use of anthropic agent
- error handling/mapping
- prompt
- structured outputs
- streaming processing

basically the same as workflow chat

RAG is proproprietary so no action needed there (of course the search services used by rag need work)

### workflow_chat

- error handling - anthropic errors get mapped to ApolloError
- abstracting the anthropic client
- response needs to be normalised - history, usage. Likely we'd break the existing API, so we need to convert incoming legacy structures to the new format (not a big deal)
- the streaming API (not currently in use) is Anthropic. Code here is complex, abstraction is hard
- the prompt itself (uses markdown structure, breakpoints)
- Selection/declaration/config of the model we use
- structured output handling is coupled to anthropic (answer prefilling)

UUID tracking and name sanitisation are our code and model agnostic. No embeddings calls.

### search_adaptor_docs

- embeddings

### search_docsite

- embeddings

### supervisor

### docs agent

### embeddings

### vocab mapper

(doesn't matter really)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstracting Model Usage #406

General concerns

Places we need to cut

prompts

job_chat

workflow_chat

search_adaptor_docs

search_docsite

supervisor

docs agent

embeddings

vocab mapper

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Abstracting Model Usage #406

Description

General concerns

Places we need to cut

prompts

job_chat

workflow_chat

search_adaptor_docs

search_docsite

supervisor

docs agent

embeddings

vocab mapper

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions