Add ARCHITECTURE.md by coderdan · Pull Request #374 · cipherstash/proxy

coderdan · 2026-02-12T09:25:20Z

Summary

Adds an ARCHITECTURE.md document describing the internal design of CipherStash Proxy for developers and anyone browsing the repo.
Covers the SQL type inference engine (unification, trait bounds, associated types), the transformation pipeline, the PostgreSQL protocol bridge, encryption/key management, and schema management.

Summary by CodeRabbit

Documentation
- Added a comprehensive Architecture guide describing the proxy's design, components, query lifecycle, encryption/key management, protocol handling, and operational behaviors.
- Added links to the new Architecture guide in the README's "More info" sections for easier discovery.

Explains how the system works for developers and casual readers: the SQL type inference engine, unification algorithm, transformation pipeline, PostgreSQL protocol bridge, and schema management.

coderabbitai · 2026-02-12T09:28:26Z

📝 Walkthrough

Walkthrough

Adds a new ARCHITECTURE.md describing the CipherStash Proxy internals and inserts links to it from README.md. The architecture doc details proxy positioning, eql-mapper and protocol-bridge subsystems, query lifecycle, type inference/unification, transformation rules, encryption/key management, authentication bridging, schema handling, and package layout.

Changes

Cohort / File(s)	Summary
Architecture doc `ARCHITECTURE.md`	New comprehensive architecture documentation covering proxy positioning, core subsystems (eql-mapper, protocol bridge), query lifecycle, EQL type system and unification, AST transformation rules, batch decryption, authentication bridging, ZeroKMS/key caching, schema discovery/reload strategies, and package structure.
Readme updates `README.md`	Two insertions adding links to the new Architecture guide in the "More info" sections. No code changes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 I hopped through pages, found the plan so neat,

Blueprints of the proxy — every path complete,
Types, keys, and bridges in tidy rows,
Rules that tell the AST where it goes,
I nibble docs with joy — architecture sweet. 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add ARCHITECTURE.md' directly and clearly describes the main change: the addition of a new architecture documentation file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs/architecture-md

No actionable comments were generated in the recent review. 🎉

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@ARCHITECTURE.md`:
- Around line 205-229: The fenced package tree in ARCHITECTURE.md is missing a
language identifier causing markdownlint warnings; update the opening code fence
for the package tree (the triple-backtick block that contains the directory
listing for packages/ and entries like cipherstash-proxy/, eql-mapper/,
eql-mapper-macros/, showcase/) to include a language (use "text") so the block
starts with ```text instead of ```; no other content changes are needed.
- Around line 16-33: The fenced ASCII diagram block in ARCHITECTURE.md is
missing a language identifier (causing markdownlint MD040); update the opening
fence for the diagram (the triple-backtick block containing the "Application
CipherStash Proxy PostgreSQL" ASCII flow) to include a language token such as
text (i.e., change ``` to ```text) so the diagram is treated as plain text and
the linter warning is resolved.

coderabbitai · 2026-02-12T09:30:33Z

ARCHITECTURE.md

+```
+Application                    CipherStash Proxy                         PostgreSQL
+    |                                |                                        |
+    |--- SQL statement ------------->|                                        |
+    |                          Parse SQL into AST                             |
+    |                          Import schema (tables, columns, EQL types)     |
+    |                          Run type inference (unification)               |
+    |                          Identify encrypted literals & parameters       |
+    |                          Encrypt values via ZeroKMS                     |
+    |                          Apply transformation rules to AST             |
+    |                          Emit rewritten SQL                             |
+    |                                |--- transformed SQL ------------------>|
+    |                                |<-- result rows ----------------------|
+    |                          Identify encrypted columns in results          |
+    |                          Batch-decrypt values via ZeroKMS              |
+    |                          Re-encode to PostgreSQL wire format            |
+    |<-- plaintext results ----------|                                        |
+```


⚠️ Potential issue | 🟡 Minor

Add a language identifier to the code fence.

Markdownlint flags the diagram block as missing a language specifier. Consider text (or plain) for the ASCII diagram to satisfy MD040.

Suggested fix

-``` +```text Application CipherStash Proxy PostgreSQL | | | |--- SQL statement ------------->| | | Parse SQL into AST | | Import schema (tables, columns, EQL types) | | Run type inference (unification) | | Identify encrypted literals & parameters | | Encrypt values via ZeroKMS | | Apply transformation rules to AST | | Emit rewritten SQL | | |--- transformed SQL ------------------>| | |<-- result rows ----------------------| | Identify encrypted columns in results | | Batch-decrypt values via ZeroKMS | | Re-encode to PostgreSQL wire format | |<-- plaintext results ----------| | -``` +```

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

```

Application CipherStash Proxy PostgreSQL

| | |

|--- SQL statement ------------->| |

| Parse SQL into AST |

| Import schema (tables, columns, EQL types) |

| Run type inference (unification) |

| Identify encrypted literals & parameters |

| Encrypt values via ZeroKMS |

| Apply transformation rules to AST |

| Emit rewritten SQL |

| |--- transformed SQL ------------------>|

| |<-- result rows ----------------------|

| Identify encrypted columns in results |

| Batch-decrypt values via ZeroKMS |

| Re-encode to PostgreSQL wire format |

|<-- plaintext results ----------| |

```

🧰 Tools

🪛 markdownlint-cli2 (0.20.0)

[warning] 16-16: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@ARCHITECTURE.md` around lines 16 - 33, The fenced ASCII diagram block in ARCHITECTURE.md is missing a language identifier (causing markdownlint MD040); update the opening fence for the diagram (the triple-backtick block containing the "Application CipherStash Proxy PostgreSQL" ASCII flow) to include a language token such as text (i.e., change ``` to ```text) so the diagram is treated as plain text and the linter warning is resolved.

coderabbitai · 2026-02-12T09:30:33Z

ARCHITECTURE.md

+```
+packages/
+├── cipherstash-proxy/           # Main proxy binary
+│   └── src/
+│       ├── postgresql/          # Wire protocol implementation
+│       │   ├── frontend.rs      # Client → Server message handling
+│       │   ├── backend.rs       # Server → Client message handling
+│       │   ├── handler.rs       # Connection startup and auth
+│       │   ├── protocol.rs      # Low-level message reading
+│       │   ├── parser.rs        # SQL parsing entry point
+│       │   └── context/         # Session state (statements, portals, metadata)
+│       ├── proxy/               # Encryption service, schema management, config
+│       └── config/              # Configuration parsing
+├── eql-mapper/                  # SQL type inference and transformation
+│   └── src/
+│       ├── inference/           # Type inference engine
+│       │   ├── unifier/         # Unification algorithm, type definitions, trait bounds
+│       │   ├── sql_types/       # Operator and function type signatures
+│       │   └── infer_type_impls/# Per-AST-node type inference implementations
+│       ├── transformation_rules/# AST rewriting rules
+│       ├── model/               # Schema, tables, columns, DDL tracking
+│       └── scope_tracker.rs     # Lexical scope management
+├── eql-mapper-macros/           # Proc macros for operator/function declarations
+└── showcase/                    # Example healthcare data model
+```


⚠️ Potential issue | 🟡 Minor

Add a language identifier to the code fence.

Markdownlint flags the package tree block as missing a language specifier. Consider text for the tree.

Suggested fix

-``` +```text packages/ ├── cipherstash-proxy/ # Main proxy binary │ └── src/ │ ├── postgresql/ # Wire protocol implementation │ │ ├── frontend.rs # Client → Server message handling │ │ ├── backend.rs # Server → Client message handling │ │ ├── handler.rs # Connection startup and auth │ │ ├── protocol.rs # Low-level message reading │ │ ├── parser.rs # SQL parsing entry point │ │ └── context/ # Session state (statements, portals, metadata) │ ├── proxy/ # Encryption service, schema management, config │ └── config/ # Configuration parsing ├── eql-mapper/ # SQL type inference and transformation │ └── src/ │ ├── inference/ # Type inference engine │ │ ├── unifier/ # Unification algorithm, type definitions, trait bounds │ │ ├── sql_types/ # Operator and function type signatures │ │ └── infer_type_impls/# Per-AST-node type inference implementations │ ├── transformation_rules/# AST rewriting rules │ ├── model/ # Schema, tables, columns, DDL tracking │ └── scope_tracker.rs # Lexical scope management ├── eql-mapper-macros/ # Proc macros for operator/function declarations └── showcase/ # Example healthcare data model -``` +```

🧰 Tools

🪛 markdownlint-cli2 (0.20.0)

[warning] 205-205: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In `@ARCHITECTURE.md` around lines 205 - 229, The fenced package tree in ARCHITECTURE.md is missing a language identifier causing markdownlint warnings; update the opening code fence for the package tree (the triple-backtick block that contains the directory listing for packages/ and entries like cipherstash-proxy/, eql-mapper/, eql-mapper-macros/, showcase/) to include a language (use "text") so the block starts with ```text instead of ```; no other content changes are needed.

Copilot

Pull request overview

Adds a new ARCHITECTURE.md document to describe CipherStash Proxy’s internal design for developers and repo readers, covering the query flow, eql-mapper inference/transformation, protocol bridging, encryption/key management, and schema/config management.

Changes:

Introduces a comprehensive architecture write-up for CipherStash Proxy.
Documents the SQL type inference + transformation pipeline concepts used by eql-mapper.
Describes the PostgreSQL wire-protocol bridge and encryption/config reload behaviors at a high level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T09:30:41Z

ARCHITECTURE.md

+
+## Schema Management
+
+The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support.


The schema-loading description is inaccurate: the proxy’s schema loader marks encrypted columns by checking information_schema.columns.udt_name == 'eql_v2_encrypted' (see select_table_schemas.sql / SchemaManager::load_schema). It does not consult eql_v2_configuration for this, and index support comes from the separate Encrypt configuration loaded from public.eql_v2_configuration (via select_config.sql). Please update this paragraph to reflect the actual split between schema discovery vs encrypt-config loading.

Suggested change

The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support.

The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns and marks encrypted columns by checking `information_schema.columns.udt_name = 'eql_v2_encrypted'` (via `select_table_schemas.sql` / `SchemaManager::load_schema`). Separately, it loads the Encrypt configuration from `public.eql_v2_configuration` (via `select_config.sql`) to determine index types and other search capabilities for those encrypted columns.

Copilot · 2026-02-12T09:30:41Z

ARCHITECTURE.md

+1. **Startup** — load schema with exponential backoff retry (up to 10 attempts, max 2-second backoff) to handle cases where PostgreSQL isn't ready yet
+2. **Periodic** — a background task reloads schema on a configurable interval
+3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes


The reload-cycle bullet for Startup says schema is loaded with exponential backoff retry. In the current implementation, startup loads schema/config without retries (SchemaManager::init calls load_schema, and EncryptConfigManager::init_reloader calls load_encrypt_config); retries with backoff only happen during reloads (load_schema_with_retry / load_encrypt_config_with_retry). Please adjust the bullet list so it matches runtime behavior.

Suggested change

1. **Startup** — load schema with exponential backoff retry (up to 10 attempts, max 2-second backoff) to handle cases where PostgreSQL isn't ready yet

2. **Periodic** — a background task reloads schema on a configurable interval

3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes

1. **Startup** — load schema and encryption configuration once; if this fails, proxy startup fails (no retry/backoff is performed during startup)

2. **Periodic** — a background task reloads schema and encryption configuration on a configurable interval, using exponential backoff retry on failure (up to 10 attempts, max 2-second backoff)

3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes, also using the same exponential backoff retry behavior on failure

Copilot · 2026-02-12T09:30:41Z

ARCHITECTURE.md

+
+This process propagates type information across the entire statement — through subqueries, CTEs, JOINs, `UNION` branches, function calls, and `RETURNING` clauses.
+
+A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value.


This section claims the system infers the minimum encryption payload needed via Partial bounds. While eql-mapper does infer EqlTerm::Partial(_, bounds), the proxy currently discards those bounds when mapping/encrypting (it uses EqlTermVariant and treats Full/Partial/Tokenized the same as EqlOperation::Store). Please either clarify that the payload-minimization is an eql-mapper internal concept not currently used by the proxy encryption path, or update the description to match the current behavior.

Suggested change

A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value.

A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. Within `eql-mapper`, this allows the type system to infer the minimum bounds required for each value; however, the current proxy encryption path does not yet use these `Partial` bounds and treats `Full`/`Partial`/`Tokenized` the same when deciding what to encrypt.

Copilot · 2026-02-12T09:30:42Z

ARCHITECTURE.md

+| `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms |
+| `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed |
+| `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed |
+| `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms |
+| `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument |
+| `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument |
+


The EQL operation routing table suggests Partial(Eq) vs Partial(Ord) map to different storage behavior and that JsonAccessor uses a distinct “field selector” query op. In the current proxy encryption service, Full/Partial/Tokenized all use EqlOperation::Store, and both JsonPath and JsonAccessor currently use EqlOperation::Query(..., QueryOp::SteVecSelector) when a SteVec index exists. Please align this table with the implemented routing logic (or note where behavior differs by design).

Suggested change

| `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms |

| `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed |

| `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed |

| `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms |

| `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument |

| `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument |

| `Full` | `EqlOperation::Store` | Insert/update with all configured search terms materialised |

| `Partial(Eq)` | `EqlOperation::Store` | Equality-oriented operations — only equality search terms are constructed |

| `Partial(Ord)` | `EqlOperation::Store` | Ordering/comparison operations — only ORE search terms are constructed |

| `Tokenized` | `EqlOperation::Store` | Pattern/LIKE-style operations — tokenized search terms are constructed |

| `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument (uses SteVec index when available) |

| `JsonAccessor` | `EqlOperation::Query` with `SteVecSelector` | JSON field access argument (same SteVec selector as `JsonPath` in current implementation) |

**Note:** In the current proxy implementation, all of `Full`, `Partial(Eq)`, `Partial(Ord)`, and `Tokenized` are routed to `EqlOperation::Store`; the `Partial`/`Tokenized` variants only affect which search terms are built inside the payload. Likewise, both `JsonPath` and `JsonAccessor` use `EqlOperation::Query(..., QueryOp::SteVecSelector)` when a SteVec index exists, even though the mapper distinguishes them conceptually.

docs: add ARCHITECTURE.md describing proxy internals

1977a7f

Explains how the system works for developers and casual readers: the SQL type inference engine, unification algorithm, transformation pipeline, PostgreSQL protocol bridge, and schema management.

Copilot AI review requested due to automatic review settings February 12, 2026 09:25

Copilot started reviewing on behalf of coderdan February 12, 2026 09:25 View session

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

Copilot AI reviewed Feb 12, 2026

View reviewed changes

docs: link to ARCHITECTURE.md from README

1199080

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ARCHITECTURE.md#374

Add ARCHITECTURE.md#374
coderdan wants to merge 2 commits intomainfrom
docs/architecture-md

coderdan commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## Schema Management

		The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support.


		This process propagates type information across the entire statement — through subqueries, CTEs, JOINs, `UNION` branches, function calls, and `RETURNING` clauses.

		A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value.

-| `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms |
-| `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed |
-| `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed |
-| `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms |
-| `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument |
-| `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument |
+| `Full` | `EqlOperation::Store` | Insert/update with all configured search terms materialised |
+| `Partial(Eq)` | `EqlOperation::Store` | Equality-oriented operations — only equality search terms are constructed |
+| `Partial(Ord)` | `EqlOperation::Store` | Ordering/comparison operations — only ORE search terms are constructed |
+| `Tokenized` | `EqlOperation::Store` | Pattern/LIKE-style operations — tokenized search terms are constructed |
+| `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument (uses SteVec index when available) |
+| `JsonAccessor` | `EqlOperation::Query` with `SteVecSelector` | JSON field access argument (same SteVec selector as `JsonPath` in current implementation) |
+**Note:** In the current proxy implementation, all of `Full`, `Partial(Eq)`, `Partial(Ord)`, and `Tokenized` are routed to `EqlOperation::Store`; the `Partial`/`Tokenized` variants only affect which search terms are built inside the payload. Likewise, both `JsonPath` and `JsonAccessor` use `EqlOperation::Query(..., QueryOp::SteVecSelector)` when a SteVec index exists, even though the mapper distinguishes them conceptually.

Conversation

coderdan commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderdan commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading