Conversation
Explains how the system works for developers and casual readers: the SQL type inference engine, unification algorithm, transformation pipeline, PostgreSQL protocol bridge, and schema management.
📝 WalkthroughWalkthroughAdds a new ARCHITECTURE.md describing the CipherStash Proxy internals and inserts links to it from README.md. The architecture doc details proxy positioning, eql-mapper and protocol-bridge subsystems, query lifecycle, type inference/unification, transformation rules, encryption/key management, authentication bridging, schema handling, and package layout. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@ARCHITECTURE.md`:
- Around line 205-229: The fenced package tree in ARCHITECTURE.md is missing a
language identifier causing markdownlint warnings; update the opening code fence
for the package tree (the triple-backtick block that contains the directory
listing for packages/ and entries like cipherstash-proxy/, eql-mapper/,
eql-mapper-macros/, showcase/) to include a language (use "text") so the block
starts with ```text instead of ```; no other content changes are needed.
- Around line 16-33: The fenced ASCII diagram block in ARCHITECTURE.md is
missing a language identifier (causing markdownlint MD040); update the opening
fence for the diagram (the triple-backtick block containing the "Application
CipherStash Proxy PostgreSQL" ASCII flow) to include a language token such as
text (i.e., change ``` to ```text) so the diagram is treated as plain text and
the linter warning is resolved.
| ``` | ||
| Application CipherStash Proxy PostgreSQL | ||
| | | | | ||
| |--- SQL statement ------------->| | | ||
| | Parse SQL into AST | | ||
| | Import schema (tables, columns, EQL types) | | ||
| | Run type inference (unification) | | ||
| | Identify encrypted literals & parameters | | ||
| | Encrypt values via ZeroKMS | | ||
| | Apply transformation rules to AST | | ||
| | Emit rewritten SQL | | ||
| | |--- transformed SQL ------------------>| | ||
| | |<-- result rows ----------------------| | ||
| | Identify encrypted columns in results | | ||
| | Batch-decrypt values via ZeroKMS | | ||
| | Re-encode to PostgreSQL wire format | | ||
| |<-- plaintext results ----------| | | ||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the code fence.
Markdownlint flags the diagram block as missing a language specifier. Consider text (or plain) for the ASCII diagram to satisfy MD040.
Suggested fix
-```
+```text
Application CipherStash Proxy PostgreSQL
| | |
|--- SQL statement ------------->| |
| Parse SQL into AST |
| Import schema (tables, columns, EQL types) |
| Run type inference (unification) |
| Identify encrypted literals & parameters |
| Encrypt values via ZeroKMS |
| Apply transformation rules to AST |
| Emit rewritten SQL |
| |--- transformed SQL ------------------>|
| |<-- result rows ----------------------|
| Identify encrypted columns in results |
| Batch-decrypt values via ZeroKMS |
| Re-encode to PostgreSQL wire format |
|<-- plaintext results ----------| |
-```
+```📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| Application CipherStash Proxy PostgreSQL | |
| | | | | |
| |--- SQL statement ------------->| | | |
| | Parse SQL into AST | | |
| | Import schema (tables, columns, EQL types) | | |
| | Run type inference (unification) | | |
| | Identify encrypted literals & parameters | | |
| | Encrypt values via ZeroKMS | | |
| | Apply transformation rules to AST | | |
| | Emit rewritten SQL | | |
| | |--- transformed SQL ------------------>| | |
| | |<-- result rows ----------------------| | |
| | Identify encrypted columns in results | | |
| | Batch-decrypt values via ZeroKMS | | |
| | Re-encode to PostgreSQL wire format | | |
| |<-- plaintext results ----------| | | |
| ``` |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@ARCHITECTURE.md` around lines 16 - 33, The fenced ASCII diagram block in
ARCHITECTURE.md is missing a language identifier (causing markdownlint MD040);
update the opening fence for the diagram (the triple-backtick block containing
the "Application CipherStash Proxy PostgreSQL" ASCII flow) to include a language
token such as text (i.e., change ``` to ```text) so the diagram is treated as
plain text and the linter warning is resolved.
| ``` | ||
| packages/ | ||
| ├── cipherstash-proxy/ # Main proxy binary | ||
| │ └── src/ | ||
| │ ├── postgresql/ # Wire protocol implementation | ||
| │ │ ├── frontend.rs # Client → Server message handling | ||
| │ │ ├── backend.rs # Server → Client message handling | ||
| │ │ ├── handler.rs # Connection startup and auth | ||
| │ │ ├── protocol.rs # Low-level message reading | ||
| │ │ ├── parser.rs # SQL parsing entry point | ||
| │ │ └── context/ # Session state (statements, portals, metadata) | ||
| │ ├── proxy/ # Encryption service, schema management, config | ||
| │ └── config/ # Configuration parsing | ||
| ├── eql-mapper/ # SQL type inference and transformation | ||
| │ └── src/ | ||
| │ ├── inference/ # Type inference engine | ||
| │ │ ├── unifier/ # Unification algorithm, type definitions, trait bounds | ||
| │ │ ├── sql_types/ # Operator and function type signatures | ||
| │ │ └── infer_type_impls/# Per-AST-node type inference implementations | ||
| │ ├── transformation_rules/# AST rewriting rules | ||
| │ ├── model/ # Schema, tables, columns, DDL tracking | ||
| │ └── scope_tracker.rs # Lexical scope management | ||
| ├── eql-mapper-macros/ # Proc macros for operator/function declarations | ||
| └── showcase/ # Example healthcare data model | ||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the code fence.
Markdownlint flags the package tree block as missing a language specifier. Consider text for the tree.
Suggested fix
-```
+```text
packages/
├── cipherstash-proxy/ # Main proxy binary
│ └── src/
│ ├── postgresql/ # Wire protocol implementation
│ │ ├── frontend.rs # Client → Server message handling
│ │ ├── backend.rs # Server → Client message handling
│ │ ├── handler.rs # Connection startup and auth
│ │ ├── protocol.rs # Low-level message reading
│ │ ├── parser.rs # SQL parsing entry point
│ │ └── context/ # Session state (statements, portals, metadata)
│ ├── proxy/ # Encryption service, schema management, config
│ └── config/ # Configuration parsing
├── eql-mapper/ # SQL type inference and transformation
│ └── src/
│ ├── inference/ # Type inference engine
│ │ ├── unifier/ # Unification algorithm, type definitions, trait bounds
│ │ ├── sql_types/ # Operator and function type signatures
│ │ └── infer_type_impls/# Per-AST-node type inference implementations
│ ├── transformation_rules/# AST rewriting rules
│ ├── model/ # Schema, tables, columns, DDL tracking
│ └── scope_tracker.rs # Lexical scope management
├── eql-mapper-macros/ # Proc macros for operator/function declarations
└── showcase/ # Example healthcare data model
-```
+```🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 205-205: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@ARCHITECTURE.md` around lines 205 - 229, The fenced package tree in
ARCHITECTURE.md is missing a language identifier causing markdownlint warnings;
update the opening code fence for the package tree (the triple-backtick block
that contains the directory listing for packages/ and entries like
cipherstash-proxy/, eql-mapper/, eql-mapper-macros/, showcase/) to include a
language (use "text") so the block starts with ```text instead of ```; no other
content changes are needed.
There was a problem hiding this comment.
Pull request overview
Adds a new ARCHITECTURE.md document to describe CipherStash Proxy’s internal design for developers and repo readers, covering the query flow, eql-mapper inference/transformation, protocol bridging, encryption/key management, and schema/config management.
Changes:
- Introduces a comprehensive architecture write-up for CipherStash Proxy.
- Documents the SQL type inference + transformation pipeline concepts used by
eql-mapper. - Describes the PostgreSQL wire-protocol bridge and encryption/config reload behaviors at a high level.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ## Schema Management | ||
|
|
||
| The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support. |
There was a problem hiding this comment.
The schema-loading description is inaccurate: the proxy’s schema loader marks encrypted columns by checking information_schema.columns.udt_name == 'eql_v2_encrypted' (see select_table_schemas.sql / SchemaManager::load_schema). It does not consult eql_v2_configuration for this, and index support comes from the separate Encrypt configuration loaded from public.eql_v2_configuration (via select_config.sql). Please update this paragraph to reflect the actual split between schema discovery vs encrypt-config loading.
| The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns, then checks `eql_v2_configuration` to determine which columns are encrypted and what index types they support. | |
| The proxy discovers the database schema at startup and reloads it periodically. Schema loading queries PostgreSQL's `information_schema` to discover tables and columns and marks encrypted columns by checking `information_schema.columns.udt_name = 'eql_v2_encrypted'` (via `select_table_schemas.sql` / `SchemaManager::load_schema`). Separately, it loads the Encrypt configuration from `public.eql_v2_configuration` (via `select_config.sql`) to determine index types and other search capabilities for those encrypted columns. |
| 1. **Startup** — load schema with exponential backoff retry (up to 10 attempts, max 2-second backoff) to handle cases where PostgreSQL isn't ready yet | ||
| 2. **Periodic** — a background task reloads schema on a configurable interval | ||
| 3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes |
There was a problem hiding this comment.
The reload-cycle bullet for Startup says schema is loaded with exponential backoff retry. In the current implementation, startup loads schema/config without retries (SchemaManager::init calls load_schema, and EncryptConfigManager::init_reloader calls load_encrypt_config); retries with backoff only happen during reloads (load_schema_with_retry / load_encrypt_config_with_retry). Please adjust the bullet list so it matches runtime behavior.
| 1. **Startup** — load schema with exponential backoff retry (up to 10 attempts, max 2-second backoff) to handle cases where PostgreSQL isn't ready yet | |
| 2. **Periodic** — a background task reloads schema on a configurable interval | |
| 3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes | |
| 1. **Startup** — load schema and encryption configuration once; if this fails, proxy startup fails (no retry/backoff is performed during startup) | |
| 2. **Periodic** — a background task reloads schema and encryption configuration on a configurable interval, using exponential backoff retry on failure (up to 10 attempts, max 2-second backoff) | |
| 3. **On-demand** — DDL detection during a transaction triggers a reload when the transaction completes, also using the same exponential backoff retry behavior on failure |
|
|
||
| This process propagates type information across the entire statement — through subqueries, CTEs, JOINs, `UNION` branches, function calls, and `RETURNING` clauses. | ||
|
|
||
| A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value. |
There was a problem hiding this comment.
This section claims the system infers the minimum encryption payload needed via Partial bounds. While eql-mapper does infer EqlTerm::Partial(_, bounds), the proxy currently discards those bounds when mapping/encrypting (it uses EqlTermVariant and treats Full/Partial/Tokenized the same as EqlOperation::Store). Please either clarify that the payload-minimization is an eql-mapper internal concept not currently used by the proxy encryption path, or update the description to match the current behavior.
| A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. This means the system automatically infers the minimum encryption payload needed for each value. | |
| A particularly interesting aspect is how EQL types unify with each other. When two `Partial` EQL types for the same column meet, their bounds are merged (union). When a `Partial` meets a `Full`, the result promotes to `Full`. Within `eql-mapper`, this allows the type system to infer the minimum bounds required for each value; however, the current proxy encryption path does not yet use these `Partial` bounds and treats `Full`/`Partial`/`Tokenized` the same when deciding what to encrypt. |
| | `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms | | ||
| | `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed | | ||
| | `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed | | ||
| | `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms | | ||
| | `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument | | ||
| | `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument | | ||
|
|
There was a problem hiding this comment.
The EQL operation routing table suggests Partial(Eq) vs Partial(Ord) map to different storage behavior and that JsonAccessor uses a distinct “field selector” query op. In the current proxy encryption service, Full/Partial/Tokenized all use EqlOperation::Store, and both JsonPath and JsonAccessor currently use EqlOperation::Query(..., QueryOp::SteVecSelector) when a SteVec index exists. Please align this table with the implemented routing logic (or note where behavior differs by design).
| | `Full` | `EqlOperation::Store` | Inserting a new encrypted value with all search terms | | |
| | `Partial(Eq)` | `EqlOperation::Store` | Equality query — only equality search terms needed | | |
| | `Partial(Ord)` | `EqlOperation::Store` | Comparison query — only ORE search terms needed | | |
| | `Tokenized` | `EqlOperation::Store` | LIKE query — tokenized search terms | | |
| | `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument | | |
| | `JsonAccessor` | `EqlOperation::Query` with field selector | JSON field access argument | | |
| | `Full` | `EqlOperation::Store` | Insert/update with all configured search terms materialised | | |
| | `Partial(Eq)` | `EqlOperation::Store` | Equality-oriented operations — only equality search terms are constructed | | |
| | `Partial(Ord)` | `EqlOperation::Store` | Ordering/comparison operations — only ORE search terms are constructed | | |
| | `Tokenized` | `EqlOperation::Store` | Pattern/LIKE-style operations — tokenized search terms are constructed | | |
| | `JsonPath` | `EqlOperation::Query` with `SteVecSelector` | JSON path query argument (uses SteVec index when available) | | |
| | `JsonAccessor` | `EqlOperation::Query` with `SteVecSelector` | JSON field access argument (same SteVec selector as `JsonPath` in current implementation) | | |
| **Note:** In the current proxy implementation, all of `Full`, `Partial(Eq)`, `Partial(Ord)`, and `Tokenized` are routed to `EqlOperation::Store`; the `Partial`/`Tokenized` variants only affect which search terms are built inside the payload. Likewise, both `JsonPath` and `JsonAccessor` use `EqlOperation::Query(..., QueryOp::SteVecSelector)` when a SteVec index exists, even though the mapper distinguishes them conceptually. |
Summary
ARCHITECTURE.mddocument describing the internal design of CipherStash Proxy for developers and anyone browsing the repo.Summary by CodeRabbit