diff --git a/docs/developers/operations-api/databases-and-tables.md b/docs/developers/operations-api/databases-and-tables.md index 936425c3..858ecb54 100644 --- a/docs/developers/operations-api/databases-and-tables.md +++ b/docs/developers/operations-api/databases-and-tables.md @@ -26,7 +26,7 @@ Returns the definitions of all databases and tables within the database. Record "dog": { "schema": "dev", "name": "dog", - "hash_attribute": "id", + "primary_key": "id", "audit": true, "schema_defined": false, "attributes": [ @@ -82,7 +82,7 @@ Returns the definitions of all tables within the specified database. "dog": { "schema": "dev", "name": "dog", - "hash_attribute": "id", + "primary_key": "id", "audit": true, "schema_defined": false, "attributes": [ @@ -137,7 +137,7 @@ Returns the definition of the specified table. { "schema": "dev", "name": "dog", - "hash_attribute": "id", + "primary_key": "id", "audit": true, "schema_defined": false, "attributes": [ diff --git a/docs/developers/operations-api/index.md b/docs/developers/operations-api/index.md index ad44d9de..7bf107f6 100644 --- a/docs/developers/operations-api/index.md +++ b/docs/developers/operations-api/index.md @@ -34,6 +34,7 @@ The operations API reference is available below and categorized by topic: - [Configuration](operations-api/configuration) - [Certificate Management](operations-api/certificate-management) - [Token Authentication](operations-api/token-authentication) +- [Impersonation](../security/impersonation) - [SQL Operations](operations-api/sql-operations) - [Advanced JSON SQL Examples](operations-api/advanced-json-sql-examples) - [Analytics](operations-api/analytics) diff --git a/docs/developers/replication/index.md b/docs/developers/replication/index.md index 703f00f3..d6e2c75e 100644 --- a/docs/developers/replication/index.md +++ b/docs/developers/replication/index.md @@ -225,6 +225,23 @@ replication: When using controlled flow replication, you will typically have different route configurations for each node to every other node. In that case, typically you do want to ensure that you are _not_ replicating the `system` database, since the `system` database containes the node configurations, and replicating the `system` database will cause all nodes to be replicated and have identical route configurations. +The `replicates` property also allows you to specify routes with more granularity by specifying `sendsTo` and/or `receivesFrom` properties, which each can have an array with node and database names. + +```yaml +replication: + databases: + - data + routes: + - host: node-two + replicates: + sendsTo: + - target: node-three + database: data + receivesFrom: + - source: node-four + database: system +``` + #### Explicit Subscriptions By default, Harper automatically handles connections and subscriptions between nodes, ensuring data consistency across your cluster. It even uses data routing to manage node failures. However, you can manage these connections manually by explicitly subscribing to nodes. This should _not_ be used for production replication and should be avoided and exists only for testing, debugging, and legacy migration. This will likely be removed in V5. If you choose to manage subscriptions manually, Harper will no longer handle data consistency for you. This means there’s no guarantee that all nodes will have consistent data if subscriptions don’t fully replicate in all directions. If a node goes down, it’s possible that some data wasn’t replicated before the failure. If you want single direction replication, you can use controlled replication flow described above. diff --git a/docs/developers/security/impersonation.md b/docs/developers/security/impersonation.md new file mode 100644 index 00000000..a41e484d --- /dev/null +++ b/docs/developers/security/impersonation.md @@ -0,0 +1,154 @@ +--- +title: Impersonation +--- + +# Impersonation + +Impersonation allows a `super_user` to execute operations API requests as if they were a different, less-privileged user. This is useful for testing permissions, debugging access issues, and building admin tools that preview what a given user or role can see and do — all without needing that user's credentials. + +## How It Works + +Add an `impersonate` property to any operations API request body. Harper will authenticate the request normally (the caller must be a `super_user`), then **downgrade** the effective permissions for that request to match the impersonated identity. + +```http +POST https://my-harperdb-server:9925/ +Authorization: Basic +Content-Type: application/json + +{ + "operation": "search_by_hash", + "database": "dev", + "table": "dog", + "hash_values": ["1"], + "impersonate": { + "username": "test_user" + } +} +``` + +The request above runs the `search_by_hash` as if `test_user` had made it — subject to that user's role permissions. + +## Security Constraints + +- **Super user only** — only users with `super_user` permissions can use `impersonate`. All other users receive a `403` error. +- **Downgrade only** — impersonation can never escalate privileges. The `super_user` and `cluster_user` flags are always forced to `false` on the impersonated identity. +- **Audit trail** — every impersonated request is logged, recording who initiated the impersonation and which identity was assumed. + +## Impersonation Modes + +There are three ways to specify the impersonated identity, depending on what you want to test. + +### Impersonate an Existing User + +Provide a `username` to run the request with that user's current role and permissions. + +```json +{ + "operation": "search_by_hash", + "database": "dev", + "table": "dog", + "hash_values": ["1"], + "impersonate": { + "username": "test_user" + } +} +``` + +The target user must exist and be active. If the user is not found, a `404` error is returned. If the user is inactive, a `403` error is returned. + +### Impersonate an Existing Role + +Provide a `role_name` to run the request with that role's permissions. You can optionally include a `username` to set the effective username (defaults to the caller's username). + +```json +{ + "operation": "search_by_value", + "database": "dev", + "table": "dog", + "search_attribute": "name", + "search_value": "Penny", + "impersonate": { + "role_name": "developer" + } +} +``` + +The role must exist. If the role is not found, a `404` error is returned. + +### Impersonate with Inline Permissions + +Provide a `role` object with a `permission` property to test with an ad-hoc set of permissions. This is useful for previewing the effect of a role you haven't created yet. + +```json +{ + "operation": "sql", + "sql": "SELECT * FROM dev.dog", + "impersonate": { + "username": "preview_user", + "role": { + "permission": { + "dev": { + "tables": { + "dog": { + "read": true, + "insert": false, + "update": false, + "delete": false, + "attribute_permissions": [] + } + } + } + } + } + } +} +``` + +The `username` field is optional and defaults to the caller's username. The `permission` object follows the same structure as [role permissions](users-and-roles#role-permissions). + +You can also restrict the impersonated identity to a specific set of operations API calls using the `operations` field inside `permission`: + +```json +{ + "operation": "search_by_hash", + "database": "dev", + "table": "dog", + "hash_values": ["1"], + "impersonate": { + "role": { + "permission": { + "operations": ["read_only"], + "dev": { + "tables": { + "dog": { + "read": true, + "insert": false, + "update": false, + "delete": false, + "attribute_permissions": [] + } + } + } + } + } + } +} +``` + +## Impersonate Payload Reference + +| Field | Type | Description | +|---|---|---| +| `username` | string | Target username. Required for existing-user mode. Optional for role-based modes (defaults to the caller's username). | +| `role_name` | string | Name of an existing role to assume. | +| `role` | object | Inline role definition. Must include a `permission` object. | +| `role.permission` | object | Permission object following the standard [role permissions](users-and-roles#role-permissions) structure. | + +Exactly one of `username` (alone), `role_name`, or `role` must be provided. If `role` is present, it takes precedence. + +## Use Cases + +- **Admin dashboards** — preview what a user sees without switching accounts. +- **Permission testing** — verify that a role grants (or denies) the expected access before assigning it to users. +- **Debugging** — reproduce access issues reported by a user by impersonating them directly. +- **CI/CD** — automated tests can verify permission configurations by impersonating different roles against a single `super_user` credential. diff --git a/docs/developers/security/index.md b/docs/developers/security/index.md index a090aa88..ffd0b94a 100644 --- a/docs/developers/security/index.md +++ b/docs/developers/security/index.md @@ -21,3 +21,4 @@ Harper uses role-based, attribute-level security to ensure that users can only g - [Configuration](security/configuration) - Security configuration and settings - [Users and Roles](security/users-and-roles) - Role-based access control and permissions +- [Impersonation](security/impersonation) - Execute operations as a different user or role diff --git a/docs/developers/security/users-and-roles.md b/docs/developers/security/users-and-roles.md index cff17e5a..8d0770e5 100644 --- a/docs/developers/security/users-and-roles.md +++ b/docs/developers/security/users-and-roles.md @@ -4,7 +4,7 @@ title: Users & Roles # Users & Roles -Harper utilizes a Role-Based Access Control (RBAC) framework to manage access to Harper instances. A user is assigned a role that determines the user’s permissions to access database resources and run core operations. +Harper utilizes a Role-Based Access Control (RBAC) framework to manage access to Harper instances. A user is assigned a role that determines the user's permissions to access database resources and run core operations. ## Roles in Harper @@ -13,9 +13,9 @@ Role permissions in Harper are broken into two categories – permissions around **Database Manipulation**: A role defines CRUD (create, read, update, delete) permissions against database resources (i.e. data) in a Harper instance. 1. At the table-level access, permissions must be explicitly defined when adding or altering a role – _i.e. Harper will assume CRUD access to be FALSE if not explicitly provided in the permissions JSON passed to the `add_role` and/or `alter_role` API operations._ -1. At the attribute-level, permissions for attributes in all tables included in the permissions set will be assigned based on either the specific attribute-level permissions defined in the table’s permission set or, if there are no attribute-level permissions defined, permissions will be based on the table’s CRUD set. +1. At the attribute-level, permissions for attributes in all tables included in the permissions set will be assigned based on either the specific attribute-level permissions defined in the table's permission set or, if there are no attribute-level permissions defined, permissions will be based on the table's CRUD set. -**Database Definition**: Permissions related to managing databases, tables, roles, users, and other system settings and operations are restricted to the built-in `super_user` role. +**Database Definition**: Permissions related to managing databases, tables, roles, users, and other system settings and operations are restricted to the built-in `super_user` role by default. Specific operations can be selectively granted to non-super_user roles using [`operations`](#roles-in-harper). **Built-In Roles** @@ -45,7 +45,7 @@ When creating a new, user-defined role in a Harper instance, you must provide a - `permissions` used to explicitly define CRUD access to existing table data. -Example JSON for `add_role` request +### Example JSON for `add_role` request {#add-role-example} ```json { @@ -84,15 +84,86 @@ Example JSON for `add_role` request **Setting Role Permissions** -There are two parts to a permissions set: +There are three parts to a permissions set: - `super_user` – boolean value indicating if role should be provided super_user access. _If `super_user` is set to true, there should be no additional database-specific permissions values included since the role will have access to the entire database schema. If permissions are included in the body of the operation, they will be stored within Harper, but ignored, as super_users have full access to the database._ -- `permissions`: Database tables that a role should have specific CRUD access to should be included in the final, database-specific `permissions` JSON. +- `operations` – array of operation names and/or permission group names that this role is allowed to call. When set, it acts as a two-gate check: (1) only listed operations are reachable — any unlisted operation is denied regardless of table CRUD permissions; (2) for data operations that pass gate one, table-level CRUD permissions still apply as normal. Operations that are normally restricted to `super_user` can be selectively granted by including them in the list. - _For user-defined roles (i.e. non-super_user roles, blank permissions will result in the user being restricted from accessing any of the database schema._ + _If `operations` is not set, existing behavior is unchanged — the role can call any non-super_user operation, subject to table CRUD permissions._ + + **Permission Groups** + + Groups expand to a predefined set of operations and can be mixed with individual operation names: + + - `read_only` – search, SQL SELECT, describe, and monitoring operations. No data modification. Operations: `search`, `search_by_conditions`, `search_by_hash`, `search_by_id`, `search_by_value`, `sql`, `describe_all`, `describe_schema`, `describe_database`, `describe_table`, `user_info`, `get_job`, `get_analytics`, `list_metrics`, `describe_metric` + + - `standard_user` – everything in `read_only` plus full data manipulation and bulk load. Does not include any `super_user`-restricted operations, schema DDL (`create_attribute`), or token management. Additional operations beyond `read_only`: `insert`, `update`, `upsert`, `delete`, `csv_data_load`, `csv_file_load`, `csv_url_load`, `import_from_s3` + + **Example: read-only role** + + A role that can only search and describe — cannot insert, update, delete, or call any admin operations: + + ```json + { + "operation": "add_role", + "role": "read_only_analyst", + "permission": { + "operations": ["read_only"], + "orders_db": { + "tables": { + "orders": { + "read": true, + "insert": false, + "update": false, + "delete": false, + "attribute_permissions": [] + } + } + } + } + } + ``` + + **Example: full data access + targeted admin operations** + + A role with all normally available data operations, plus the ability to call `get_configuration` and `system_information` without being a super_user. The `standard_user` group opens the operation gate for all non-SU data ops; table CRUD permissions then govern what the role can actually do in each table: + + ```json + { + "operation": "add_role", + "role": "ops_engineer", + "permission": { + "operations": ["standard_user", "get_configuration", "system_information"], + "orders_db": { + "tables": { + "orders": { + "read": true, + "insert": true, + "update": true, + "delete": true, + "attribute_permissions": [] + }, + "audit_log": { + "read": true, + "insert": false, + "update": false, + "delete": false, + "attribute_permissions": [] + } + } + } + } + } + ``` + + _This role can insert/update/delete `orders` (both gates pass), only read `audit_log` (operation gate passes, CRUD gate blocks writes), call `get_configuration` and `system_information` (super_user bypass), and cannot call `restart`, `drop_database`, or any other unlisted operation._ + +- `permission`: Database tables that a role should have specific CRUD access to should be included in the final, database-specific `permission` JSON. + + _For user-defined roles (i.e. non-super_user roles), blank permissions will result in the user being restricted from accessing any of the database schema._ **Table Permissions JSON** @@ -120,25 +191,25 @@ Each table that a role should be given some level of CRUD permissions to must be **Important Notes About Table Permissions** 1. If a database and/or any of its tables are not included in the permissions JSON, the role will not have any CRUD access to the database and/or tables. -1. If a table-level CRUD permission is set to false, any attribute-level with that same CRUD permission set to true will return an error. +2. If a table-level CRUD permission is set to false, any attribute-level with that same CRUD permission set to true will return an error. **Important Notes About Attribute Permissions** 1. If there are attribute-specific CRUD permissions that need to be enforced on a table, those need to be explicitly described in the `attribute_permissions` array. -1. If a non-hash attribute is given some level of CRUD access, that same access will be assigned to the table’s `hash_attribute` (also referred to as the `primary_key`), even if it is not explicitly defined in the permissions JSON. +2. If a non-hash attribute is given some level of CRUD access, that same access will be assigned to the table's `hash_attribute` (also referred to as the `primary_key`), even if it is not explicitly defined in the permissions JSON. - _See table_name1’s permission set for an example of this – even though the table’s hash attribute is not specifically defined in the attribute_permissions array, because the role has CRUD access to ‘attribute1’, the role will have the same access to the table’s hash attribute._ + _See [`table_name1`'s permission set](#add-role-example) for an example of this – even though the table's hash attribute is not specifically defined in the attribute_permissions array, because the role has CRUD access to 'attribute1', the role will have the same access to the table's hash attribute._ -1. If attribute-level permissions are set – _i.e. attribute_permissions.length > 0_ – any table attribute not explicitly included will be assumed to have not CRUD access (with the exception of the `hash_attribute` described in #2). +3. If attribute-level permissions are set – _i.e. attribute_permissions.length > 0_ – any table attribute not explicitly included will be assumed to have not CRUD access (with the exception of the `hash_attribute` described in #2). - _See table_name1’s permission set for an example of this – in this scenario, the role will have the ability to create, insert and update ‘attribute1’ and the table’s hash attribute but no other attributes on that table._ + _See [`table_name1`'s permission set](#add-role-example) for an example of this – in this scenario, the role will have the ability to create, insert and update 'attribute1' and the table's hash attribute but no other attributes on that table._ -1. If an `attribute_permissions` array is empty, the role’s access to a table’s attributes will be based on the table-level CRUD permissions. +4. If an `attribute_permissions` array is empty, the role's access to a table's attributes will be based on the table-level CRUD permissions. - _See table_name2’s permission set for an example of this._ + _See [`table_name2`'s permission set](#add-role-example) for an example of this._ -1. The `__createdtime__` and `__updatedtime__` attributes that Harper manages internally can have read perms set but, if set, all other attribute-level permissions will be ignored. -1. Please note that DELETE permissions are not included as a part of an individual attribute-level permission set. That is because it is not possible to delete individual attributes from a row, rows must be deleted in full. +5. The `__createdtime__` and `__updatedtime__` attributes that Harper manages internally can have read perms set but, if set, all other attribute-level permissions will be ignored. +6. Please note that DELETE permissions are not included as a part of an individual attribute-level permission set. That is because it is not possible to delete individual attributes from a row, rows must be deleted in full. - If a role needs the ability to delete rows from a table, that permission should be set on the table-level. - The practical approach to deleting an individual attribute of a row would be to set that attribute to null via an update statement. @@ -146,7 +217,7 @@ Each table that a role should be given some level of CRUD permissions to must be The table below includes all API operations available in Harper and indicates whether or not the operation is restricted to super_user roles. -_Keep in mind that non-super_user roles will also be restricted within the operations they do have access to by the database-level CRUD permissions set for the roles._ +_Keep in mind that non-super_user roles will also be restricted within the operations they do have access to by the database-level CRUD permissions set for the roles. Operations marked with X can be selectively granted to non-super_user roles using `operations`._ | Databases and Tables | Restricted to Super_Users | | -------------------- | :-----------------------: | diff --git a/release-notes/v4-tucker/4.7.20.md b/release-notes/v4-tucker/4.7.20.md new file mode 100644 index 00000000..9aee7a16 --- /dev/null +++ b/release-notes/v4-tucker/4.7.20.md @@ -0,0 +1,12 @@ +--- +title: 4.7.20 +--- + +# 4.7.20 + +2/27/2026 + +- Don't register a WS message listener multiple times when replication is waiting for authorization +- Update the get_analytics coalesce window to match the aggregation window +- Broad improvements to TypeScript definitions +- Separate app and plugin name in plugin scope diff --git a/release-notes/v5-lincoln/5.0.0.md b/release-notes/v5-lincoln/5.0.0.md new file mode 100644 index 00000000..d5b62b1f --- /dev/null +++ b/release-notes/v5-lincoln/5.0.0.md @@ -0,0 +1,96 @@ +--- +title: 5.0.0 +--- + +# 5.0.0 + +## Open Source and Pro Editions + +Harper v5.0 is available in two editions: Open Source and Pro. The Open Source edition is free and open source under the Apache 2.0 license, while the Pro edition includes replication, certificate management, and licensing functionality (the source code is available under the Elastic 2.0 License). +The open source edition can be installed with: +`npm i -g harper` +And the pro edition can be installed with: +`npm i -g @harperfast/harper-pro` + +### Naming Updates + +Along with new names for Harper packages, Harper now uses the name "harper" more consistently: + +- For a fresh installation, the data, configuration, logs, and applications will be installed in the directory `~/harper` directory by default (instead of `~/hdb`). +- The configuration file will be named `harper-config.yaml` by default (`harperdb-config.yaml` will still be supported for backwards compatibility). +- Applications should import from the `harper` module instead of `harperdb`, to access the Harper APIs. ( `harperdb` will still be supported for backwards compatibility). + +## RocksDB + +Harper 5.0 now uses RocksDB as its default underlying storage engine. RocksDB provides a significantly more robust and reliable +storage engine with consistent performance characteristics. RocksDB is well-maintained, has powerful background compaction +capabilities, and a wide array of tuning options and features. + +Harper also introduces its own native transaction log as a write-ahead log (WAL) for RocksDB, which drives ACID compliance +in RocksDB, as well as powers real-time delivery of data. This is a highly optimized transaction log designed for high throughput +of messaging and data. This is all powered by our new [open source rocksdb-js library](https://github.com/harperfast/rocksdb-js). The transaction log also utilizes separate log files for each node origin, for improved performance and reliability with replication. + +RocksDB enables robust transactions in Harper, with the complete ability to read and query after writes (and get data from those writes) within a transaction. + +Harper will continue to support the existing LMDB storage engine for v5.0, and will continue to load databases created with LMDB. + +Switching a database from LMDB to RocksDB requires a database migration. This can be done using replication by creating +new nodes and replicating the data from the old nodes to the new nodes. + +### Current Limitations of RocksDB + +Currently, there are a number of optimizations with querying, caching, contention monitoring, and write batching that +have not yet been implemented in v5.0, but are planned for a future release. LMDB exhibits better performance for data +that is cached in-memory. +Retrieval of past events is not guaranteed to return every event when concurrent events take place on different nodes. This is often used by non-clean MQTT sessions. However, the latest message is always guaranteed to be delivered, sequences of messages from the same node are guaranteed to be delivered in order, ensuring the correctness of most applications and message retain consistency. +Retrieval of past events for subscriptions will not support a `count` option. +Published messages do not support streamed blobs. + +## Resource API Updates + +Harper v5.0 has upgraded the resource API with several important changes: + +- Harper v5.0 is specifically encouraging the use of `static` REST methods, and providing functionality to easily use these methods. +- The `target` (`RequestTarget` type) will now be parsed prior to calling static REST methods, for access to any query information in the URL. +- The current request (`Request` type) will be available in any function through asynchronous context tracking, using the `getContext()` function available from the `harper` module. +- The `get` method will always return a frozen enumber record. The return value does not include all the methods from the Resource API (like `wasLoadedFromSource`, `getContext`, etc.). It only includes methods `getUpdatedTime` and `getExpiresAt`. +- A source resource can return a standard `Response` object (the resolved return value from a `fetch` call) in a cache resolving `get` method, and Harper will automatically handle the response, streaming the body and saving the headers into a cached record. +- A `getResponse` function is available as part of the standard `harper` module exports, allowing for easy access to the response object from within a resource method. +- When using the LMDB storage engine, Harper will no longer attempt to cache resource instances that can make were used to make a record stored in a write visible in a subsequent read. +- All the default singular REST methods on tables will consistently return a `Promise`. This includes `Table.get(id)`, `Table.put(...)`, `Table.delete(id)`, `Table.patch(...)`, and `Table.invalidate(id)`. +- The Table resource API now includes a `save()` method to explicitly save a record to the database within the current transaction, making it visible to subsequent reads/queries. + +## Application Context Separation + +Harper now runs each application its own separate JavaScript "context", which has its own global object, top level variables, and module imports. This provides isolation of applications and access to application-specific configuration data and functionality. These contexts will limit access to certain functionality including spawning new processes. This functionality can be controlled with configuration options. Specifically, any new processes that will be spawned need to be listed in `applications.allowedShellCommands`. +Harper will also "freeze" many of the intrinsic objects in the global object, to protect against prototype pollution type attacks and vulnerabilities. +This application context separation will also allow the logger to apply application-specific tagging to log messages, and leverage the application-specific configuration for logging. + +## Transitive Replication + +Harper now uses an exclusion-based subscription model for replication. This means that replication will request data from nodes, excluding any other known nodes that will be sending data to the current node. With this approach, complex topologies can be created where additional nodes can be added and transitively replicate through other nodes, without explicit knowledge of all the nodes. Previously, replication required direct connections between all nodes, but transitive replication enables topologies with limited connections to proxy data throughout the cluster. + +By default, with the replication of the system database, all nodes are reachable and will fully connect. To leverage transitive replication, you can disable the replication of the system database and individually configure the routing of each node (with `replication.routes` in the configuration). + +## Granular Operations Access for Roles + +Roles can now be configured with granular access to operations. A role can be designed to have access to a subset of operations, or reference a named group of operations. + +### Impersonation + +Super users can now impersonate other users, with the ability to specify the user's identity and roles for the impersonated session. + +## REST API Updates + +For errors that occur during the execution of a REST method, Harper now follows [RFC 9457](https://www.rfc-editor.org/rfc/rfc9457.html) for error responses. This means that the error response will include`type`, `title`, and `code` properties to describe the error. +Harper will no longer add a `Server` header in the response. + +## Operation API Updates + +The `update` operation, and the `upsert` operation when applied to an existing record, will now follow the semantics of a `patch` method which means they will fully utilize CRDT semantics for resolution of conflicting updates across a cluster (separate properties can be independently updated and merged). +The operations API has fully switched from using `hash_attribute` to `primary_key` for all operations. The response from `describe_all`, `describe_database`, and `describe_table` operations will now include the `primary_key` attribute (instead of the legacy `hash_attribute` name). +The `get_components` operation will now return all files including files and directories that begin with a period (although `node_modules` will still be excluded). + +### Configuration + +The `harper-config.yaml` file will now use relative paths to the root directory of the Harper data installation. diff --git a/release-notes/v5-lincoln/lincoln.md b/release-notes/v5-lincoln/lincoln.md new file mode 100644 index 00000000..4c9477f4 --- /dev/null +++ b/release-notes/v5-lincoln/lincoln.md @@ -0,0 +1,7 @@ +--- +title: Harper Lincoln (Version 5) +--- + +# Harper Lincoln (Version 5) + +In honor of [Lincoln](/img/dogs/lincoln.png). diff --git a/release-notes/v5-lincoln/v5-migration.md b/release-notes/v5-lincoln/v5-migration.md new file mode 100644 index 00000000..89f49ad1 --- /dev/null +++ b/release-notes/v5-lincoln/v5-migration.md @@ -0,0 +1,95 @@ +Harper version 5.0 includes many updates to provide a cleaner, more consistent and secure environment. However, there are some breaking changes, and users should review the migration guide for details on how to update their applications. Note that applications that have race conditions that are prone to timing or rely on undocumented features or bugs are always prone to breakage at any point, including major version upgrades. This document describes the important changes to make for applications correctly built in documented APIs. + +## `Table.get` return value + +The return value of `Table.get` has been changed to return a record object instead of an instance of the table class (previously this behavior only occured in classes that had set `static loadAsInstance=false`). This means that the returned object will not have all the table instance methods available. Most functionality is still available through the `Table` class. One notable method that had been commonly used is `wasLoadedFromSource`. Information about whether the request was fulfilled from cache or origin is now available on the request `target` object. For example, if you have existing code like: + +```javascript +const record = await Table.get(id); +// old method: +if (record.wasLoadedFromSource()) { + // record was loaded from origin (not cache) +} +``` + +You should update this code to: + +```javascript +const target = new RequestTarget(); // note that this is passed in if you are overriding the `get` method +target.id = id; +const record = await Table.get(target); +// new way of checking if it was loaded from source: +if (target.loadedFromSource) { + // record was loaded from origin (not cache) +} +``` + +The record objects do have `getUpdatedTime` and `getExpiresAt` methods available. + +### Frozen Records + +The record object is also frozen. This means that you cannot add or remove properties from the record object, and if you want a modified version of the record, you must create or copy a new one. For example if you had code: + +```javascript +const record = await Table.get(id); +record.property = 'changed'; +``` + +You would need to change this to: + +```javascript +let record = await Table.get(id); +record = { ...record, property: 'changed' }; +``` + +## Transactions and Context + +With RocksDB, transactions are now fully supported through the storage engine, providing a consistent ability to read and query data that has been written to the transaction. This does result in behavioral changes if code had previously not expected written data to be visible in queries until after a commit. + +Harper v5 now uses asynchronous context tracking to automatically preserve context and the current transaction across calls and asynchronous operations. Context is used to track the current transaction. Previously, transactions were only applied to calls to other tables if they were explicitly included in the arguments. Now context is implicitly and automatically carried to other calls (this was also behavior in v4.x with `static loadAsInstance=false`). Previous code may have omitted context to another table call to exclude it from a transaction. Code should be updated to explicitly commit/finish a transaction to see new visible data or start a new transaction. For example if you had a function that polled to determine when a record was updated: + +```javascript +import { setTimeout as delay } from 'node:timers/promises'; +class MyResource { + static async get(target) { + // this function is within a transaction, with a consistent snapshot of data that won't change, but previous code could + // call Table.get without a context, it would not use the current transaction and would instead get the latest data + while ((await Table.get(target)).status !== 'ready') { + delay(100); + } + return Table.get(target); + } +} +``` + +Now the internal `Table.get` will automatically use the current transaction, which will never change and won't receive updated data. So we should explicitly commit the transaction to see the updated data and/or start a new transaction for each get request to see the latest data: + +```javascript +import { setTimeout as delay } from 'node:timers/promises'; +import { getContext } from 'harper'; +class MyResource { + static async get(target) { + // this function is still within a transaction, with a consistent snapshot of data that won't change, but we should + // explicitly commit the transaction to see the updated data + await getContext().transaction.commit(); + // now we can call Table.get and it will read the latest data. + // we could also explicitly start a new transaction here for each get: + while ((await transaction(() => Table.get(target))).status !== 'ready') { + delay(100); + } + return Table.get(target); + } +} +``` + +## Spawning new processes (via `node:child_process`) + +The ability to spawn new processes is a dangerous pathway for exploitation and security vulnerabilities. Additionally, spawning processes from multiple threads presents unique challenges and hazards. In Harper version 5, spawning new processes (through node's `child_process` module) is more tightly controlled and managed. +First, any `spawn`, `exec`, or `execFile` may only spawn executables or commands that have been registered in the `applications.allowedSpawnCommands` configuration. This provides a much more secure evironment, preventing malicious intrusions. +Second, it is common to attempt to use spawn child processes with the expectations of code that is written to run in a single thread for an indefinite period of time. However, Harper runs multiple threads that may frequently be restarted. When attempting to start/run a supporting process, spawning every time a module loads leads multiplication of processes and orphaned processes. Harper now manages the spawning process to ensure a single process is spawned. To ensure that only a single process is started, the `spawn`, `exec`, etc. functions require a `name` property in the `options` argument, to create a named process that other threads can check and omit starting a new process if one is already started. If you really want to start a separate process from a previously started process, a new `name` must be provided. + +# Recommend Changes + +The migration information above highlights necessary changes to make to existing applications, if they have used any of these patterns or features. However, we also have new recommended best practices for applications. These are not necessary, but can help to ensure that your application is using the best patterns. + +- We recommend using the `static` methods on Resources/Tables to implement endpoints. See the Resources API for more information. diff --git a/static/img/dogs/lincoln.png b/static/img/dogs/lincoln.png new file mode 100644 index 00000000..1db3a970 Binary files /dev/null and b/static/img/dogs/lincoln.png differ