Skip to content

feat(services): implements dynamic service loading#36

Draft
brian316 wants to merge 3 commits intomainfrom
feat_dynamic_services
Draft

feat(services): implements dynamic service loading#36
brian316 wants to merge 3 commits intomainfrom
feat_dynamic_services

Conversation

@brian316
Copy link
Copy Markdown
Contributor

@brian316 brian316 commented Mar 5, 2026

WIP

@brian316
Copy link
Copy Markdown
Contributor Author

brian316 commented Mar 5, 2026

Dynamic Runtime Service Catalog

Overview

Bridge previously relied on a static global service catalog loaded from config/services.toml (or config/services_sample.toml in debug mode). While this file was read at runtime, the catalog contents were effectively fixed once loaded and did not support safe, explicit in-process updates.

This feature introduces a runtime-refreshable service catalog that supports updating backend service/resource routing targets while Bridge is running (for example on OpenShift).


What was implemented

1) Runtime-refreshable catalog state

src/web/route/proxy/services.rs was refactored from static LazyLock<Catalog> data to a shared, mutable runtime state:

  • LazyLock<RwLock<CatalogState>>
  • atomic reload() operation
  • startup init_once() operation

The state now stores:

  • raw parsed TOML table
  • all catalog entries (service/resource metadata)
  • resource names
  • service health URLs

All data is owned (String) instead of 'static references, enabling safe refreshes.

2) Runtime accessors (replacing static globals)

Callers now use functions instead of static references:

  • get_service(service_name)
  • get_resource(resource_name)
  • is_service_mcp(service_name)
  • get_detail(type_, name, field)
  • get_all()
  • get_all_resources_by_name()
  • get_service_health_urls()

3) Startup initialization

src/web/mod.rs now initializes catalog state during server startup via:

services::init_once()

If startup catalog load fails, server startup fails fast (same operational expectation as other required startup dependencies).

4) Live reload endpoint for system admins

Added endpoint in src/web/route/portal/system_admin/mod.rs:

  • POST /portal/system_admin/hx/reload-services

Behavior:

  • validates system-admin access
  • calls services::reload()
  • returns HTMX-friendly success response

5) UI button to trigger reload

templates/components/systems_group.html now contains a Reload Services button:

  • sends HTMX POST to system_admin/hx/reload-services
  • renders status response inline

This provides an operator workflow for live updates after config changes.

6) OpenShift/Deployment support via env override

Catalog file path can now be overridden by environment variable:

  • BRIDGE_SERVICES_CONFIG_PATH

Default behavior remains:

  • debug: config/services_sample.toml
  • release: config/services.toml

This makes ConfigMap-mounted runtime paths easy to adopt in OpenShift deployments.


Files changed (high level)

  • src/web/route/proxy/services.rs (core catalog redesign + reload support)
  • src/web/mod.rs (startup initialization)
  • src/web/route/proxy/mod.rs (dynamic service lookup)
  • src/web/route/resource/mod.rs (dynamic resource lookup)
  • src/web/route/mcp/mod.rs (dynamic MCP checks/lookups)
  • src/web/route/health/mod.rs (dynamic health URL retrieval)
  • src/web/route/portal/mod.rs (dynamic resource-name filtering)
  • src/web/route/portal/user.rs (dynamic subscription metadata)
  • src/web/route/portal/user_htmx.rs (owned subscription/detail handling)
  • src/web/route/portal/group_admin/mod.rs (dynamic metadata/detail handling)
  • src/web/route/portal/system_admin/mod.rs (dynamic metadata + reload endpoint)
  • src/web/route/portal/system_admin/htmx.rs (moved group item storage to Vec<String>)
  • src/web/route/portal/helper.rs (notebook bookkeeping signature adjustment)
  • templates/components/systems_group.html (reload button)

Operational flow (recommended)

  1. Update the mounted services config file (e.g., ConfigMap-backed file).
  2. In Bridge system-admin UI, click Reload Services.
  3. Bridge atomically swaps to the new catalog.
  4. Subsequent proxy/resource/health/portal requests use updated values.

Validation performed

  • cargo check
  • cargo check --features mcp
  • cargo check --features full
  • targeted catalog unit test run (test_catalog_all_names)

All checks passed.

UI screenshot

image

@brian316
Copy link
Copy Markdown
Contributor Author

brian316 commented Apr 1, 2026

The "Why": Hot-Reloading Support

In the main branch, the service CATALOG was a LazyLock<Catalog> that loaded the configuration exactly once when the server started. Because it never changed, the compiler knew the loaded strings would live forever. The original author took advantage of this by using &'static str (references that live for the lifetime of the program) everywhere—like in Subscription<'s> and GroupContent.

However, this PR introduces the ability to hot-reload the services at runtime (via the new system_reload_services endpoint which uses an RwLock to swap out the config state).

Because the configuration can now be replaced at any time, the old strings might be dropped from memory. Therefore, the application can no longer hand out &'static str references safely.

The Trade-offs

To fix the compiler errors caused by losing the 'static lifetime, there are really only two options in Rust:

  1. Use Owned Types (String) Current PR Approach
    You change the structs to own their data (String instead of &str), which forces you to use .clone() or .to_string() when reading out of the catalog.

    • Pros: Clean, safe, easy to write, and won't cause async/await compiler headaches.
    • Cons: Verbosity (.to_string() everywhere) and slight performance overhead from heap allocations.
  2. Use Explicit Lifetimes (&'a str)
    You could borrow the strings from the local Arc we added earlier by explicitly marking the lifetimes of the scopes (e.g., Subscription<'a>).

    • Pros: Zero allocations.
    • Cons: You would have to propagate this 'a lifetime parameter through every UI struct, function signature, and template renderer in the codebase. In async web frameworks like Actix, borrowing across await points or threading references into serialization engines (like tera) can quickly turn into a "lifetime nightmare".

Conclusion

The verbosity you are seeing is the standard "Rust tax" for moving from a static, read-only configuration to a dynamic, mutable configuration. Given that these .to_string() allocations only happen on page-load for the web dashboard (and not in a tight, high-throughput loop), the owned String approach is perfectly fine and is the standard practice here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant