Skip to content

feat(support): add support service with WebSockets and Yamux#47

Open
edospadoni wants to merge 21 commits intomainfrom
feature/support-service
Open

feat(support): add support service with WebSockets and Yamux#47
edospadoni wants to merge 21 commits intomainfrom
feature/support-service

Conversation

@edospadoni
Copy link
Member

@edospadoni edospadoni commented Mar 10, 2026

📋 Description

🏗 Support Service — Architecture

How it works

A tunnel client on the customer's system opens a persistent WebSocket to our support service. The connection is multiplexed with yamux — one WebSocket carries many parallel streams. When an operator clicks "Open" in the UI, traffic flows through the tunnel to reach the remote service (web UI, terminal, API) as if it were local.

graph LR
    subgraph Customer System
        TC[tunnel-client<br/>yamux mux] --> WU[Web UI]
        TC --> SA[SSH/API]
        TC --> ETC[...]
    end

    TC ---|WebSocket<br/>single connection| SS

    BR[Browser<br/>operator] --> NG[nginx<br/>proxy]
    NG --> BE[Backend :8080<br/>sessions, auth]
    BE --> SS[Support :8082<br/>tunnels, yamux]
Loading

Session Lifecycle

stateDiagram-v2
    [*] --> pending
    pending --> active : WebSocket established
    active --> closed : operator closes
    active --> grace_period : disconnect
    grace_period --> active : reconnect (same session)
    grace_period --> expired : timeout (30-60s)
Loading

WebSocket + yamux Multiplexing

The tunnel client opens one WebSocket to the support service. On top of it, yamux creates a multiplexed session — like having many TCP connections inside a single one.

WebSocket connection (single, persistent)
|
+-- yamux session
    |
    +-- stream #0  [manifest]    client → server: service list (JSON)
    +-- stream #1  [diagnostics] client → server: health report (JSON)
    +-- stream #2  [COMMAND]     server → client: add_services / future commands
    +-- stream #3  [HTTP proxy]  operator browses NethVoice UI
    +-- stream #4  [HTTP proxy]  operator browses another service
    +-- stream #5  [terminal]    operator opens xterm.js shell
    +-- ...        (up to 64 concurrent streams per tunnel)

How it connects:

  1. Tunnel client sends GET /support/api/tunnel with HTTP Basic Auth
  2. Support service upgrades to WebSocket, wraps it as net.Conn
  3. yamux.Server is created over the wrapped connection (keepalive 15s)
  4. Client opens a manifest stream with a JSON service list — the reachable services on the remote system
  5. Client opens a diagnostics stream with a health snapshot (CPU, RAM, disk, uptime + plugin results)
  6. Each proxied request from an operator opens a new yamux stream, forwarded to the target service on the customer system

Server-initiated streams: the support service can also open streams toward the tunnel-client. These start with a COMMAND <version>\n header and carry a JSON payload. The tunnel-client processes the command and responds OK\n or ERROR <msg>\n.

On disconnect: the tunnel enters a grace period (30-60s). If the client reconnects, the same session is reused. If the grace expires, the session is closed.


Diagnostics

At connect time, the tunnel-client collects a health report and pushes it to the support service via a dedicated yamux stream. The report is stored on the session (diagnostics JSONB column) and shown to operators in the session popover.

tunnel-client connects
  → runs built-in system plugin (CPU load, RAM, disk, uptime, OS version)
  → runs external plugins from /usr/share/my/diagnostics.d/ (configurable)
  → opens yamux stream, writes "DIAGNOSTICS 1\n" + JSON report
  → support service stores report on the session
  → operator sees health status in the session popover (green/yellow/red dot)

Plugin format (exit code: 0 = ok, 1 = warning, 2 = critical):

{
  "id": "myapp",
  "name": "My Application",
  "status": "warning",
  "summary": "DB at 87% capacity",
  "checks": [
    { "name": "service", "status": "ok", "value": "running" },
    { "name": "database", "status": "warning", "value": "87% full" }
  ]
}

The overall session status is the worst status across all plugins.


Static Service Injection

Operators can add arbitrary host:port services to a running tunnel without reconnection. This is useful for services not auto-discovered via Traefik — for example the web management interface of a device on the customer's LAN (IP phone, managed switch, NAS, etc.).

Operator clicks "Add service" → fills in name, target (host:port), label, TLS
  → POST /api/support-sessions/:id/services
  → Backend validates and publishes to Redis pub/sub: {action: "add_services", services: {...}}
  → Support service opens an outbound yamux COMMAND stream to the tunnel-client
  → Tunnel-client merges the new service, re-sends updated manifest
  → Support service updates its registry for that session
  → Operator can immediately open the new service via the subdomain proxy

Example: to access a Yealink phone's web UI at 192.168.1.100:443 on a customer system, add a service with target: 192.168.1.100:443, tls: true. The phone's interface becomes available at:

https://phone-yealink--<session-uuid>.support.my.nethesis.it/

…as if the operator were on the same LAN as the phone.


How the UI Proxy works (subdomain)

When an operator clicks a service link (e.g. NethVoice UI), the browser opens a new tab on a dedicated subdomain. Each service gets its own origin, so all the app's absolute paths (/_next/, /api/, /static/) work natively.

1. Frontend: POST /api/support-sessions/:id/proxy-token  {service: "nethvoice-ui"}
   Backend:  generates scoped JWT (session_id + service_name + org_role, 8h TTL)
   Response: {url: "https://nethvoice-ui--c37f4ce78b024a1fb123456789abcdef.support.example.com/", token: "ey..."}

2. Browser navigates to: https://nethvoice-ui--c37f4ce78b024a1fb123456789abcdef.support.example.com/?token=ey...
   nginx:   matches *.support.* --> rewrites to /support-proxy/* --> backend
   Backend: validates JWT, sets HttpOnly SameSite=Strict cookie, redirects to same URL without ?token=

3. All subsequent requests carry the cookie automatically:
   Browser --> nginx --> Backend (SubdomainProxy) --> Support service --> yamux stream --> Customer system

The ?token= is removed from the URL after the first request (redirect), so it never leaks in logs, referrer headers, or browser history.


How the Web Terminal works (xterm.js)

The terminal needs a WebSocket from the browser, but browsers can't send Authorization headers on WebSocket connections. Solution: one-time ticket exchanged beforehand.

1. Frontend: POST /api/support-sessions/:id/terminal-ticket  (JWT in Authorization header)
   Backend:  generates random ticket, stores in Redis with 30s TTL
   Response: {ticket: "a1b2c3..."}

2. Frontend opens WebSocket: GET /api/support-sessions/:id/terminal?ticket=a1b2c3...
   Backend:  Redis GETDEL (atomic read + delete, single-use)
             validates ticket matches session
             opens raw TCP to support service, sends WebSocket upgrade with X-Session-Token
             hijacks browser connection (http.Hijacker)
             bridges both sides bidirectionally:

   Browser (xterm.js) <--WebSocket--> Backend (TCP bridge) <--WebSocket--> Support <--yamux stream--> PTY on customer system

The tunnel client spawns a PTY (pseudo-terminal) directly on the customer system — no SSH daemon involved. The PTY output is forwarded as raw bytes through the yamux stream back to the browser's xterm.js.

Why TCP hijacking instead of httputil.ReverseProxy?

When the browser opens a WebSocket, it sends an HTTP request with Upgrade: websocket. The server responds with 101 Switching Protocols and from that point the connection is no longer HTTP — it becomes a raw bidirectional byte channel.

httputil.ReverseProxy can't handle this. It's designed for the classic HTTP request/response cycle: read the response from the backend, copy it to the client, close. With a WebSocket there's no "response" to copy — there's a continuous stream of frames in both directions.

Gin (which uses net/http underneath) has the same problem: its ResponseWriter buffers, manages headers, Content-Length... none of which make sense after the 101.

The solution is http.Hijacker: a Go interface that lets you take control of the raw TCP connection from the HTTP server. You're telling Go "I'll handle it from here".

The flow:

  1. Backend receives the WebSocket request from the browser
  2. Opens a direct TCP connection to the support service and sends the same upgrade request
  3. Reads the 101 Switching Protocols from the support service
  4. Calls Hijack() on the browser connection — now it has the raw TCP socket
  5. Sends the 101 to the browser
  6. Two goroutines copy bytes in both directions (io.Copy): browser ↔ support service

No HTTP, no buffering, no overhead. Just bytes flowing through.


Access Patterns & Auth

Who does what Auth mechanism How it works
System → tunnel HTTP Basic Auth system_key:system_secret (SHA256), 3-tier cache (memory → Redis → DB), rate-limited
Operator → session CRUD JWT + RBAC connect:systems permission, standard middleware chain
Operator → web terminal One-time ticket JWT exchanged for 30s Redis ticket → GETDEL on use → WebSocket via TCP hijack
Operator → UI proxy Scoped proxy JWT 8h token with {session_id, service_name, org_role} → SameSite=Strict cookie on subdomain → auto-redirect strips token from URL
Backend → support service Per-session token + INTERNAL_SECRET X-Session-Token (64-char hex, per-session) + shared SUPPORT_INTERNAL_SECRET for service-level auth, constant-time validation

Security Highlights

🔑 No shared secrets Each session gets its own token. Compromising one doesn't affect others
🎫 Terminal ticket 30s TTL, single-use (GETDEL), JWT never touches the URL
🍪 Proxy cookie Token arrives as ?token=, gets stored as HttpOnly SameSite=Strict cookie, URL is cleaned via redirect + Referrer-Policy: no-referrer
Constant-time comparisons crypto/subtle for all token validations — no timing attacks
🛡 SSRF protection Tunnel blocks cloud metadata (169.254.x.x), link-local, multicast, loopback
🖼 Frame protection CSP frame-ancestors 'self' on proxied responses — prevents clickjacking
🔄 Cache invalidation Secret regeneration → Redis pub/sub → flush memory + Redis caches instantly
📝 Audit trail Every operator action logged: who, when, what service, access type

Subdomain Proxy

Each service gets its own browser origin — no URL rewriting needed:

https://nethvoice-ui--c37f4ce78b024a1fb123456789abcdef.support.my.nethesis.it/
        ────────────  ──────────────────────────────── ───────────────────────
        service name  session UUID (no dashes, 32 hex)   configured domain

Requires: DNS wildcard *.support.{domain} + matching wildcard SSL certificate + SUPPORT_PROXY_DOMAIN env var.


Inter-service Communication

Backend ──INTERNAL_SECRET────▶ Support Service    (service-level auth, shared secret)
Backend ──X-Session-Token────▶ Support Service    (per-request, per-session scope)
Backend ──Redis pub/sub──────▶ Support Service    (close sessions, add_services, cache invalidation)
Support ──yamux COMMAND──────▶ Tunnel Client      (server-initiated: add_services)
Support ──yamux stream───────▶ Tunnel Client      (proxied HTTP, terminal)
Support ──WebSocket 4000─────▶ Tunnel Client      (graceful close, no reconnect)

Components & Files

Component Path Purpose
Support service services/support/ WebSocket tunnels, yamux, session DB, service proxy
Backend APIs backend/methods/support_proxy.go Terminal ticket, proxy token, subdomain proxy, session CRUD
Frontend frontend/src/components/support/ Session dashboard, service list, terminal (xterm.js)
Proxy proxy/nginx.conf Subdomain routing, tunnel endpoint exposure
DB schema backend/database/migrations/009_*, 018_*021_* support_sessions, support_access_logs, diagnostics column
CI/CD .github/workflows/, render.yaml, deploy.sh Build, test, deploy support service

Related Issue: #[ISSUE_NUMBER]

🚀 Testing Environment

To trigger a fresh deployment of all services in the PR preview environment, comment:

update deploy

To download tunnel-client binary, reference here: #47 (comment)

Automatic PR environments:

✅ Merge Checklist

Code Quality:

  • Backend Tests
  • Collect Tests
  • Sync Tests
  • Frontend Tests

Builds:

  • Backend Build
  • Collect Build
  • Sync Build
  • Frontend Build

@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 10, 2026 08:00 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:00 — with Render Active
@github-actions
Copy link
Contributor

🔗 Redirect URIs Added to Logto

The following redirect URIs have been automatically added to the Logto application configuration:

Redirect URIs:

  • https://my-proxy-qa-pr-47.onrender.com/login-redirect

Post-logout redirect URIs:

  • https://my-proxy-qa-pr-47.onrender.com/login

These will be automatically removed when the PR is closed or merged.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

🤖 My API structural change detected

Preview documentation

Structural change details

Added (13)

  • DELETE /support-sessions/{id}
  • GET /support-sessions
  • GET /support-sessions/diagnostics
  • GET /support-sessions/{id}
  • GET /support-sessions/{id}/diagnostics
  • GET /support-sessions/{id}/logs
  • GET /support-sessions/{id}/proxy/{service}/{path}
  • GET /support-sessions/{id}/services
  • GET /support-sessions/{id}/terminal
  • PATCH /support-sessions/{id}/extend
  • POST /support-sessions/{id}/proxy-token
  • POST /support-sessions/{id}/services
  • POST /support-sessions/{id}/terminal-ticket

Modified (5)

  • GET /systems
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property modified: systems
  • GET /systems/{id}
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • POST /systems
    • Response modified: 201
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • POST /systems/{id}/regenerate-secret
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • PUT /systems/{id}
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
Powered by Bump.sh

@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:11 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:15 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:52 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:32 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:39 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:54 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:07 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:22 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:31 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:39 — with Render Active
@edospadoni edospadoni force-pushed the feature/support-service branch from c62b877 to 007bd6d Compare March 10, 2026 10:53
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:54 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 13:23 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 14:33 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 11, 2026 13:09 — with Render Active
@edospadoni
Copy link
Member Author

edospadoni commented Mar 11, 2026

tunnel-client binary (linux/amd64)

Download:

tunnel-client.zip

Quick start

# Make it executable
chmod +x tunnel-client-linux-amd64

# Run it
./tunnel-client-linux-amd64 \
  --url wss://my-proxy-qa-pr-47.onrender.com/support/api/tunnel \
  --key <SYSTEM_KEY> \
  --secret <SYSTEM_SECRET>

Parameters

Flag Env var Description
-u, --url SUPPORT_URL WebSocket tunnel URL (required)
-k, --key SYSTEM_KEY System key from registration (required)
-s, --secret SYSTEM_SECRET System secret from registration (required)
-n, --node-id NODE_ID Cluster node ID, auto-detected on NS8
-r, --redis-addr REDIS_ADDR Redis address, auto-detected on NS8
--static-services STATIC_SERVICES Manual service definition: name=host:port[:tls],...
--exclude EXCLUDE_PATTERNS Comma-separated glob patterns to exclude services
--tls-insecure TLS_INSECURE Skip TLS certificate verification
--discovery-interval DISCOVERY_INTERVAL Service re-discovery interval (default 5m)
--reconnect-delay RECONNECT_DELAY Base reconnect delay (default 5s)
--max-reconnect-delay MAX_RECONNECT_DELAY Max reconnect delay (default 5m)
--diagnostics-dir DIAGNOSTICS_DIR Directory with diagnostic plugin scripts (default /usr/share/my/diagnostics.d)
--diagnostics-plugin-timeout DIAGNOSTICS_PLUGIN_TIMEOUT Timeout per diagnostic plugin (default 10s)
--diagnostics-total-timeout DIAGNOSTICS_TOTAL_TIMEOUT Max time to wait for all diagnostics before giving up (default 30s)

Service discovery modes

The tunnel-client auto-detects the environment:

  • NS8: discovers services from Redis + Traefik routes
  • NethSecurity: discovers services from OpenWrt/nginx config
  • Static: define services manually with --static-services

Diagnostics plugin system

At connect time, the tunnel-client collects a health snapshot and sends it to MY over the tunnel. Operators see the results directly in the support session popover — before opening a terminal or proxy — so they have immediate context on the system state.

How it works:

  1. When the tunnel-client starts, it runs all diagnostic plugins in parallel with the WebSocket connection (no delay to the connection itself)
  2. After sending the service manifest, it waits up to --diagnostics-total-timeout (default 30s) for the plugins to finish, then pushes the aggregated report over a dedicated yamux stream
  3. The support service stores the report on the session; MY shows it in the session popover

Built-in plugin (system): always runs regardless of configuration. Collects:

  • OS name and version (from /etc/os-release)
  • CPU load averages (1m / 5m / 15m from /proc/loadavg)
  • RAM usage (from /proc/meminfo, warning >85%, critical >95%)
  • Root disk usage (warning >85%, critical >95%)
  • System uptime

External plugins: any executable file placed in /usr/share/my/diagnostics.d/ is run automatically. Files are executed in alphabetical order, each with its own timeout. This allows NS8 modules, NethSecurity, and third-party integrations to ship their own health checks independently.

Each plugin must:

  • Write a JSON object to stdout
  • Signal severity via exit code: 0 = ok, 1 = warning, 2 = critical
#!/bin/bash
# /usr/share/my/diagnostics.d/10-myservice.sh

STATUS="ok"
SUMMARY="all good"

if ! systemctl is-active --quiet myservice; then
  STATUS="critical"
  SUMMARY="myservice is not running"
fi

echo "{\"id\":\"myservice\",\"name\":\"My Service\",\"status\":\"$STATUS\",\"summary\":\"$SUMMARY\"}"
exit $([ "$STATUS" = "ok" ] && echo 0 || echo 2)

The overall session status shown in MY is the worst status across all plugins (critical > warning > ok). If a plugin exceeds its timeout it is marked timeout and does not block the others.

If --diagnostics-dir points to a directory that does not exist, only the built-in system plugin runs — no error, no configuration needed on systems that have not installed any plugins yet.

Environment variables

All flags can also be passed as env vars:

export SUPPORT_URL=wss://my-proxy-qa-pr-47.onrender.com/support/api/tunnel
export SYSTEM_KEY=<your-key>
export SYSTEM_SECRET=<your-secret>
./tunnel-client-linux-amd64

@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 11, 2026 14:06 — with Render Active
Support session CRUD, WebSocket terminal with one-time tickets, subdomain proxy
with body rewriting, access logging, RBAC with connect:systems permission,
database migrations, and security hardening from penetration test findings.
Support sessions table with pagination and sorting, xterm.js web terminal
with multi-tab support, service dropdown with multi-node grouping,
connect:systems permission guard, and i18n translations.
Add support service routing in nginx proxy, Render.com deployment config,
CI pipeline with tunnel-client Docker image and rolling dev release,
release workflow with tunnel-client binary and SBOM, connect:systems RBAC permission.
Allows manually created services (not from Blueprint) to be reached
from PR preview environments by setting their env var to a full FQDN.
…kend

Address 27 findings from security audit: prevent double-close panic with
sync.Once, fix TOCTOU race in session creation with DB transaction, add
gzip bomb protection, limit manifest size/rate, validate service names,
use full session UUID in subdomain proxy, add org_role to proxy tokens,
harden WebSocket origin checks, add session rate limiting, fix concurrent
read/write safety, and multiple other hardening improvements.
…kend

Address 23 findings from penetration testing report on the support service:
- SSRF/DNS rebinding prevention with IP validation and DNS resolution checks
- Open redirect fix via protocol-relative URL sanitization
- CORS restriction from AllowAllOrigins to localhost-only in debug mode
- HSTS, CSP, X-Content-Type-Options security headers in nginx proxy
- InternalSecret middleware for defense-in-depth inter-service auth
- PTY environment variable sanitization to prevent credential leakage
- Cookie rewriting to prevent cross-session domain leakage
- Global memory budget (50MB) for gzip decompression (bomb mitigation)
- CONNECT protocol newline injection prevention with service name validation
- Container hardening with nginx-unprivileged and non-root users
- Input validation for node_id and service names
- Nginx server_name regex anchoring for multi-environment support
- Rate limiter single-instance design documentation
- Non-functional default secrets in .env.example files
Add pid directive to /tmp/nginx.pid and create writable cache directories
so nginx can run as non-root user without permission errors.
Add https://*.nethesis.it to connect-src so the frontend can reach
the Logto identity provider for OIDC flows.
Embed the support session ID directly in system list and detail
endpoints to avoid N+1 API calls when checking session status per system.
Show a clickable headset icon next to system name when an active support
session exists. The popover displays session status, dates, and connected
operators with per-node terminal badges. Backend now tracks terminal
disconnect times via access log lifecycle (insert returns ID, disconnect
updates disconnected_at).
…able rate limits

Refactor the tunnel-client from a single 1181-line main.go into organized
internal packages (config, connection, discovery, models, stream, terminal).
Rename traefik.go to nethserver.go with updated function names and log messages.
Replace YAML config with EXCLUDE_PATTERNS env var / --exclude flag for service
filtering. Improve api-cli error logging to include stderr output. Add
configurable rate limiting via env vars (RATE_LIMIT_TUNNEL_PER_IP,
RATE_LIMIT_TUNNEL_PER_KEY, RATE_LIMIT_SESSION_PER_ID, RATE_LIMIT_WINDOW)
with session limit raised from 100 to 500 req/min. Add build-tunnel-client
and run-tunnel-client Makefile targets.
Shift migrations to avoid conflict with 017_inventory_fk_set_null added on main.
@edospadoni edospadoni force-pushed the feature/support-service branch from f56f203 to 2fbfa44 Compare March 19, 2026 09:32
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 19, 2026 09:32 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 19, 2026 09:32 — with Render Active
At connect time, the tunnel-client collects a health report and pushes
it to the support service over a dedicated yamux stream. Operators see
the results in the session popover before opening a terminal or proxy.

Built-in system plugin always runs (CPU load, RAM, disk, uptime, OS
info). External plugins can be dropped as executables in
/usr/share/my/diagnostics.d/ - NS8 modules and NethSecurity can ship
their own health checks independently. Each plugin writes JSON to
stdout and signals severity via exit code (0=ok, 1=warning, 2=critical).

The overall session status is the worst status across all plugins.
Diagnostics run in parallel with the WebSocket connection to avoid
adding latency. A per-plugin timeout (default 10s) and a total timeout
(default 30s) prevent slow plugins from blocking the session.

- tunnel-client: new internal/diagnostics package (runner + models),
  built-in system check, DIAGNOSTICS yamux stream after manifest
- support service: acceptControlStream distinguishes DIAGNOSTICS header
  from manifest JSON, SaveDiagnostics() stores JSONB on session
- backend: GET /api/support-sessions/:id/diagnostics with RBAC scoping,
  migration 021 adds diagnostics + diagnostics_at columns
- frontend: diagnostics section in SupportSessionPopover with status
  dot and per-plugin summary rows
Operators can now inject arbitrary host:port services into a running
tunnel session without reconnection, enabling access to LAN devices
(IP phones, switches) through the support proxy.

- Backend: POST /support-sessions/:id/services with RBAC, validation,
  and Redis pub/sub dispatch (add_services action)
- Support service: SendCommandToSession() opens outbound yamux stream,
  writes COMMAND 1\n + JSON payload, waits for OK/ERROR
- Tunnel-client: accept loop pre-reads first line to route COMMAND vs
  CONNECT streams; thread-safe serviceStore with sync.RWMutex
- Frontend: Add Service modal with name/target/label/TLS fields; 1500ms
  delay before re-fetching services to account for async round-trip
- OpenAPI: documented new endpoint with Conflict response component
- README: added COMMAND stream table, Static Service Injection section
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 19, 2026 14:03 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 19, 2026 14:03 — with Render Active
Fixes 10 security issues identified in the pen-test review of the
static service injection and diagnostics features:

- SSRF bypass in applyAddServices (HMAC-signed Redis commands, server
  pre-check, and client-side validateTarget)
- Diagnostics JSON schema validation, 512 KB size cap, and DB-enforced
  rate limit across reconnections
- Diagnostic plugins rejected if not owned by root or writable by
  others; sanitized environment strips credentials
- host:port validation uses net.SplitHostPort with numeric range check
- DIAGNOSTICS stream version validated as exact "DIAGNOSTICS 1"
- serviceStore total cap (500) prevents unbounded growth
- Diagnostics goroutine starts only after yamux session is established
Remote apps (NethVoice, NethCTI) proxied through different subdomains
make cross-origin API calls that require CORS headers and shared cookie
authentication across sibling subdomains of the same support session.

Backend:
- Move CORS middleware from router to /api group so it does not
  intercept /support-proxy/* routes
- Add CORS preflight (OPTIONS 204) and response headers for
  same-session sibling subdomains (validated by session slug match)
- Scope proxy cookie to .support.{domain} with SameSite=Lax so it
  is shared across all service subdomains of the same session
- Remove per-service token validation: session ID match is sufficient
  since users have session-level access

Support service:
- Fix non-deterministic hostname rewriting in buildHostRewriteMap:
  when multiple services share the same original hostname, the current
  service's proxy subdomain is always preferred, keeping API calls
  same-origin and letting Traefik handle path-based routing
@edospadoni edospadoni force-pushed the feature/support-service branch from d683765 to 50624ac Compare March 20, 2026 10:43
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 20, 2026 10:43 — with Render Active
…er display

Add GET /api/support-sessions/diagnostics?system_id=X endpoint that returns
diagnostics for all active sessions of a system grouped by node, with an
overall_status reflecting the worst across all nodes. Update the frontend
popover to show collapsible per-node sections for multi-node NS8 clusters
while keeping the flat list for single-node systems.
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 20, 2026 11:00 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 20, 2026 11:00 — with Render Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant