Skip to content

feat(tls): add mkcert-based TLS for HTTPS on port 8443#158

Draft
bussyjd wants to merge 41 commits intomainfrom
feat/mkcert-tls
Draft

feat(tls): add mkcert-based TLS for HTTPS on port 8443#158
bussyjd wants to merge 41 commits intomainfrom
feat/mkcert-tls

Conversation

@bussyjd
Copy link
Collaborator

@bussyjd bussyjd commented Feb 13, 2026

Summary

  • Add mkcert-based TLS certificate generation for *.obol.stack wildcard domain
  • New internal/tls package handles cert generation, path helpers, and K8s Secret creation
  • obol stack init generates trusted certs via mkcert; obol stack up creates the TLS Secret and patches the Traefik Gateway with a websecure HTTPS listener on port 8443
  • OpenClaw HTTPRoutes attach to both web (HTTP) and websecure (HTTPS) listeners
  • Dashboard URL auto-detects HTTPS when certs are present
  • Import gateway token from ~/.openclaw/openclaw.json during setup (fixes first-deploy validation)
  • Default model switched to gpt-oss:20b-cloud (Ollama cloud)

Context

OpenClaw's Control UI requires crypto.subtle (Web Crypto API) for device identity — browsers only expose this in secure contexts (HTTPS or localhost). Without TLS, http://<instance>.obol.stack:8080 forces fallback to allowInsecureAuth: true (token-only auth, no device pairing).

This PR enables https://<instance>.obol.stack:8443 so crypto.subtle is available and full device auth works. HTTP fallback on port 8080 remains functional.

Note on trust store: mkcert -install runs during obol stack init (interactive, can prompt for OS keychain). JAVA_HOME is stripped from the environment to avoid keytool errors on systems with broken Java keystores.

Closes #155

Test plan

  • go test ./... — all tests pass (TLS, stack helmfile patching, overlay generation)
  • obol stack init generates certs to <configDir>/tls/
  • obol stack up creates obol-stack-tls Secret in traefik namespace
  • curl https://obol.stack:8443/ — TLS 1.3, certificate verified, 200 OK
  • curl https://openclaw-default.obol.stack:8443/ — wildcard SAN match, 200 OK
  • curl http://obol.stack:8080/ — HTTP fallback still works, 200 OK
  • Inference waterfall: HTTPS → OpenClaw → Ollama cloud → response
  • Fresh install test on clean machine (verify mkcert -install prompts correctly)

bussyjd and others added 30 commits January 12, 2026 12:26
Update dependency versions to latest stable releases:
- kubectl: 1.31.0 → 1.35.0
- helm: 3.19.1 → 3.19.4
- helmfile: 1.2.2 → 1.2.3
- k9s: 0.32.5 → 0.50.18
- helm-diff: 3.9.11 → 3.14.1

k3d remains at 5.8.3 (already current).
Replace nginx-ingress controller with Traefik 38.0.2 using Kubernetes
Gateway API for routing. This addresses the nginx-ingress deprecation
(end of maintenance March 2026).

Changes:
- Remove --disable=traefik from k3d config to use k3s built-in Traefik
- Replace nginx-ingress helm release with Traefik 38.0.2 in infrastructure
- Configure Gateway API provider with cross-namespace routing support
- Add GatewayClass and Gateway resources via Traefik helm chart
- Convert all Ingress resources to HTTPRoute format:
  - eRPC: /rpc path routing
  - obol-frontend: / path routing
  - ethereum: /execution and /beacon path routing with URL rewrite
  - aztec: namespace-based path routing with URL rewrite
  - helios: namespace-based path routing with URL rewrite
- Disable legacy Ingress in service helm values

Closes #125
Add Cloudflare Tunnel integration to expose obol-stack services publicly
without port forwarding or static IPs. Uses quick tunnel mode for MVP.

Changes:
- Add cloudflared Helm chart (internal/embed/infrastructure/cloudflared/)
- Add tunnel management package (internal/tunnel/)
- Add CLI commands: obol tunnel status/restart/logs
- Integrate cloudflared into infrastructure helmfile

The tunnel deploys automatically with `obol stack up` and provides a
random trycloudflare.com URL accessible via `obol tunnel status`.

Future: Named tunnel support for persistent URLs (obol tunnel login)
Update documentation to reflect the upgraded dependency versions
in obolup.sh. This keeps the documentation in sync with the actual
pinned versions used by the bootstrap installer.
# Conflicts:
#	internal/embed/infrastructure/helmfile.yaml
Introduce the inference marketplace foundation: an x402-enabled reverse
proxy that wraps any OpenAI-compatible inference service with USDC
micropayments via the x402 protocol.

Components:
- internal/inference/gateway.go: net/http reverse proxy with x402 middleware
- cmd/inference-gateway/: standalone binary for containerisation
- cmd/obol/inference.go: `obol inference serve` CLI command
- internal/embed/networks/inference/: helmfile network template deploying
  Ollama + gateway + HTTPRoute (auto-discovered by existing CLI)
- Dockerfile.inference-gateway: distroless multi-stage build

Provider: obol network install inference --wallet-address 0x... --model llama3.2:3b
Consumer: POST /v1/chat/completions with X-PAYMENT header (USDC on Base)
feat(inference): add x402 pay-per-inference gateway (Phase 1)
- Remove unused $publicDomain variable from helmfile.yaml (caused
  Helmfile v1 gotmpl pre-processing to fail on .Values.* references)
- Fix eRPC secretEnv: chart expects plain strings, not secretKeyRef
  maps; move OBOL_OAUTH_TOKEN to extraEnv with valueFrom
- Fix obol-frontend escaped quotes in gotmpl (invalid \\" in operand)
Replace the in-cluster Ollama Deployment/PVC/Service with an
ExternalName Service that routes ollama.llm.svc.cluster.local to the
host machine's Ollama server. LLMSpy and all consumers use the stable
cluster-internal DNS name; the ExternalName target is resolved during
stack init via the {{OLLAMA_HOST}} placeholder:

  k3d  → host.k3d.internal
  k3s  → node gateway IP (future)

This avoids duplicating the model cache inside the cluster and
leverages the host's GPU/VRAM for inference.

Also updates CopyDefaults to accept a replacements map, following
the same pattern used for k3d.yaml placeholder resolution.
refactor(llm): proxy to host Ollama via ExternalName Service
The obol-agent deployment in the agent namespace fails with
ImagePullBackOff because its container image is not publicly
accessible. Wrap the template in a Helm conditional
(obolAgent.enabled) defaulting to false so it no longer deploys
automatically. The manifest is preserved for future use — set
obolAgent.enabled=true in the base chart values to re-enable.
fix(infra): disable obol-agent from default stack deployment
Add GitHub Actions workflow to build and publish the OpenClaw container
image to ghcr.io/obolnetwork/openclaw from the upstream openclaw/openclaw
repo at a pinned version. Renovate watches for new upstream releases and
auto-opens PRs to bump the version file.

Closes #142
Add integration-okr-1 and feat/openclaw-ci to push triggers for testing.
Remove after verifying the workflow runs successfully — limit to main only.
The pinned SHAs from charon-dkg-sidecar were stale and caused the
security-scan job to fail at setup.
bussyjd and others added 11 commits February 10, 2026 14:52
ci(openclaw): Docker image build workflow with Renovate auto-bump
* feat(openclaw): add OpenClaw CLI and embedded chart for Obol Stack

Adds `obol openclaw` subcommands to deploy and manage OpenClaw AI agent
instances on the local k3d cluster. The chart is embedded via go:embed
for development use; the canonical chart lives in ObolNetwork/helm-charts.

CLI commands:
  openclaw up      - Create and deploy an instance
  openclaw sync    - Re-deploy / update an existing instance
  openclaw token   - Retrieve the gateway token
  openclaw list    - List deployed instances
  openclaw delete  - Remove an instance
  openclaw skills  - Sync skills from a local directory

The embedded Helm chart supports:
  - Pluggable model providers (Anthropic, OpenAI, Ollama)
  - Chat channels (Telegram, Discord, Slack)
  - Skills injection via ConfigMap + init container
  - RBAC, Gateway API HTTPRoute, values schema validation

* feat(openclaw): integrate OpenClaw into stack setup with config import

OpenClaw is now deployed automatically as a default instance during
`obol stack up`. Adds ~/.openclaw/openclaw.json detection and import,
interactive provider selection for direct CLI usage, and idempotent
re-sync behavior for the default instance.

* fix: resolve CRD conflicts, OpenClaw command, HTTPRoute spec, and KUBECONFIG propagation

- Remove gateway-api-crds presync hook; Traefik v38+ manages its own CRDs
- Fix Ethereum HTTPRoute: use single PathPrefix match (Gateway API spec)
- Fix OpenClaw chart command to match upstream Dockerfile (node openclaw.mjs)
- Update OpenClaw image tag to match GHCR published format (no v prefix)
- Add KUBECONFIG env to helmfile subprocess in stack.go (aligns with all other packages)

* feat(openclaw): detect and import existing ~/.openclaw workspace + bump to v2026.2.9

Auto-detect existing OpenClaw installations during `obol stack up` and
`obol openclaw up`. When ~/.openclaw/ contains a workspace directory with
personality files (SOUL.md, AGENTS.md, etc.), copies them into the pod's
PVC after deployment. Auto-skips interactive provider prompts when an
existing config with providers is detected.

Also bumps the chart image to v2026.2.9 to match the CI-published image.

* feat(openclaw): add setup wizard and dashboard commands

Add `obol openclaw setup <id>` which port-forwards to the deployed
gateway and runs the native OpenClaw onboard wizard via PTY. The wizard
provides the full onboarding experience (personality, channels, skills,
providers) against the running k8s instance.

Add `obol openclaw dashboard <id>` which port-forwards and opens the
web dashboard in the browser with auto-injected gateway token.

Implementation details:
- Port-forward lifecycle manager with auto-port selection
- PTY-based wizard with raw terminal mode for @clack/prompts support
- Sliding-window marker detection to exit cleanly when wizard completes
- Proper PTY shutdown sequence (close master -> kill -> wait) to avoid
  hang caused by stdin copy goroutine blocking cmd.Wait()
- Refactored Token() into reusable getToken() helper
- findOpenClawBinary() searches PATH then cfg.BinDir with install hints
- obolup.sh gains install_openclaw() for npm-based binary management

* feat(llm,openclaw): llmspy universal proxy + openclaw CLI passthrough

Route all cloud API traffic through llmspy as a universal gateway:
- Add Anthropic/OpenAI providers to llm.yaml (ConfigMap + Secret + envFrom)
- New `internal/llm` package with ConfigureLLMSpy() for imperative patching
- New `obol llm configure` command for standalone provider setup
- OpenClaw overlay routes through llmspy:8000/v1 instead of direct cloud APIs
- Bump llmspy image to obol fork rc.2 (fixes SQLite startup race)

Add `obol openclaw cli <id> -- <args>` passthrough:
- Remote-capable commands (gateway, acp, browser, logs) via port-forward
- Local-only commands (doctor, models, config) via kubectl exec
- Replace PTY-based setup wizard with non-interactive helmfile sync flow
- Remove creack/pty and golang.org/x/term dependencies

* fix(openclaw): rename up→onboard, fix api field and macOS host resolution

- Rename `obol openclaw up` to `obol openclaw onboard`
- Set api: "openai-completions" in llmspy-routed overlay (fixes
  "No API provider registered for api: undefined" in OpenClaw)
- Use host.docker.internal on macOS for Ollama ExternalName service
  (host.k3d.internal doesn't resolve on Docker Desktop)

* feat(openclaw): detect Ollama availability before offering it in setup wizard

SetupDefault() now probes the host Ollama endpoint before deploying with
Ollama defaults — skips gracefully when unreachable so users without
Ollama can configure a cloud provider later via `obol openclaw setup`.
interactiveSetup() dynamically shows a 3-option menu (Ollama/OpenAI/
Anthropic) when Ollama is detected, or a 2-option menu (OpenAI/Anthropic)
when it isn't.

* docs: add LLM configuration architecture to CLAUDE.md

Document the two-tier model: global llmspy gateway (cluster-wide keys
and provider routing) vs per-instance OpenClaw config (overlay values
pointing at llmspy or directly at cloud APIs). Includes data flow
diagram, summary table, and key source files reference.
Update Anthropic models to include Opus 4.6, replace retiring GPT-4o
with GPT-5.2, add next-step guidance to NOTES.txt, and clarify gateway
token and skills injection comments per CTO review feedback.
Sync _helpers.tpl, validate.yaml, and values.yaml comments to match
the helm-charts repo. Key changes:
- Remove randAlphaNum gateway token fallback (require explicit value)
- Add validation: gateway token required for token auth mode
- Add validation: RBAC requires serviceAccount.name when create=false
- Add validation: initJob requires persistence.enabled=true
- Align provider and gateway token comments
Add a local dnsmasq-based DNS resolver that enables wildcard hostname
resolution for per-instance routing (e.g., openclaw-myid.obol.stack)
without manual /etc/hosts entries.

- New internal/dns package: manages dnsmasq Docker container on port 5553
- macOS: auto-configures /etc/resolver/obol.stack (requires sudo once)
- Linux: prints manual DNS configuration instructions
- stack up: starts DNS resolver (idempotent, non-fatal on failure)
- stack purge: stops DNS resolver and removes system resolver config
- stack down: leaves DNS resolver running (cheap, persists across restarts)

Closes #150
DNS resolver: add systemd-resolved integration for Linux. On Linux,
dnsmasq binds to 127.0.0.2:53 (avoids systemd-resolved's stub on
127.0.0.53:53) and a resolved.conf.d drop-in forwards *.obol.stack
queries. On macOS, behavior is unchanged (port 5553 + /etc/resolver).

Also fixes dnsmasq startup with --conf-file=/dev/null to ignore
Alpine's default config which enables local-service (rejects queries
from Docker bridge network).

Fix llmspy image tag: 3.0.32-obol.1-rc.2 does not exist on GHCR,
corrected to 3.0.32-obol.1-rc.1.
…153)

OpenClaw's control UI rejects WebSocket connections with "1008: control ui
requires HTTPS or localhost (secure context)" when running behind Traefik
over HTTP. This adds:

- Chart values and _helpers.tpl rendering for controlUi.allowInsecureAuth
  and controlUi.dangerouslyDisableDeviceAuth gateway settings
- trustedProxies chart value for reverse proxy IP allowlisting
- Overlay generation injects controlUi settings for both imported and
  fresh install paths
- RBAC ClusterRole/ClusterRoleBinding for frontend OpenClaw instance
  discovery (namespaces, pods, configmaps, secrets)
…lowInsecureAuth

The dangerouslyDisableDeviceAuth flag is completely redundant when running
behind Traefik over HTTP: the browser's crypto.subtle API is unavailable
in non-secure contexts (non-localhost HTTP), so the Control UI never sends
device identity at all. Setting dangerouslyDisableDeviceAuth only matters
when the browser IS in a secure context but you want to skip device auth —
which doesn't apply to our Traefik proxy case.

allowInsecureAuth alone is sufficient: it allows the gateway to accept
token-only authentication when device identity is absent. Token auth
remains fully enforced — connections without a valid gateway token are
still rejected.

Security analysis:
- Token/password auth: still enforced (timing-safe comparison)
- Origin check: still enforced (same-origin validation)
- Device identity: naturally skipped (browser can't provide it on HTTP)
- Risk in localhost k3d context: Low (no external attack surface)
- OpenClaw security audit classification: critical (general), but
  acceptable for local-only dev stack

Refs: plans/security-audit-controlui.md, plans/trustedproxies-analysis.md
Enable crypto.subtle in browsers by serving *.obol.stack over HTTPS.
This allows OpenClaw's Control UI to use Web Crypto API for device
identity (Ed25519 keypair) instead of falling back to token-only auth.

- New internal/tls package: cert generation, K8s Secret management
- obolup.sh: install mkcert v1.4.4 as optional dependency
- stack init: generate wildcard cert for *.obol.stack via mkcert
- stack up: create obol-stack-tls Secret, patch helmfile for websecure
  Gateway listener with HTTPS on port 8443
- openclaw: add websecure parentRef, HTTPS dashboard URL when certs exist
- Import gateway token from ~/.openclaw/openclaw.json on setup
- Default model switched to gpt-oss:20b-cloud (Ollama cloud)

HTTP fallback on port 8080 remains fully functional.
allowInsecureAuth kept for HTTP-only environments.

Closes #155
Base automatically changed from feat/wildcard-dns-resolver to integration-okr-1 February 13, 2026 14:49
@bussyjd bussyjd marked this pull request as draft February 16, 2026 18:51
Base automatically changed from integration-okr-1 to main February 17, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments