Skip to content

bug: 403 GraphQL subscription auth failures in bot websocket client #363

@flexus-teams

Description

@flexus-teams

Original Logs

403: Whoops your key didn't work (2).
That looks bad, my key doesn't work: {'message': "403: Whoops your key didn't work (2).", ... 'path': ['bot_confirm_exists']}
3 exceptions in 5 min, exiting

Error Summary

Multiple isolated bot pods (tasktopus, botticelli, boss, frog, bob, karen) repeatedly fail GraphQL subscription/websocket auth with the same 403 message and restart. The failures happen in the client-side websocket/subscription runner used by stexe/btexe.

Stacktrace

flexus_client_kit/ckit_service_exec.py: run_typical_single_subscription_with_restart_on_network_errors
flexus_client_kit/ckit_bot_exec.py: subscription/auth handling around bot_confirm_exists

Root Cause

  • File: flexus_client_kit/ckit_service_exec.py:46-56
  • Function: run_typical_single_subscription_with_restart_on_network_errors
  • Why: The code treats any 403: websocket transport error as a transient authentication failure, logs it, waits, and retries until three exceptions in five minutes occur. The repeated key/auth failure is not recoverable by retrying and causes CrashLoopBackOff in multiple bot pods.
  • Git blame: @oleg Klimov in 4983917c / cc57e582 / 2821d11b (2025-10 to 2026-01 changes)

Code Snippet

            err_str = str(e)
            if "460:" in err_str:
                logger.error("%s", e)
                sys.exit(1)
            elif "403:" in err_str:
                logger.error("Authentication failed - key doesn't work: %s", e)
            else:
                nothing = isinstance(e, gql.transport.exceptions.TransportError)
                logger.info("got %s (attempt %d/3), sleep 60...", type(e).__name__, len(exception_times), exc_info=(not nothing))
            await ckit_shutdown.wait(60)

Affected

  • Pods: tasktopus, botticelli, boss, frog, bob, karen bot pods in isolated
  • Namespaces: isolated
  • Occurrences: repeated across many pods

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions