Skip to content

Fix dialog lifecycle race on GNOME 46+ (refocused follow-up to #9)#11

Open
tmetz1987 wants to merge 2 commits intonick-redwill:mainfrom
tmetz1987:fix/gnome49-lifecycle
Open

Fix dialog lifecycle race on GNOME 46+ (refocused follow-up to #9)#11
tmetz1987 wants to merge 2 commits intonick-redwill:mainfrom
tmetz1987:fix/gnome49-lifecycle

Conversation

@tmetz1987
Copy link
Copy Markdown

Follow-up to #9 — refocused to only the part you indicated you'd consider reviewing again, with a fix attempting to address the GNOME 49 silent-fail you observed.

What changed vs. #9

The two commits you rejected on scope grounds — the screen shield lightbox override and the Mutter PowerSaveMode DPMS wake — have been dropped from this PR entirely. I hear you on both counts:

  • The lightbox and DPMS fixes go beyond "simple video playback" and step on territory that Unblank lock screen already owns. I also confirmed the conflict you described — when both extensions are active, the lightbox override wins and your video never shows. That's clearly not acceptable upstream.
  • I'm keeping those two changes as local-only patches on my Pi, where Unblank isn't installed and the physical DPMS blank is visible because of the slow first-frame time. They won't come back in this PR.

So this PR contains just the lifecycle refactor (refactor: defer player setup until screen shield is active) plus one small follow-up commit that tries to fix the "works on first lock, silent fail on subsequent" regression you hit on GNOME 49.

Commit 1 — lifecycle refactor (unchanged from #9)

Same rationale as before: hooking Main.screenShield._dialog directly at enable() time is unreliable because the dialog can be null (never locked yet) or a dead reference (recreated between enable() and the next lock cycle). Moving the player spawn and injection into a Main.screenShield::active-changed handler guarantees we inject into the live dialog every time.

Validated on Raspberry Pi 5 / Ubuntu 24.04 / GNOME Shell 46.0 / GJS 1.80.2 / Wayland. All upstream behavior preserved: connector-based window mapping, _forceFullscreen debug path, multi-monitor position signals, and the GNOME 48 TapAction replacement from #7.

Commit 2 — bounded, cancelable retry (new, tries to address your GNOME 49 observation)

On your feedback re: GNOME 49:

It appears to work on the first lock, but then fails silently on subsequent attempts.

I can't reproduce this directly — I don't have a GNOME 49 machine — so I went back and re-read the code for any path that could produce exactly that symptom. I think I found one:

// Original retry, from the first commit:
_injectIntoDialog() {
    const dialog = Main.screenShield._dialog;
    if (!dialog) {
        GLib.timeout_add(GLib.PRIORITY_DEFAULT, 100, () => {
            if (this._lockActive) this._injectIntoDialog();
            return GLib.SOURCE_REMOVE;
        });
        return;
    }
    ...
}

Two latent issues here that I missed in #9:

  1. The retry is single-shot. It schedules exactly one 100ms retry. If _dialog isn't populated by the time that one retry fires, the whole injection silently gives up. Nothing logs, nothing throws.
  2. The retry timer id is never saved, so _teardownForUnlock() can't cancel it. If unlock happens while the retry is pending, the retry still fires — potentially into partially-torn-down state.

Hypothesis for the second-lock symptom on GNOME 49: on the first lock of a session, _dialog is either already present or arrives within 100ms, so the happy path catches it. On unlock, GNOME 49 may tear the dialog down. On the second lock, _dialog is recreated asynchronously and is slower than 100ms, so the single retry fires before it's ready and we silently give up. From the user's perspective: first lock shows video, second lock shows the default GNOME lock screen with no error.

This is still a hypothesis — I have no way to verify the exact timing on GNOME 49 from here. But the bounded-retry fix is defensible regardless of whether it turns out to be the root cause:

  • Retries up to 20 × 100ms (~2s total) instead of once.
  • Tracks the timer id in this._injectRetryId and cancels it in _teardownForUnlock().
  • Logs a console.warn if we exhaust retries, so if it ever fails again the mode is debuggable instead of silent.
  • Happy path is unchanged: if _dialog is present on the first attempt (as it is on GNOME 46 in my testing), the retry code is never reached.

If this doesn't fix it on your GNOME 49 setup, I'd really appreciate two things:

  1. The output of journalctl --user -b -o cat /usr/bin/gnome-shell | grep -i lockscreen after a failing lock cycle — the new console.warn should reveal whether we're giving up in the retry path or somewhere else entirely.
  2. Whether it's the exact same dialog instance on second lock (Main.screenShield._dialog === <previous>) or a new one. That'd tell us if the dialog is being recreated at all on 49.

I'm happy to iterate further. I'd rather resolve this for real than force it through.

Test plan

  • GNOME 46 / Wayland / Pi 5: repeated Super+L lock/unlock cycles, no leaked timers or wrapper actors
  • gnome-extensions disable while locked → cleanly reverts
  • gnome-extensions enable → works on next lock
  • Extension toggled on while already locked (the Main.screenShield.active early-run path in enable())
  • Needs reviewer verification on GNOME 49 — specifically the second and subsequent locks in a session

Moves the player spawn, window interception, and screen shield dialog
injection out of enable() and into a new _setupForLock() method that
runs whenever Main.screenShield emits active-changed with active=true.
A matching _teardownForUnlock() runs on unlock and on disable().

Reason: hooking Main.screenShield._dialog directly at enable() time is
unreliable on GNOME 46 Wayland. The dialog object can be null (if the
extension is enabled before the user has ever locked) or a dead
reference (if GNOME recreates the dialog between enable() and the next
lock cycle). On Raspberry Pi 5 / Ubuntu 24.04 this manifested as a
black screen on Super+L with no visible video, because the injections
were targeting a destroyed dialog instance. Listening to active-changed
guarantees we inject into the current, live dialog on every lock.

_injectIntoDialog() includes a short poll fallback because
active-changed can fire a hair before _dialog is populated in some
GNOME versions. All upstream behavior is preserved: connector-based
window-actor mapping, _forceFullscreen debug path, multi-monitor
position signals, and the GNOME 48 TapAction replacement.
The original `_injectIntoDialog()` retry path had two issues that
could cause the extension to silently fail on GNOME versions where
`_dialog` is recreated (and populated asynchronously) on each lock:

  1. It only retried once after 100ms. If `_dialog` wasn't ready
     within that single tick, the retry returned and no injection
     ever happened.

  2. The retry timer id was never saved, so `_teardownForUnlock()`
     could not cancel a pending retry. If unlock happened during
     the 100ms wait, the retry would fire into a half-torn-down
     state on the next lock.

This matches the reported symptom in nick-redwill#9 ("works on first lock but
fails silently on subsequent attempts" on GNOME 49) — though I
cannot reproduce it directly, since my test environment is GNOME
46 / Raspberry Pi 5 / Ubuntu 24.04.

The fix:

  - Cap the retry loop at ~2s (20 × 100ms) and log if we give up,
    so the failure mode becomes debuggable instead of silent.
  - Track the timer id in `this._injectRetryId` and cancel it in
    `_teardownForUnlock()`, and reset `_injectAttempts` there too.
  - Initialise both fields in `_resetLockState()` for symmetry.

No behaviour change on the happy path: if `_dialog` is present on
the first attempt (as it always was in my GNOME 46 testing), the
retry code is never reached.
@nick-redwill
Copy link
Copy Markdown
Owner

Hi, thanks for the update!

Unfortunately, the issue with videos not playing is still present.

I did some digging and noticed that the Main.screenShield::active-changed signal does not fire after the first lock, even though the Main.screenShield.active value is clearly changing. I’m not sure why this happens, but it means we probably should not rely on this signal.

At this point, the only reliable approach seems to be polling the state every N ms (e.g. every 100 ms), checking Main.screenShield.active and Main.screenShield.dialog. Or we might only need to check the dialog, since active is already set to true by the time we intercept windows.

That said, I’m a bit curious about your experience. In my testing, I’ve never encountered a situation where windows were intercepted before the dialog was initialized, and no users have reported issues like that either. From your previous PR, it seems like you did run into this case.

How often does this happen for you and under what conditions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants