Skip to content

WIP: fix(tracing): finish active trace on crash#1667

Open
jpnurmi wants to merge 16 commits intomasterfrom
jpnurmi/fix/trace-finish
Open

WIP: fix(tracing): finish active trace on crash#1667
jpnurmi wants to merge 16 commits intomasterfrom
jpnurmi/fix/trace-finish

Conversation

@jpnurmi
Copy link
Copy Markdown
Collaborator

@jpnurmi jpnurmi commented Apr 23, 2026

Before this PR, an in-flight trace was silently dropped on crash. And even with a manual sentry_transaction_finish right before the crash, the crash event still floated at the trace root rather than nesting under the active span — sentry_transaction_finish clears scope->span / scope->transaction_object, so by the time the crash handler captures its event, scope-apply falls through to the stale propagation context.

This PR auto-finishes the active span/transaction on crash with status aborted, and preserves scope->span / scope->transaction_object across the finish, so the trace reaches Sentry and the crash nests under it.

Before (manual finish) After (auto-finish)
image image

For comparison:

Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c
Comment thread src/sentry_tracing.c Outdated
Comment thread src/sentry_tracing.c
Comment thread src/sentry_core.c
@jpnurmi jpnurmi force-pushed the jpnurmi/fix/trace-finish branch from 1764758 to 570ed57 Compare April 24, 2026 15:35
@jpnurmi jpnurmi changed the title WIP: fix(tracing): finish active trace on crash fix(tracing): finish active trace on crash Apr 25, 2026
jpnurmi and others added 12 commits April 27, 2026 10:12
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior crash-path trace finish only closed `scope->span` (the deepest
active span) and its root transaction. Intermediate spans between them —
e.g. Qt event dispatch handlers nested above a crashing slot — were never
finished, never added to `tx.spans`, and ended up orphaning the crash
event at the trace root in Sentry's UI.

Track every span on its root transaction under `children_mutex` at
`sentry__span_new`, deregister on normal finish, and drain the list in
leaf-first order inside `sentry__trace_finish`. Matches sentry-cocoa's
`SentryTracer` `_children` + `finishForCrash` and sentry-java's
`SentryTracer.forceFinish`.

Preserve `scope->span` / `scope->transaction_object` across the drain so
the subsequent crash event inherits the full trace context from scope
(mirrors cocoa's `finishTracer:shouldCleanUp:NO`). Without this, the
crash event fell through to the stale propagation context and Sentry
could not nest it under the active span chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim WHAT-narration comments in sentry__trace_finish and its tests, keeping
the sentry-cocoa alignment anchor and the non-obvious "detach to skip
remove-scan" rationale. Warn on the tracking-list allocation failure so a
silent crash-finalize gap is audible. Use the typedef name for the forward
declaration in sentry_sampling_context.h for consistency with the rest of
the codebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split sentry__trace_finish into save_active_trace / restore_active_trace
(as a pair around the scope mutation) and finish_children for the atomic
children swap + finish loop, so the top-level function reads as a short
narrative. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sentry_transaction_finish_ts consumes its argument, so the ref that
save_active_trace took for active_tx is released by the finish call.
restore_active_trace was decref'ing it again. Harmless in crash context
where the process exits, but a real refcount underflow otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sentry_span_finish_ts consumes its argument (decref at sentry_core.c:1607
on success, :1611 on fail), so the explicit sentry__span_decref after the
finish call released the children-list ref a second time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The transaction's `children` list held an incref'd reference on each live
span, while each span holds a ref on its transaction via `span->transaction`.
Any code path that decref'd a span without going through `sentry_span_finish_ts`
(e.g. the `span_data` unit test using the low-level `sentry__span_new` /
`sentry__span_decref` API) left both sides stuck at refcount 1, leaking both.

Make `tx->children` weak: `sentry__span_new` no longer increfs on add, and
`sentry__span_decref` unlists the span from the children list on its final
drop. `sentry__transaction_remove_child` correspondingly no longer decrefs.
The drain path in `sentry__trace_finish` continues to work — the spans on
the swapped-out list are alive via their other refs (scope / saved_span /
user var), and `sentry_span_finish_ts` consumes one of those refs as the
caller's, exactly as in the non-crash flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the weak-children refactor, `finish_children` handed each span to
`sentry_span_finish_ts` without a caller ref. `finish_ts` consumes one
ref on its argument, so when the drained span had only the user's own ref
outstanding, finish_ts would drop it to zero and free the span — leaving
the user's pointer dangling.

Incref inside the drain loop so finish_ts consumes "our" ref, leaving user
refs intact. Also update the `trace_finish` unit test to release the refs
it held on `tx`/`child`/`grand` (without the children-list strong ref,
unfinished spans now properly leak if the user forgets to decref).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SENTRY_WITH_SCOPE_MUT calls sentry__scope_flush_unlock on exit, which
invokes the backend's flush_scope_func — file I/O in both crashpad and
native backends. sentry__trace_finish runs from signal-handler context
in the inproc/breakpad/native crash handlers, so the flush is unsafe
there. Use the NO_FLUSH variant the codebase provides for this exact
situation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Call restore_active_trace before the early return so any refs taken by
save_active_trace are released even if active_tx is NULL. No-op under
the current invariant (saved_span->transaction is always non-NULL), but
defensive if the invariant ever changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi force-pushed the jpnurmi/fix/trace-finish branch from 570ed57 to 264a69b Compare April 27, 2026 08:13
Comment thread src/sentry_tracing.c
sentry_span_finish_ts(child, end_ts);
}
sentry_free(children);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating weak children list outside mutex risks UAF

Medium Severity

finish_children releases children_mutex before iterating the swapped-out array, but the entries are weak pointers. A concurrent sentry__span_decref on another thread can free a span between the unlock and the access in the loop, so sentry__span_incref(child) then dereferences freed memory. The swap only prevents sentry_span_finish_ts from rescanning the list, not concurrent frees. On UNIX signal contexts the mutex is intentionally a no-op via sentry__block_for_signal_handler, so even the swap itself is unsynchronized there, widening the window into a secondary crash inside the crash handler.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 264a69b. Configure here.

Comment thread src/sentry_tracing.c
tx->children = NULL;
tx->children_count = 0;
tx->children_cap = 0;
sentry__mutex_unlock(&tx->children_mutex);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crash handler may deadlock on children_mutex

Medium Severity

sentry__trace_finish acquires tx->children_mutex in finish_children, and sentry_span_finish_ts re-acquires it via sentry__transaction_remove_child for every closed child. If a sibling thread is mid-sentry__span_new or mid-remove and holds this mutex when the inproc backend's signal-context handler fires, the handler blocks forever inside the crash path. This turns a crash into a hang on every concurrent-tracing+crash scenario for the inproc backend, and lengthens the window where the rest of the crash pipeline (write marker, on_crash, capture event) cannot run.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ba3de42. Configure here.

@jpnurmi jpnurmi changed the title fix(tracing): finish active trace on crash WIP: fix(tracing): finish active trace on crash Apr 27, 2026
Comment on lines +182 to +189
if (!sentry_value_is_null(transaction)) {
transaction = sentry__prepare_transaction_value(
options, transaction, nullptr);
if (!sentry_value_is_null(transaction)
&& !sentry__envelope_add_transaction(
envelope, transaction)) {
sentry_value_decref(transaction);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When a crash occurs, the transaction's event ID overwrites the crash event's ID in the envelope header, causing the crash to be misidentified by the Sentry backend.
Severity: HIGH

Suggested Fix

The logic for adding items to an envelope should be revised. Either ensure that the crash event's ID is preserved in the envelope header, or revert to not mixing event and transaction items in the same envelope for crash reports. One approach could be to make sentry__envelope_add_transaction conditionally set the event ID, similar to sentry__envelope_add_event.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/backends/sentry_backend_breakpad.cpp#L182-L189

Potential issue: In the `inproc` and `breakpad` backends, the crash handling logic adds
both a crash event and the active transaction to the same envelope. The
`sentry__envelope_add_transaction` function unconditionally overwrites the envelope's
`event_id` with the transaction's ID. Since the crash event is added first, its ID is
replaced. According to the Sentry envelope specification, the envelope header's
`event_id` takes precedence, which means the Sentry backend will use the transaction's
ID to identify the crash, leading to incorrect processing and routing of the crash
event.

Also affects:

  • src/backends/sentry_backend_inproc.c:1192~1199

Comment thread src/backends/sentry_backend_breakpad.cpp Outdated
Comment thread src/sentry_envelope.c Outdated
Comment thread src/backends/native/sentry_crash_daemon.c Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6a2e71b. Configure here.

sentry__capture_envelope(
disk_transport, tx_envelope, options);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External crash reporter drops finished transaction

Medium Severity

The transaction value from sentry__trace_finish is leaked and its trace is not sent in crash handling. This occurs when an external crash reporter is launched (in inproc and breakpad backends) or when the before_send_func hook discards the crash event (in the native backend), as the transaction is not properly managed in these paths.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6a2e71b. Configure here.

disk_transport, options->run);
sentry_transport_free(disk_transport);
} else {
sentry_value_decref(transaction);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session leaked when disk transport creation fails

Low Severity

The refactor moves sentry__envelope_new() and sentry__envelope_add_session(envelope, session) inside the if (disk_transport) branch. Previously the envelope wrapped the session unconditionally and was freed via sentry_envelope_free(envelope) when the disk transport could not be created, which also released the session. Now, when sentry_new_disk_transport returns null, only transaction is decref'd while session is never wrapped or freed, leaking it.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6a2e71b. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant