WIP: fix(tracing): finish active trace on crash#1667
WIP: fix(tracing): finish active trace on crash#1667
Conversation
eaaeff4 to
e78541b
Compare
1764758 to
570ed57
Compare
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior crash-path trace finish only closed `scope->span` (the deepest active span) and its root transaction. Intermediate spans between them — e.g. Qt event dispatch handlers nested above a crashing slot — were never finished, never added to `tx.spans`, and ended up orphaning the crash event at the trace root in Sentry's UI. Track every span on its root transaction under `children_mutex` at `sentry__span_new`, deregister on normal finish, and drain the list in leaf-first order inside `sentry__trace_finish`. Matches sentry-cocoa's `SentryTracer` `_children` + `finishForCrash` and sentry-java's `SentryTracer.forceFinish`. Preserve `scope->span` / `scope->transaction_object` across the drain so the subsequent crash event inherits the full trace context from scope (mirrors cocoa's `finishTracer:shouldCleanUp:NO`). Without this, the crash event fell through to the stale propagation context and Sentry could not nest it under the active span chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim WHAT-narration comments in sentry__trace_finish and its tests, keeping the sentry-cocoa alignment anchor and the non-obvious "detach to skip remove-scan" rationale. Warn on the tracking-list allocation failure so a silent crash-finalize gap is audible. Use the typedef name for the forward declaration in sentry_sampling_context.h for consistency with the rest of the codebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split sentry__trace_finish into save_active_trace / restore_active_trace (as a pair around the scope mutation) and finish_children for the atomic children swap + finish loop, so the top-level function reads as a short narrative. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sentry_transaction_finish_ts consumes its argument, so the ref that save_active_trace took for active_tx is released by the finish call. restore_active_trace was decref'ing it again. Harmless in crash context where the process exits, but a real refcount underflow otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sentry_span_finish_ts consumes its argument (decref at sentry_core.c:1607 on success, :1611 on fail), so the explicit sentry__span_decref after the finish call released the children-list ref a second time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The transaction's `children` list held an incref'd reference on each live span, while each span holds a ref on its transaction via `span->transaction`. Any code path that decref'd a span without going through `sentry_span_finish_ts` (e.g. the `span_data` unit test using the low-level `sentry__span_new` / `sentry__span_decref` API) left both sides stuck at refcount 1, leaking both. Make `tx->children` weak: `sentry__span_new` no longer increfs on add, and `sentry__span_decref` unlists the span from the children list on its final drop. `sentry__transaction_remove_child` correspondingly no longer decrefs. The drain path in `sentry__trace_finish` continues to work — the spans on the swapped-out list are alive via their other refs (scope / saved_span / user var), and `sentry_span_finish_ts` consumes one of those refs as the caller's, exactly as in the non-crash flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the weak-children refactor, `finish_children` handed each span to `sentry_span_finish_ts` without a caller ref. `finish_ts` consumes one ref on its argument, so when the drained span had only the user's own ref outstanding, finish_ts would drop it to zero and free the span — leaving the user's pointer dangling. Incref inside the drain loop so finish_ts consumes "our" ref, leaving user refs intact. Also update the `trace_finish` unit test to release the refs it held on `tx`/`child`/`grand` (without the children-list strong ref, unfinished spans now properly leak if the user forgets to decref). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SENTRY_WITH_SCOPE_MUT calls sentry__scope_flush_unlock on exit, which invokes the backend's flush_scope_func — file I/O in both crashpad and native backends. sentry__trace_finish runs from signal-handler context in the inproc/breakpad/native crash handlers, so the flush is unsafe there. Use the NO_FLUSH variant the codebase provides for this exact situation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Call restore_active_trace before the early return so any refs taken by save_active_trace are released even if active_tx is NULL. No-op under the current invariant (saved_span->transaction is always non-NULL), but defensive if the invariant ever changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit b0c1aca.
570ed57 to
264a69b
Compare
| sentry_span_finish_ts(child, end_ts); | ||
| } | ||
| sentry_free(children); | ||
| } |
There was a problem hiding this comment.
Iterating weak children list outside mutex risks UAF
Medium Severity
finish_children releases children_mutex before iterating the swapped-out array, but the entries are weak pointers. A concurrent sentry__span_decref on another thread can free a span between the unlock and the access in the loop, so sentry__span_incref(child) then dereferences freed memory. The swap only prevents sentry_span_finish_ts from rescanning the list, not concurrent frees. On UNIX signal contexts the mutex is intentionally a no-op via sentry__block_for_signal_handler, so even the swap itself is unsynchronized there, widening the window into a secondary crash inside the crash handler.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 264a69b. Configure here.
| tx->children = NULL; | ||
| tx->children_count = 0; | ||
| tx->children_cap = 0; | ||
| sentry__mutex_unlock(&tx->children_mutex); |
There was a problem hiding this comment.
Crash handler may deadlock on children_mutex
Medium Severity
sentry__trace_finish acquires tx->children_mutex in finish_children, and sentry_span_finish_ts re-acquires it via sentry__transaction_remove_child for every closed child. If a sibling thread is mid-sentry__span_new or mid-remove and holds this mutex when the inproc backend's signal-context handler fires, the handler blocks forever inside the crash path. This turns a crash into a hang on every concurrent-tracing+crash scenario for the inproc backend, and lengthens the window where the rest of the crash pipeline (write marker, on_crash, capture event) cannot run.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ba3de42. Configure here.
| if (!sentry_value_is_null(transaction)) { | ||
| transaction = sentry__prepare_transaction_value( | ||
| options, transaction, nullptr); | ||
| if (!sentry_value_is_null(transaction) | ||
| && !sentry__envelope_add_transaction( | ||
| envelope, transaction)) { | ||
| sentry_value_decref(transaction); | ||
| } |
There was a problem hiding this comment.
Bug: When a crash occurs, the transaction's event ID overwrites the crash event's ID in the envelope header, causing the crash to be misidentified by the Sentry backend.
Severity: HIGH
Suggested Fix
The logic for adding items to an envelope should be revised. Either ensure that the crash event's ID is preserved in the envelope header, or revert to not mixing event and transaction items in the same envelope for crash reports. One approach could be to make sentry__envelope_add_transaction conditionally set the event ID, similar to sentry__envelope_add_event.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/backends/sentry_backend_breakpad.cpp#L182-L189
Potential issue: In the `inproc` and `breakpad` backends, the crash handling logic adds
both a crash event and the active transaction to the same envelope. The
`sentry__envelope_add_transaction` function unconditionally overwrites the envelope's
`event_id` with the transaction's ID. Since the crash event is added first, its ID is
replaced. According to the Sentry envelope specification, the envelope header's
`event_id` takes precedence, which means the Sentry backend will use the transaction's
ID to identify the crash, leading to incorrect processing and routing of the crash
event.
Also affects:
src/backends/sentry_backend_inproc.c:1192~1199
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 4 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6a2e71b. Configure here.
| sentry__capture_envelope( | ||
| disk_transport, tx_envelope, options); | ||
| } | ||
| } |
There was a problem hiding this comment.
External crash reporter drops finished transaction
Medium Severity
The transaction value from sentry__trace_finish is leaked and its trace is not sent in crash handling. This occurs when an external crash reporter is launched (in inproc and breakpad backends) or when the before_send_func hook discards the crash event (in the native backend), as the transaction is not properly managed in these paths.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6a2e71b. Configure here.
| disk_transport, options->run); | ||
| sentry_transport_free(disk_transport); | ||
| } else { | ||
| sentry_value_decref(transaction); |
There was a problem hiding this comment.
Session leaked when disk transport creation fails
Low Severity
The refactor moves sentry__envelope_new() and sentry__envelope_add_session(envelope, session) inside the if (disk_transport) branch. Previously the envelope wrapped the session unconditionally and was freed via sentry_envelope_free(envelope) when the disk transport could not be created, which also released the session. Now, when sentry_new_disk_transport returns null, only transaction is decref'd while session is never wrapped or freed, leaking it.
Reviewed by Cursor Bugbot for commit 6a2e71b. Configure here.


Before this PR, an in-flight trace was silently dropped on crash. And even with a manual
sentry_transaction_finishright before the crash, the crash event still floated at the trace root rather than nesting under the active span —sentry_transaction_finishclearsscope->span/scope->transaction_object, so by the time the crash handler captures its event, scope-apply falls through to the stale propagation context.This PR auto-finishes the active span/transaction on crash with status
aborted, and preservesscope->span/scope->transaction_objectacross the finish, so the trace reaches Sentry and the crash nests under it.For comparison:
sentry_finishAndSaveTransaction→finishForCrashUncaughtExceptionHint→forceFinish(ABORTED)