Fix spark application spans status on sql analysis failure#10981
Fix spark application spans status on sql analysis failure#10981aboitreaud wants to merge 4 commits intomasterfrom
Conversation
Add lastSqlFailed tracking to AbstractDatadogSparkListener when SQL calls (e.g. SparkSession.sql()) throw exceptions during Catalyst analysis, before any Spark job is submitted. This ensures finishApplication() can mark the application span as ERROR even when no job/stage/task events fire. The error priority in finishApplication() is: throwable (from caller) > exitCode != 0 > lastJobFailed > lastSqlFailed Add unit tests to verify SQL failures mark application spans as ERROR, and that job failures take precedence over SQL failures. Fixes: Spark application traces marked SUCCESS when SQL analysis fails
Add SparkSqlFailureAdvice that intercepts SparkSession.sql() method calls and propagates any exceptions (e.g. AnalysisException) to the listener via the new onSqlFailure() callback. This ensures SQL analysis failures that occur before any Spark job is submitted are captured and can be reported as ERROR in the application span.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 63 metrics, 8 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~28234053ea, baseline=1.61.0-SNAPSHOT~78288e9218
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1064360
Total [baseline] (10.926 s) : 0, 10925634
Agent [candidate] (1.067 s) : 0, 1067286
Total [candidate] (11.069 s) : 0, 11068566
section appsec
Agent [baseline] (1.245 s) : 0, 1245319
Total [baseline] (11.094 s) : 0, 11093785
Agent [candidate] (1.244 s) : 0, 1243886
Total [candidate] (11.123 s) : 0, 11123015
section iast
Agent [baseline] (1.234 s) : 0, 1233571
Total [baseline] (11.208 s) : 0, 11208247
Agent [candidate] (1.224 s) : 0, 1223511
Total [candidate] (11.275 s) : 0, 11274906
section profiling
Agent [baseline] (1.181 s) : 0, 1180632
Total [baseline] (10.894 s) : 0, 10894382
Agent [candidate] (1.18 s) : 0, 1179789
Total [candidate] (10.924 s) : 0, 10923949
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~28234053ea, baseline=1.61.0-SNAPSHOT~78288e9218
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.198 ms) : 0, 1198
crashtracking [candidate] (1.218 ms) : 0, 1218
BytebuddyAgent [baseline] (633.819 ms) : 0, 633819
BytebuddyAgent [candidate] (635.703 ms) : 0, 635703
AgentMeter [baseline] (29.569 ms) : 0, 29569
AgentMeter [candidate] (29.958 ms) : 0, 29958
GlobalTracer [baseline] (258.21 ms) : 0, 258210
GlobalTracer [candidate] (258.907 ms) : 0, 258907
AppSec [baseline] (32.017 ms) : 0, 32017
AppSec [candidate] (32.174 ms) : 0, 32174
Debugger [baseline] (60.49 ms) : 0, 60490
Debugger [candidate] (60.878 ms) : 0, 60878
Remote Config [baseline] (597.861 µs) : 0, 598
Remote Config [candidate] (614.242 µs) : 0, 614
Telemetry [baseline] (8.833 ms) : 0, 8833
Telemetry [candidate] (8.069 ms) : 0, 8069
Flare Poller [baseline] (3.539 ms) : 0, 3539
Flare Poller [candidate] (3.569 ms) : 0, 3569
section appsec
crashtracking [baseline] (1.18 ms) : 0, 1180
crashtracking [candidate] (1.21 ms) : 0, 1210
BytebuddyAgent [baseline] (658.179 ms) : 0, 658179
BytebuddyAgent [candidate] (657.301 ms) : 0, 657301
AgentMeter [baseline] (12.115 ms) : 0, 12115
AgentMeter [candidate] (12.091 ms) : 0, 12091
GlobalTracer [baseline] (257.406 ms) : 0, 257406
GlobalTracer [candidate] (257.148 ms) : 0, 257148
AppSec [baseline] (177.455 ms) : 0, 177455
AppSec [candidate] (177.23 ms) : 0, 177230
Debugger [baseline] (66.136 ms) : 0, 66136
Debugger [candidate] (66.126 ms) : 0, 66126
Remote Config [baseline] (622.405 µs) : 0, 622
Remote Config [candidate] (629.193 µs) : 0, 629
Telemetry [baseline] (8.334 ms) : 0, 8334
Telemetry [candidate] (8.385 ms) : 0, 8385
Flare Poller [baseline] (3.577 ms) : 0, 3577
Flare Poller [candidate] (3.561 ms) : 0, 3561
IAST [baseline] (24.109 ms) : 0, 24109
IAST [candidate] (24.052 ms) : 0, 24052
section iast
crashtracking [baseline] (1.198 ms) : 0, 1198
crashtracking [candidate] (1.213 ms) : 0, 1213
BytebuddyAgent [baseline] (801.287 ms) : 0, 801287
BytebuddyAgent [candidate] (793.495 ms) : 0, 793495
AgentMeter [baseline] (11.674 ms) : 0, 11674
AgentMeter [candidate] (11.379 ms) : 0, 11379
GlobalTracer [baseline] (247.942 ms) : 0, 247942
GlobalTracer [candidate] (246.223 ms) : 0, 246223
AppSec [baseline] (26.553 ms) : 0, 26553
AppSec [candidate] (26.44 ms) : 0, 26440
Debugger [baseline] (69.824 ms) : 0, 69824
Debugger [candidate] (70.488 ms) : 0, 70488
Remote Config [baseline] (518.721 µs) : 0, 519
Remote Config [candidate] (521.738 µs) : 0, 522
Telemetry [baseline] (9.615 ms) : 0, 9615
Telemetry [candidate] (9.179 ms) : 0, 9179
Flare Poller [baseline] (3.473 ms) : 0, 3473
Flare Poller [candidate] (3.364 ms) : 0, 3364
IAST [baseline] (25.344 ms) : 0, 25344
IAST [candidate] (25.247 ms) : 0, 25247
section profiling
crashtracking [baseline] (1.17 ms) : 0, 1170
crashtracking [candidate] (1.167 ms) : 0, 1167
BytebuddyAgent [baseline] (681.753 ms) : 0, 681753
BytebuddyAgent [candidate] (681.53 ms) : 0, 681530
AgentMeter [baseline] (8.978 ms) : 0, 8978
AgentMeter [candidate] (9.007 ms) : 0, 9007
GlobalTracer [baseline] (214.621 ms) : 0, 214621
GlobalTracer [candidate] (214.509 ms) : 0, 214509
AppSec [baseline] (32.209 ms) : 0, 32209
AppSec [candidate] (32.266 ms) : 0, 32266
Debugger [baseline] (65.5 ms) : 0, 65500
Debugger [candidate] (65.751 ms) : 0, 65751
Remote Config [baseline] (565.092 µs) : 0, 565
Remote Config [candidate] (560.417 µs) : 0, 560
Telemetry [baseline] (7.709 ms) : 0, 7709
Telemetry [candidate] (7.692 ms) : 0, 7692
Flare Poller [baseline] (3.472 ms) : 0, 3472
Flare Poller [candidate] (3.491 ms) : 0, 3491
ProfilingAgent [baseline] (93.898 ms) : 0, 93898
ProfilingAgent [candidate] (93.208 ms) : 0, 93208
Profiling [baseline] (94.448 ms) : 0, 94448
Profiling [candidate] (93.753 ms) : 0, 93753
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~28234053ea, baseline=1.61.0-SNAPSHOT~78288e9218
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.066 s) : 0, 1065533
Total [baseline] (8.833 s) : 0, 8833004
Agent [candidate] (1.054 s) : 0, 1054469
Total [candidate] (8.805 s) : 0, 8804731
section iast
Agent [baseline] (1.234 s) : 0, 1233545
Total [baseline] (9.563 s) : 0, 9563201
Agent [candidate] (1.224 s) : 0, 1224414
Total [candidate] (9.52 s) : 0, 9519594
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~28234053ea, baseline=1.61.0-SNAPSHOT~78288e9218
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.225 ms) : 0, 1225
crashtracking [candidate] (1.189 ms) : 0, 1189
BytebuddyAgent [baseline] (636.098 ms) : 0, 636098
BytebuddyAgent [candidate] (627.116 ms) : 0, 627116
AgentMeter [baseline] (29.607 ms) : 0, 29607
AgentMeter [candidate] (29.354 ms) : 0, 29354
GlobalTracer [baseline] (257.618 ms) : 0, 257618
GlobalTracer [candidate] (256.235 ms) : 0, 256235
AppSec [baseline] (32.026 ms) : 0, 32026
AppSec [candidate] (31.707 ms) : 0, 31707
Debugger [baseline] (59.757 ms) : 0, 59757
Debugger [candidate] (59.31 ms) : 0, 59310
Remote Config [baseline] (587.277 µs) : 0, 587
Remote Config [candidate] (588.025 µs) : 0, 588
Telemetry [baseline] (8.064 ms) : 0, 8064
Telemetry [candidate] (8.017 ms) : 0, 8017
Flare Poller [baseline] (4.25 ms) : 0, 4250
Flare Poller [candidate] (4.927 ms) : 0, 4927
section iast
crashtracking [baseline] (1.201 ms) : 0, 1201
crashtracking [candidate] (1.187 ms) : 0, 1187
BytebuddyAgent [baseline] (800.811 ms) : 0, 800811
BytebuddyAgent [candidate] (795.373 ms) : 0, 795373
AgentMeter [baseline] (11.629 ms) : 0, 11629
AgentMeter [candidate] (11.375 ms) : 0, 11375
GlobalTracer [baseline] (247.729 ms) : 0, 247729
GlobalTracer [candidate] (246.357 ms) : 0, 246357
AppSec [baseline] (26.733 ms) : 0, 26733
AppSec [candidate] (27.244 ms) : 0, 27244
Debugger [baseline] (68.484 ms) : 0, 68484
Debugger [candidate] (67.293 ms) : 0, 67293
Remote Config [baseline] (536.619 µs) : 0, 537
Remote Config [candidate] (521.779 µs) : 0, 522
Telemetry [baseline] (10.887 ms) : 0, 10887
Telemetry [candidate] (10.172 ms) : 0, 10172
Flare Poller [baseline] (3.931 ms) : 0, 3931
Flare Poller [candidate] (3.668 ms) : 0, 3668
IAST [baseline] (25.516 ms) : 0, 25516
IAST [candidate] (25.241 ms) : 0, 25241
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 18 metrics, 17 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~28234053ea, baseline=1.61.0-SNAPSHOT~78288e9218
dateFormat X
axisFormat %s
section baseline
no_agent (19.28 ms) : 19081, 19479
. : milestone, 19280,
appsec (19.56 ms) : 19362, 19759
. : milestone, 19560,
code_origins (17.713 ms) : 17537, 17888
. : milestone, 17713,
iast (17.825 ms) : 17646, 18004
. : milestone, 17825,
profiling (18.542 ms) : 18356, 18728
. : milestone, 18542,
tracing (17.768 ms) : 17590, 17946
. : milestone, 17768,
section candidate
no_agent (19.287 ms) : 19090, 19485
. : milestone, 19287,
appsec (18.957 ms) : 18762, 19151
. : milestone, 18957,
code_origins (18.087 ms) : 17903, 18272
. : milestone, 18087,
iast (17.833 ms) : 17655, 18011
. : milestone, 17833,
profiling (18.968 ms) : 18776, 19161
. : milestone, 18968,
tracing (17.976 ms) : 17799, 18153
. : milestone, 17976,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~28234053ea, baseline=1.61.0-SNAPSHOT~78288e9218
dateFormat X
axisFormat %s
section baseline
no_agent (1.188 ms) : 1177, 1200
. : milestone, 1188,
iast (3.12 ms) : 3079, 3160
. : milestone, 3120,
iast_FULL (5.801 ms) : 5743, 5859
. : milestone, 5801,
iast_GLOBAL (3.467 ms) : 3413, 3522
. : milestone, 3467,
profiling (2.148 ms) : 2129, 2167
. : milestone, 2148,
tracing (1.819 ms) : 1804, 1834
. : milestone, 1819,
section candidate
no_agent (1.202 ms) : 1190, 1214
. : milestone, 1202,
iast (3.129 ms) : 3088, 3169
. : milestone, 3129,
iast_FULL (5.706 ms) : 5650, 5762
. : milestone, 5706,
iast_GLOBAL (3.574 ms) : 3519, 3629
. : milestone, 3574,
profiling (1.934 ms) : 1916, 1951
. : milestone, 1934,
tracing (1.763 ms) : 1750, 1777
. : milestone, 1763,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~28234053ea, baseline=1.61.0-SNAPSHOT~78288e9218
dateFormat X
axisFormat %s
section baseline
no_agent (1.479 ms) : 1467, 1490
. : milestone, 1479,
appsec (2.522 ms) : 2467, 2577
. : milestone, 2522,
iast (2.261 ms) : 2192, 2331
. : milestone, 2261,
iast_GLOBAL (2.323 ms) : 2252, 2393
. : milestone, 2323,
profiling (2.091 ms) : 2036, 2146
. : milestone, 2091,
tracing (2.084 ms) : 2030, 2139
. : milestone, 2084,
section candidate
no_agent (1.474 ms) : 1463, 1486
. : milestone, 1474,
appsec (3.755 ms) : 3536, 3974
. : milestone, 3755,
iast (2.267 ms) : 2197, 2336
. : milestone, 2267,
iast_GLOBAL (2.317 ms) : 2247, 2388
. : milestone, 2317,
profiling (2.095 ms) : 2040, 2150
. : milestone, 2095,
tracing (2.074 ms) : 2019, 2128
. : milestone, 2074,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~28234053ea, baseline=1.61.0-SNAPSHOT~78288e9218
dateFormat X
axisFormat %s
section baseline
no_agent (15.371 s) : 15371000, 15371000
. : milestone, 15371000,
appsec (14.823 s) : 14823000, 14823000
. : milestone, 14823000,
iast (18.425 s) : 18425000, 18425000
. : milestone, 18425000,
iast_GLOBAL (18.184 s) : 18184000, 18184000
. : milestone, 18184000,
profiling (15.187 s) : 15187000, 15187000
. : milestone, 15187000,
tracing (15.001 s) : 15001000, 15001000
. : milestone, 15001000,
section candidate
no_agent (15.21 s) : 15210000, 15210000
. : milestone, 15210000,
appsec (14.658 s) : 14658000, 14658000
. : milestone, 14658000,
iast (18.585 s) : 18585000, 18585000
. : milestone, 18585000,
iast_GLOBAL (17.954 s) : 17954000, 17954000
. : milestone, 17954000,
profiling (14.637 s) : 14637000, 14637000
. : milestone, 14637000,
tracing (14.991 s) : 14991000, 14991000
. : milestone, 14991000,
|
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
pawel-big-lebowski
left a comment
There was a problem hiding this comment.
The code looks good to me. Would it be possible to include End to end test which:
- creates
sparkSession = SparkSession.builder().. - runs
sparkSession.sql("""...)with table not found for example - and checks if the trace contains expected error attributes.
Similar tests exist in AbstractSpark32SqlTest.
What Does This Do
Context: SparkSession.sql() calls that throw during Catalyst analysis (e.g. AnalysisException for missing tables) fire before any Spark job/stage events: our current instrumentation never sees them, so the sparl.application span stays green.
Motivation
Make sure that Spark application span are marked as error when the call spark.sql().show() fails on SQL analysis failure
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.