Skip to content

Conversation

charlesmyu
Copy link
Contributor

@charlesmyu charlesmyu commented Oct 16, 2025

What Does This Do

  • Instruments the fromSparkPlan function to:
    • Parse the plan parameter into a map of String properties
    • Update the meta field of returned SparkPlanInfo with those properties
  • Creates a Spark21XPlanUtils class with a extractPlanProduct method that parses a SparkPlan object and returns the properties as a <String, String> map
  • Creates a AbstractSparkPlanUtils class with a parsePlanProduct method that parses the various Objects extracted by extractPlanProduct to return a comprehensible string representation
    • This is extended by Spark21XPlanUtils
    • Updates the toJson function in SparkSQLUtils to write a JSON object if possible, otherwise just write a string
  • Updates tests to reflect these changes & additions

Motivation

The SparkPlan houses additional details about its execution that is useful to visualize for operators to use. Extract these into spans so they can be ingested.

Additional Notes

This PR leverages the existing meta field in the SparkPlanInfo class. This should be safe as we don't overwrite the field if any data exists, and it is currently only used for ScanExec node details. Furthermore since this class appears to be primarily intended as an abstraction for informational purposes, any faulty updates to the object shouldn't result in any breaking issues.

Also note that we use the Product API to obtain the key names (using productElementName), however this was only made available in Scala 2.13. As a result the Scala 2.12 instrumentation uses arbitrary _dd.unknown_key.X names for the keys, so the values can at least be extracted.

Worth mentioning that this PR does not introduce traversal of the physical plan itself into the tracer - this is left to Spark itself. This is because the recursive fromSparkPlan method is instrumented, meaning as each node is built the tracer is invoked to parse it, and we expressly filter out any potential QueryPlan nodes when performing the parsing.

Contributor Checklist

Jira ticket: DJM-974

@charlesmyu charlesmyu added type: enhancement Enhancements and improvements inst: apache spark Apache Spark instrumentation labels Oct 16, 2025
@datadog-official
Copy link

datadog-official bot commented Oct 16, 2025

🎯 Code Coverage
Patch Coverage: 0.00%
Total Coverage: 59.89% (+0.12%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 3619b77 | Docs | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Oct 16, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761174241 1761189637
git_commit_sha f42a6e9 3619b77
release_version 1.55.0-SNAPSHOT~f42a6e9909 1.55.0-SNAPSHOT~3619b77e1b
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1761191611 1761191611
ci_job_id 1193103858 1193103858
ci_pipeline_id 80044964 80044964
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-f0hiff2a 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-f0hiff2a 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 1 performance improvements and 3 performance regressions! Performance is the same for 51 metrics, 10 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:insecure-bank:iast:Flare Poller worse
[+133.708µs; +296.406µs] or [+3.240%; +7.182%]
4.342ms 4.127ms
scenario:startup:petclinic:iast:IAST better
[-9.568ms; -6.188ms] or [-27.892%; -18.037%]
26.426ms 34.304ms
scenario:startup:petclinic:iast:Telemetry worse
[+169.044µs; +468.848µs] or [+2.028%; +5.623%]
8.656ms 8.337ms
scenario:startup:petclinic:profiling:Telemetry worse
[+7.645ms; +8.240ms] or [+92.876%; +100.105%]
16.174ms 8.231ms
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.55.0-SNAPSHOT~3619b77e1b, baseline=1.55.0-SNAPSHOT~f42a6e9909

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.031 s) : 0, 1031329
Total [baseline] (8.685 s) : 0, 8685047
Agent [candidate] (1.016 s) : 0, 1015991
Total [candidate] (8.684 s) : 0, 8684116
section iast
Agent [baseline] (1.166 s) : 0, 1165936
Total [baseline] (9.347 s) : 0, 9346832
Agent [candidate] (1.159 s) : 0, 1159400
Total [candidate] (9.351 s) : 0, 9350953
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.031 s -
Agent iast 1.166 s 134.606 ms (13.1%)
Total tracing 8.685 s -
Total iast 9.347 s 661.785 ms (7.6%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.016 s -
Agent iast 1.159 s 143.408 ms (14.1%)
Total tracing 8.684 s -
Total iast 9.351 s 666.838 ms (7.7%)
gantt
    title insecure-bank - break down per module: candidate=1.55.0-SNAPSHOT~3619b77e1b, baseline=1.55.0-SNAPSHOT~f42a6e9909

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.462 ms) : 0, 1462
crashtracking [candidate] (1.451 ms) : 0, 1451
BytebuddyAgent [baseline] (703.77 ms) : 0, 703770
BytebuddyAgent [candidate] (693.636 ms) : 0, 693636
GlobalTracer [baseline] (244.777 ms) : 0, 244777
GlobalTracer [candidate] (241.736 ms) : 0, 241736
AppSec [baseline] (32.338 ms) : 0, 32338
AppSec [candidate] (32.179 ms) : 0, 32179
Debugger [baseline] (6.355 ms) : 0, 6355
Debugger [candidate] (6.413 ms) : 0, 6413
Remote Config [baseline] (679.085 µs) : 0, 679
Remote Config [candidate] (709.569 µs) : 0, 710
Telemetry [baseline] (15.713 ms) : 0, 15713
Telemetry [candidate] (9.271 ms) : 0, 9271
Flare Poller [baseline] (5.02 ms) : 0, 5020
Flare Poller [candidate] (9.426 ms) : 0, 9426
section iast
crashtracking [baseline] (1.489 ms) : 0, 1489
crashtracking [candidate] (1.491 ms) : 0, 1491
BytebuddyAgent [baseline] (828.654 ms) : 0, 828654
BytebuddyAgent [candidate] (822.226 ms) : 0, 822226
GlobalTracer [baseline] (233.55 ms) : 0, 233550
GlobalTracer [candidate] (232.785 ms) : 0, 232785
IAST [baseline] (30.964 ms) : 0, 30964
IAST [candidate] (26.696 ms) : 0, 26696
AppSec [baseline] (30.697 ms) : 0, 30697
AppSec [candidate] (34.822 ms) : 0, 34822
Debugger [baseline] (6.06 ms) : 0, 6060
Debugger [candidate] (6.124 ms) : 0, 6124
Remote Config [baseline] (613.287 µs) : 0, 613
Remote Config [candidate] (607.359 µs) : 0, 607
Telemetry [baseline] (8.396 ms) : 0, 8396
Telemetry [candidate] (8.644 ms) : 0, 8644
Flare Poller [baseline] (4.127 ms) : 0, 4127
Flare Poller [candidate] (4.342 ms) : 0, 4342
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.55.0-SNAPSHOT~3619b77e1b, baseline=1.55.0-SNAPSHOT~f42a6e9909

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.04 s) : 0, 1039502
Total [baseline] (10.925 s) : 0, 10925125
Agent [candidate] (1.017 s) : 0, 1017157
Total [candidate] (10.673 s) : 0, 10673226
section appsec
Agent [baseline] (1.21 s) : 0, 1209602
Total [baseline] (11.027 s) : 0, 11027076
Agent [candidate] (1.196 s) : 0, 1195832
Total [candidate] (11.067 s) : 0, 11067278
section iast
Agent [baseline] (1.163 s) : 0, 1163406
Total [baseline] (11.123 s) : 0, 11123026
Agent [candidate] (1.151 s) : 0, 1150827
Total [candidate] (10.967 s) : 0, 10966845
section profiling
Agent [baseline] (1.182 s) : 0, 1182351
Total [baseline] (10.908 s) : 0, 10908413
Agent [candidate] (1.161 s) : 0, 1161389
Total [candidate] (11.093 s) : 0, 11092681
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.04 s -
Agent appsec 1.21 s 170.099 ms (16.4%)
Agent iast 1.163 s 123.904 ms (11.9%)
Agent profiling 1.182 s 142.848 ms (13.7%)
Total tracing 10.925 s -
Total appsec 11.027 s 101.951 ms (0.9%)
Total iast 11.123 s 197.901 ms (1.8%)
Total profiling 10.908 s -16.712 ms (-0.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.017 s -
Agent appsec 1.196 s 178.675 ms (17.6%)
Agent iast 1.151 s 133.67 ms (13.1%)
Agent profiling 1.161 s 144.232 ms (14.2%)
Total tracing 10.673 s -
Total appsec 11.067 s 394.052 ms (3.7%)
Total iast 10.967 s 293.619 ms (2.8%)
Total profiling 11.093 s 419.455 ms (3.9%)
gantt
    title petclinic - break down per module: candidate=1.55.0-SNAPSHOT~3619b77e1b, baseline=1.55.0-SNAPSHOT~f42a6e9909

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.499 ms) : 0, 1499
crashtracking [candidate] (1.454 ms) : 0, 1454
BytebuddyAgent [baseline] (709.023 ms) : 0, 709023
BytebuddyAgent [candidate] (694.349 ms) : 0, 694349
GlobalTracer [baseline] (246.837 ms) : 0, 246837
GlobalTracer [candidate] (241.917 ms) : 0, 241917
AppSec [baseline] (32.656 ms) : 0, 32656
AppSec [candidate] (32.326 ms) : 0, 32326
Debugger [baseline] (6.433 ms) : 0, 6433
Debugger [candidate] (6.431 ms) : 0, 6431
Remote Config [baseline] (692.873 µs) : 0, 693
Remote Config [candidate] (707.406 µs) : 0, 707
Telemetry [baseline] (13.145 ms) : 0, 13145
Telemetry [candidate] (9.378 ms) : 0, 9378
Flare Poller [baseline] (7.901 ms) : 0, 7901
Flare Poller [candidate] (9.351 ms) : 0, 9351
section appsec
crashtracking [baseline] (1.496 ms) : 0, 1496
crashtracking [candidate] (1.453 ms) : 0, 1453
BytebuddyAgent [baseline] (730.078 ms) : 0, 730078
BytebuddyAgent [candidate] (719.667 ms) : 0, 719667
GlobalTracer [baseline] (237.478 ms) : 0, 237478
GlobalTracer [candidate] (234.565 ms) : 0, 234565
IAST [baseline] (25.143 ms) : 0, 25143
IAST [candidate] (24.898 ms) : 0, 24898
AppSec [baseline] (175.208 ms) : 0, 175208
AppSec [candidate] (175.002 ms) : 0, 175002
Debugger [baseline] (5.954 ms) : 0, 5954
Debugger [candidate] (6.142 ms) : 0, 6142
Remote Config [baseline] (638.798 µs) : 0, 639
Remote Config [candidate] (628.454 µs) : 0, 628
Telemetry [baseline] (8.496 ms) : 0, 8496
Telemetry [candidate] (8.438 ms) : 0, 8438
Flare Poller [baseline] (3.916 ms) : 0, 3916
Flare Poller [candidate] (3.932 ms) : 0, 3932
section iast
crashtracking [baseline] (1.465 ms) : 0, 1465
crashtracking [candidate] (1.455 ms) : 0, 1455
BytebuddyAgent [baseline] (825.578 ms) : 0, 825578
BytebuddyAgent [candidate] (815.506 ms) : 0, 815506
GlobalTracer [baseline] (234.457 ms) : 0, 234457
GlobalTracer [candidate] (231.245 ms) : 0, 231245
IAST [baseline] (34.304 ms) : 0, 34304
IAST [candidate] (26.426 ms) : 0, 26426
AppSec [baseline] (27.057 ms) : 0, 27057
AppSec [candidate] (34.994 ms) : 0, 34994
Debugger [baseline] (6.129 ms) : 0, 6129
Debugger [candidate] (6.111 ms) : 0, 6111
Remote Config [baseline] (593.674 µs) : 0, 594
Remote Config [candidate] (610.057 µs) : 0, 610
Telemetry [baseline] (8.337 ms) : 0, 8337
Telemetry [candidate] (8.656 ms) : 0, 8656
Flare Poller [baseline] (4.156 ms) : 0, 4156
Flare Poller [candidate] (4.219 ms) : 0, 4219
section profiling
crashtracking [baseline] (1.463 ms) : 0, 1463
crashtracking [candidate] (1.424 ms) : 0, 1424
BytebuddyAgent [baseline] (732.04 ms) : 0, 732040
BytebuddyAgent [candidate] (721.012 ms) : 0, 721012
GlobalTracer [baseline] (221.871 ms) : 0, 221871
GlobalTracer [candidate] (217.694 ms) : 0, 217694
AppSec [baseline] (33.427 ms) : 0, 33427
AppSec [candidate] (32.253 ms) : 0, 32253
Debugger [baseline] (13.907 ms) : 0, 13907
Debugger [candidate] (6.433 ms) : 0, 6433
Remote Config [baseline] (705.226 µs) : 0, 705
Remote Config [candidate] (720.404 µs) : 0, 720
Telemetry [baseline] (8.231 ms) : 0, 8231
Telemetry [candidate] (16.174 ms) : 0, 16174
Flare Poller [baseline] (4.163 ms) : 0, 4163
Flare Poller [candidate] (4.091 ms) : 0, 4091
ProfilingAgent [baseline] (110.712 ms) : 0, 110712
ProfilingAgent [candidate] (108.027 ms) : 0, 108027
Profiling [baseline] (111.325 ms) : 0, 111325
Profiling [candidate] (109.481 ms) : 0, 109481
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761174241 1761189637
git_commit_sha f42a6e9 3619b77
release_version 1.55.0-SNAPSHOT~f42a6e9909 1.55.0-SNAPSHOT~3619b77e1b
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1761191263 1761191263
ci_job_id 1193103859 1193103859
ci_pipeline_id 80044964 80044964
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-i9h8uurg 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-i9h8uurg 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 2 performance improvements and 4 performance regressions! Performance is the same for 6 metrics, 12 unstable metrics.

scenario Δ mean http_req_duration Δ mean throughput candidate mean http_req_duration candidate mean throughput baseline mean http_req_duration baseline mean throughput
scenario:load:insecure-bank:iast:high_load better
[-1.300ms; -0.955ms] or [-12.893%; -9.467%]
unstable
[-6.147op/s; +121.084op/s] or [-1.334%; +26.283%]
8.955ms 518.156op/s 10.083ms 460.688op/s
scenario:load:insecure-bank:profiling:high_load better
[-860.951µs; -540.774µs] or [-9.060%; -5.691%]
unstable
[-33.875op/s; +111.313op/s] or [-6.934%; +22.785%]
8.802ms 527.250op/s 9.503ms 488.531op/s
scenario:load:insecure-bank:iast_GLOBAL:high_load worse
[+773.539µs; +1182.866µs] or [+7.588%; +11.603%]
unstable
[-95.466op/s; +16.341op/s] or [-20.944%; +3.585%]
11.173ms 416.250op/s 10.195ms 455.812op/s
scenario:load:petclinic:iast:high_load worse
[+0.945ms; +1.756ms] or [+2.180%; +4.052%]
unstable
[-10.284op/s; +6.376op/s] or [-9.640%; +5.977%]
44.684ms 104.725op/s 43.334ms 106.679op/s
scenario:load:petclinic:tracing:high_load worse
[+1.803ms; +2.615ms] or [+4.100%; +5.945%]
unstable
[-13.344op/s; +0.576op/s] or [-12.390%; +0.535%]
46.195ms 101.312op/s 43.986ms 107.696op/s
scenario:load:petclinic:profiling:high_load worse
[+2.300ms; +3.305ms] or [+4.885%; +7.021%]
unstable
[-13.631op/s; +0.014op/s] or [-13.537%; +0.014%]
49.877ms 93.888op/s 47.075ms 100.696op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~3619b77e1b, baseline=1.55.0-SNAPSHOT~f42a6e9909
    dateFormat X
    axisFormat %s
section baseline
no_agent (36.628 ms) : 36334, 36923
.   : milestone, 36628,
appsec (48.039 ms) : 47615, 48464
.   : milestone, 48039,
code_origins (43.38 ms) : 43021, 43739
.   : milestone, 43380,
iast (43.334 ms) : 42967, 43701
.   : milestone, 43334,
profiling (47.075 ms) : 46647, 47502
.   : milestone, 47075,
tracing (43.986 ms) : 43628, 44345
.   : milestone, 43986,
section candidate
no_agent (35.98 ms) : 35693, 36266
.   : milestone, 35980,
appsec (49.449 ms) : 49013, 49885
.   : milestone, 49449,
code_origins (43.018 ms) : 42651, 43384
.   : milestone, 43018,
iast (44.684 ms) : 44298, 45071
.   : milestone, 44684,
profiling (49.877 ms) : 49374, 50381
.   : milestone, 49877,
tracing (46.195 ms) : 45801, 46590
.   : milestone, 46195,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 36.628 ms [36.334 ms, 36.923 ms] -
appsec 48.039 ms [47.615 ms, 48.464 ms] 11.411 ms (31.2%)
code_origins 43.38 ms [43.021 ms, 43.739 ms] 6.751 ms (18.4%)
iast 43.334 ms [42.967 ms, 43.701 ms] 6.706 ms (18.3%)
profiling 47.075 ms [46.647 ms, 47.502 ms] 10.447 ms (28.5%)
tracing 43.986 ms [43.628 ms, 44.345 ms] 7.358 ms (20.1%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 35.98 ms [35.693 ms, 36.266 ms] -
appsec 49.449 ms [49.013 ms, 49.885 ms] 13.47 ms (37.4%)
code_origins 43.018 ms [42.651 ms, 43.384 ms] 7.038 ms (19.6%)
iast 44.684 ms [44.298 ms, 45.071 ms] 8.705 ms (24.2%)
profiling 49.877 ms [49.374 ms, 50.381 ms] 13.898 ms (38.6%)
tracing 46.195 ms [45.801 ms, 46.59 ms] 10.216 ms (28.4%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~3619b77e1b, baseline=1.55.0-SNAPSHOT~f42a6e9909
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.392 ms) : 4338, 4446
.   : milestone, 4392,
iast (10.083 ms) : 9909, 10256
.   : milestone, 10083,
iast_FULL (14.516 ms) : 14220, 14812
.   : milestone, 14516,
iast_GLOBAL (10.195 ms) : 10015, 10375
.   : milestone, 10195,
profiling (9.503 ms) : 9343, 9662
.   : milestone, 9503,
tracing (7.979 ms) : 7854, 8104
.   : milestone, 7979,
section candidate
no_agent (4.279 ms) : 4231, 4326
.   : milestone, 4279,
iast (8.955 ms) : 8809, 9102
.   : milestone, 8955,
iast_FULL (14.243 ms) : 13956, 14530
.   : milestone, 14243,
iast_GLOBAL (11.173 ms) : 10973, 11373
.   : milestone, 11173,
profiling (8.802 ms) : 8665, 8939
.   : milestone, 8802,
tracing (8.18 ms) : 8054, 8305
.   : milestone, 8180,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.392 ms [4.338 ms, 4.446 ms] -
iast 10.083 ms [9.909 ms, 10.256 ms] 5.691 ms (129.6%)
iast_FULL 14.516 ms [14.22 ms, 14.812 ms] 10.124 ms (230.5%)
iast_GLOBAL 10.195 ms [10.015 ms, 10.375 ms] 5.803 ms (132.1%)
profiling 9.503 ms [9.343 ms, 9.662 ms] 5.111 ms (116.4%)
tracing 7.979 ms [7.854 ms, 8.104 ms] 3.587 ms (81.7%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.279 ms [4.231 ms, 4.326 ms] -
iast 8.955 ms [8.809 ms, 9.102 ms] 4.677 ms (109.3%)
iast_FULL 14.243 ms [13.956 ms, 14.53 ms] 9.965 ms (232.9%)
iast_GLOBAL 11.173 ms [10.973 ms, 11.373 ms] 6.894 ms (161.1%)
profiling 8.802 ms [8.665 ms, 8.939 ms] 4.523 ms (105.7%)
tracing 8.18 ms [8.054 ms, 8.305 ms] 3.901 ms (91.2%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761174241 1761189637
git_commit_sha f42a6e9 3619b77
release_version 1.55.0-SNAPSHOT~f42a6e9909 1.55.0-SNAPSHOT~3619b77e1b
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1761191834 1761191834
ci_job_id 1193103860 1193103860
ci_pipeline_id 80044964 80044964
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-2-7sb214wi 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-2-7sb214wi 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 1 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:dacapo:tomcat:appsec better
[-1.462ms; -1.118ms] or [-38.925%; -29.767%]
2.466ms 3.756ms
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~3619b77e1b, baseline=1.55.0-SNAPSHOT~f42a6e9909
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.482 ms) : 1470, 1493
.   : milestone, 1482,
appsec (3.756 ms) : 3536, 3976
.   : milestone, 3756,
iast (2.223 ms) : 2159, 2286
.   : milestone, 2223,
iast_GLOBAL (2.259 ms) : 2195, 2322
.   : milestone, 2259,
profiling (2.078 ms) : 2025, 2130
.   : milestone, 2078,
tracing (2.042 ms) : 1992, 2092
.   : milestone, 2042,
section candidate
no_agent (1.481 ms) : 1469, 1492
.   : milestone, 1481,
appsec (2.466 ms) : 2415, 2517
.   : milestone, 2466,
iast (2.212 ms) : 2149, 2275
.   : milestone, 2212,
iast_GLOBAL (2.262 ms) : 2198, 2326
.   : milestone, 2262,
profiling (2.504 ms) : 2335, 2673
.   : milestone, 2504,
tracing (2.034 ms) : 1985, 2084
.   : milestone, 2034,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.482 ms [1.47 ms, 1.493 ms] -
appsec 3.756 ms [3.536 ms, 3.976 ms] 2.274 ms (153.5%)
iast 2.223 ms [2.159 ms, 2.286 ms] 740.893 µs (50.0%)
iast_GLOBAL 2.259 ms [2.195 ms, 2.322 ms] 776.818 µs (52.4%)
profiling 2.078 ms [2.025 ms, 2.13 ms] 595.836 µs (40.2%)
tracing 2.042 ms [1.992 ms, 2.092 ms] 560.319 µs (37.8%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.481 ms [1.469 ms, 1.492 ms] -
appsec 2.466 ms [2.415 ms, 2.517 ms] 985.232 µs (66.5%)
iast 2.212 ms [2.149 ms, 2.275 ms] 731.694 µs (49.4%)
iast_GLOBAL 2.262 ms [2.198 ms, 2.326 ms] 781.352 µs (52.8%)
profiling 2.504 ms [2.335 ms, 2.673 ms] 1.023 ms (69.1%)
tracing 2.034 ms [1.985 ms, 2.084 ms] 553.785 µs (37.4%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~3619b77e1b, baseline=1.55.0-SNAPSHOT~f42a6e9909
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.346 s) : 15346000, 15346000
.   : milestone, 15346000,
appsec (14.833 s) : 14833000, 14833000
.   : milestone, 14833000,
iast (18.631 s) : 18631000, 18631000
.   : milestone, 18631000,
iast_GLOBAL (18.096 s) : 18096000, 18096000
.   : milestone, 18096000,
profiling (15.335 s) : 15335000, 15335000
.   : milestone, 15335000,
tracing (15.313 s) : 15313000, 15313000
.   : milestone, 15313000,
section candidate
no_agent (15.258 s) : 15258000, 15258000
.   : milestone, 15258000,
appsec (15.248 s) : 15248000, 15248000
.   : milestone, 15248000,
iast (18.587 s) : 18587000, 18587000
.   : milestone, 18587000,
iast_GLOBAL (17.583 s) : 17583000, 17583000
.   : milestone, 17583000,
profiling (15.244 s) : 15244000, 15244000
.   : milestone, 15244000,
tracing (15.342 s) : 15342000, 15342000
.   : milestone, 15342000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.346 s [15.346 s, 15.346 s] -
appsec 14.833 s [14.833 s, 14.833 s] -513.0 ms (-3.3%)
iast 18.631 s [18.631 s, 18.631 s] 3.285 s (21.4%)
iast_GLOBAL 18.096 s [18.096 s, 18.096 s] 2.75 s (17.9%)
profiling 15.335 s [15.335 s, 15.335 s] -11.0 ms (-0.1%)
tracing 15.313 s [15.313 s, 15.313 s] -33.0 ms (-0.2%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.258 s [15.258 s, 15.258 s] -
appsec 15.248 s [15.248 s, 15.248 s] -10.0 ms (-0.1%)
iast 18.587 s [18.587 s, 18.587 s] 3.329 s (21.8%)
iast_GLOBAL 17.583 s [17.583 s, 17.583 s] 2.325 s (15.2%)
profiling 15.244 s [15.244 s, 15.244 s] -14.0 ms (-0.1%)
tracing 15.342 s [15.342 s, 15.342 s] 84.0 ms (0.6%)

@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch 3 times, most recently from dc41615 to d9d6213 Compare October 16, 2025 22:43
Comment on lines 28 to 29
// Should really only return valid JSON types (Array, Map, String, Boolean, Number, null)
public Object parsePlanProduct(Object value) {
Copy link
Contributor Author

@charlesmyu charlesmyu Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that this method returns an Object instead of something definite like a JSON node (or even just a String). The end goal is to allow any JSON object (other than null, which we filter out) to be serialized into a string using writeObjectToString, and this seemed like the most straightforwards way to achieve that. There's probably some more idiomatic way I'm missing - happy to hear about it if anyone has ideas!

@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch from d9d6213 to 54ab1ad Compare October 16, 2025 22:47
@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch from 54ab1ad to 0279fff Compare October 17, 2025 04:53
public static void exit(
@Advice.Return(readOnly = false) SparkPlanInfo planInfo,
@Advice.Argument(0) SparkPlan plan) {
if (planInfo.metadata().size() == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using the existing metadata on the DataSourceScanExec nodes, we open ourselves to a bit of inconsistency in the JSON parsing:

"meta": {
    "Format": "Parquet",
    "Batched": true,
    ...,
    "DataFilters": "[CASE WHEN PULocationID#28 IN (236,132,161) THEN true ELSE isnotnull(PULocationID#28) END]"
},

Specifically the lists are not quoted & escaped, which means when we read out the field it's treated as a string rather than a JSON native array. Ideally we would parse this ourselves and upsert it so we can control that formatting, but obviously there's a risk of the parsing going wrong and impacting something that actually uses the field. Leaning slightly towards keeping the formatting as-is in favour of not touching existing fields but happy to hear any other thoughts on this...

// An extension of how Spark translates `SparkPlan`s to `SparkPlanInfo`, see here:
// https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala#L54
public class Spark213PlanUtils extends AbstractSparkPlanUtils {
public Map<String, String> extractPlanProduct(TreeNode plan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenLineage connector we had a special facet for storing serialized LogicalPlan of the query. This was the most problematic feature we ever had. Because the plan can contain everything. For example, if a user creates in memory few gigabyte dataframe, then this becomes a node in a logical plan. And OpenLineage connector tried to serliaze it and failed the whole Spark driver.

This PR seems to be doing same thing for the physical plan. I think we shouldn't serialize the object when we don't know what's inside.

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted about this over a call, summarizing for posterity:

  • Worth clarifying that this function does not traverse the tree itself; we leave that up to Spark because we instrument the recursive fromSparkPlan method
  • We should avoid serializing anything we don't know about arbitrarily, especially using toString(). Since we are taking the full product of the TreeNode we could get some enormous structure (e.g. improbable, but maybe an array of all the data) and toString() would then attempt to serialize all of that data
    • Instead we should lean solely on simpleString() which is safe by default and default to not serializing otherwise. We could then only serialize other TreeNodes and leave out any unknown or unexpected data structures
    • With this change it would even be safe to parse the child QueryPlan nodes because it would no longer output the long physical plan, and instead print the one line string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 177 to 185
// Parse any nested objects to a standard form that ignores the keys
// Right now we do this by just asserting the key set and none of the values
static Object parseNestedMetaObject(Object value) {
if (value instanceof Map) {
return value.keySet()
} else {
return value
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was driving me nuts - there must be a better way to accomplish this without a ton of additional code... The issue is that for the Spark32 suite of tests, the expectations for the meta fields use named keys, but when we run the tests using Scala 2.12 we expect those to all show up as _dd.unknown_key.*. I added a (not great) way around that in assertSQLPlanEquals, which worked fine until we started getting nested maps that can have unknown keys. e.g.:

"meta": {
  "_dd.unparsed" : "any",
  "outputPartitioning" : {
    "HashPartitioning" : {
      "numPartitions" : 2,
      "expressions" : [ "string_col#28" ]
    }
  },
  "shuffleOrigin" : "ENSURE_REQUIREMENTS"
},

Where the numPartitions and expressions keys would show up as _dd.unknown_key.* in Scala 2.12. Initially I went for a recursive approach but that ended up feeling very bloated, so I abandoned it in favour of a subpar keyset check (i.e. only check that HashPartitioning exists in the map).

No false impressions that this is any good - let me know if there's a better way I'm missing, if just the key check is okay (only applies to the test suite running Scala 2.12/Spark 3.2.0, the other two suites compare everything as expected), or if we just have to put up with the recursive approach...

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the approach to compare lists of values instead of whatever I had put before - a bit cleaner and simpler to follow. Has its own downsides (e.g. not perfect comparisons as some stable keys are eliminated, and the containsAll comparison can be fooled) but at least it attempts to compare values and is much easier to maintain. Given it's on an older version of Scala that will no longer be supported for new Spark versions, I think this should probably be fine.

c68b356 (#9783)

Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the first round of reading and left some comments.
Pls let me know do you think about it.

public abstract class AbstractSparkPlanUtils {
private final int MAX_DEPTH = 4;
private final int MAX_LENGTH = 50;
private final ObjectMapper mapper =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractDatadogSparkListener already contains ObjectMapper. Pls try reusing the existing one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same thing and was looking at it briefly. From my understanding because the two classes are instantiated in separate advice classes it's fairly difficult to pass references to one another in any way.

I was even hoping to just find a potential way to share an ObjectMapper across all instantiations of AbstractSparkPlanUtils (maybe using Context Stores?) but it didn't seem super feasible... Would be very happy to be corrected on this, though

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just kidding, I did not realize you can just call static fields/methods directly from an abstract class...

8bf8488

// Use reflection rather than native `instanceof` for classes added in later Spark versions
private boolean instanceOf(Object value, Class[] classes) {
for (Class cls : classes) {
if (cls != null && cls.isInstance(value)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid any class loading issue, it may make sense to compare class names only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but am only concerned about how we would deal with inheritance, since we compare parent classes fairly frequently (e.g. BuildSide, Attribute, BroadcastMode, etc). I'm not sure it makes the most sense (especially as Spark evolves) to enumerate all the child classes explicitly, but maybe we could use reflection to traverse the tree? Not sure if that negates the benefits :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up settling on String comparison with traversal of interfaces & parent classes to avoid class loading errors. Added a negative cache to potentially help consistently traversing classes we know we won't find anything from - let me know if that makes any sense! 047880a (#9783)


public static class SparkPlanInfoAdvice {
@Advice.OnMethodExit(suppress = Throwable.class, onThrowable = Throwable.class)
public static void exit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking loudly: would it make sense to hide this behind feature flague in config? Reasons to do so:

  • we can let it bake on dd internal jobs for some time,
  • users can turn it off in case of unexpected issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, added! 55b917b


if (unparsed.size() > 0) {
// For now, place what we can't parse here with the types so we're aware of them
args.put("_dd.unparsed", unparsed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to create separate SafeSparkPlanSerializer class? Seems like this is what this all is about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, do we mean to use SafeSparkPlanSerializer to store the list of parsed and unparsed orgs to then serialize? Or for it to handle the actual serialization (e.g. parsePlanProduct) itself?

This did make me realize some of the function names are outdated, though. Updated them (extractPlanProduct -> safeParseTreeNode, parsePlanProduct -> safeParseObjectToJson)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified over a quick call; renamed class to AbstractSparkPlanSerializer as well to address this!

8bf8488

@charlesmyu charlesmyu marked this pull request as ready for review October 21, 2025 21:20
@charlesmyu charlesmyu requested a review from a team as a code owner October 21, 2025 21:20
@charlesmyu charlesmyu requested a review from a team as a code owner October 21, 2025 21:20
@charlesmyu charlesmyu requested a review from mhlidd October 21, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: apache spark Apache Spark instrumentation type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants