-
Notifications
You must be signed in to change notification settings - Fork 314
Reuse SpanBuilder within a Thread #9537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Initial performance experiment The idea is to store a CoreSpanBuilder per thread, since usually only SpanBuilder is in use at a given time per thread -- and CoreSpanBuilder isn't thread safe This simple change provides a giant boost in small heaps Improving Spring petclinic throughput from -39% to -19% with 80m heap
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
|
🎯 Code Coverage 🔗 Commit SHA: 3d9c527 | Docs | Was this helpful? Give us feedback! |
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 54 metrics, 11 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.54.0-SNAPSHOT~3d9c52713f, baseline=1.55.0-SNAPSHOT~cd02b0c326
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.033 s) : 0, 1033376
Total [baseline] (9.815 s) : 0, 9814883
Agent [candidate] (1.027 s) : 0, 1027356
Total [candidate] (10.755 s) : 0, 10754602
section appsec
Agent [baseline] (1.199 s) : 0, 1199341
Total [baseline] (10.962 s) : 0, 10962309
Agent [candidate] (1.201 s) : 0, 1201085
Total [candidate] (10.897 s) : 0, 10897153
section iast
Agent [baseline] (1.166 s) : 0, 1165797
Total [baseline] (11.088 s) : 0, 11087553
Agent [candidate] (1.16 s) : 0, 1159991
Total [candidate] (11.166 s) : 0, 11165763
section profiling
Agent [baseline] (1.177 s) : 0, 1176925
Total [baseline] (10.961 s) : 0, 10960777
Agent [candidate] (1.173 s) : 0, 1172709
Total [candidate] (10.943 s) : 0, 10942802
gantt
title petclinic - break down per module: candidate=1.54.0-SNAPSHOT~3d9c52713f, baseline=1.55.0-SNAPSHOT~cd02b0c326
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.478 ms) : 0, 1478
crashtracking [candidate] (1.47 ms) : 0, 1470
BytebuddyAgent [baseline] (704.314 ms) : 0, 704314
BytebuddyAgent [candidate] (698.065 ms) : 0, 698065
GlobalTracer [baseline] (246.81 ms) : 0, 246810
GlobalTracer [candidate] (245.778 ms) : 0, 245778
AppSec [baseline] (32.85 ms) : 0, 32850
AppSec [candidate] (32.42 ms) : 0, 32420
Debugger [baseline] (6.422 ms) : 0, 6422
Debugger [candidate] (6.344 ms) : 0, 6344
Remote Config [baseline] (686.034 µs) : 0, 686
Remote Config [candidate] (675.405 µs) : 0, 675
Telemetry [baseline] (9.434 ms) : 0, 9434
Telemetry [candidate] (10.189 ms) : 0, 10189
Flare Poller [baseline] (9.912 ms) : 0, 9912
Flare Poller [candidate] (10.971 ms) : 0, 10971
section appsec
crashtracking [baseline] (1.468 ms) : 0, 1468
crashtracking [candidate] (1.473 ms) : 0, 1473
BytebuddyAgent [baseline] (720.646 ms) : 0, 720646
BytebuddyAgent [candidate] (721.134 ms) : 0, 721134
GlobalTracer [baseline] (235.814 ms) : 0, 235814
GlobalTracer [candidate] (237.164 ms) : 0, 237164
IAST [baseline] (25.103 ms) : 0, 25103
IAST [candidate] (24.938 ms) : 0, 24938
AppSec [baseline] (175.778 ms) : 0, 175778
AppSec [candidate] (175.851 ms) : 0, 175851
Debugger [baseline] (6.106 ms) : 0, 6106
Debugger [candidate] (6.148 ms) : 0, 6148
Remote Config [baseline] (639.31 µs) : 0, 639
Remote Config [candidate] (625.454 µs) : 0, 625
Telemetry [baseline] (8.573 ms) : 0, 8573
Telemetry [candidate] (8.482 ms) : 0, 8482
Flare Poller [baseline] (3.93 ms) : 0, 3930
Flare Poller [candidate] (3.896 ms) : 0, 3896
section iast
crashtracking [baseline] (1.491 ms) : 0, 1491
crashtracking [candidate] (1.467 ms) : 0, 1467
BytebuddyAgent [baseline] (826.036 ms) : 0, 826036
BytebuddyAgent [candidate] (820.823 ms) : 0, 820823
GlobalTracer [baseline] (234.456 ms) : 0, 234456
GlobalTracer [candidate] (234.235 ms) : 0, 234235
IAST [baseline] (27.745 ms) : 0, 27745
IAST [candidate] (26.808 ms) : 0, 26808
AppSec [baseline] (34.442 ms) : 0, 34442
AppSec [candidate] (35.359 ms) : 0, 35359
Debugger [baseline] (6.263 ms) : 0, 6263
Debugger [candidate] (6.195 ms) : 0, 6195
Remote Config [baseline] (618.169 µs) : 0, 618
Remote Config [candidate] (605.222 µs) : 0, 605
Telemetry [baseline] (8.804 ms) : 0, 8804
Telemetry [candidate] (8.752 ms) : 0, 8752
Flare Poller [baseline] (4.233 ms) : 0, 4233
Flare Poller [candidate] (4.194 ms) : 0, 4194
section profiling
crashtracking [baseline] (1.479 ms) : 0, 1479
crashtracking [candidate] (1.484 ms) : 0, 1484
BytebuddyAgent [baseline] (725.92 ms) : 0, 725920
BytebuddyAgent [candidate] (724.954 ms) : 0, 724954
GlobalTracer [baseline] (222.576 ms) : 0, 222576
GlobalTracer [candidate] (222.099 ms) : 0, 222099
AppSec [baseline] (32.767 ms) : 0, 32767
AppSec [candidate] (32.319 ms) : 0, 32319
Debugger [baseline] (8.457 ms) : 0, 8457
Debugger [candidate] (7.448 ms) : 0, 7448
Remote Config [baseline] (742.09 µs) : 0, 742
Remote Config [candidate] (684.549 µs) : 0, 685
Telemetry [baseline] (13.888 ms) : 0, 13888
Telemetry [candidate] (15.121 ms) : 0, 15121
Flare Poller [baseline] (5.011 ms) : 0, 5011
Flare Poller [candidate] (4.193 ms) : 0, 4193
ProfilingAgent [baseline] (111.869 ms) : 0, 111869
ProfilingAgent [candidate] (110.266 ms) : 0, 110266
Profiling [baseline] (112.527 ms) : 0, 112527
Profiling [candidate] (110.886 ms) : 0, 110886
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.54.0-SNAPSHOT~3d9c52713f, baseline=1.55.0-SNAPSHOT~cd02b0c326
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.034 s) : 0, 1033562
Total [baseline] (8.732 s) : 0, 8731760
Agent [candidate] (1.028 s) : 0, 1028126
Total [candidate] (8.743 s) : 0, 8742826
section iast
Agent [baseline] (1.152 s) : 0, 1152100
Total [baseline] (9.363 s) : 0, 9362668
Agent [candidate] (1.159 s) : 0, 1158825
Total [candidate] (9.343 s) : 0, 9342912
gantt
title insecure-bank - break down per module: candidate=1.54.0-SNAPSHOT~3d9c52713f, baseline=1.55.0-SNAPSHOT~cd02b0c326
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.471 ms) : 0, 1471
crashtracking [candidate] (1.463 ms) : 0, 1463
BytebuddyAgent [baseline] (702.948 ms) : 0, 702948
BytebuddyAgent [candidate] (698.244 ms) : 0, 698244
GlobalTracer [baseline] (247.063 ms) : 0, 247063
GlobalTracer [candidate] (246.268 ms) : 0, 246268
AppSec [baseline] (32.909 ms) : 0, 32909
AppSec [candidate] (32.511 ms) : 0, 32511
Debugger [baseline] (6.447 ms) : 0, 6447
Debugger [candidate] (6.385 ms) : 0, 6385
Remote Config [baseline] (689.943 µs) : 0, 690
Remote Config [candidate] (701.344 µs) : 0, 701
Telemetry [baseline] (9.569 ms) : 0, 9569
Telemetry [candidate] (9.45 ms) : 0, 9450
Flare Poller [baseline] (11.042 ms) : 0, 11042
Flare Poller [candidate] (11.804 ms) : 0, 11804
section iast
crashtracking [baseline] (1.498 ms) : 0, 1498
crashtracking [candidate] (1.487 ms) : 0, 1487
BytebuddyAgent [baseline] (815.611 ms) : 0, 815611
BytebuddyAgent [candidate] (819.544 ms) : 0, 819544
GlobalTracer [baseline] (231.55 ms) : 0, 231550
GlobalTracer [candidate] (233.879 ms) : 0, 233879
IAST [baseline] (26.731 ms) : 0, 26731
IAST [candidate] (26.908 ms) : 0, 26908
AppSec [baseline] (35.357 ms) : 0, 35357
AppSec [candidate] (35.668 ms) : 0, 35668
Debugger [baseline] (6.18 ms) : 0, 6180
Debugger [candidate] (6.164 ms) : 0, 6164
Remote Config [baseline] (601.723 µs) : 0, 602
Remote Config [candidate] (621.737 µs) : 0, 622
Telemetry [baseline] (8.777 ms) : 0, 8777
Telemetry [candidate] (8.852 ms) : 0, 8852
Flare Poller [baseline] (4.299 ms) : 0, 4299
Flare Poller [candidate] (4.194 ms) : 0, 4194
LoadParameters
See matching parameters
SummaryFound 4 performance improvements and 0 performance regressions! Performance is the same for 8 metrics, 12 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.54.0-SNAPSHOT~3d9c52713f, baseline=1.55.0-SNAPSHOT~cd02b0c326
dateFormat X
axisFormat %s
section baseline
no_agent (4.226 ms) : 4177, 4275
. : milestone, 4226,
iast (9.753 ms) : 9590, 9916
. : milestone, 9753,
iast_FULL (14.012 ms) : 13729, 14296
. : milestone, 14012,
iast_GLOBAL (10.616 ms) : 10426, 10806
. : milestone, 10616,
profiling (8.988 ms) : 8840, 9137
. : milestone, 8988,
tracing (7.7 ms) : 7590, 7811
. : milestone, 7700,
section candidate
no_agent (4.265 ms) : 4209, 4320
. : milestone, 4265,
iast (9.968 ms) : 9801, 10135
. : milestone, 9968,
iast_FULL (14.393 ms) : 14106, 14679
. : milestone, 14393,
iast_GLOBAL (10.359 ms) : 10177, 10540
. : milestone, 10359,
profiling (8.657 ms) : 8527, 8787
. : milestone, 8657,
tracing (7.866 ms) : 7746, 7987
. : milestone, 7866,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.54.0-SNAPSHOT~3d9c52713f, baseline=1.55.0-SNAPSHOT~cd02b0c326
dateFormat X
axisFormat %s
section baseline
no_agent (36.804 ms) : 36513, 37096
. : milestone, 36804,
appsec (49.015 ms) : 48572, 49459
. : milestone, 49015,
code_origins (45.454 ms) : 45047, 45861
. : milestone, 45454,
iast (45.002 ms) : 44628, 45376
. : milestone, 45002,
profiling (48.763 ms) : 48317, 49209
. : milestone, 48763,
tracing (44.266 ms) : 43889, 44644
. : milestone, 44266,
section candidate
no_agent (35.572 ms) : 35295, 35850
. : milestone, 35572,
appsec (47.493 ms) : 47059, 47926
. : milestone, 47493,
code_origins (43.277 ms) : 42896, 43657
. : milestone, 43277,
iast (43.825 ms) : 43443, 44208
. : milestone, 43825,
profiling (48.413 ms) : 47960, 48866
. : milestone, 48413,
tracing (43.587 ms) : 43219, 43954
. : milestone, 43587,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.54.0-SNAPSHOT~3d9c52713f, baseline=1.55.0-SNAPSHOT~cd02b0c326
dateFormat X
axisFormat %s
section baseline
no_agent (15.615 s) : 15615000, 15615000
. : milestone, 15615000,
appsec (15.16 s) : 15160000, 15160000
. : milestone, 15160000,
iast (18.59 s) : 18590000, 18590000
. : milestone, 18590000,
iast_GLOBAL (18.08 s) : 18080000, 18080000
. : milestone, 18080000,
profiling (15.458 s) : 15458000, 15458000
. : milestone, 15458000,
tracing (15.064 s) : 15064000, 15064000
. : milestone, 15064000,
section candidate
no_agent (15.664 s) : 15664000, 15664000
. : milestone, 15664000,
appsec (15.06 s) : 15060000, 15060000
. : milestone, 15060000,
iast (18.512 s) : 18512000, 18512000
. : milestone, 18512000,
iast_GLOBAL (18.238 s) : 18238000, 18238000
. : milestone, 18238000,
profiling (15.227 s) : 15227000, 15227000
. : milestone, 15227000,
tracing (15.033 s) : 15033000, 15033000
. : milestone, 15033000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.54.0-SNAPSHOT~3d9c52713f, baseline=1.55.0-SNAPSHOT~cd02b0c326
dateFormat X
axisFormat %s
section baseline
no_agent (1.477 ms) : 1465, 1488
. : milestone, 1477,
appsec (3.75 ms) : 3530, 3969
. : milestone, 3750,
iast (2.215 ms) : 2151, 2279
. : milestone, 2215,
iast_GLOBAL (2.274 ms) : 2209, 2338
. : milestone, 2274,
profiling (2.064 ms) : 2012, 2115
. : milestone, 2064,
tracing (2.037 ms) : 1987, 2087
. : milestone, 2037,
section candidate
no_agent (1.484 ms) : 1472, 1496
. : milestone, 1484,
appsec (3.728 ms) : 3508, 3947
. : milestone, 3728,
iast (2.226 ms) : 2162, 2290
. : milestone, 2226,
iast_GLOBAL (2.263 ms) : 2199, 2327
. : milestone, 2263,
profiling (2.088 ms) : 2035, 2142
. : milestone, 2088,
tracing (2.038 ms) : 1988, 2088
. : milestone, 2038,
|
| public static final String OPTIMIZED_MAP_ENABLED = "optimized.map.enabled"; | ||
| public static final String TAG_NAME_UTF8_CACHE_SIZE = "tag.name.utf8.cache.size"; | ||
| public static final String TAG_VALUE_UTF8_CACHE_SIZE = "tag.value.utf8.cache.size"; | ||
| public static final String SPAN_BUILDER_REUSE_ENABLED = "span.builder.reuse.enabled"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is best class to be placing these in. Or if this is the proper namespace. dd.trace... might be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree dd.trace prefix fits better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ question: Quick question for my understanding: how is that supposed to work with manual tracing API? (DD or OTel)
Let’s say:
var spanBuilder = tracer.spanBuilder("span1");
// …
var span = tracer.spanBuilder("span2").startSpan();
// …
spanBuilder.setAttribute("key", "value").startSpan(); The second startSpan() call will start the span with the wrong name ("span2" instead of "span1") and there is nothing we can do on the API to prevent such use case.
The solution that I put in place yesterday was that I track whether or not the SpanBuilder is "in-use". A SpanBuilder is considered "in -use" from the point that it is So So the 'in-use' check solves the case mentioned in the original comment. You'll still get two distinct SpanBuilders. But there are still other cases that worry me, specifically, I'm worried that you can actually call build on the SpanBuilder multiple times. That's not idiomatic, but it would probably work today. My solution doesn't handle that. I don't believe our own instrumentation produces multiple Spans per builder today, so we could solve the problem by having two slightly different sets of rules: one for automatic and one for manual. We can accomplish that by having two sets of methods: one set that will reuse SpanBuilders that we use in automatic instrumentation, and another set that always creates a new SpanBuilder that we use in manual instrumentation (and the OTel bridge). UPDATE: buildSpan can maintain the existing semantics including allowing building multiple spans. That does mean that we have to update some instrumentation to get the full benefit, but at least, we know the change is safe. |
Refactored code, so tests work regardless of Config
…ace-java into dougqh/spanbuilder-pooling
To avoid breaking any potential code that builds multiple spans from the same SpanBuilder, updated the SpanBuilder pooling approach Introduced a new method singleSpanBuilder which can build one and only one span, this method can be used by automatic instrumentation as an optimization. singleSpanBuilder is now used inside the startSpan convenience methods, since we know they only build and return one span. Any automatic instrumentation using startSpan gets the optimization for free. buildSpan maintains its original semantics, so all existing continues to work as is.
…ace-java into dougqh/spanbuilder-pooling
In a microbenchmark, buildSpan was performing worse than previously. To address, that shortcoming and to clean-up the code... Made CoreSpanBuilder abstract and introduced two child classes: MultiSpanBuilder and ReusableSingleSpanBuilder MultiSpanBuilder is used by buildSpan ReusableSingleSpanBuilder is used by singleSpanBuilder / startSpan (indirectly)
Co-authored-by: Brice Dutheil <[email protected]>
…tion/api/AgentTracer.java Co-authored-by: Brice Dutheil <[email protected]>
Co-authored-by: Brice Dutheil <[email protected]>
Fixing assertion with blackhole test Moved the inUse tracking to start (rather than buildSpan)
- exposed bug with not setting inUse in init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just left several minor comments.
components/environment/src/test/java/datadog/environment/ThreadUtilsTest.java
Outdated
Show resolved
Hide resolved
| MethodHandle h_startVThread; | ||
| try { | ||
| h_startVThread = | ||
| MethodHandles.lookup() | ||
| .findStatic( | ||
| Thread.class, | ||
| "startVirtualThread", | ||
| MethodType.methodType(Thread.class, Runnable.class)); | ||
| } catch (NoSuchMethodException | IllegalAccessException e) { | ||
| throw new IllegalStateException(e); | ||
| } | ||
|
|
||
| try { | ||
| return (Thread) h_startVThread.invoke(runnable); | ||
| } catch (Throwable e) { | ||
| throw new IllegalStateException(e); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a very minor nitpick: I guess we can simplify code by removing try/catch?
We already checking for V-threads before calling this function.
|
|
||
| @Override | ||
| public CoreSpanBuilder withTag(final String tag, final Number number) { | ||
| public final CoreSpanBuilder withTag(final String tag, final Number number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still curious why we are using final so often? It will force Java compiler to produce more optimal code?
|
Also noticed in build logs on CI:
|
Seeing if removing naming conflict with Groovy tests, fix Jacoco coverage calculation
…ace-java into dougqh/spanbuilder-pooling
Yes, that's what I've been trying to figure out. I think it might have been an issue with my creating a CoreTracerTest.java in addition to CoreTracerTest.groovy. |
What Does This Do
This changes reuses a SpanBuilder within a thread.
Motivation
SpanBuilders are one of the major causes of allocation within the client library, but typically only one SpanBuilder is needed at a time in a thread. By reseting and reusing the same SpanBuilder, tracing allocation can be reduced significantly.
By reducing allocation, the garbage collector needs to run less frequently improving the maximum sustainable throughput of the host application. In local tests with Spring Petclinic, this reduces the throughput penalty with tiny heaps (<= 80m) significantly from -40% to -20%. On larger heaps, the gain is more modest, since garbage collector has less of an impact on the application.
In this initial version, the thread local cache isn't used in virtual threads. This restriction exists to avoid creating more memory churn than we save, since virtual threads are created more often than regular threads.