Skip to content

Commit f2afa2a

Browse files
committed
Add in opentelemetry tracing as a feature
This is an evolution on the previous tracing support I added a few years back. See CONTRIBUTING for details. The rs_tracing crate is functionally fine but very limited in features compared to the tracing crate that provides the developer interface for opentelemetry instrumentation. As we start looking at performance again I think this tooling will be useful.
1 parent 89f1037 commit f2afa2a

File tree

14 files changed

+468
-12
lines changed

14 files changed

+468
-12
lines changed

CONTRIBUTING.md

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -217,8 +217,8 @@ break our testing (like `RUSTUP_TOOLCHAIN`, `SHELL`, `ZDOTDIR`, `RUST_BACKTRACE`
217217
But if you want to debug locally, you may need backtrace. `RUSTUP_BACKTRACE`
218218
is used like `RUST_BACKTRACE` to enable backtraces of failed tests.
219219

220-
**NOTE**: This is a backtrace for the test, not for any rustup process running
221-
in the test
220+
**NOTE**: This is a backtrace for the test, not for any subprocess invocation of
221+
rustup process running in the test
222222

223223
```bash
224224
$ RUSTUP_BACKTRACE=1 cargo test --release --test cli-v1 -- remove_toolchain_then_add_again
@@ -273,3 +273,65 @@ test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 26 filtered out
273273

274274
error: test failed, to rerun pass '--test cli-v1'
275275
```
276+
277+
## Tracing
278+
279+
The feature "otel" can be used when building rustup to turn on Opentelemetry
280+
tracing with a Jaeger exporter.
281+
282+
The normal Jaeger environment variables (`ENV_AGENT_HOST`, `ENV_AGENT_PORT`),
283+
can be used to customise its behaviour, but often the simplest thing is to just
284+
run a Jaeger docker container on the same host:
285+
286+
```sh
287+
docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 jaegertracing/all-in-one:latest
288+
```
289+
290+
Then build rustup-init with tracing:
291+
292+
```sh
293+
cargo build --features=otel
294+
```
295+
296+
Run the operation you want to analyze:
297+
298+
```sh
299+
RUSTUP_FORCE_ARG0="rustup" ./target/debug/rustup-init.exe show
300+
```
301+
302+
And [look in Jaeger for a trace](http://localhost:16686/search?service=rustup).
303+
304+
### Adding instrumentation
305+
306+
The `otel` feature uses conditional compilation to only add function instrument
307+
when enabled. Instrumenting a currently uninstrumented function is mostly simply
308+
done like so:
309+
310+
```rust
311+
#[cfg_attr(feature = "otel", tracing::instrument(skip_all))]
312+
```
313+
314+
`skip_all` is not required, but some core structs don't implement Debug yet, and
315+
others have a lot of output in Debug : tracing adds some overheads, so keeping
316+
spans lightweight can help avoid frequency bias in the results - where
317+
parameters with large debug in frequently called functions show up as much
318+
slower than they are.
319+
320+
Some good general heuristics:
321+
322+
- Do instrument slow blocking functions
323+
- Do instrument functions with many callers or that call many different things,
324+
as these tend to help figure the puzzle of what-is-happening
325+
- Default to not instrumenting thin shim functions (or at least, only instrument
326+
them temporarily while figuring out the shape of a problem)
327+
- Be way of debug build timing - release optimisations make a huge difference,
328+
though debug is a lot faster to iterate on. If something isn't a problem in
329+
release don't pay it too much heed in debug.
330+
331+
### Caveats
332+
333+
Cross-thread propogation isn't connected yet. This will cause instrumentation in
334+
a thread to make a new root span until it is fixed. If any Tokio runtime-related
335+
code gets added in those threads this will also cause a panic. We have a couple
336+
of threadpools in use today; if you need to instrument within that context, use
337+
a thunk to propogate the tokio runtime into those threads.

0 commit comments

Comments
 (0)