-
Notifications
You must be signed in to change notification settings - Fork 392
Switch to OpenTracing's ContextVarsScopeManager
#18849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to OpenTracing's ContextVarsScopeManager
#18849
Conversation
Instead of our own `LogContextScopeManager`
|
||
|
||
class LogContextScopeManagerTestCase(TestCase): | ||
class TracingScopeTestCase(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though we've removed the LogContextScopeManager
now, I've decided to keep these tests around. They appear like good sanity checks to make sure our tracing still works with Twisted regardless of what we try to use.
ContextVarsScopeManager
ContextVarsScopeManager
a9d7008
to
b1f308c
Compare
These changes are being introduced in #18849 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Thanks for the review @anoadragon453 🐍 |
…e while writing bytes to the request (#18804) This will allow to easily see how much time is taken up by being able to filter by the `write_bytes_to_request` operation in Jaeger. Spawning from #17722 The `write_bytes_to_request` span won't show up in the trace until #18849 is merged. Note: It's totally fine for a span child to finish after the parent. See https://opentracing.io/specification/#references-between-spans which shows "Child Span D" outliving the "Parent Span"
Before #18849, we we're using our own custom `LogContextScopeManager` which tied the tracing scope to the `LoggingContext`. Since we created a new `BackgroundProcessLoggingContext` any time we `run_as_background_process(...)`, the trace for the background work was separate from the trace that kicked of the work as expected (e.g. request trace is separate from the background process we kicked to fetch more messages from the federation). Since we've now switched to the `ContextVarsScopeManager` (in #18849), the tracing scope now crosses the `LoggingContext` boundaries (and thread boundaries) without a problem. This means we end up with request traces that include all of the background work that we've kicked off bloating the trace and making it hard to understand what's going on. This PR separates the traces again to how things were before. Additionally, things are even better now since I added some cross-link references between the traces to easily be able to jump between. Follow-up to #18849 --- In the before, you can see that the trace is blown up by the background process (`bgproc.qwer`). In the after, we now only have a little cross-link marker span (`start_bgproc.qwer`) to jump to background process trace. Before | After --- | --- <some image> | <some image> ### Testing strategy 1. Run a Jaeger instance (https://www.jaegertracing.io/docs/1.6/getting-started/) ```shell $ docker run -d --name jaeger \ -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \ -p 5775:5775/udp \ -p 6831:6831/udp \ -p 6832:6832/udp \ -p 5778:5778 \ -p 16686:16686 \ -p 14268:14268 \ -p 9411:9411 \ jaegertracing/all-in-one:1.59.0 ``` 1. Configure Synapse to use tracing: `homeserver.yaml` ```yaml ## Tracing ## opentracing: enabled: true jaeger_config: sampler: type: const param: 1 logging: false ``` 1. Make sure the optional `opentracing` dependency is installed: `poetry install --extras all` 1. In the `VersionsRestServlet`, modify it to kick off a dummy background process (easy to test this way) ```python from synapse.metrics.background_process_metrics import run_as_background_process async def _qwer() -> None: await self.clock.sleep(1) run_as_background_process("qwer", "test_server", _qwer) ``` 1. Run Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fire off a version requests: `curl http://localhost:8008/_matrix/client/versions` 1. Visit http://localhost:16686/search to view the traces - Select the correct service - Look for the `VersionsRestServlet` operation - Press 'Find Traces' button - Select the relevant trace - Notice how the trace isn't bloated - Look for the `start_bgproc.qwer` span cross-linking to the background process - Jump to the other trace using the cross-link reference -> `bgproc.qwer`
This reverts commit 27fc338. Conflicts: synapse/logging/context.py tests/logging/test_opentracing.py
…19007) Revert #18849 Go back to our custom `LogContextScopeManager` after trying OpenTracing's `ContextVarsScopeManager`. Fix #19004 ### Why revert? For reference, with the normal reactor, `ContextVarsScopeManager` worked just as good as our custom `LogContextScopeManager` as far as I can tell (and even better in some cases). But since Twisted appears to not fully support `ContextVar`'s, it doesn't work as expected in all cases. Compounding things, `ContextVarsScopeManager` was causing errors with the experimental `SYNAPSE_ASYNC_IO_REACTOR` option. Since we're not getting the full benefit that we originally desired, we might as well revert and figure out alternatives for extending the logcontext lifetimes to support the use case we were trying to unlock (c.f. #18804). See #19004 (comment) for more info. ### Does this require backporting and patch releases? No. Since `ContextVarsScopeManager` operates just as good with the normal reactor and was only causing actual errors with the experimental `SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to backport and make patch releases at all. ### Maintain cross-links between main trace and background process work In order to maintain the functionality introduced in #18932 (cross-links between the background process trace and currently active trace), we also needed a small change. Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below. Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.
…_available` This test is still expected to fail with our current `LogContextScopeManager` implementation but there were some flaws in how we were handling the logcontext before (like pumping the reactor in a non-sentinel logcontext) This test is originally from #18849 spawning from #18804 (comment)
Switch to OpenTracing's
ContextVarsScopeManager
instead of our own customLogContextScopeManager
.This is now possible because the linked Twisted issue from this comment is resolved:
synapse/synapse/logging/scopecontextmanager.py
Lines 42 to 43 in 40edb10
This PR is spawning from exploring different possibilities to solve the
scope
loss problem I was encountering in #18804 (comment). This appears to solve the problem and I've added the additional test from there to this PR ✅Related to #10342 (previously matrix-org/synapse#10342) where we also want to change our
LoggingContext
machinery to baseContextVar
based.Testing strategy
homeserver.yaml
opentracing
depdency is installed:poetry install --extras all
poetry run synapse_homeserver --config-path homeserver.yaml
GET http://localhost:8008/_matrix/client/versions
Servlet
'sI tested a few endpoints and the traces seem to look normal.
Dev notes
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.