Skip to content

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Aug 20, 2025

Switch to OpenTracing's ContextVarsScopeManager instead of our own custom LogContextScopeManager.

This is now possible because the linked Twisted issue from this comment is resolved:

It would be nice just to use opentracing's ContextVarsScopeManager,
but currently that doesn't work due to https://twistedmatrix.com/trac/ticket/10301.

This PR is spawning from exploring different possibilities to solve the scope loss problem I was encountering in #18804 (comment). This appears to solve the problem and I've added the additional test from there to this PR ✅


Related to #10342 (previously matrix-org/synapse#10342) where we also want to change our LoggingContext machinery to base ContextVar based.

Testing strategy

  1. Run a Jaeger instance (https://www.jaegertracing.io/docs/1.6/getting-started/)
    $ docker run -d --name jaeger \
      -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
      -p 5775:5775/udp \
      -p 6831:6831/udp \
      -p 6832:6832/udp \
      -p 5778:5778 \
      -p 16686:16686 \
      -p 14268:14268 \
      -p 9411:9411 \
      jaegertracing/all-in-one:1.59.0
  2. Configure Synapse to use tracing:
    homeserver.yaml
    ## Tracing ##
    opentracing:
     enabled: true
     jaeger_config:
       sampler:
         type: const
         param: 1
       logging:
         false
  3. Make sure the optional opentracing depdency is installed: poetry install --extras all
  4. Run Synapse: poetry run synapse_homeserver --config-path homeserver.yaml
  5. Make some requests: GET http://localhost:8008/_matrix/client/versions
  6. Visit http://localhost:16686/search to view the traces
    • Select the correct service
    • Select a relevant 'Operation' like one of the Servlet's
    • Press 'Find Traces' button

I tested a few endpoints and the traces seem to look normal.

Dev notes

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.logging.test_opentracing

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)



class LogContextScopeManagerTestCase(TestCase):
class TracingScopeTestCase(TestCase):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we've removed the LogContextScopeManager now, I've decided to keep these tests around. They appear like good sanity checks to make sure our tracing still works with Twisted regardless of what we try to use.

@MadLittleMods MadLittleMods changed the title Switch to OpenTracing ContextVarsScopeManager Switch to OpenTracing's ContextVarsScopeManager Aug 20, 2025
@MadLittleMods MadLittleMods force-pushed the madlittlemods/switch-to-opentracing-context-vars-scope-manager branch from a9d7008 to b1f308c Compare August 20, 2025 20:47
MadLittleMods added a commit that referenced this pull request Aug 20, 2025
These changes are being introduced in #18849
instead.
@MadLittleMods MadLittleMods marked this pull request as ready for review August 20, 2025 21:09
@MadLittleMods MadLittleMods requested a review from a team as a code owner August 20, 2025 21:09
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@MadLittleMods MadLittleMods merged commit 27fc338 into develop Aug 27, 2025
111 of 117 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/switch-to-opentracing-context-vars-scope-manager branch August 27, 2025 16:41
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @anoadragon453 🐍

MadLittleMods added a commit that referenced this pull request Aug 27, 2025
…e while writing bytes to the request (#18804)

This will allow to easily see how much time is taken up by
being able to filter by the `write_bytes_to_request` operation
in Jaeger.

Spawning from #17722

The `write_bytes_to_request` span won't show up in the trace until
#18849 is merged.

Note: It's totally fine for a span child to finish after the parent. See
https://opentracing.io/specification/#references-between-spans which
shows "Child Span D" outliving the "Parent Span"
MadLittleMods added a commit that referenced this pull request Sep 26, 2025
Before #18849, we we're using
our own custom `LogContextScopeManager` which tied the tracing scope to
the `LoggingContext`. Since we created a new
`BackgroundProcessLoggingContext` any time we
`run_as_background_process(...)`, the trace for the background work was
separate from the trace that kicked of the work as expected (e.g.
request trace is separate from the background process we kicked to fetch
more messages from the federation).

Since we've now switched to the `ContextVarsScopeManager` (in
#18849), the tracing scope now
crosses the `LoggingContext` boundaries (and thread boundaries) without
a problem. This means we end up with request traces that include all of
the background work that we've kicked off bloating the trace and making
it hard to understand what's going on.

This PR separates the traces again to how things were before.
Additionally, things are even better now since I added some cross-link
references between the traces to easily be able to jump between.

Follow-up to #18849

---

In the before, you can see that the trace is blown up by the background
process (`bgproc.qwer`).

In the after, we now only have a little cross-link marker span
(`start_bgproc.qwer`) to jump to background process trace.

Before | After
---  | ---
<some image> | <some image>



### Testing strategy

1. Run a Jaeger instance
(https://www.jaegertracing.io/docs/1.6/getting-started/)
    ```shell
    $ docker run -d --name jaeger \
      -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
      -p 5775:5775/udp \
      -p 6831:6831/udp \
      -p 6832:6832/udp \
      -p 5778:5778 \
      -p 16686:16686 \
      -p 14268:14268 \
      -p 9411:9411 \
      jaegertracing/all-in-one:1.59.0
    ```
 1. Configure Synapse to use tracing:
     `homeserver.yaml`
     ```yaml
    ## Tracing ##
    opentracing:
      enabled: true
      jaeger_config:
        sampler:
          type: const
          param: 1
        logging:
          false
    ```
1. Make sure the optional `opentracing` dependency is installed: `poetry
install --extras all`
1. In the `VersionsRestServlet`, modify it to kick off a dummy
background process (easy to test this way)
    ```python
from synapse.metrics.background_process_metrics import
run_as_background_process

    async def _qwer() -> None:
        await self.clock.sleep(1)

    run_as_background_process("qwer", "test_server", _qwer)
    ```
1. Run Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fire off a version requests: `curl
http://localhost:8008/_matrix/client/versions`
 1. Visit http://localhost:16686/search to view the traces
     - Select the correct service
     - Look for the  `VersionsRestServlet` operation
     - Press 'Find Traces' button
     - Select the relevant trace
     - Notice how the trace isn't bloated
- Look for the `start_bgproc.qwer` span cross-linking to the background
process
- Jump to the other trace using the cross-link reference ->
`bgproc.qwer`
MadLittleMods added a commit that referenced this pull request Oct 1, 2025
This reverts commit 27fc338.

Conflicts:
	synapse/logging/context.py
	tests/logging/test_opentracing.py
@MadLittleMods MadLittleMods added the A-Tracing OpenTracing (maybe OpenTelemetry in the future) label Oct 2, 2025
MadLittleMods added a commit that referenced this pull request Oct 2, 2025
…19007)

Revert #18849

Go back to our custom `LogContextScopeManager` after trying
OpenTracing's `ContextVarsScopeManager`.

Fix #19004

### Why revert?

For reference, with the normal reactor, `ContextVarsScopeManager` worked
just as good as our custom `LogContextScopeManager` as far as I can tell
(and even better in some cases). But since Twisted appears to not fully
support `ContextVar`'s, it doesn't work as expected in all cases.
Compounding things, `ContextVarsScopeManager` was causing errors with
the experimental `SYNAPSE_ASYNC_IO_REACTOR` option.

Since we're not getting the full benefit that we originally desired, we
might as well revert and figure out alternatives for extending the
logcontext lifetimes to support the use case we were trying to unlock
(c.f. #18804).

See
#19004 (comment)
for more info.


### Does this require backporting and patch releases?

No. Since `ContextVarsScopeManager` operates just as good with the
normal reactor and was only causing actual errors with the experimental
`SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to
backport and make patch releases at all.



### Maintain cross-links between main trace and background process work

In order to maintain the functionality introduced in #18932 (cross-links between the background process trace and currently active trace), we also needed a small change.

Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below.

Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.
MadLittleMods added a commit that referenced this pull request Oct 2, 2025
…_available`

This test is still expected to fail with our current `LogContextScopeManager`
implementation but there were some flaws in how we were handling
the logcontext before (like pumping the reactor in a non-sentinel logcontext)

This test is originally from
#18849
spawning from #18804 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Tracing OpenTracing (maybe OpenTelemetry in the future)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants