Skip to content

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Aug 11, 2025

Instrument _ByteProducer with tracing. This will allow to easily see how much time is taken up by being able to filter by the write_bytes_to_request operation in Jaeger.

Spawning from #17722

The write_bytes_to_request span won't show up in the trace until #18849 is merged.

Before After
2025-08-11_17-40 2025-08-20_15-50

Note: It's totally fine for a span child to finish after the parent. See
https://opentracing.io/specification/#references-between-spans which shows "Child Span D" outliving the "Parent Span"

Testing strategy

  1. Run a Jaeger instance (https://www.jaegertracing.io/docs/1.6/getting-started/)
    $ docker run -d --name jaeger \
      -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
      -p 5775:5775/udp \
      -p 6831:6831/udp \
      -p 6832:6832/udp \
      -p 5778:5778 \
      -p 16686:16686 \
      -p 14268:14268 \
      -p 9411:9411 \
      jaegertracing/all-in-one:1.59.0
  2. Configure Synapse to use tracing:
    homeserver.yaml
    ## Tracing ##
    opentracing:
     enabled: true
     jaeger_config:
       sampler:
         type: const
         param: 1
       logging:
         false
  3. Make sure the optional opentracing depdency is installed: poetry install --extras all
  4. Run Synapse: poetry run synapse_homeserver --config-path homeserver.yaml
  5. Make some requests: GET http://localhost:8008/_matrix/client/versions
  6. Visit http://localhost:16686/search to view the traces
    • Select the correct service
    • Press 'Find Traces' button

Dev notes

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.logging.test_opentracing.LogContextScopeManagerTestCase.test_run_in_background_active_scope_still_available

PR where we did a similar thing by managing __enter__ and __exit__ manually on a start_active_span context manager for tracing, matrix-org/synapse#15888

PR's where our custom LogContextScopeManager/_LogContextScope was changed:


Discussion below where we were running into scope loss problems on the LoggingContext which our custom LogContextScopeManager/_LogContextScope used -> #18804 (comment)


Log context docs: https://github.com/element-hq/synapse/blob/9135d78b88a429cf0220d6a93bdac7485a3a0f88/docs/log_contexts.md


Twisted docs on Producers (Producers and Consumers: Efficient High-Volume Streaming): https://docs.twisted.org/en/twisted-16.2.0/core/howto/producers.html


  • ^.* INFO - -> ``
  • (\S*?)=(\S*) -> \n $1=$2

Todo

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@MadLittleMods MadLittleMods changed the title Intrument _ByteProducer with tracing Instrument _ByteProducer with tracing Aug 11, 2025
@MadLittleMods MadLittleMods changed the title Instrument _ByteProducer with tracing Instrument _ByteProducer with tracing to measure potential dead time while writing bytes to the request Aug 13, 2025
@MadLittleMods MadLittleMods requested a review from richvdh August 14, 2025 16:30
@richvdh richvdh removed their request for review August 15, 2025 08:34
MadLittleMods added a commit that referenced this pull request Aug 20, 2025
@MadLittleMods MadLittleMods marked this pull request as ready for review August 20, 2025 22:30
@MadLittleMods MadLittleMods requested a review from a team as a code owner August 20, 2025 22:30
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation and screenshots!

MadLittleMods added a commit that referenced this pull request Aug 27, 2025
Switch to OpenTracing's `ContextVarsScopeManager` instead of our own
custom `LogContextScopeManager`.

This is now possible because the linked Twisted issue from the comment
in our custom `LogContextScopeManager` is resolved:
https://twistedmatrix.com/trac/ticket/10301

This PR is spawning from exploring different possibilities to solve the
`scope` loss problem I was encountering in
#18804 (comment).
This appears to solve the problem and I've added the additional test
from there to this PR ✅
@MadLittleMods MadLittleMods merged commit 356cc4a into develop Aug 27, 2025
44 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/17722-trace-byte-producer branch August 27, 2025 17:26
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @anoadragon453 🐆

@MadLittleMods MadLittleMods added the A-Tracing OpenTracing (maybe OpenTelemetry in the future) label Oct 2, 2025
MadLittleMods added a commit that referenced this pull request Oct 2, 2025
…19007)

Revert #18849

Go back to our custom `LogContextScopeManager` after trying
OpenTracing's `ContextVarsScopeManager`.

Fix #19004

### Why revert?

For reference, with the normal reactor, `ContextVarsScopeManager` worked
just as good as our custom `LogContextScopeManager` as far as I can tell
(and even better in some cases). But since Twisted appears to not fully
support `ContextVar`'s, it doesn't work as expected in all cases.
Compounding things, `ContextVarsScopeManager` was causing errors with
the experimental `SYNAPSE_ASYNC_IO_REACTOR` option.

Since we're not getting the full benefit that we originally desired, we
might as well revert and figure out alternatives for extending the
logcontext lifetimes to support the use case we were trying to unlock
(c.f. #18804).

See
#19004 (comment)
for more info.


### Does this require backporting and patch releases?

No. Since `ContextVarsScopeManager` operates just as good with the
normal reactor and was only causing actual errors with the experimental
`SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to
backport and make patch releases at all.



### Maintain cross-links between main trace and background process work

In order to maintain the functionality introduced in #18932 (cross-links between the background process trace and currently active trace), we also needed a small change.

Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below.

Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.
MadLittleMods added a commit that referenced this pull request Oct 2, 2025
…_available`

This test is still expected to fail with our current `LogContextScopeManager`
implementation but there were some flaws in how we were handling
the logcontext before (like pumping the reactor in a non-sentinel logcontext)

This test is originally from
#18849
spawning from #18804 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Tracing OpenTracing (maybe OpenTelemetry in the future)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants