Skip to content

fix(sdk-trace-base): nanosecond precision of span start time #5804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EmrysMyrddin
Copy link

Which problem is this PR solving?

If no startTime is provided at span creation, the Span startTime will have a millisecond precision. OTEL spec allows up to nanosecond precision, which is the default for all major languages implementations.

Fixes # (issue)

Short description of the changes

The problem reside in the Clock Drifting correction logic.

To avoid clock drifting, we try to avoid the use of HR times. While it provides an nanosecond precision level, it can drift from actual NTP time by multiple seconds.

The problem is that by default, if nothing is provided by the user, the time used is Date.now() instead of performance.now(), even if there is a whole logic to avoid drifting.

I have refactored the logic to clearly separate the handling of user input and the drift correction. This way, it's clearer what the code is doing (I had a very hard time understanding what was going one, I had to go read the PR comments).

Now, the start time is in nanosecond precision by default.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

I've tested this with my otel integration from Hive Gateway (graphql-hive/gateway#1343).

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added: Not sure how to test this, any help apreciated
  • [ ] Documentation has been updated

@EmrysMyrddin EmrysMyrddin requested a review from a team as a code owner July 18, 2025 21:40
Copy link

linux-foundation-easycla bot commented Jul 18, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: EmrysMyrddin / name: Valentin Cocaud (0fe3c79)

@EmrysMyrddin EmrysMyrddin force-pushed the fix/nanos-span-start-time branch from 9c84977 to e1a43e6 Compare July 18, 2025 21:44
@EmrysMyrddin EmrysMyrddin force-pushed the fix/nanos-span-start-time branch from e1a43e6 to 0fe3c79 Compare July 18, 2025 21:47
if (typeof input === 'number') {
if (input <= otperformance.now()) {
// must be the result of `performance.now`, which is the time since program start
// translate it to an actual timestamp with hight resolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// translate it to an actual timestamp with hight resolution
// translate it to an actual timestamp with high resolution

@dyladan
Copy link
Member

dyladan commented Jul 22, 2025

Reviewing this now but I have to warn you there be demons in this part of the code. Time handling on different platforms is not at all consistent or obvious. Most notably, the performance timer cannot be counted on to run continuously. It is monotonic, so it never goes backwards, but it is paused in all sorts of non-obvious situations.

@EmrysMyrddin
Copy link
Author

Yes, I've found out that this is a very tricky subject... But the current implementation makes things a bit weird for any operation with sub millisecond duration. Because it breaks actual order of spans (all spans starting during the same millisecond are seen as starting at the start of this millisecond, which will lead to confusion that the shortest span ended first, which is not always true).

@EmrysMyrddin
Copy link
Author

I've for example seen that on CloudFlare Workers, performance.now() doesn't change until an IO operation occurs, to workaround side channels cache timing attacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants