Skip to content

Conversation

@adamint
Copy link
Member

@adamint adamint commented Apr 14, 2025

Description

Changed the visible span data structure to a sorted set with a comparer based on start time. Previously, was backed by a data structure with no defined order.

Fixes #8731 and fixes #8799

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
      • If yes, did you have an API Review for it?
        • Yes
        • No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • Yes
        • No
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
      • If yes, have you done a threat model and had a security review?
        • Yes
        • No
    • No
  • Does the change require an update in our Aspire docs?

@adamint
Copy link
Member Author

adamint commented Apr 14, 2025

This can be tested manually using the following diff:

[playground/Stress/Stress.ApiService/Program.cs]

using System.Threading.Channels;
 using System.Xml.Linq;
 using Microsoft.AspNetCore.Mvc;
+using OpenTelemetry.Trace;
 using Stress.ApiService;

 var builder = WebApplication.CreateBuilder(args);
@@ -15,7 +16,15 @@
 builder.AddServiceDefaults();

 builder.Services.AddOpenTelemetry()
-    .WithTracing(tracing => tracing.AddSource(TraceCreator.ActivitySourceName, ProducerConsumer.ActivitySourceName))
+    .WithTracing(tracing =>
+    {
+        tracing.AddSource(builder.Environment.ApplicationName)
+            .AddAspNetCoreInstrumentation()
+            // Uncomment the following line to enable gRPC instrumentation (requires the OpenTelemetry.Instrumentation.GrpcNetClient package)
+            //.AddGrpcClientInstrumentation()
+            .AddHttpClientInstrumentation()
+            .AddSource("Services.Api");
+    })
     .WithMetrics(metrics => metrics.AddMeter(TestMetrics.MeterName));
 builder.Services.AddSingleton<TestMetrics>();

@@ -285,4 +294,60 @@ await foreach (var message in channel.Reader.ReadAllAsync(cancellationToken))
     return $"Created {TraceCount} traces.";
 });

+app.MapGet("/weatherforecast", async () =>
+    {
+        var forecast = Enumerable.Range(1, 5).Select(index =>
+                new WeatherForecast
+                (
+                    DateOnly.FromDateTime(DateTime.Now.AddDays(index)),
+                    Random.Shared.Next(-20, 55),
+                    "Sample Text"
+                ))
+            .ToArray();
+        ActivitySource source = new("Services.Api", "1.0.0");
+        ActivitySource.AddActivityListener(new ActivityListener
+        {
+            ShouldListenTo = _ => true,
+            Sample = (ref ActivityCreationOptions<ActivityContext> options) => ActivitySamplingResult.AllData,
+        });
+        using var activity = source.StartActivity("ValidateAndUpdateCacheService.ExecuteAsync");
+        await Task.Delay(100);
+        Debug.Assert(activity is not null);
+        using var innerActivity = source.StartActivity("ValidateAndUpdateCacheService.activeUser", ActivityKind.Interna
l, parentContext: activity.Context);
+        await Task.Delay(100);
+        Debug.Assert(innerActivity is not null);
+        using (source.StartActivity("Perform1", ActivityKind.Internal, parentContext: innerActivity.Context))
+        {
+            await Task.Delay(10);
+        }
+        using (source.StartActivity("Perform2", ActivityKind.Internal, parentContext: innerActivity.Context))
+        {
+            await Task.Delay(20);
+        }
+        using (source.StartActivity("Perform3", ActivityKind.Internal, parentContext: innerActivity.Context))
+        {
+            await Task.Delay(30);
+        }
+        using var innerActivity2 = source.StartActivity("ValidateAndUpdateCacheService.activeUser", ActivityKind.Intern
al, parentContext: activity.Context);
+        await Task.Delay(100);
+        Debug.Assert(innerActivity2 is not null);
+
+        using (source.StartActivity("Perform1", ActivityKind.Internal, parentContext: innerActivity2.Context))
+        {
+            await Task.Delay(30);
+        }
+        using (source.StartActivity("Perform2", ActivityKind.Internal, parentContext: innerActivity2.Context))
+        {
+            await Task.Delay(20);
+        }
+        using (source.StartActivity("Perform3", ActivityKind.Internal, parentContext: innerActivity2.Context))
+        {
+            await Task.Delay(10);
+        }
+        return forecast;
+    })
+    .WithName("GetWeatherForecast");
+
 app.Run();
+
+public record WeatherForecast(DateOnly Date, int TemperatureC, string Summary);```

Copy link
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a unit test please.

@dotnet-policy-service dotnet-policy-service bot added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 14, 2025
@danmoseley
Copy link
Member

When unit test exists please create backport PR

@adamint adamint requested a review from JamesNK April 15, 2025 16:03
@adamint
Copy link
Member Author

adamint commented Apr 15, 2025

/backport to release/9.2

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 15, 2025
@github-actions
Copy link
Contributor

Started backporting to release/9.2: https://github.com/dotnet/aspire/actions/runs/14474078723

@github-actions
Copy link
Contributor

@adamint backporting to "release/9.2" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Ensure visible trace spans stay sorted by start time
Applying: Add grid sort order test
error: sha1 information is lacking or useless (src/Aspire.Dashboard/Components/Pages/TraceDetail.razor.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0002 Add grid sort order test
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@danmoseley
Copy link
Member

Did you verify the test fails without your fix here?

@adamint
Copy link
Member Author

adamint commented Apr 15, 2025

Did you verify the test fails without your fix here?

It failed when I ran it, but only ran once. I will retry to confirm

edit: seems my test run was a fluke. I've fixed the test, and improved the fix so that visibleViewModels only has one responsibility and does not have to ensure ordering.

@adamint
Copy link
Member Author

adamint commented Apr 15, 2025

I'm adding a fix to #8799 because the same chunk of code contains both bugs.

Debug.Assert(_spanWaterfallViewModels != null);

var visibleViewModels = new HashSet<SpanWaterfallViewModel>();
var visibleViewModels = new List<SpanWaterfallViewModel>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the change here still matter if you're using it just for a contains predicat on _spanWaterfallViewModels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. I found that hash collisions on different span view model objects were common (somehow?).

Copy link
Member

@JamesNK JamesNK Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. I found that hash collisions on different span view model objects were common (somehow?).

Because the code is looping through each item in _spanWaterfallViewModels and is then looking at its children. The children are also in _spanWaterfallViewModels so a list ends up with values being added many times if there is nesting.

I tested with the data from the big trace stress test and the list ended up with 6000+ items in the list even though there were only 230ish spans.

What breaks when this is a hashset?

Copy link
Member

@JamesNK JamesNK Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this code more. I think it should look like this:

var visibleViewModels = new HashSet<SpanWaterfallViewModel>();
foreach (var viewModel in _spanWaterfallViewModels)
{
    // Span is always hidden or has already been found to be visible.
    // Skip evaluating span again to avoid unnecessary work.
    if (viewModel.IsHidden || visibleViewModels.Contains(viewModel))
    {
        continue;
    }

    if (viewModel.MatchesFilter(_filter, GetResourceName, out var matchedDescendents))
    {
        visibleViewModels.Add(viewModel);
        foreach (var descendent in matchedDescendents.Where(d => !d.IsHidden))
        {
            visibleViewModels.Add(descendent);
        }
    }
}

The extra if check means VMs that are never visible or already visible aren't evaluated over and over. But I think that change could be made later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code

@JamesNK
Copy link
Member

JamesNK commented Apr 16, 2025

Add a command plus endpoint to the stress playground project for repoing this issue. Should be the code from: #8771 (comment) (although I don't think you need to add all the instrumentation. Just .AddSource("Services.Api") should work)

I think it's ok to backport code in the playground project. It doesn't ship anywhere and it's important to have a way for folks to do a functional test that the problem is fixed.

@adamint
Copy link
Member Author

adamint commented Apr 16, 2025

@danmoseley now fixes #8799 too

@adamint adamint requested a review from JamesNK April 16, 2025 14:31
Copy link
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok. I tested the branch with the problem data, and tested expanding/collapsing, but someone else double checking would be good.

@danmoseley
Copy link
Member

@balachir something to test around on the next vendor pass perhaps. (in main or 9.3)

@balachir
Copy link

Thanks @danmoseley , we'll test this. Including @v-elenafeng for awareness.

@danmoseley danmoseley changed the title Ensure visible trace spans stay sorted by start time Ensure visible trace spans stay sorted by start time and fix Traces view expand/collapse icons Apr 18, 2025
@JamesNK JamesNK force-pushed the dev/adamint/8731-traces-sorting-bug branch from e0e4b2b to b7f1e9e Compare April 21, 2025 23:41
@JamesNK JamesNK force-pushed the dev/adamint/8731-traces-sorting-bug branch from b7f1e9e to 695e8b8 Compare April 21, 2025 23:41
@JamesNK
Copy link
Member

JamesNK commented Apr 21, 2025

/backport to release/9.2

@github-actions
Copy link
Contributor

Started backporting to release/9.2: https://github.com/dotnet/aspire/actions/runs/14583534045

@github-actions
Copy link
Contributor

@JamesNK backporting to "release/9.2" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Ensure visible trace spans stay sorted by start time
Using index info to reconstruct a base tree...
M	src/Aspire.Dashboard/Components/Pages/TraceDetail.razor.cs
M	tests/Aspire.Dashboard.Components.Tests/Pages/TraceDetailsTests.cs
Falling back to patching base and 3-way merge...
Auto-merging tests/Aspire.Dashboard.Components.Tests/Pages/TraceDetailsTests.cs
CONFLICT (content): Merge conflict in tests/Aspire.Dashboard.Components.Tests/Pages/TraceDetailsTests.cs
Auto-merging src/Aspire.Dashboard/Components/Pages/TraceDetail.razor.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Ensure visible trace spans stay sorted by start time
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@JamesNK JamesNK enabled auto-merge (squash) April 21, 2025 23:48
@JamesNK JamesNK merged commit 97a7e52 into dotnet:main Apr 21, 2025
174 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators May 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

4 participants