Skip to content

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Sep 9, 2025

Builds on the stable base provided by #3534 and adds the optional async-continue capability back in. It is optional, it is not on by default in this build. To use it you must set a switch: AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour", false); this will use the new multiplexer and async-continue support already defaults to enabled.

This changes the way that continue is used from the original implementation. Originally continue was always available which proved to cause problems with routes through the code that did not expect to be able to fail causing lost context. This new approach runs as if continue is disabled until a function is called which explicitly requests continue mode to be enabled. The only functions that can do this are the ones that read multi-packet string or binary data. The request is cleared when the read completes so it cannot affect other reads from the same packet.

Benchmark results:

All Reads were 10 Mib of data.

BinaryRead

Method UseContinue Mean Error StdDev Gen0 Gen1 Gen2 Allocated
Async False 5,840.26 ms 94.748 ms 88.627 ms 2000.0000 1000.0000 1000.0000 40.81 MB
StreamAsync False 46.91 ms 0.925 ms 1.692 ms 3083.3333 1833.3333 1833.3333 51.8 MB
Sync False 26.87 ms 0.534 ms 1.042 ms 1562.5000 906.2500 906.2500 30.73 MB
Async True 50.78 ms 1.003 ms 1.561 ms 2181.8182 2090.9091 909.0909 40.82 MB
StreamAsync True 58.23 ms 1.114 ms 2.469 ms 2888.8889 1666.6667 1666.6667 51.8 MB
Sync True 26.50 ms 0.509 ms 0.522 ms 1562.5000 906.2500 906.2500 30.73 MB

StringRead

Method UseContinue Mean Error StdDev Gen0 Gen1 Gen2 Allocated
Async False 5,832.31 ms 114.512 ms 191.324 ms 1000.0000 - - 50.8 MB
Sync False 34.32 ms 0.686 ms 1.026 ms 1000.0000 400.0000 400.0000 40.72 MB
Async True 66.26 ms 0.960 ms 1.895 ms 1625.0000 1500.0000 375.0000 50.81 MB
Sync True 35.14 ms 0.694 ms 1.215 ms 1000.0000 400.0000 400.0000 40.72 MB

XmlRead

Method UseContinue Mean Error StdDev Gen0 Gen1 Gen2 Allocated
Sync False 92.20 ms 1.334 ms 1.183 ms 3166.6667 3000.0000 1000.0000 38.57 MB
Async False 182,897.65 ms 3,447.733 ms 5,053.643 ms 31360000.0000 31012000.0000 9640000.0000 360347.15 MB
Sync True 89.87 ms 0.803 ms 0.712 ms 3166.6667 3000.0000 1000.0000 38.57 MB
Async True 128.25 ms 2.192 ms 3.213 ms 4800.0000 4600.0000 1400.0000 56.39 MB

Here are the benchmark files that I used to get these numbers. You'll need connection strings and to generate a random 10Mib xml files from somewhere on the web. Benchmarks.zip
The async xml numbers are pretty unbelievable. If you replicate or improve the benches it would be good to get perf corroboration.

@ErikEJ this will be the build you'll want to test and then possible make available to people, as discussed on the previous PR.
@edwardneal if you have the resources could you run your fantastic test suite against this version and see if you can identify any new bugs?
@dotnet/sqlclientdevteam can you run CI please.

@Wraith2 Wraith2 requested a review from a team as a code owner September 9, 2025 11:14
@Wraith2 Wraith2 changed the title add continue request usage Add continue request usage Sep 9, 2025
@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 9, 2025

@Wraith2 This looks awesome - looks like you also fixed the issue with increased allocations..

I will give this a try against the repro VM asap

@paulmedynski
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@Wraith2
Copy link
Contributor Author

Wraith2 commented Sep 9, 2025

@Wraith2 This looks awesome - looks like you also fixed the issue with increased allocations..

That was fixed in the original fix so it's already merged to main. 4b891d8

Since i've just reviewed the other PR i'll cc a couple of people who were interested in it and may want to test this if it generates artifacts, @MichelZ @rhuijben

@edwardneal
Copy link
Contributor

Excellent! I've built the PR locally and kicked off my stress tests, including the XML case. These will likely take a few days to fully run, but I'll post the results when available.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Sep 9, 2025

I've built the PR locally and kicked off my stress tests, including the XML case

The performance on the xml is shockingly bad. When I realised just how slow it was going on the current main branch I thought I must have either written the test to loop 1000 times or somehow broken SqlCachedReader.

@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 10, 2025

@backstromjoel @valentiniliescu @nilzzzzzz @CSharpFiasco @luca-domenichini @warappa @stevendarby @wjrogers @AnderssonPeter @p10tyr @Eli-Black-Work @BradBarnich @igbenic

You have all expressed an interest in this performance issue - please test this PR build!

@gnegno84
Copy link

Builds on the stable base provided by #3534 and adds the optional async-continue capability back in. It is optional, it is not on by default in this build. To use it you must set a switch: AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour", false); this will use the new multiplexer and async-continue support already defaults to enabled.

On our side, we tried to repro one issue that was happening with 6.1.0 (randomly receiving null values for a json serialized string field).
I've tested with the artifact in this PR and I can confirm that the issue IS NOT happening anymore with both approaches:

  • UseCompatibilityAsyncBehaviour set to true (expected)
  • UseCompatibilityAsyncBehaviour set to false

From a performance perspective, I cannot add valuable info as our test was quite simple and not benchmarked.
Thanks for the updates.

@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 10, 2025

@Wraith2 The 6.11.0-pull.125205 artifcat verified OK against my Sweden repro VM.

@codecov
Copy link

codecov bot commented Sep 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (cd3dbd1) to head (df52b34).
⚠️ Report is 24 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #3605       +/-   ##
==========================================
- Coverage   65.48%       0   -65.49%     
==========================================
  Files         275       0      -275     
  Lines       61518       0    -61518     
==========================================
- Hits        40288       0    -40288     
+ Misses      21230       0    -21230     
Flag Coverage Δ
addons ?
netcore ?
netfx ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 10, 2025

@Wraith2 How do I set an appcontext swith for an entire xUnit test assembly?

@Wraith2
Copy link
Contributor Author

Wraith2 commented Sep 10, 2025

Probably in the ctor but because of the caching in the library once you've set it and the value has been read you can't change the value. To do that you'll need to use private reflection as we do in the tests here.

Copy link
Contributor

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment!

@paulmedynski paulmedynski self-assigned this Sep 12, 2025
@paulmedynski
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 14, 2025

@Wraith2 The 6.11.0-pull.125205 artifact verified OK against the EF Core tests

@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 14, 2025

@Wraith2 FYI, I added this class to the test project:

using Xunit.Sdk;
[assembly: TestFramework("Microsoft.EntityFrameworkCore.AssemblyFixture", "Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests")]

namespace Microsoft.EntityFrameworkCore;

public sealed class AssemblyFixture : XunitTestFramework
{
    public AssemblyFixture(IMessageSink messageSink)
        :base(messageSink)
    {
        AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour", false);
    }
}

@edwardneal
Copy link
Contributor

The stress-testing passed as expected, but it's worth noting that we currently need two AppContext switches to enable this:

AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour", false);
AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseCompatibilityProcessSni", false);

If the latter switch is not explicitly set, we see increased memory usage but not the performance improvements.

Async and sync are very close to parity in normal circumstances, remote SQL Server instances saw nearly identical results in my benchmarks. I noticed that in situations where the SQL Server is able to keep up with the supply of PLP data, the speed improvements here mean that SqlClient is able to call code paths in TryReadByteArray which copy packet buffers around when building the snapshot. This is the current speed bottleneck as far as I can tell.

While nothing's needed for this PR, it looks like SQL Server will try to adjust the size of the PLP data blocks it sends to align with the number of bytes left in each TDS packet. With that in mind, we could avoid copying memory; if a read will take us precisely to the end of the current TDS packet, we could construct a correctly-sized ReadOnlyMemory<byte> over the TDS packet buffer, pass that up to the snapshot directly and allocate a new buffer for the next packet.

I also saw a minor (~5%) performance bump in XML data when I changed the SqlCachedBuffer.MaxChunkSize constant to 8192 (a power of two which is greater than the default packet size.)

@ErikEJ
Copy link
Contributor

ErikEJ commented Sep 15, 2025

@edwardneal Thanks for the info about the switches, I was not aware of that.

Let me try to add both to the EF Core test suite

@Wraith2
Copy link
Contributor Author

Wraith2 commented Sep 15, 2025

The "Switch.Microsoft.Data.SqlClient.UseCompatibilityAsyncBehaviour" switch code is currently:

        public static bool UseCompatibilityAsyncBehaviour
        {
            get
            {
                if (UseCompatibilityProcessSni)
                {
                    // If ProcessSni compatibility mode has been enabled then the packet
                    // multiplexer has been disabled. The new async behaviour using continue
                    // point capture is only stable if the multiplexer is enabled so we must
                    // return true to enable compatibility async behaviour using only restarts.
                    return true;
                }

                if (s_useCompatibilityAsyncBehaviour == Tristate.NotInitialized)
                {
                    if (!AppContext.TryGetSwitch(UseCompatibilityAsyncBehaviourString, out bool returnedValue) || returnedValue)
                    {
                        s_useCompatibilityAsyncBehaviour = Tristate.True;
                    }
                    else
                    {
                        s_useCompatibilityAsyncBehaviour = Tristate.False;
                    }
                }
                return s_useCompatibilityAsyncBehaviour == Tristate.True;
            }
        }

which unless I can't mentally process boolean logic (it's happened) means that it defaults to true and you have to turn it off. However because "Switch.Microsoft.Data.SqlClient.UseCompatibilityProcessSni" is currently false the short circuit is active.
So as I stated in the first post If you turn OFF compatibility sni you should automatically get the new continue-async behaviour.

If you're settings both flags explicitly you'll have to manage your combinations and that's likely what's happening in our tests but for casual users it should only require one settings.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Sep 22, 2025

Is this waiting for anything other than a second review @dotnet/sqlclientdevteam ?

@mdaigle mdaigle self-assigned this Sep 22, 2025
@mdaigle
Copy link
Contributor

mdaigle commented Sep 22, 2025

@Wraith2 I've been out for a couple weeks. I'll start taking a look tomorrow!

@cremor
Copy link

cremor commented Sep 23, 2025

Thanks for all your work on this!

Assuming no problems are found, will this PR be included in a 6.1.x release?
If yes, will it stay opt-in as stated in the PR description?

Those questions are especially relevant for all users who don't know about this problem and just get SqlClient as an EF dependency. EF 10 uses SqlClient 6.1 and won't update to any non-LTS version.

And a small PS: You bolded the wrong lines in the "XmlRead" performance result table in the PR description.

@edwardneal edwardneal mentioned this pull request Sep 23, 2025
Copy link
Contributor

@mdaigle mdaigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks reasonable, but I'm struggling to see what actually changed in the logic. It seems like the if (isPlp) check just shifted down a level. The request continue behavior you described in the PR description was already present, wasn't it? The only related change I see is the addition of RequestContinue(true) within TryReadByteArrayWithContinue. I don't see any spot that newly checks whether a continue has been requested, unless I'm missing it.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Sep 23, 2025

The only important change is the addition of the calls to RequestContinue. Everything needed by those calls was already present but could not be activated. The request enables continue mode only for the read of the data that we know supports it. The revert of the flag, the conditional logic around ContinueEnabled, it was all already present but was previously only capable of working in async-retry mode.

@rizi
Copy link

rizi commented Sep 24, 2025

@Wraith2 where should I add this switch call in my code (not tests)?
E.g in my DB Context constructor or my Program.cs (Asp.net core).

Which version of SqlClient will take this switch into account? Any timeline when it will be released?

Thx for your hard work!

Br

@paulmedynski paulmedynski added this to the 7.0-preview2 milestone Sep 24, 2025
@paulmedynski paulmedynski merged commit 481a2c4 into dotnet:main Sep 24, 2025
236 checks passed
@Wraith2
Copy link
Contributor Author

Wraith2 commented Sep 24, 2025

@Wraith2 where should I add this switch call in my code (not tests)? E.g in my DB Context constructor or my Program.cs (Asp.net core).

That's your choice and will depend on how you structure your code. As long as the switch value is set before the first read is called then it will be active, the value is cached so once set it cannot be changed.

Which version of SqlClient will take this switch into account? Any timeline when it will be released?

The version released from the main branch. I have no details on timelines or version numbers. I'm just a contributor.

@rizi
Copy link

rizi commented Sep 24, 2025

@Wraith2 where should I add this switch call in my code (not tests)? E.g in my DB Context constructor or my Program.cs (Asp.net core).

Which version of SqlClient will take this switch into account? Any timeline when it will be released?

Thx for your hard work!

Br

Thx for the clarification.

@paulmedynski
Copy link
Contributor

@rizi - I've added this to the 7.0.0 Preview 2 milestone, release expected mid/late October.

@rizi
Copy link

rizi commented Sep 24, 2025

@rizi - I've added this to the 7.0.0 Preview 2 milestone, release expected mid/late October.

Thx for the info

edwardneal added a commit to edwardneal/SqlClient that referenced this pull request Sep 24, 2025
See line 6778 of change to netcore TdsParser in dotnet#3605
@Wraith2 Wraith2 deleted the add-speed branch October 20, 2025 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants