Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 49 additions & 41 deletions src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Original file line number Diff line number Diff line change
Expand Up @@ -865,65 +865,73 @@ private int ReadBuffer(Span<char> userBuffer, out bool readToUserBuffer)
return task;
}

private async Task<string?> ReadLineAsyncInternal()
private Task<string?> ReadLineAsyncInternal()
{
if (_charPos == _charLen && (await ReadBufferAsync(CancellationToken.None).ConfigureAwait(false)) == 0)
{
return null;
}

StringBuilder? sb = null;
return _charPos == _charLen
? NoDataInTheBuffer()
: ConsumeBufferedData();

do
async Task<string?> NoDataInTheBuffer()
{
char[] tmpCharBuffer = _charBuffer;
int tmpCharLen = _charLen;
int tmpCharPos = _charPos;
int i = tmpCharPos;
if (await ReadBufferAsync(CancellationToken.None).ConfigureAwait(false) == 0)
return null;

return await ConsumeBufferedData().ConfigureAwait(false);
}
Copy link
Member

@stephentoub stephentoub Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this make the NoDataInTheBuffer case more expensive? Does the perf test being run stress this case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does but it might not be the most used one case, at least that's what I hope for. The benchmarks -> dotnet/performance@529f33c#diff-64fc887dbc1598f2ea871592946f4077593925f80b22cab6f586ea7e1e1b0723

async Task<string?> ConsumeBufferedData()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a theory you can share for why this improves perf? There are three cases here:

  • There's no data in the buffer and we're at EOF.
  • There's no data in the buffer but there's remaining data.
  • There's data in the buffer.

In all three cases, previously you would be calling a single async method. In all three cases, now you're still calling at least one async method, and in the second case, it's now calling a second and doing more work than before. The third case is just adding an extra layer of indirection. And the first case should only occur once per file.

Copy link
Author

@Trapov Trapov Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that I could cut the state-machine in half (and eleminate awaits) and switch in two paths without any state-machines where in one part the third case is making less work and in another part the first/second ones (NoDataInBuffer) are doing more work (but they're not the most used ones probably), at least that was my reasoning behind it. But thinking about it again and looking at the perf-results not every case is a winner. Specificaly short lines are the worst.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method Job Toolchain LineLengthRange Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Allocated
ReadLineAsync Job-VRCDUV /testhost-main/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 0, 0] 535.71 us 3.930 us 3.281 us 534.51 us 529.92 us 541.75 us 1.00 0.00 187.5000 2.1552 1,155 KB
ReadLineAsync Job-UBMMAT /testhost/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 0, 0] 512.69 us 4.709 us 4.174 us 512.40 us 504.80 us 519.53 us 0.96 0.01 187.5000 2.0161 1,156 KB
ReadLineAsync Job-VRCDUV /testhost-main/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 0, 1024] 29.21 us 0.369 us 0.345 us 29.12 us 28.65 us 29.84 us 1.00 0.00 10.4360 0.1160 65 KB
ReadLineAsync Job-UBMMAT /testhost/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 0, 1024] 19.96 us 0.153 us 0.143 us 19.95 us 19.70 us 20.18 us 0.68 0.01 10.4828 0.1588 65 KB
ReadLineAsync Job-VRCDUV /testhost-main/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 1, 1] 320.86 us 1.957 us 1.634 us 321.01 us 317.07 us 323.00 us 1.00 0.00 125.0000 1.3021 771 KB
ReadLineAsync Job-UBMMAT /testhost/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 1, 1] 322.20 us 1.850 us 1.640 us 322.16 us 319.25 us 325.77 us 1.00 0.01 125.0000 1.2755 772 KB
ReadLineAsync Job-VRCDUV /testhost-main/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 1, 8] 169.12 us 2.478 us 2.069 us 169.55 us 164.25 us 172.48 us 1.00 0.00 55.7796 0.6720 344 KB
ReadLineAsync Job-UBMMAT /testhost/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 1, 8] 172.79 us 2.950 us 2.760 us 172.04 us 168.83 us 177.28 us 1.02 0.02 55.7065 0.6793 344 KB
ReadLineAsync Job-VRCDUV /testhost-main/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 9, 32] 58.74 us 0.469 us 0.391 us 58.80 us 58.05 us 59.53 us 1.00 0.00 17.9228 0.2298 110 KB
ReadLineAsync Job-UBMMAT /testhost/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 9, 32] 56.41 us 0.530 us 0.470 us 56.53 us 55.72 us 57.13 us 0.96 0.01 18.0201 0.2281 110 KB
ReadLineAsync Job-VRCDUV /testhost-main/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 33, 128] 35.15 us 0.317 us 0.281 us 35.13 us 34.73 us 35.60 us 1.00 0.00 9.5291 - 59 KB
ReadLineAsync Job-UBMMAT /testhost/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 33, 128] 27.54 us 0.216 us 0.192 us 27.53 us 27.33 us 27.93 us 0.78 0.01 9.6831 0.1100 59 KB
ReadLineAsync Job-VRCDUV /testhost-main/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 129, 1024] 29.31 us 0.279 us 0.233 us 29.27 us 28.95 us 29.81 us 1.00 0.00 10.5932 0.1177 65 KB
ReadLineAsync Job-UBMMAT /testhost/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [ 129, 1024] 20.26 us 0.105 us 0.088 us 20.29 us 20.05 us 20.34 us 0.69 0.01 10.6041 - 65 KB
ReadLineAsync Job-VRCDUV /testhost-main/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [1025, 2048] 31.22 us 0.457 us 0.405 us 31.31 us 30.56 us 31.99 us 1.00 0.00 14.9457 0.3706 92 KB
ReadLineAsync Job-UBMMAT /testhost/net7.0-OSX-Release-x64/shared/Microsoft.NETCore.App/7.0.0/corerun [1025, 2048] 20.79 us 0.195 us 0.182 us 20.75 us 20.50 us 21.13 us 0.67 0.01 14.9402 0.4150 92 KB

@stephentoub dotnet/performance@529f33c#diff-64fc887dbc1598f2ea871592946f4077593925f80b22cab6f586ea7e1e1b0723

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out of benchmarks the sync method for cleaner results between main and this branch

{
StringBuilder? sb = null;
do
{
char ch = tmpCharBuffer[i];
char[] tmpCharBuffer = _charBuffer;
int tmpCharLen = _charLen;
int tmpCharPos = _charPos;
int i = tmpCharPos;

// Note the following common line feed chars:
// \n - UNIX \r\n - DOS \r - Mac
if (ch == '\r' || ch == '\n')
do
{
string s;
char ch = tmpCharBuffer[i];

if (sb != null)
{
sb.Append(tmpCharBuffer, tmpCharPos, i - tmpCharPos);
s = sb.ToString();
}
else
// Note the following common line feed chars:
// \n - UNIX \r\n - DOS \r - Mac
if (ch is '\r' or '\n')
Copy link
Member

@stephentoub stephentoub Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces identical IL and asm, doesn't it? I would not expect a perf improvement from this. What will be different and might yield a micro-improvement is ch == '\r' | ch == '\n', removing a (likely-well-predicted) branch.

{
s = new string(tmpCharBuffer, tmpCharPos, i - tmpCharPos);
}
string s;

if (sb != null)
{
sb.Append(tmpCharBuffer, tmpCharPos, i - tmpCharPos);
s = sb.ToString();
}
else
{
s = new string(tmpCharBuffer, tmpCharPos, i - tmpCharPos);
}

_charPos = tmpCharPos = i + 1;
_charPos = tmpCharPos = i + 1;

if (ch == '\r' && (tmpCharPos < tmpCharLen || (await ReadBufferAsync(CancellationToken.None).ConfigureAwait(false)) > 0))
{
tmpCharPos = _charPos;
if (_charBuffer[tmpCharPos] == '\n')
if (ch == '\r' && (tmpCharPos < tmpCharLen || (await ReadBufferAsync(CancellationToken.None).ConfigureAwait(false)) > 0))
{
_charPos = ++tmpCharPos;
tmpCharPos = _charPos;
if (_charBuffer[tmpCharPos] == '\n')
{
_charPos = ++tmpCharPos;
}
}

return s;
}

return s;
}
i++;
} while (i < tmpCharLen);

i++;
} while (i < tmpCharLen);
i = tmpCharLen - tmpCharPos;
sb ??= new StringBuilder(i + 80);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik I was wondering whether it would be possible to do pooling of the StringBuilder instance, to avoid this allocation in long line cases?

Copy link
Member

@adamsitnik adamsitnik Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nietras it's definitely a good idea! Perhaps we could even replace the private char array ( _charBuffer) with a StringBuilder, write the chars directly to it and avoid the need of appending? The only thing I am not sure of is what side effects using StringBuilder.ToString(int startIndex, int length) might have

sb.Append(tmpCharBuffer, tmpCharPos, i);
} while (await ReadBufferAsync(CancellationToken.None).ConfigureAwait(false) > 0);

i = tmpCharLen - tmpCharPos;
sb ??= new StringBuilder(i + 80);
sb.Append(tmpCharBuffer, tmpCharPos, i);
} while (await ReadBufferAsync(CancellationToken.None).ConfigureAwait(false) > 0);

return sb.ToString();
return sb.ToString();
}
}

public override Task<string> ReadToEndAsync()
Expand Down