-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
This was originally a PR (dotnet/coreclr#1429), turned into an issue as a result of the comments there.
The idea is to avoid 4KB allocation for buffer whenever we need to work with large number of files.
Consider the following code:
foreach (var file in System.IO.Directory.GetFiles(dirToCheck, "*.dat"))
{
using (var fs = new FileStream(file, FileMode.Open))
{
// do something to read from the stream
}
}The problem is that each instance of FileStream will allocate an independent buffer. If we are reading 10,000 files, that will result in 40MB(!) being allocated, even if we are very careful about allocations in general.
See also: dotnet/corefx#2929
The major problem is that FileStream will allocate its own buffer(s) and provide no way to really manage that. Creating large number of FileStream, or doing a big writes using WriteAsync will allocate a lot of temporary buffers, and generate a lot of GC pressure.
As I see it, there are a few options here:
- Add a constructor that will take an external buffer to use. This will be the sole buffer that will be used, and if a bigger buffer is required, it will throw, instead of allocating a new buffer.
- Add a pool of buffers that will be used. Something like the following code:
[ThreadStatic] private static Stack<byte[]>[] _buffersBySize;
private static GetBuffer(int requestedSize)
{
if(_buffersBySize == null)
_buffersBySize = new Stack<byte[]>[32];
var actualSize = PowerOfTwo(requestedSize);
var pos = MostSignificantBit(actualSize);
if(_buffersBySize[pos] == null)
_buffersBySize[pos] = new Stack<byte[]>();
if(_buffersBySize[pos].Count == 0)
return new byte[actualSize];
return _buffersBySize[pos].Pop();
}
private static void ReturnBuffer(byte[] buffer)
{
var actualSize = PowerOfTwo(buffer.Length);
if(actualSize != buffer.Length)
return; // can't put a buffer of strange size here (prbably an error)
if(_buffersBySize == null)
_buffersBySize = new Stack<byte[]>[32];
var pos = MostSignificantBit(actualSize);
if(_buffersBySize[pos] == null)
_buffersBySize[pos] = new Stack<byte[]>();
_buffersBySize[pos].Push(buffer);
}The idea here is that each thread has its own set of buffers, and we'll take the buffers from there. The Dispose method will return them to the thread buffer. Note that there is no requirement to use the same thread for creation / disposal. (Although to be fair, we'll probably need to handle a case where a disposal thread is used and all streams are disposed on it).
The benefit here is that this isn't going to impact the external API, while adding the external buffer will result in external API being visible.