Skip to content

Avoid unnecessary allocations when using FileStream #15088

@ayende

Description

@ayende

This was originally a PR (dotnet/coreclr#1429), turned into an issue as a result of the comments there.

The idea is to avoid 4KB allocation for buffer whenever we need to work with large number of files.
Consider the following code:

foreach (var file in System.IO.Directory.GetFiles(dirToCheck, "*.dat"))
{
    using (var fs = new FileStream(file, FileMode.Open))
    {
             // do something to read from the stream
    }
}

The problem is that each instance of FileStream will allocate an independent buffer. If we are reading 10,000 files, that will result in 40MB(!) being allocated, even if we are very careful about allocations in general.

See also: dotnet/corefx#2929

The major problem is that FileStream will allocate its own buffer(s) and provide no way to really manage that. Creating large number of FileStream, or doing a big writes using WriteAsync will allocate a lot of temporary buffers, and generate a lot of GC pressure.

As I see it, there are a few options here:

  • Add a constructor that will take an external buffer to use. This will be the sole buffer that will be used, and if a bigger buffer is required, it will throw, instead of allocating a new buffer.
  • Add a pool of buffers that will be used. Something like the following code:
  [ThreadStatic] private static Stack<byte[]>[] _buffersBySize;
  
  private static GetBuffer(int requestedSize)
  {
      if(_buffersBySize == null)
          _buffersBySize = new Stack<byte[]>[32];
  
      var actualSize = PowerOfTwo(requestedSize);
      var pos = MostSignificantBit(actualSize);
  
      if(_buffersBySize[pos] == null)
          _buffersBySize[pos] = new Stack<byte[]>();
  
      if(_buffersBySize[pos].Count == 0)
          return new byte[actualSize];
  
      return _buffersBySize[pos].Pop();
  }
  
  private static void ReturnBuffer(byte[] buffer)
  {
      var actualSize = PowerOfTwo(buffer.Length);
      if(actualSize != buffer.Length)
          return; // can't put a buffer of strange size here (prbably an error)
  
      if(_buffersBySize == null)
          _buffersBySize = new Stack<byte[]>[32];
  
      var pos = MostSignificantBit(actualSize);
  
      if(_buffersBySize[pos] == null)
          _buffersBySize[pos] = new Stack<byte[]>();
  
      _buffersBySize[pos].Push(buffer);
  }

The idea here is that each thread has its own set of buffers, and we'll take the buffers from there. The Dispose method will return them to the thread buffer. Note that there is no requirement to use the same thread for creation / disposal. (Although to be fair, we'll probably need to handle a case where a disposal thread is used and all streams are disposed on it).

The benefit here is that this isn't going to impact the external API, while adding the external buffer will result in external API being visible.

Metadata

Metadata

Assignees

Labels

api-approvedAPI was approved in API review, it can be implementedarea-System.IOblockingMarks issues that we want to fast track in order to unblock other important workenhancementProduct code improvement that does NOT require public API changes/additionstenet-performancePerformance related issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions