-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and motivation
The BlobBuilder type is a mix between:
- Trying to emulate the underlying mechanics and allocation profile of
StringBuilder - Extensible so that consumers of System.Reflection.Metadata can control allocations of
BlobBuilder(with pooling)
In its current configuration it doesn't fully achieve either of these goals due the following reasons:
BlobBuilderhas no enforced maximum internal chunk size. Instead during write operations it has a much simpler strategy of use rest of currentBlobBuilderthen allocate a singleBlobBuilderto hold the rest. That results in lots of LOH allocations during build.- There are many types in
System.Reflection.Metadatahas no mechanism for consumers to provide derivedBlobBuilderinstances and instead allocateBlobBuildertypes directly. This subverts attempts by consumers to pool allocations. - The
LinkSuffix / LinkPrefixAPIs can end up silently mixing the types ofBlobBuilderinstances in a chain. That makes advanced caching like pooling array allocations impossible because types with different caching strategies get silently inserted into the chain. When these insertions happen thebyte[]underlying the instances are swapped. - There is to mechanism to control the underlying
byte[]allocation which prevents these from being pooled. Only theBlobBuilderinstances can be pooled which means their underlyingbyte[]is inefficiently managed because it can't be re-used when the containingBlobBuilderis at rest. This is in contrast toStringBuilderwhich leverages theArrayPool<char>for allocations. - There is no easy mechanism for derived types to control zeroing of underlying
byte[]when aBlobBuilderinstance from a pool is re-used. Can lead to difficult issues like 99244.
The below proposed changes are meant to address these problems such that consumers of System.Reflection.Metadata can do the following:
- Control the allocation of all
BlobBuilderinstances used in a emit pass. - Control and manage the underlying
byte[]in theBlobBuilder. - Detect when
BlobBuilderinstances are linked withBlobBuilderinstances of a different type.
Using the below changes I've been able to significantly improve the allocation profile of VBCSCompiler. For building a solution the scale of compilers.slnf (~500 compilation events, large, small and medium projects) I've been able to remove ~200MB of LOH for byte[] and reduce GC pause time by 1.5%.
API Proposal
namespace System.Reflection.Metadata;
public class BlobBuilder
{
+ /// <summary>
+ /// The byte array underpinning the <see cref="BlobBuilder"/>. This can only be called on
+ /// the head of a chain of <see cref="BlobBuilder"/> instances. Calling the setter will reset
+ /// the <see cref="Length"> to zero.
+ /// </summary>
+ protected byte[] Buffer { get; set; }
+ /// <summary>
+ /// Derived types can override this to restrict maximum chunk size to allocate when writing
+ /// a contiguous set of bytes through the WriteBytes APIs. When unset the default is to allocate
+ /// a chunk for the rest of the bytes that don't fit into the current chunk.
+ /// </summary>
+ protected virtual int? MaxChunkSize => null;
+ /// <summary>
+ /// Set the capacity of the <see cref="BlobBuilder"/>.
+ // </summary>
+ public int Capacity { get; set; }
+ protected BlobBuilder(byte[] buffer);
+ /// <summary>
+ /// This method is called in <see cref="LinkSuffix"> or <see cref="LinkPrefix"> for both the
+ /// current instance as well as the target of the link method. This allows derived types to
+ /// detect when a link is being made between two different types of <see cref="BlobBuilder"/>
+ /// and take appropriate action.
+ /// </summary>
+ /// <remarks>
+ /// This method is called before the underlying buffers are swapped.
+ /// </remarks>
+ protected virtual void BeforeSwap(BlobBuilder other);
+ /// <summary>
+ /// Derived types can override this to control the allocation when <see cref="Capacity"> is
+ /// changed.
+ // </summary>
+ protected virtual void SetCapacity(int capacity);
+ protected void WriteBytes(ReadOnlySpan<byte> buffer);
}
public class MetadataBuilder
{
+ public MetadataBuilder(
+ int userStringHeapStartOffset,
+ int stringHeapStartOffset,
+ int blobHeapStartOffset,
+ int guidHeapStartOffset,
+ Func<int, BlobBuilder>? createBlobBuilderFunc);
}
public class DebugDirectoryBuilder
{
+ public DebugDirectoryBuilder(BlobBuilder blobBuilder);
}
public class ManagedPEBuilder
{
+ /// <summary>
+ /// Dervied types can override this to control how <see cref="BlobBuilder"> instances are
+ /// allocated during the emit pass. This allows consumers to pool <see cref="BlobBuilder">
+ /// instances more effectively.
+ /// </summary>
+ protected virtual BlobBuilder CreateBlobBuilder(int? minimumSize = null);
}
API Usage
Can see a full implementation of a PooledBlobBuilder. That branch contains the other changes necessary to use this new API.
Alternative Designs
One alternative design is to limit the ability to control the underlying byte[] allocation and have consumers focus on pooling BlobBuilders only. That will provide some benefit but it is inefficient. It means that a large number of byte[] are unused in the pooled BlobBuilder instances and hence other parts of the program end up allocating them instead.
Risks
There are a few risks to consider:
- Other teams besides Roslyn can provide derived instances of
BlobBuilder,ManagedPEBuilder, etc ... These changes are careful to ensure that those consumers are not impacted by these changes. The behavior of the existing code only changes when the new hooks are used in derived types. - That the changes don't fully hook all the places
BlobBuildersare allocated. That would meanLinkSuffix / LinkPrefixare called with differing types thus limiting potential gains. In my local tests I hookedBeforeSwapsuch that it fails when linked with different types. Was able to successfully rebuild Roslyn with these changes so I'm confident these hooks are thorough. - Taking advantage of
BlobBuilder.MaxChunkSizedoes significantly increase the number of allocatedBlobBuilderduring emit. That will require changes to pooling strategies if leveraged. However the new APIs give consumers the flexibility to pursue several strategies here.