Skip to content

Generate strings instead of Roslyn SyntaxNodes in interop source generators #95882

@jkoritzinsky

Description

@jkoritzinsky

Since the start of the interop source generators, we've always decided to use Roslyn SyntaxNodes as both our source-of-truth as well as our go-to emit strategy. This choice has provided us a number of benefits:

  • C# constructs are represented the same way as the C# compiler represents them.
  • We can easily define an API surface that specifies "this API returns a name/expression/statement" without building our own type hierarchy.
  • We can copy tokens from the source of truth to our emitted code.
  • We never try emitting code that is only understood by Roslyn compilers newer than we can safely confirm we'll run against as the APIs don't exist.
  • We could reuse the code in a code-fix to inline the source-generated code.

However, it has also lead to a number of problems:

  • The SyntaxFactory APIs are very verbose. We've basically had to add comments above each block to explain what the code is doing.
  • NormalizeWhitespace (what we use to handle adding whitespace) is extremely slow.
  • We have at times accidentally added additional GC roots to the SyntaxNodes in the source SyntaxTrees due to how we structure some of our internal data types.
  • The source generator API surface requires us to pass strings to Roslyn, so we need to go from SyntaxNode to string for Roslyn to parse anyway.
  • We have a few places where we produce invalid syntax nodes because writing out the truly accurate node is a pain (in particular type names).
  • We never did the code-fix reuse idea and I doubt we'll ever do so (and we can't without additional work due to the above item).

Additionally, the Roslyn team recommends using a string-based approach.

Finally, @Sergio0694 updated ComputeSharp to use an "indented text writer" approach and saw a ~460x performance improvement in CPU time and a ~36x Memory footprint improvement. Even if we only get a fraction of that improvement due to our desire to have an extensibility model, doing the work to rewrite the APIs seems worthwhile.

Additionally, with that amount of improvement, we may also see build-time improvements for assemblies with lots of LibraryImports, such as CoreLib.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions