StringUtilities based on vectorized helpers provided by core libraries

`StringUtilities` currently use a vectorized approach based on hw-intrinsics and doesn't include Arm.
In https://github.com/dotnet/aspnetcore/pull/44040 there's an attempt to use the xplat-intrinsics, but in the meantime https://github.com/dotnet/runtime/issues/28230 got done thus it would be the best option to base `StringUtilities` on these new APIs so that the custom vectorized code can go away (cf. https://github.com/dotnet/aspnetcore/pull/44040#issuecomment-1272062476).

With the ASCII-APIs it could look like https://github.com/dotnet/aspnetcore/compare/d2a1c23d90d4e69df47616b59b7215b3da8cc9f6...25c56209d2e3e8b9f8922b260aff5d2271f5d021, but there are some pieces missing. Copied from in https://github.com/dotnet/aspnetcore/pull/44040#issuecomment-1289517635:

## What's missing to achieve this?

### ASCII

In StringUtilities for ASCII values of the range `(0x00, 0x80)` are considered valid.
`Ascii.ToUtf16` treats the whole ASCII range `[0x00, 0x80)` as valid.

Thus something like 
```diff
namespace System.Buffers.Text
{
    public static class Ascii
    {
        // existing methods
-       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten);
+       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false);
    }
}
```
is needed.

### Latin1

I don't know how hot Latin1 is here, but as it's special cased in https://github.com/dotnet/aspnetcore/blob/d3259f92851e4772d3230177be5b71be20d3ff6d/src/Servers/Kestrel/Core/src/Internal/Infrastructure/HttpUtilities.cs#L140-L143 I think it's hot enough to be optimized. Besided that standard `Encoding.Latin1` can't be used solely, as 0x00 is considered invalid.

Thus basically the same as for ASCII above applies, i.e.

```diff
namespace System.Buffers.Text
{
+   public static class Latin1
+   {
+       // other methods similar as Ascii?
+       public static OperationStatus ToUtf16(ReadOnlySpan<byte> source, Span<char> destination, out int bytesConsumed, out int charsWritten, bool treatNullAsInvalid = false);
+   }
}
```

If the type `Latin1` seems too heavy, too niche, whatever, as alternative one could use something like https://github.com/dotnet/aspnetcore/compare/25c56209d2e3e8b9f8922b260aff5d2271f5d021...e3afae2dcf436dfe357b5a961f0184e51681325e where Latin1 bytes are expanded to UTF-16 via Asii.ToUtf16 and if non-ASCII is met, then the remainder is done in scalar way. Though this is a naive approach, which should be perf-tested -- I don't have numbers on how likeley Latin1 inputs with ranges `[0x80, 0xFF]` are, but if they are rare then that (simple) approach should be good enough.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StringUtilities based on vectorized helpers provided by core libraries #45962

What's missing to achieve this?

ASCII

Latin1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if (ReferenceEquals(encoding, Encoding.Latin1))
	{
	return span.GetLatin1StringNonNullCharacters();
	}

StringUtilities based on vectorized helpers provided by core libraries #45962

Description

What's missing to achieve this?

ASCII

Latin1

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions