-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Milestone
Description
For p/invokes, the character set is encoded into the metadata for the method. As a result, adding anything, like UTF-8, is complex and far-reaching. The current experience (ANSI means UTF-8 on Unix) is odd and confusing. The p/invoke source generator should be used to improve this experience.
We’d like to:
- Avoid proliferating the pattern of ‘use
CharSet.Ansion Unix to get UTF-8 - Allow specifying the character set to use for all parameters a method
- Instead of needing
MarshalAson each parameter
- Instead of needing
- Avoid adding to the
CharSetenumeration- Don’t want inconsistent support and don’t want to implement new support in all the places that currently use it
Our current thinking is to:
- Remove
CharSetfield - Add
MarshalStringsUsingfield -Type- Should be a type that could be used with
MarshalUsing/NativeMarshallingattributes for custom marshalling of strings - New APIs can be added for common marshallers (which can use things like
System.Text.Encodingunder the hood)
- Should be a type that could be used with
Example:
// UTF-8 - equivalent to explicitly specifying [MarshalAs(UnmanagedType.LPUTF8Str)] on string parameters
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(System.Runtime.InteropServices.Encoding.Utf8StringMarshalling))]
public static partial int Method(string s);
// UTF-16 - equivalent to CharSet.Unicode behaviour in built-in
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(System.Runtime.InteropServices.Encoding.Utf16StringMarshalling))]
public static partial int Method(string s);
// Error - invalid encoding
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(int))]
public static partial int Method(string s);
// User-defined marshalling
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(MyCustomMarshal.Wtf8String))]
public static partial int Method(string s);Where:
// .NET can provide:
namespace System.Runtime.InteropServices.Encoding
{
// UTF-16 with endianness based on the current platform
struct Utf16StringMarshalling { ... }
// UTF-8
struct Utf8StringMarshalling { ... }
// ANSI
[SupportedOSPlatform("windows")]
struct AnsiStringMarshalling { ... }
...
}
// User can define:
namespace MyCustomMarshal
{
struct Wtf8String { ... }
}Other considerations:
- Naming:
UnicodevsUtf16- .NET has usually used the (Windows-centric) term Unicode to refer to UTF-16. Naming the struct
...Utf16StringMarshallingwould be correct and in line with our cross-platform focus, butUnicodeStringMarshallingwould be more consistent with existing APIs.
- .NET has usually used the (Windows-centric) term Unicode to refer to UTF-16. Naming the struct
- Auto (UTF-8 on Unix, UTF-16 on Windows);
- We expect usage to be low. If necessary, users can define different p/invokes and call the desired one conditionally (for example, using the
OperatingSystemAPIs)
- We expect usage to be low. If necessary, users can define different p/invokes and call the desired one conditionally (for example, using the
- Defaults:
- The source generator requires specifying marshalling information for string/char.
- Requires the intention to be made clear and removes hidden assumptions, but can make declarations more verbose
- The source generator does not check / reconcile higher level settings like
DefaultCharSetAttribute.
- The source generator requires specifying marshalling information for string/char.
ExactSpelling: usesCharSetto probe for entry point on Windows, doesn’t mean anything on Unix- The source generator could require exact spelling for entry point names
- Would be in the spirit of avoiding propagating some of the Windows-centric aspects of DllImport
- The source generator could require exact spelling for entry point names
AaronRobinsonMSFT