Skip to content

Conversation

@huoyaoyuan
Copy link
Member

Replace with standard or shared ones.

Please confirm their behavior and performance characteristics are desired.

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-VM-coreclr labels Dec 16, 2023
size_t destLen = minipal_get_length_utf8_to_utf16(utf8str, sourceLen, 0);

HRESULT hr = FString::Utf8_Unicode_Length(utf8str, & allAscii, & length);
LPWSTR buffer = (LPWSTR) AllocThrows((length + 1) * sizeof(WCHAR));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be LPSTR (UTF8).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unchanged (with whitespace off).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length -> destLen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LPWSTR buffer = (LPWSTR) AllocThrows((length + 1) * sizeof(WCHAR));
CHAR16_T* buffer = (CHAR16_T*) AllocThrows((length + 1) * sizeof(CHAR16_T));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dn-u16.h uses WCHAR while minipal/utf-8 uses CHAR16_T. They are not treated as compatible although both typedef-d from wchar_t. Is any reason that compiler supported char16_t can't be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some platforms do not support char16_t, which is why we went with the least supported type to handle coreclr and mono simultaneously.

#ifdef TARGET_WINDOWS
typedef wchar_t CHAR16_T;
#else
typedef unsigned short CHAR16_T;
#endif

During this refactoring, it's better to directly allocate the target type; CHAR16_T for UTF16 or char for UTF8 to avoid further confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some platforms do not support char16_t,

I was asking for the exact platforms and GCC/Clang versions we need to target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS doesn't support char16_t so you can cross the whole IsApplePlatform series off the list. Our least supported versions support C11 and hence char16_t on Linux, so only OS headers are relevant here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS doesn't support char16_t

Asked a friend to confirm that char16_t is supported with latest build tool on macOS. The mentions of macOS doesn't support char16_t are quite old. On cppreference, char16_t is marked as supported by Apple clang. There's also documentation about CChar16 in Swift doc.

Can you confirm again about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in C mode, we had some problems. I don't remember the details offhand. This is the quick way to find out what doesn't work in the CI:

 #ifdef TARGET_WINDOWS
 typedef wchar_t CHAR16_T;
 #else
- typedef unsigned short CHAR16_T;
+ typedef char16_t CHAR16_T; 
 #endif 

(note that we'd still need CHAR16_T for Windows, so perhaps it's better to keep this investigation separate from this PR)

@am11
Copy link
Member

am11 commented Dec 16, 2023

Thanks for starting this work. Ideally we should remove the intermediate wrappers for string conversion and directly use minipal/utf8.h in coreclr/mono/corehost, including windows targets; but it will take a while to get there. (I started the grand change in a branch, but realized that it's best to do in batches to ease up the review like you are doing here)

Co-authored-by: Adeel Mujahid <[email protected]>
@huoyaoyuan
Copy link
Member Author

Now tests are passing. I need some assist about link failure on macos.

@am11
Copy link
Member

am11 commented Dec 26, 2023

I need some assist about link failure on macos.

Try main...am11:runtime:patch_gh-96099.

@huoyaoyuan
Copy link
Member Author

Hmm it is linked in Unix pal, but not minipal. Will it cause duplicated symbols on Linux?

@huoyaoyuan
Copy link
Member Author

Since the buffer will always be sufficient, the resultant error code could only be NO_UNICODE_TRANSLATION.
I assume using errno is desired since it's in latest minipal implementation.

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@am11 anything else?

Copy link
Member

@am11 am11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AaronRobinsonMSFT AaronRobinsonMSFT merged commit bc81c55 into dotnet:main Jan 4, 2024
@huoyaoyuan huoyaoyuan deleted the str-op branch January 5, 2024 02:12
@janvorli janvorli mentioned this pull request Jan 19, 2024
tommcdon added a commit to tommcdon/runtime that referenced this pull request Jan 21, 2024
@tommcdon tommcdon mentioned this pull request Jan 21, 2024
tommcdon added a commit that referenced this pull request Jan 21, 2024
* Revert "Forward minipal_get_length_utf16_to_utf8 and minipal_convert_utf16_to_utf8 from DAC (#97055)"

This reverts commit 1263107.

* Revert "Cleanup some string operating functions (#96099)"

This reverts commit bc81c55.
tmds pushed a commit to tmds/runtime that referenced this pull request Jan 23, 2024
* Revert "Forward minipal_get_length_utf16_to_utf8 and minipal_convert_utf16_to_utf8 from DAC (dotnet#97055)"

This reverts commit 1263107.

* Revert "Cleanup some string operating functions (dotnet#96099)"

This reverts commit bc81c55.
@github-actions github-actions bot locked and limited conversation to collaborators Feb 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-VM-coreclr community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants