-
Notifications
You must be signed in to change notification settings - Fork 14k
Create UTF-8 version of OsStr/OsString
#147932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
r? @ibraheemdev rustbot has assigned @ibraheemdev. Use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me, just a few small suggestions.
r? tgross35
@rustbot author
|
Reminder, once the PR becomes ready for a review, use |
087647c to
7e2b76e
Compare
|
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
Unless WASI and/or Redox OS platform maintainers speak up here, I am creating separate PRs for those platforms for more focused attention. |
|
Thanks! @bors r+ rollup |
Rollup of 7 pull requests Successful merges: - #141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - #142339 (Add NonNull pattern types) - #147768 (Code refactoring on hir report_no_match_method_error) - #147788 (const Cell methods) - #147932 (Create UTF-8 version of `OsStr`/`OsString`) - #147933 (os_str: Make platform docs more consistent) - #147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 7 pull requests Successful merges: - rust-lang/rust#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang/rust#142339 (Add NonNull pattern types) - rust-lang/rust#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang/rust#147788 (const Cell methods) - rust-lang/rust#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang/rust#147933 (os_str: Make platform docs more consistent) - rust-lang/rust#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Create UTF-8 version of `OsStr`/`OsString` Implement a UTF-8 version of `OsStr`/`OsString`, in addition to the existing bytes and WTF-8 platform-dependent encodings. This is applicable for several platforms, but I've currently only implemented it for Motor OS: - WASI uses Unicode paths, but currently reexports the Unix bytes-assuming `OsStrExt`/`OsStringExt` traits. - [wasi:filesystem](https://wa.dev/wasi:filesystem) APIs: > Paths are passed as interface-type `strings`, meaning they must consist of a sequence of Unicode Scalar Values (USVs). Some filesystems may contain paths which are not accessible by this API. - In [wasi-filesystem#17](WebAssembly/wasi-filesystem#17 (comment)), it was decided that applications can use any Unicode transformation format, so we're free to use UTF-8 (and probably already do). This was chosen over specifically UTF-8 or an ad hoc encoding which preserves paths not representable in UTF-8. > The current API uses strings for filesystem paths, which contains sequences of Unicode scalar values (USVs), which applications can work with using strings encoded in UTF-8, UTF-16, or other Unicode encodings. > > This does mean that the API is unable to open files which do not have well-formed Unicode encodings, which may want separate APIs for handling such paths or may want something like the arf-strings proposal, but if we need that we should file a new issue for it. - As of Redox OS [0.7.0](https://www.redox-os.org/news/release-0.7.0/), "All paths are now required to be UTF-8, and the kernel enforces this". This appears to have been implemented in commit [d331f72f](https://gitlab.redox-os.org/redox-os/kernel/-/commit/d331f72f2a51fa577072f24bc2587829fd87368b) (Use UTF-8 for all paths, 2021-02-14). Redox does not have `OsStrExt`/`OsStringExt`. - Motor OS guarantees that its OS strings are UTF-8 in its [current `OsStrExt`/`OsStringExt` traits](https://github.com/moturus/rust/blob/a828ffcf5f04be5cdd91b5fad608102eabc17ec7/library/std/src/os/motor/ffi.rs), but they're still internally bytes like Unix. This is an alternate approach to rust-lang#147797, which reuses the existing bytes `OsString` and relies on the safety properties of `from_encoded_bytes_unchecked`. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity. Note that Motor OS currently does not build until rust-lang#147930 merges. cc `@tgross35` (for earlier review) cc `@alexcrichton,` `@rylev,` `@loganek` (for WASI) cc `@lasiotus` (for Motor OS) cc `@jackpot51` (for Redox OS)
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang#142339 (Add NonNull pattern types) - rust-lang#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang#147788 (const Cell methods) - rust-lang#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang#147933 (os_str: Make platform docs more consistent) - rust-lang#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Create UTF-8 version of `OsStr`/`OsString` Implement a UTF-8 version of `OsStr`/`OsString`, in addition to the existing bytes and WTF-8 platform-dependent encodings. This is applicable for several platforms, but I've currently only implemented it for Motor OS: - WASI uses Unicode paths, but currently reexports the Unix bytes-assuming `OsStrExt`/`OsStringExt` traits. - [wasi:filesystem](https://wa.dev/wasi:filesystem) APIs: > Paths are passed as interface-type `strings`, meaning they must consist of a sequence of Unicode Scalar Values (USVs). Some filesystems may contain paths which are not accessible by this API. - In [wasi-filesystem#17](WebAssembly/wasi-filesystem#17 (comment)), it was decided that applications can use any Unicode transformation format, so we're free to use UTF-8 (and probably already do). This was chosen over specifically UTF-8 or an ad hoc encoding which preserves paths not representable in UTF-8. > The current API uses strings for filesystem paths, which contains sequences of Unicode scalar values (USVs), which applications can work with using strings encoded in UTF-8, UTF-16, or other Unicode encodings. > > This does mean that the API is unable to open files which do not have well-formed Unicode encodings, which may want separate APIs for handling such paths or may want something like the arf-strings proposal, but if we need that we should file a new issue for it. - As of Redox OS [0.7.0](https://www.redox-os.org/news/release-0.7.0/), "All paths are now required to be UTF-8, and the kernel enforces this". This appears to have been implemented in commit [d331f72f](https://gitlab.redox-os.org/redox-os/kernel/-/commit/d331f72f2a51fa577072f24bc2587829fd87368b) (Use UTF-8 for all paths, 2021-02-14). Redox does not have `OsStrExt`/`OsStringExt`. - Motor OS guarantees that its OS strings are UTF-8 in its [current `OsStrExt`/`OsStringExt` traits](https://github.com/moturus/rust/blob/a828ffcf5f04be5cdd91b5fad608102eabc17ec7/library/std/src/os/motor/ffi.rs), but they're still internally bytes like Unix. This is an alternate approach to rust-lang#147797, which reuses the existing bytes `OsString` and relies on the safety properties of `from_encoded_bytes_unchecked`. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity. Note that Motor OS currently does not build until rust-lang#147930 merges. cc `@tgross35` (for earlier review) cc `@alexcrichton,` `@rylev,` `@loganek` (for WASI) cc `@lasiotus` (for Motor OS) cc `@jackpot51` (for Redox OS)
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#141445 (Add `FromIterator` impls for `ascii::Char`s to `String`s) - rust-lang#142339 (Add NonNull pattern types) - rust-lang#147768 (Code refactoring on hir report_no_match_method_error) - rust-lang#147788 (const Cell methods) - rust-lang#147932 (Create UTF-8 version of `OsStr`/`OsString`) - rust-lang#147933 (os_str: Make platform docs more consistent) - rust-lang#147948 (PassWrapper: Access GlobalValueSummaryInfo::SummaryList via getter for LLVM 22+) r? `@ghost` `@rustbot` modify labels: rollup
Implement a UTF-8 version of
OsStr/OsString, in addition to the existing bytes and WTF-8 platform-dependent encodings.This is applicable for several platforms, but I've currently only implemented it for Motor OS:
OsStrExt/OsStringExttraits.OsStrExt/OsStringExt.OsStrExt/OsStringExttraits, but they're still internally bytes like Unix.This is an alternate approach to #147797, which reuses the existing bytes
OsStringand relies on the safety properties offrom_encoded_bytes_unchecked. Compared to that, this also gains efficiency from propagating the UTF-8 invariant to the whole type, as it never needs to test for UTF-8 validity.Note that Motor OS currently does not build until #147930 merges.
cc @tgross35 (for earlier review)
cc @alexcrichton, @rylev, @loganek (for WASI)
cc @lasiotus (for Motor OS)
cc @jackpot51 (for Redox OS)