-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Normalize ident #66670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize ident #66670
Conversation
r? @davidtwco (rust_highfive has picked a reviewer for you, use r? to override) |
r? @estebank |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
cc @Manishearth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from a unicode standpoint
hopefully there are no other places where we create Symbols?
cc @petrochenkov re. Symbols |
912227a
to
0e3036b
Compare
At least there's one in the |
@Manishearth shouldn't we have a warning when encountering non-NFC chars? Shouldn't we, for the sake of third party tools, attempt to "soft-enforce" byte comparable tokens? These questions might are possibly unrelated to this PR in particular and I should have brought up this point in the mega-thread. |
The PR LGTM, but I can foresee issues cropping up when involving macros. |
@estebank no, non-NFC chars appear for all kinds of reasons |
@bors try @rust-timer queue |
Awaiting bors try build completion |
Normalize ident Perform unicode normalization on identifiers. Resolving the first bullet point in #55467.
☀️ Try build successful - checks-azure |
Queued ce0c4f6 with parent a44774c, future comparison URL. |
Mmm, is the timer task still queued? |
@bors try @rust-timer queue |
@crlf0710: 🔑 Insufficient privileges: not in try users |
Insufficient permissions to issue commands to rust-timer. |
@bors try @rust-timer queue |
Awaiting bors try build completion |
🔒 Merge conflict This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again. How do I rebase?Assuming
You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial. Please avoid the "Resolve conflicts" button on GitHub. It uses Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Error message
|
49f3bc9
to
27e7a1b
Compare
Rebased and added dependency according to @Mark-Simulacrum 's instruction. @estebank |
@bors r+ |
📌 Commit 27e7a1b has been approved by |
Normalize ident Perform unicode normalization on identifiers. Resolving the first bullet point in #55467.
☀️ Test successful - checks-azure |
I have two questions / concerns:
|
Following the RFC, since we're using NFC normalization, I think the answer to the first question is "no it's not possible". NFC normalization is basically reordering and regrouping, so nothing really disappears. |
For the second question, i think that's a question for a potential "shepherd" for RFC 2457, currently we don't have one, but it would be best if there is one. cc #55467 In my personal opinion though, i think it's not too wrong to gate at the AST after macro expansion, since macros and procedure macros can generate new identifiers, and need to be gated too. I basically know nothing about the |
And by the way i believe there're some remaining work on performing this normalization to the user data sent to
I think I need some mentor here. @petrochenkov could you give some instructions when you have time? I tried to read the code in |
@crlf0710 It's emitted twice in So, the normalization (and also gating, arguably) needs to happen in the same places. |
@petrochenkov do you mean https://github.com/rust-lang/rust/blob/master/src/libsyntax_expand/proc_macro_server.rs#L330 ? I'm a little confused. After normalization, if the identifier changes, should i intern, get a new |
@crlf0710 |
Great, let me create a branch and PR it. |
By the way, the RFC says the normalization is performed during parsing rather than lexing. On the other hand, if the normalization is performed during lexing, that would be the first case in which we do not preserve original tokens by design (except for a big hack with doc comments in macros). |
…henkov Add symbol normalization for proc_macro_server. Follow up for rust-lang#66670, finishing the first bullet point in rust-lang#55467. r? @petrochenkov
Perform unicode normalization on identifiers. Resolving the first bullet point in #55467.