- 
                Notifications
    You must be signed in to change notification settings 
- Fork 471
Fix unhandled cases for exotic idents #6658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e875735    to
    b7269a6      
    Compare
  
    406746c    to
    c582cf4      
    Compare
  
    
      
        
              This comment was marked as resolved.
        
        
      
    
  This comment was marked as resolved.
| @zth @cristianoc can you review this? these changes have a quite broad impact on other modules and may need to be understood by other compiler devs. | 
| let buffer = Buffer.create 20 in | ||
| let startPos = position scanner in | ||
| let startOff = scanner.offset in | ||
| let closed = ref false in | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to implement with closed? It seems like I only need to wrap \" and " at the end of my existing implementation if it's not an infix operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a path to skip when a token seq like \\\"\n is entered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't \\\"\n catched by this existing line?
| '\n' | '\r' ->
      (* line break *)
      let endPos = position scanner in
      scanner.err ~startPos ~endPos
        (Diagnostics.message "A quoted identifier can't contain line breaks.");
      next scannerThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will possibly be unsafe to substring
It could obviously be replaced with unwrap_ident, but it has the same kind of assertion exists in it. I did it this way because it is simpler. There is no reason for the scanner to be more inefficient.
| I don't have bandwidth to review this right now, sorry for keeping you waiting. @cristianoc you got time to review this? | 
7f35700    to
    5d38f55      
    Compare
  
    Since infix operators are weird constraints, scanner shouldn't touch it. Therefore, the printer still needs to distinguish whether the given ident is infix-like or not.
| cc @cristianoc we need your eyes on this. | 
cbc01bd    to
    81f544b      
    Compare
  
    | No worry, I just requested his review in person | 
Duplicates rescript-lang#6658, but with much smaller changes As follows @cristianoc review, I tried to reduce possible affected surface to only exotic uident.
Duplicates rescript-lang#6658, but with much smaller changes Reflecting @cristianoc ' review, I tried to reduce possible affected surface to only exotic uident.
Duplicates rescript-lang#6658, but with much smaller changes Reflecting on @cristianoc ' review, I tried to reduce possible affected surface to only exotic uident.
| Close this in favor of #6777 But avoiding to use  | 
Duplicates #6658, but with much smaller changes Reflecting on @cristianoc ' review, I tried to reduce possible affected surface to only exotic uident.
Changes
Note this is an important feature for Gentype(#6196) and Library Mode(#6210), which are assumed to be used by users on the JS side.
Rationale
In OCaml AST, there are two separate identifier classes Lident(lowercase ident) and Uident(uppercase ident), depending on whether the first letter of the identifier is lowercase or uppercase.
This is useful internally to distinguish whether the identifier is for a module or a variable/type binding.
e.g. https://github.com/rescript-lang/rescript-compiler/blob/17d383f/jscomp/ml/path.ml#L87-L91
ReScript supports exotic labels for defining bindings starting with uppercase letters. The ReScript parser removes the surrounding wrapper when parsing exotic labels and forces it to be a Lident node.
So this produces
Lident(name = "Upper")node in OCaml AST.However, the OCaml type system uses the previously mentioned
is_uidentfor path analysis. As a result, exotic idents that start with an uppercase letter are unintentionally treated as modules internally. These are cases OCaml does not assume, and it ends up raising invariant errors deep in the runtime. (#6539)OCaml itself has no knowledge of exotic ident, so it cannot determine whether the original declaration was exotic or not. For compatibility reasons, the AST cannot be changed to support it.
In the compiler, knowledge of exotic idents existed only in the printer. Here again, since there is no information passed from the grammar, some heavy logic (e.g.
printIdentLike) is needed everywhere to recheck whether a label should be treated as exotic or not.This change solves the problem by introducing several assumptions.
When parsing exotic idents, the parser passes the surrounding chars as-is. So OCaml will treat it as a Lident too (because
'\' <> A .. Z), and we can easily recheck whether an ident name is treated as exotic or not by checking whether the first letter is the backslash.Extoic uident is explicitly not supported (e.g.
module \"lowercase") . There is no way to distinguish between uppercase lident and lowercase uident without breaking change on AST.It's not the parser's job to provide "pretty labels", so we'll have to do that elsewhere where we need it. For example, the printer normalizes names using the
ext_ident.unwrap_exoticutility where necessary.