- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.1k
Make CheckUnused not slow. #20321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CheckUnused not slow. #20321
Conversation
933d8fa    to
    9ad7463      
    Compare
  
    It is used for every single tree in `CheckUnused`, so this is worth it.
It is not efficient when the results are always used exactly once.
`Tree`s have structural equality. Even if `==` should be able to exit quickly either because of `eq` or an early difference, sets systematically call `hashCode`, which is going to recurse into the entire structure.
It is pointless to sort a list before converting it into a Set.
9ad7463    to
    6587ab4      
    Compare
  
    | Sure, I will review this today. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactor looks good to me. I added some minor suggestions that might make the code easier to read.
Sorry for taking so long with the review, the end of last week was a public holiday in Poland.
| val selector = selData.selector | ||
|  | ||
| if !selector.isWildcard then | ||
| if !altName.forall(explicitName => selector.rename == explicitName.toTermName) then | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be changed to altName.exists(explicitName => selector.rename != explicitName.toTermName)? (It would read like the comment then)
| // If the symbol is accessible in this scope without an import, do not register it for unused import analysis | ||
| val notForImport1 = | ||
| notForImport | ||
| || (!name.exists(_.toTermName != sym.name.toTermName) && sym.isAccessibleAsIdent) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to isInImport: can this first part be name.forall(_.toTermName == sym.name.toTermName)? (Less negation should make it simpler)
| if !isConstructorOfSynth(sym) && !doNotRegister(sym) then | ||
| if sym.isConstructor && sym.exists then | ||
| registerUsed(sym.owner, None) // constructor are "implicitly" imported with the class | ||
| def registerUsed(sym: Symbol, name: Option[Name], notForImport: Boolean = false, isDerived: Boolean = false)(using Context): Unit = | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the logic harder to follow because of the not in notForImport, can we flip the value of this flag to make it mean "possibly form an import"?
Instead of dealing with entire `tpd.Import`s at the end of the scope, we eagerly flatten them into individual `ImportSelector`s. We store them along with some data, including a mutable flag for whether a selector has been used. This allows to dramatically simplify `isInImport`, as well as more aggressively cache the resolution of selectors. We also get rid of the `IdentityHashMap`. The algorithm is still O(n*m) where n is the number of imports in a scope, and m the number of references found in that scope. It is not entirely clear to me whether the previous logic was already O(n*m) or worse (it may have included an additional p factor for the number of possible selections from a given qualifier). Regardless, it is already quite a bit faster than before, thanks to smaller constant factors.
That test does not rely on any information dependent on the import selectors. It only relies on information at the use site. Eagerly checking it means we do not put as many symbols into the `usedInScope` set, which is good because it is one of the complexity factors of the unused-imports analysis.
6587ab4    to
    8553bfc      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, glad to see the logic can be simplified significantly.
| @sjrd thank you very much for your work on this. You're going to make a lot of people happy with this! Out of curiosity, do you happen to know what release will these changes be included in? | 
| 3.5.0-RC1 | 
Backports #20321 to the LTS branch. PR submitted by the release tooling. [skip ci]
It doesn't mean that it's fast yet, but it is already a significant step in that direction. In particular, this goes in the direction of addressing #19671.
The most important commit is "Simplify the logic for checking unused imports.", whose commit message follows:
Instead of dealing with entire
tpd.Imports at the end of the scope, we eagerly flatten them into individualImportSelectors. We store them along with some data, including a mutable flag for whether a selector has been used.This allows to dramatically simplify
isInImport, as well as more aggressively cache the resolution of selectors. We also get rid of theIdentityHashMap.The algorithm is still
O(n*m)where n is the number of imports in a scope, and m the number of references found in that scope. It is not entirely clear to me whether the previous logic was alreadyO(n*m)or worse (it may have included an additionalpfactor for the number of possible selections from a given qualifier).Regardless, it is already quite a bit faster than before, thanks to smaller constant factors.