-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: Cache trait solving across queries in the same revision #20527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Cache trait solving across queries in the same revision #20527
Conversation
|
Again, I'm +1 on this as this would improve performance without too much regression on memory spikes in typical usecase scenarios unless opening all the files across the project without editing anything |
|
The test failure is because different DBs get allocated the exact same address... I wonder what's best to solve this. Salsa uses a per-db nonce but it's not exposed. We could make our own but it's not nice... |
Caching trait solving can do a lot to speed. Unfortunately it also consume a huge amount of memory. Therefore, as part of the migration to the new solver Jack Huey disabled caching of trait solving (he made the query transparent). The PR proposes a middle ground: do cache trait solving, but only for the same revision. This allows us to be safe because during a revision the inputs cannot change. The result is hopefully much better performance to features that tend to do a bulk of trait solving, and also repeat the same query (e.g. inference then IDE features). There is another limitation: results are only cached in the same thread, to remove the need for synchronization which will be expensive. More measurements are required to check whether it's better to use a synchronized global cache, or maybe stay with a thread-local cache but batch multiple feature requests (highlighting, inlay hints etc.) of the same file to the same thread. Alongside the actual cache we store the revision, because we need to verify it (we can't eagerly clear caches when incrementing the revision), and also the address of the db to prevent multiple dbs from interleaving (this is mostly relevant in tests, although injected highlighting also uses a new db, therefore maybe it's better to move it to a separate thread). This "games" analysis-stats to both be way faster and use way more memory; the former is because analysis-stats doesn't increment revisions, therefore all queries share the cache and hit ratio is way too good, the latter is because analysis-stats doesn't increment revisions and therefore the cache isn't cleared. Both are not representative of a typical IDE scenario.
79803a7 to
1c8a07c
Compare
|
I went ahead and solved it by making our own nonce. Not pretty, but works. |
|
@rust-lang/rust-analyzer what is your opinion on this? @ShoyuVanilla already said they want this. |
|
Yes, I've tried working with next-gen solver r-a and it's quite unworkable due to so much hangs. Though I haven't tried this PR's branch yet - will do tomorrow 😅 - but I strongly agree with that we need some action like this. Edit) I think the best is to find out what's going on borrowck like in this comment #20503 (comment) |
|
@rust-lang/rust-analyzer This is a second call. If I don't hear any objections, I'll merge this based on the approval from @ShoyuVanilla and because this renders r-a essentially unusable. |
|
I'm okay with doing this for now; we can revisit other caching strategies later on, need be. |
|
@davidbarsky FWIW, I don't think we could find a fundamentally better caching strategy. I see some possible strategies:
However, I do think it's possible we can optimize more within this strategy, for example by batching requests to the same file to the same thread as I said above. |
|
This feels like a band-aid (especially the thread-local caches), but I think we need. We'll probably have to find a better way to evaluate the memory usage, though. |
I agree it's kinda band-aid, but it's not that bad (just a few loc) and the gains are huge.
We can clear the solver cache before measuring the memory, just like we bump the revision to trigger LRU. |
ShoyuVanilla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are no objections, I’ll go ahead and merge this. Given that we’re pausing stable releases, I think we should try out many experiments to improve rust-analyzer before we resume.
Caching trait solving can do a lot to speed. Unfortunately it also consume a huge amount of memory. Therefore, as part of the migration to the new solver Jack Huey disabled caching of trait solving (he made the query transparent).
The PR proposes a middle ground: do cache trait solving, but only for the same revision. This allows us to be safe because during a revision the inputs cannot change.
The result is hopefully much better performance to features that tend to do a bulk of trait solving, and also repeat the same query (e.g. inference then IDE features).
There is another limitation: results are only cached in the same thread, to remove the need for synchronization which will be expensive. More measurements are required to check whether it's better to use a synchronized global cache, or maybe stay with a thread-local cache but batch multiple feature requests (highlighting, inlay hints etc.) of the same file to the same thread.
Alongside the actual cache we store the revision, because we need to verify it (we can't eagerly clear caches when incrementing the revision), and also the address of the db to prevent multiple dbs from interleaving (this is mostly relevant in tests, although injected highlighting also uses a new db, therefore maybe it's better to move it to a separate thread).
This "games" analysis-stats to both be way faster and use way more memory; the former is because analysis-stats doesn't increment revisions, therefore all queries share the cache and hit ratio is way too good, the latter is because analysis-stats doesn't increment revisions and therefore the cache isn't cleared. Both are not representative of a typical IDE scenario.