-
Notifications
You must be signed in to change notification settings - Fork 13.9k
interpret: Unify projections for MPlaceTy, PlaceTy, OpTy #114011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The Miri subtree was changed cc @rust-lang/miri Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri |
src/tools/miri/src/helpers.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is unfortunate, it makes aggregate traversal n^2 in the number of fields. But at least this only affects SB (and I doubt it is a limiting factor there). I didn't find a way to avoid this without introducing overhead for the common case (where we don't care about the order). The only nice way I came up with required return-position impl trait in a trait.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't entirely understand why we hooked visit_aggregate for these optimizations, they seem entirely implementable by hooking visit_value...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wild, took me quite a while to debug. It is triggered in tests/ui/const_prop/issue-86351.rs.
c996252 to
f3f4cf4
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
|
⌛ Trying commit f3f4cf470cd07fb0c53a261205edd3665d230038 with merge 5bc60a1ed7d987a7881444c4e80ed9ed697acd95... |
This comment has been minimized.
This comment has been minimized.
f3f4cf4 to
7338980
Compare
This comment has been minimized.
This comment has been minimized.
7338980 to
2c296dc
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
|
⌛ Trying commit 2c296dcc079483d1cbc5845068f75b5645f16a72 with merge a367073b0b085658182079202c18c2b339640bbf... |
This comment has been minimized.
This comment has been minimized.
2c296dc to
a0631dc
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This failed somewhere during bootstrap. Sadly I don't know how to reproduce it.
This comment has been minimized.
This comment has been minimized.
|
Some changes occurred in src/tools/clippy cc @rust-lang/clippy |
|
@bors try |
|
The command can be found in the details of every perf run: in https://perf.rust-lang.org/detailed-query.html?commit=945ab669165d66b3ab6088bece01844e36f6a029&base_commit=fc8a3e357a0a5e317132e5ff8858ec70970fb07a&benchmark=ctfe-stress-5-opt&scenario=full it says "Local Profile (diff)", that command can then be run within the https://github.com/rust-lang/rustc-perf repo (after building it in release mode) |
This comment has been minimized.
This comment has been minimized.
8187dc4 to
d042263
Compare
|
Okay, thanks. I can't promise when I will have the time to take a look at that. |
|
Here's the entire cachegrind diff for the perf run I linked above: so not very helpful I guess. Optimizer noise? Or actually new work being done? |
|
Thanks! eval_place became more expensive? That is very strange. I would have expected it to become cheaper since it doesn't need to write_immediate_no_validate is now doing an extra case distinction (but that's just a discriminant read so should be cheap) and it is doing the force_allocation that previously were done during projection, so that one becoming more expensive is somewhat expected. It should just be balanced by projection becoming cheaper... |
d042263 to
d127600
Compare
|
@bors r+ Perf is mixed, and I'd rather have the CTFE stress test regress a bit while generally having improvements, and massively improving the CTFE logic. |
|
☀️ Test successful - checks-actions |
|
Finished benchmarking commit (4fc6b33): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 651.438s -> 650.083s (-0.21%) |
|
Given that we have a bunch of improvements, in particular some on primary benchmarks, and given the code quality improvements, I am totally fine with accepting this perf loss on the ctfe stress test. |
For ~forever, we didn't really have proper shared code for handling projections into those three types. This is mostly because
PlaceTyprojections require&mut self: they might have toforce_allocateto be able to represent a project part-way into a local.This PR finally fixes that, by enhancing
Place::Localwith anoffsetso that such an optimized place can point into a part of a place without having requiring an in-memory representation. If we later write to that place, we will still doforce_allocate-- for now we don't have an optimized path inwrite_immediatethat would avoid allocation for partial overwrites of immediately stored locals. But inwrite_immediatewe have&mut selfso at least this no longer pollutes all our type signatures.(Ironically, I seem to distantly remember that many years ago,
Place::Localdid have anoffset, and I removed it to simplify things. I guess I didn't realize why it was so useful... I am also not sure if this was actually used to achieve place projection on&selfback then.)The
offsethad typeOption<Size>, whereNonerepresent "no projection was applied". This is needed because locals can be unsized (when they are arguments) butPlace::Localcannot store metadata: if the offset isNone, this refers to the entire local, so we can use the metadata of the local itself (which must be indirect); if a projection gets applied, since the local is indirect, it will turn into aPlace::Ptr. (Note that even for indirect locals we can havePlace::Local: when the local appears in MIR, we always start withPlace::Local, and only checkframe.localslater. We could eagerly normalize toPlace::Ptrbut I don't think that would actually simplify things much.)Having done all that, we can finally properly abstract projections: we have a new
Projectabletrait that has the basic methods required for projecting, and then all projection methods are implemented for anything that implements that trait. We can even implement it forImmTy! (Not that we need that, but it seems neat.) The visitor can be greatly simplified; it doesn't need its own trait any more but it can use theProjectabletrait. We also don't need the separateMutvisitor any more; that was required only to reflect that projections onPlaceTyneeded&mut self.It is possible that there are some more
&mut selfthat can now become&self... I guess we'll notice that over time.r? @oli-obk