-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
When struct copies are transformed into block copies it can lead to store forwarding stalls if the block copy involves padding that was never written. #96524 and #100750 (comment) are examples that show some of the potential cost of these stalls. #99835 (comment) has some discussion as well.
It's possible to generate these struct copies without accessing any padding, but it is at the expense of larger code that is probably slower if the source was also written as a block op, so it is not clear what the right trade off is.
I am also not sure how good the CPUs are at reconstructing the source from several stores. For example, if we wrote both Span<T>._reference and Span<T>._length as 8 bytes, would a 16-byte SIMD read still stall? If it doesn't then perhaps we could alleviate some issues by cheaply extending some stores to cover padding as well.
cc @dotnet/jit-contrib @stephentoub