-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Optimize FMA codegen base on the overwritten #58196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis is for #12984. @kunalspathak @tannergooding, thanks!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. Did you run the superpmi asmdiff?
No, I haven't. What is this for? |
Pretty much all Jit changes are run through diffs (and SPMI is probably the most convenient tool for getting them), so that we can asses the impact on the generated code and how much existing test coverage we have. |
@SingleAccretion, I ran the SuperPMI.py with asmdiffs and saw quite some errors, most of which are from "JIT.HardwareIntrinsics.Arm.Helpers:FPRSqrtStepFused(float,float):float" or other similar tests. How can I find these tests to debug? And, it's very confusing that my change is suppose to only work for xarch, why Arm tests are complaining. |
@weilinwa One of the nicest things with SPMI is that it makes debugging easy. When you encountered errors (I presume asserts), the tool should've printed a "reproduction command", with the path to the native SPMI executable and a list of
I am not sure why that is either. |
@SingleAccretion , I got the "Error: no baseline JIT found" when run the |
Yes. I believe we only have prebuilt Jits for the |
Correct way to use this is: This will do |
|
@kunalspathak @tannergooding, I've modified the code logic to check different Asm diffsbenchmarks.run.windows.x64.checked.mch:Detail diffscoreclr_tests.pmi.windows.x64.checked.mch:Detail diffslibraries.pmi.windows.x64.checked.mch:Detail diffslibraries_tests.pmi.windows.x64.checked.mch:Detail diffs |
|
@tannergooding, I have a question about In instructions for FMA of scalar values like My questions is, do we need to ensure |
@weilinwa, that's a great question. The TL;DR; is that yes we do need to ensure Normally we provide two versions of the scalar function where this matters, such as: When we do this, the upper bits come from For We'd need to expose |
|
|
||
| srcCount += 1; | ||
| srcCount += BuildDelayFreeUses(emitOp2, emitOp1); | ||
| srcCount += emitOp3->isContained() ? BuildOperandUses(emitOp3) : BuildDelayFreeUses(emitOp3, emitOp1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot smaller and easier to follow now 🎉
src/coreclr/jit/lsraxarch.cpp
Outdated
| if (containedOpNum == 1 && !copiesUpperBits) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the !copiesUpperBits check needed? If we are copiesUpperBits then containedOpNum shouldn't be 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be true. Do we need to add an assert before to ensure that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you already have that assert a few lines up: https://github.com/dotnet/runtime/pull/58196/files/5ca658ebf53beee4d84446236f9391fba9e3e935#diff-9626112837daf480c93d401a587012b3e398dd90a953c870797d472cff36839dR2342
assert(!copiesUpperBits || !op1->isContained());There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe replace it with assert(containedOpNum != 1 || !copiesUpperBits); to also cover the regOptional case
src/coreclr/jit/lsraxarch.cpp
Outdated
| // Intrinsics with CopyUpperBits semantics must have op1 as target | ||
| if (containedOpNum == 1 && !copiesUpperBits) | ||
| { | ||
| if (resultOpNum != 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this were the positive case and there were an assert that resultOpNum != containedOpNum
Therefore, if containedOpNum == 1 then resultOpNum can only be 0, 2, or 3
If it's 3, then swapping op1/op3 is sufficient
If it's 2, then swapping op2/op3 is needed first
If it's 0, then it doesn't matter what we do so its fine to not swap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that containedOpNum ==0 and resultOpNum==0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If none of the operands are overwritten and none are last use, then containedOpNum == 0.
I think we probably won't also get containedOpNum == 0 because VFMADD should support general-purpose loads as well and so RegOptional should probably be true for at least one case. But in general its better to check and account for possible future changes, scenarios, or nodes that are introduced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because op lastUse could be updated after lowering, there are cases that we have resultOpNum == containedOpNum when they are not 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make multiple ops contained in lowering or change that in lsra?
src/coreclr/jit/lsraxarch.cpp
Outdated
| } | ||
| else | ||
| { | ||
| assert(containedOpNum == 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its possible for containedOpNum to be 0 and so we should check for this explicitly.
| std::swap(emitOp2, emitOp3); | ||
|
|
||
| srcCount += op3->isContained() ? BuildOperandUses(op3) : BuildDelayFreeUses(op3, op1); | ||
| if (resultOpNum == 3 && !copiesUpperBits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just capturing a comment, I don't think we need to do anything in this PR.
I think the logic around copiesUpperBits could be simplified a bit so we don't need these extra checks everywhere. That is, if copiesUpperBits is true, then resultOpNum doesn't matter if its not 1 so maybe we should be forcing resultOpNum to be 0 in that case (that is if copiesUpperBits == true and resultOpNum != 1, then treat it as 0, because no matter what we do, op1 cannot be swapped or moved about and op2/op3 will be delay free or contained).
| // op1 = (op1 * op2) + [op3] or op2 = (op1 * op2) + [op3] | ||
| // ? = (op1 * op2) + [op3] or ? = (op1 * op2) + op3 | ||
| // 213 form: XMM1 = (XMM2 * XMM1) + [XMM3] | ||
| isCommutative = copiesUpperBits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't isCommutative be !copiesUpperBits? We can't swap anything if copiesUpperBits == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I used it inaccurately here to barely control if we should enter the branch.
| regNumber op1Reg = emitOp1->GetRegNum(); | ||
| regNumber op2Reg = emitOp2->GetRegNum(); | ||
|
|
||
| if (isCommutative && (op1Reg != targetReg) && (op2Reg == targetReg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this block still needed given the above handling?
It feels like we should already be covering this under the last block, which is op3 or nothing is contained/spilled so:
if (!copiesUpperBits && (targetReg == op2Reg))
{
std::swap(emitOp1, emitOp2);
}Then everything should be in the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it if (!copiesUpperBits && (targetReg == op2Reg)) not if (copiesUpperBits && (targetReg == op2Reg))? I thought we need to ensure targetReg is op1Reg only when copiesUpperBits is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because emitOp1 is already op1, so if copiesUpperBits == true, then we don't want to change anything.
When its false, we only need to swap if the target reg is op2Reg.
|
@tannergooding, could you please take a look at the latest code when you have time? I resolved almost all of your comments except the resultOpNum and containedOpNum assertion. Thanks! |
| op1Reg = op3->GetRegNum(); | ||
| op2Reg = op2->GetRegNum(); | ||
| op3 = op1; | ||
| if (targetReg == op3NodeReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be !copiesUpperBits && (targetReg == op3NodeReg)
Otherwise, copiesUpperBits can be true since op1 is not Contained or UsedFromSpillTemp and therefore swapping emitOp1 isn't correct.
| // op1 = (op1 * op2) + [op3] or op2 = (op1 * op2) + [op3] | ||
| // ? = (op1 * op2) + [op3] or ? = (op1 * op2) + op3 | ||
| // 213 form: XMM1 = (XMM2 * XMM1) + [XMM3] | ||
| if (targetReg == op2NodeReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, I think this needs to be if (!copiesUpperBits && (targetReg == op2NodeReg)) for the same reason.
I think we also don't need the below section doing if (!copiesUpperBits && (emitOp2->GetRegNum() == targetReg)) as it will have already been covered up here.
|
Everything looks good except for the two related callouts in codegen. Looks like there is also a merge conflict, like due to #59912. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all LGTM. CC. @kunalspathak or @echesakovMSFT could you give a second review and merge if everything looks good to you as well
|
@kunalspathak @echesakovMSFT, could you take a look when you have some time? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to uncomment the 2 asserts and run the test to make sure they are not hit.
| if (containedOpNum == 1) | ||
| { | ||
| // resultOpNum might change between lowering and lsra, comment out assertion for now. | ||
| // assert(containedOpNum != resultOpNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to uncomment this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion cannot be uncommented because the last use value could change after lowering step. I left them here for follow up work if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create a issue for it and add the link to the issue in the comment here?
| } | ||
| else if (containedOpNum == 3) | ||
| { | ||
| // assert(containedOpNum != resultOpNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here?
Co-authored-by: Kunal Pathak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @weilinwa for your patience and commitment. This looks good to me.
|
@weilinwa - I noticed superpmi.py replay failure on linux/x64. Can you double check if it is from your change? |
This is for #12984. @kunalspathak @tannergooding, thanks!