Skip to content

Conversation

@SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented May 16, 2021

It used to be that in the importer, whether the cast was to be marked as GTF_UNSIGNED was decided exclusively based on the incoming opcode. However, the flag only makes sense for casts from integral sources, and it turns out morph had a bug where it failed to clear this flag which resulted in
bad codegen.

The bug went as follows: gtMorphCast turns casts from an FP type into a small integer into a chain of casts: CAST(small integer <- FP) => CAST(small integer <- CAST(TYP_INT <- FP)). On 32 bit platforms, the code failed to clear the GTF_UNSIGNED flag from the original tree, which meant that the outer cast thought it had TYP_UINT as the source. This matters for checked casts: conv.ovf.i2.un(-2.0d), which is a legitimate conversion, was interpreted
wrongly as an overflowing one as the resulting codegen only checked the upper bound via an unsigned compare.

The fix is two-fold: clear GTF_UNSIGNED for GT_CAST nodes with FP sources on creation and unify the 64 bit and 32 bit paths in gtMorphCast, which, after the removal of GTF_UNSIGNED, are identical.

This is a zero-diff change across all SPMI collections for Windows x64, Linux x64, Linux ARM64.

This is not a zero-diff change for Windows x86. Instances of bad codegen have been corrected in some tests.

Diffs for the aforementioned tests:

Top method regressions (bytes):
          12 (13.64% of base) : 235214.dasm - ConvTest:test25():short
          12 (13.64% of base) : 235147.dasm - ConvTest:test25():short
          10 (11.63% of base) : 235213.dasm - ConvTest:test24():byte
          10 (11.63% of base) : 235146.dasm - ConvTest:test24():byte
           9 ( 4.46% of base) : 86456.dasm - DevDiv_578217.Program:ILGEN_METHOD(ushort,float,int,ushort,ubyte,short,int):short
           7 ( 2.27% of base) : 86411.dasm - ILGEN_CLASS:ILGEN_METHOD(int):ubyte
           7 (10.45% of base) : 235247.dasm - ConvTest:test25():short
           7 (10.45% of base) : 235181.dasm - ConvTest:test25():short
           5 ( 7.69% of base) : 235246.dasm - ConvTest:test24():byte
           5 ( 2.67% of base) : 84471.dasm - FullProof:Test():int
           5 ( 7.69% of base) : 235180.dasm - ConvTest:test24():byte

Top method improvements (bytes):
         -67 (-33.00% of base) : 86409.dasm - ILGEN_CLASS:ILGEN_METHOD(short,ushort,ushort,int):float

ConvTest (JIT\Methodical\NaN\r8nanconv_il_r\r8nanconv), FullProof (JIT\Regression\VS-ia64-JIT\V2.0-Beta2\b309576\bug2) and DevDiv_578217 diffs are as expected, still investigating what happened in the ILGEN method - it looks like UMOD was not transfomed into CORINFO_HELP_ULMOD - weird code, see below.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 16, 2021
@SingleAccretion SingleAccretion force-pushed the Simplify-fgMorphCast-Phase-1 branch 2 times, most recently from f9e7b3e to 5a245e5 Compare May 18, 2021 09:52
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented May 18, 2021

So, the UMOD case is weird because the code that is responsible for it is questionable:

runtime/src/coreclr/jit/morph.cpp

Lines 12618 to 12662 in 5a0edac

/* If this is an unsigned long mod with op2 which is a cast to long from a
constant int, then don't morph to a call to the helper. This can be done
faster inline using idiv.
*/
noway_assert(op2);
if ((typ == TYP_LONG) && opts.OptEnabled(CLFLG_CONSTANTFOLD) &&
((tree->gtFlags & GTF_UNSIGNED) == (op1->gtFlags & GTF_UNSIGNED)) &&
((tree->gtFlags & GTF_UNSIGNED) == (op2->gtFlags & GTF_UNSIGNED)))
{
if (op2->gtOper == GT_CAST && op2->AsCast()->CastOp()->gtOper == GT_CNS_INT &&
op2->AsCast()->CastOp()->AsIntCon()->gtIconVal >= 2 &&
op2->AsCast()->CastOp()->AsIntCon()->gtIconVal <= 0x3fffffff &&
(tree->gtFlags & GTF_UNSIGNED) == (op2->AsCast()->CastOp()->gtFlags & GTF_UNSIGNED))
{
tree->AsOp()->gtOp2 = op2 = fgMorphCast(op2);
noway_assert(op2->gtOper == GT_CNS_NATIVELONG);
}
if (op2->gtOper == GT_CNS_NATIVELONG && op2->AsIntConCommon()->LngValue() >= 2 &&
op2->AsIntConCommon()->LngValue() <= 0x3fffffff)
{
tree->AsOp()->gtOp1 = op1 = fgMorphTree(op1);
noway_assert(op1->TypeGet() == TYP_LONG);
// Update flags for op1 morph
tree->gtFlags &= ~GTF_ALL_EFFECT;
tree->gtFlags |= (op1->gtFlags & GTF_ALL_EFFECT); // Only update with op1 as op2 is a constant
// If op1 is a constant, then do constant folding of the division operator
if (op1->gtOper == GT_CNS_NATIVELONG)
{
tree = gtFoldExpr(tree);
}
// We may fail to fold
if (!tree->OperIsConst())
{
tree->AsOp()->CheckDivideByConstOptimized(this);
}
return tree;
}
}

It appears that the intention was to check whether the operands of UMOD are unsigned, but that doesn't make a lot of sense because GTF_UNSIGNED is not responsible for that, and the fact that the node is a UMOD is already enough - it treats its operands as unsigned. In any case, the diff is caused by this "do GTF_UNSIGNEDs match" logic.
Base:

Before:
               [000028] ------------                            +--*  UMOD      long  
               [000025] ---------U--                            |  +--*  CAST      long <- ulong <- double
               [000024] ------------                            |  |  \--*  NEG       double
               [000023] ------------                            |  |     \--*  LCL_VAR   double V11 loc7         
               [000027] ------------                            |  \--*  CNS_LNG   long   0x000000002487c9b5
                                                                
After:                                                          
               [000028] --CX-+------                            +--*  CALL help long   HELPER.CORINFO_HELP_ULMOD
               [000025] --C--+---U-- arg0                       |  +--*  CALL help long   HELPER.CORINFO_HELP_DBL2ULNG
               [000024] -----+------ arg0                       |  |  \--*  NEG       double
               [000023] -----+------                            |  |     \--*  LCL_VAR   double V11 loc7         
               [000027] -----+------ arg1                       |  \--*  CNS_LNG   long   0x000000002487c9b5

Diff:

Before:
               [000028] ------------                            +--*  UMOD      long  
               [000025] ------------                            |  +--*  CAST      long <- ulong <- double
               [000024] ------------                            |  |  \--*  NEG       double
               [000023] ------------                            |  |     \--*  LCL_VAR   double V11 loc7         
               [000027] ------------                            |  \--*  CNS_LNG   long   0x000000002487c9b5
                                                                
After:                                                          
               [000028] --C--+------                            +--*  UMOD      long  
               [000025] --C--+------                            |  +--*  CALL help long   HELPER.CORINFO_HELP_DBL2ULNG
               [000024] -----+------ arg0                       |  |  \--*  NEG       double
               [000023] -----+------                            |  |     \--*  LCL_VAR   double V11 loc7         
               [000027] ------------                            |  \--*  CNS_LNG   long   0x000000002487c9b5

While it is weird, it should be correct.

Another note on the weird UMOD code: it is testing for a cast from a constant - that may be a remnant of the fact ldc.i4+conv.i8 used to not be folded.

Edit: sure enough, commenting out the "is op2 a cast from a constant" is a no-diff change for Windows x64 and x86.

@SingleAccretion
Copy link
Contributor Author

Ready for review.

cc @sandreenko for another morph change...

@sandreenko sandreenko marked this pull request as ready for review May 18, 2021 20:52
@sandreenko
Copy link
Contributor

sandreenko commented May 18, 2021

Thanks, @SingleAccretion, how did you find this issue?

It is great that you have added a new test for such cases.

I will try to take a look this week, but such change would require a second approval, I think, PTAL @dotnet/jit-contrib

@SingleAccretion
Copy link
Contributor Author

how did you find this issue?

From experimenting with fgMorphCast, I suppose. I noticed that these two branches were doing almost the same thing, made them do the same thing, and while inspecting the diffs realized the (previous) codegen was wrong.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are calling GenTreeCast constructor:

GenTreeCast(var_types type, GenTree* op, bool fromUnsigned, var_types castType DEBUGARG(bool largeNode = false))
: GenTreeOp(GT_CAST, type, op, nullptr DEBUGARG(largeNode)), gtCastType(castType)
{
gtFlags |= fromUnsigned ? GTF_UNSIGNED : GTF_EMPTY;
}

and after that we call CanonicalizeCastNode to clear the flag in some cases, is it worth it?
One option is to do

inline GenTreeCast* Compiler::gtNewCastNode(var_types typ, GenTree* op1, bool fromUnsigned, var_types castType)
{
  if (varTypeIsFloating(cast->CastOp())) { fromUnsigned = false; }
  GenTreeCast* cast = new (this, GT_CAST) GenTreeCast(typ, op1, fromUnsigned, castType);

or add this logic to GenTreeCast constructor (note it will affect more cases then).

Often I set conditional breakpoints on flag values so I would prefer not to change it back and forth without a reason.

Have you tried to add assert(!varTypeIsFloating(cast->CastOp() || !fromUnsigned) and fix the callers who call it with wrong value of bool fromUnsigned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to add assert(!varTypeIsFloating(cast->CastOp() || !fromUnsigned) and fix the callers who call it with wrong value of bool fromUnsigned?

Yes, in fact that was the original version of the fix, and the only place I had to fix was the importer. I went back and forth on this a few times, and ultimately decided to have the change initially be with this more conservative implementation, because it is "guaranteed" (not really because I did not add it to the constructor, but nothing calls the constructor directly) to always return "correct" result to the caller. I suppose I should revisit it and run the assertty version through the CI. Maybe it turns out more callers are getting it wrong and it would be better to normalize centrally (regardless of that, the flag flipping will be fixed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I should revisit it and run the assertty version through the CI. Maybe it turns out more callers are getting it wrong and it would be better to normalize centrally (regardless of that, the flag flipping will be fixed).

I would prefer this if you have time, if not please open a follow-up issue to investigate/clean it and we will merge this PR as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this

I would as well, so we'll investigate it as part of this PR.

Verifying that checked '.un' casts from floats to small types are not treated as casts from unsigned types.
It used to be that in the importer, whether the cast was to
be marked as GTF_UNSIGNED was decided exclusively based on
the incoming opcode. However, the flag only makes sense
for casts from integral sources, and it turns out morph
had a bug where it failed to clear this flag which resulted in
bad codegen.

The bug went as follows: "gtMorphCast" turns casts from an FP
type to a small integer into a chain of casts:
CAST(small integer <- FP) => CAST(small integer <- CAST(TYP_INT <- FP)).
On 32 bit platforms, the code failed to clear the GTF_UNSIGNED flag
from the original tree, which meant that the outer cast thought it
had TYP_UINT as the source. This matters for checked casts:
conv.ovf.i2.un(-2.0d), which is a legitimate conversion, was interpreted
wrongly as an overflowing one as the resulting codegen only checked the
upper bound via an unsigned compare.

The fix is two-fold: clear GTF_UNSIGNED for GT_CAST nodes with FP sources
on creation and unify the 64 bit and 32 bit paths in "gtMorphCast",
which, after the removal of GTF_UNSIGNED handling, are identical.

This is a zero-diff change across all SPMI collections for Windows x64, Linux x64,
Linux ARM64.

This **is not** a zero-diff change for Windows x86.
Instances of bad codegen have been corrected in some tests.
@SingleAccretion SingleAccretion force-pushed the Simplify-fgMorphCast-Phase-1 branch from a0ff6e3 to 395d829 Compare May 21, 2021 20:53
Instead of normalizing GTF_UNSIGNED for FP sources in "gtNewCastNode",
assert that it is not set in GenTreeCast's constructor and fix the
importer to respect that constraint.
@SingleAccretion SingleAccretion force-pushed the Simplify-fgMorphCast-Phase-1 branch from 395d829 to a1a0850 Compare May 21, 2021 20:57
Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@sandreenko sandreenko merged commit 34e6bb4 into dotnet:main May 24, 2021
@SingleAccretion SingleAccretion deleted the Simplify-fgMorphCast-Phase-1 branch May 26, 2021 20:03
@ghost ghost locked as resolved and limited conversation to collaborators Jun 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants