Skip to content

Conversation

@quic-sanirudh
Copy link
Contributor

@quic-sanirudh quic-sanirudh commented Mar 26, 2024

When Ramp and Broadcast nodes are created with fixed length lanes,
they're fixed to int32 dtype since DLDataType always supports only
uint16 lanes.

@quic-sanirudh quic-sanirudh force-pushed the intimm_deepequal_compare_fix branch from bcc0e1d to 0675ed4 Compare March 26, 2024 11:37
@tqchen
Copy link
Member

tqchen commented Mar 26, 2024

In this particular case (deep equality), i think type do matter, so it would be great instead to fix the cases that would depend on the relaxed behavior.

I know we had some i64/i32 issues, and general rule of thumb now is to try to be explicit as much as possible and that helps to reduce errors

@quic-sanirudh
Copy link
Contributor Author

quic-sanirudh commented Mar 26, 2024

In this particular case (deep equality), i think type do matter, so it would be great instead to fix the cases that would depend on the relaxed behavior.

I know we had some i64/i32 issues, and general rule of thumb now is to try to be explicit as much as possible and that helps to reduce errors

Oh okay, thanks for the feedback @tqchen. The cases we started seeing was that some expressions were not getting simplified properly after RampNode lanes were changed to PrimExpr. I narrowed down the exact simplification that was failing was actually a rewrite simplify rule here.

I realized that the simplification was not happening because lanes between broadcast and RampNode in my case had different types, so a couple other solutions I thought would apply here is to either fix the RampNode constructor to stick to some fixed dtype for lanes (something like int32/int16, since lanes in DLDataType anyways only supports uint16 dtype), or to update the simplify rules here to try the same rules with an PVar<IntImm> lanes type in case of fixed length vectors.

But if dtype does matter, then should we update the PEqualChecked<IntImm> to also check for dtypes?

@tqchen
Copy link
Member

tqchen commented Mar 26, 2024

ah oK, i think in this case we should try to come up with a rule for lanes. In this particular case i think it was related to SVE changes, so also cc @ekalda

I think having a fixed dtype probably makes sense then we handle cast for related cases (keep i32 and i64 iterators in mind

@ekalda
Copy link
Contributor

ekalda commented Mar 26, 2024

Yeah I think fixing the dtype is a good idea, it would hopefully avoid this kind of problems in the future as well. Out of interest, what were the mismatching dtypes of the two compared IntImmNodes that you observed @quic-sanirudh?

@quic-sanirudh
Copy link
Contributor Author

Yeah I think fixing the dtype is a good idea, it would hopefully avoid this kind of problems in the future as well. Out of interest, what were the mismatching dtypes of the two compared IntImmNodes that you observed @quic-sanirudh?

Thanks @ekalda. I'll update the PR to fix the dtypes in RampNode (and perhaps the broadcast node as well).

The dtypes in my case were int32 and int64. The expression I saw was something like this (slightly simpler version)
T.Broadcast(c, 128) + T.Ramp(T.int64(0), T.int64(1), T.int64(128))

The RampNode seems to get the int64 lanes because the all the iterators in our case is by default int64, but the broadcast seems to be inserted during the evaluation of AddNode in op.cc here

When Ramp and Broadcast nodes are created with fixed length lanes,
they're fixed to int32 dtype since DLDataType always supports only
uint16 lanes.
@quic-sanirudh quic-sanirudh force-pushed the intimm_deepequal_compare_fix branch from 0675ed4 to 888bbf0 Compare March 27, 2024 08:46
@quic-sanirudh
Copy link
Contributor Author

I've updated the PR to fix the dtype of lanes as we discussed. Let me know if this looks good.

Copy link
Contributor

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you for the fix @quic-sanirudh! Hopefully we won't see these kind of silent simplification failures in the future now 🤞

@quic-sanirudh quic-sanirudh changed the title [TIR] Modify IntImmNode deep_equal to match regardless of type [TIR] Ramp and Broadcast lanes fixed to int32 dtype Mar 27, 2024
@quic-sanirudh
Copy link
Contributor Author

Thanks for the review @ekalda

@tqchen
Copy link
Member

tqchen commented Mar 27, 2024

@quic-sanirudh can you add a test case about i64 iterator type with Ramp? I know fixing to i32 likely will resolve issues when most programs are i32 iterator, we should make sure it is i64 compatible

tvm.tir.Ramp(x, 1, 4),
),
TestCase(
tvm.tir.Broadcast(x, tvm.tir.IntImm(dtype="int64", value=4)) + tvm.tir.Ramp(0, 1, 4),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant that the value being broadcasted and ramp base being i64. but lanes remains i32, this would be a more common case in our setting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, sorry for the confusion. I'll update the test cases to check that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen I've updated the test case, let me know if this looks good.

@quic-sanirudh
Copy link
Contributor Author

@tvm-bot rerun

@tqchen tqchen merged commit b5fda2d into apache:main Apr 1, 2024
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
* [TIR] Ramp and Broadcast lanes fixed to int32 dtype

When Ramp and Broadcast nodes are created with fixed length lanes,
they're fixed to int32 dtype since DLDataType always supports only
uint16 lanes.

* Add test cases for int64 type lanes

* Update test case with int64 iterators
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants