Skip to content

Conversation

@tkonolige
Copy link
Contributor

Subexpression elimination used a comparison of each subexpression to every other subexpression to determine which to eliminate. This resulted in an O(N^2) algorithm that was slow when there were many LetStmts. Instead we now hash each subexpression and compare the hashes. We reuse structual hashing without remapping free variables to uniquely
determine each subexpression.

@masahi @FranckQC

Subexpression elimination used a comparison of each subexpression to
every other subexpression to determine which to eliminate. This resulted
in an O(N^2) algorithm that was slow when there were many LetStmts.
Instead we now hash each subexpression and compare the hashes. We reuse
structual hashing without remapping free variables to uniquely
determine each subexpression.
Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, I have noticed this pass is a little slow when conducting some synthetic tests with lots of subexpressions.

@tkonolige
Copy link
Contributor Author

@AndrewZhaoLuo comments addressed

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FranckQC
Copy link
Contributor

Hi,

Thank you for your interest in improving the pass!
I haven't worked on the CSE pass for a while, so I am probably a little bit rusty on it, but I think there are quite a few problems with this PR.

1)
The role of this function SyntacticToSemanticComputations() is to transform a ComputationTable (that is to say, a std::unordered_map<PrimExpr, size_t, StructuralHash, ExprDeepEqual>, ie a hashtable mapping PrimExpr to size_t, using StructuralHash for the hashing, and ExprDeepEqual for the equality test) into a vector of std::pair<PrimExpr, size_t> where semantically equivalent elements have been merged.

The whole need for this function is due to having collected syntactical entities (using an efficient structure for that, the ComputationTable, where it's fast to retrieve an element (in constant time), as it's a hashtable), which latter of course needs to be transformed into a collection (a vector), where equivalent terms (like x+y and y+x) are merged together (and their counters added), which I then call semantical entities.

This notion of being semantically equivalent could be anything, like identifying terms modulo associativity [(x+y)+z with x+(y+z)], modulo commutatitvity [x+y with y+x], or anything else, with any equivalence relation. It's completely customizable by just changing the EquivalentTerms() function.

Sure, at the moment, this function EquivalentTerms() just calls the syntactical equality EqualTerms(), but the whole pass has been written with the idea that we could latter-on replace it with anything else, for making it even more powerful. You can see that the function SyntacticToSemanticComputations() that you have changed used to call std::find_if with the predicate being precisely this very customizable EquivalentTerms(). This is lost with these changes, and it no longer identifies equivalent terms.

2)
There is something else which I don't quite get about your changes. Why do you even bother to transform the ComputationTable (which is a std::unordered_map<PrimExpr, size_t, StructuralHash, ExprDeepEqual>) into a std::unordered_map<PrimExpr, size_t, ExprDeepHashStruct, ExprDeepEqual>?
The lines 753 to 763 (https://github.com/apache/tvm/pull/11423/files#diff-f83c46530e92c628fd309499ceffa32cb2fb0505633ac0e754e2bdef4d518962R753-R762) are basically transforming a hashtable (table) into the exact same hashtable (equiv_computations)). This code is doing nothing.

If the pass is really taking too long for many people, I could try to help to improve it, if that's needed. There will necessary be a limit in what we can gain at compile time, as improving the runtime speed (or creating opportunities for it, which CSE is doing) often has a price to pay at compile time. But we can see what we can do, for sure.

What do you think?

@tkonolige
Copy link
Contributor Author

Thanks for helping to clarify things @FranckQC! I think I'm a little confused still.

Regarding 1), you are saying that right now SyntacticToSemanticComputations just happens to use ExprDeepEqual, but we intend to change it in the future? And so, right now (in main), SyntacticToSemanticComputations is essentially doing nothing? If so, could we just have SyntacticToSemanticComputations just be a no-op for now?

Even if we switch SyntacticToSemanticComputations to a no-op, I worry that we are going to hit this O(N^2) problem in the future. Is there a way to get around the all-pairs comparison?

Regarding 2), I didn't realize ComputationTable was already a std::unordered_map<PrimExpr, size_t, StructuralHash, ExprDeepEqual>. Whoops :).

@AndrewZhaoLuo
Copy link
Contributor

AndrewZhaoLuo commented May 26, 2022

@FranckQC thanks for elucidating the intention of the original design. I see why for an arbitrary semantic computation functions you may need to have N^2 comparisons.

Is it possible to relax the number of comparisons by using a canonical form. E.g. (x + y) + z and z + (y + x) would always reduce to (x + y) + z or is this too limiting/impossible to catch some optimizations? If we could do things this way we would have O(N) computations now.

Regardless, this has O(N^2) for little benefit (as we do equality only) at the moment so I propose to comment out the original impl. and replace it with a O(N) fast one / no op, and add a suitable comment.

Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turning off the merge approval until we hash this out.

@FranckQC
Copy link
Contributor

FranckQC commented May 26, 2022

Hi everyone,

@tkonolige :

Regarding 1), you are saying that right now SyntacticToSemanticComputations just happens to use ExprDeepEqual, but we intend to change it in the future? And so, right now (in main), SyntacticToSemanticComputations is essentially doing nothing? If so, could we just have SyntacticToSemanticComputations just be a no-op for now?

Hum, no, I wouldn't say it does nothing. It still transforms a hastable into a vector. How it does so depends on the EquivalentTerms() function.

SyntacticToSemanticComputations() transforms a table of syntactic entities (where x+y is not the same thing as y+x, even if you were to work modulo commutativity) into a vector of "semantical entities", where equivalent computations have been fusioned/merged together, according to the notion of "being equivalent". This function is building equivalence classes, if you want. And this notion of "being equivalent" is customizable, and implemented by the EquivalentTerms() function, which at the moment simply calls EqualTerms(). So at the moment, the only way to have x Equiv y is to have x and y being syntactically the same term. But that's not a requirement, and it could be any equivalence relation, because the pass has been written to work for any equivalence relation, in order to be able to produce even more commonings.

As a use case of this kind of thing:
For instance, in just a 3 lines of code, @yuanfz98 proposed to change the equivalence relation to now identify terms modulo associativity and distributivity (of * over +) in a PR. Sadly I didn't have enough time to discuss this PR at the time (which got quickly closed). Some people were afraid it would add too much computational complexity to the pass. The complexity of the pass itself wouldn't change (we already look at each pair!), but this time it took benefit of that. However, this new equivalence function was relying on a pseudo-normalisation function (arith::Analyzer::Simplify, see discussion here), which would necessary need some time to (try to) normalize the terms being manipulated. It should not take too long, as they did not implement a full decision procedure, just some quick attempts of applying some rewrite rules, which are some known patterns, that often lead to the most simplified term (but it's not guaranteed). (Said differently, arith::Analyzer::Simplify is correct but not complete.)
It could still take too long in the current form, I don't know, I didn't try it as it was. But I think there's a way to make that more efficient.

@AndrewZhaoLuo and @tkonolige

I think I should be able to address the issue without removing completely the possibility to have more interesting comparisons in the future.
I'll give it a go this week-end if you're happy with that.

Thanks.

@tkonolige
Copy link
Contributor Author

@FranckQC because this currently is causing issues, how about we remove the code for now. When you figure out a solution we can add the code back in.

@FranckQC
Copy link
Contributor

If that's solved by tomorrow, would that work for you?

@tkonolige
Copy link
Contributor Author

Sure, that is probably fine. Thank you for the quick turnaround.

@AndrewZhaoLuo
Copy link
Contributor

I think the main issue I have O(N^2) runtime on a default pass. We can do linear time if we plan things well, e.g. have syntactic canonical forms. Haven't dug too deeply, but if this normalization stuff does something like this I will be cool

@tkonolige
Copy link
Contributor Author

@FranckQC any update on a solution?

@FranckQC
Copy link
Contributor

Just starting to work on it now. Was busy with other things earlier today.
A PR will come this evening/night.

@tkonolige
Copy link
Contributor Author

tkonolige commented May 27, 2022

I don't think @AndrewZhaoLuo or I will be back to review this until Tuesday so don't feel rushed.

@FranckQC
Copy link
Contributor

Just a small update to let you know that I have a patch almost ready for that. Just testing it now (both functionally and performance-wise) to make sure everything is ok.
I will update soon again.

@tkonolige
Copy link
Contributor Author

@FranckQC any update?

@FranckQC
Copy link
Contributor

FranckQC commented Jun 3, 2022

Sorry, it's currently being reviewed in our repo downstream. I hope to make the PR here this afternoon!

@FranckQC
Copy link
Contributor

FranckQC commented Jun 5, 2022

Hi,

The PR is ready here : #11574
Apologies as it took me a little bit longer than I expected.
The only test failing seems to be from a flaky test as it's unrelated to the changes introduced.

Thanks!

@tkonolige
Copy link
Contributor Author

Thanks @FranckQC.

@tkonolige tkonolige closed this Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants