Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Apr 30, 2021

What changes were proposed in this pull request?

This patch fixes Invoke expression when the target object has more than one method with the given method name.

Why are the changes needed?

Invoke will find out the method on the target object with given method name. If there are more than one method with the name, currently it is undeterministic which method will be used. We should add the condition of parameter number when finding the method.

Does this PR introduce any user-facing change?

Yes, fixed a bug when using Invoke on a object where more than one method with the given method name.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Apr 30, 2021
@viirya
Copy link
Member Author

viirya commented Apr 30, 2021

cc @cloud-fan @maropu

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Nice catch!
Thank you, @viirya .

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42617/

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42617/

case ObjectType(cls) =>
val m = cls.getMethods.find(_.getName == encodedFunctionName)
val m = cls.getMethods.find { m =>
m.getName == encodedFunctionName && m.getParameterCount == arguments.length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check parameter types too? also curious why we don't use cls.getMethod for the purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to use getMethod

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually Invoke uses getMethod for target object with data types other than ObjectType. I'm not sure why we need to do a separate getMethod during evaluation for the cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And note method uses encodedFunctionName to find method, but in eval the original functionName is used.

Copy link
Member Author

@viirya viirya Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be safer to keep original pattern. But switched to use getMethod and see what CI tells.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a very ancient bug from Spark 1.x.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, @viirya . According to the Affected Version of SPARK-35278, are we going to cut another RC (Apache Spark 2.4.8 RC4) for this?

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Test build #138103 has finished for PR 32404 at commit 9fc152c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Apr 30, 2021

Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: Couldn't find apply on interface scala.Function1
	at scala.sys.package$.error(package.scala:30)
	at org.apache.spark.sql.catalyst.expressions.objects.Invoke.liftedTree1$1(objects.scala:333)
	at org.apache.spark.sql.catalyst.expressions.objects.Invoke.method$lzycompute(objects.scala:329)
	at org.apache.spark.sql.catalyst.expressions.objects.Invoke.method(objects.scala:327)
	at org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:359)
	at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147)

Switched to getMethod seems causing some test failures.

@viirya
Copy link
Member Author

viirya commented Apr 30, 2021

BTW, @viirya . According to the Affected Version of SPARK-35278, are we going to cut another RC (Apache Spark 2.4.8 RC4) for this?

Ideally, yes. Though seems no one complains this, and this is an ancient bug...

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42623/

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42623/

@viirya
Copy link
Member Author

viirya commented Apr 30, 2021

Let me know if you prefer to skip 2.4 backport.

@viirya
Copy link
Member Author

viirya commented Apr 30, 2021

Oh, the problem of using getMethod is for some cases, the parameter types is not known in advance. Like the failed tests, the argument types from arguments is java.lang.Object. So we cannot reliably using name + parameter types to get method.

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Test build #138097 has finished for PR 32404 at commit 3829aaa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@transient lazy val method = targetObject.dataType match {
case ObjectType(cls) =>
val m = cls.getMethods.find(_.getName == encodedFunctionName)
val m = cls.getMethods.find { m =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use filter instead of find, and fail if more than one match is found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

  1. getMethod with param types
  2. If not found, turn to find one method with parameter number
  3. If more than one is found, or no one can be found, fails it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for that improved one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Now I'm wondering whether StaticInvoke has the same issue since it uses getDeclaredMethod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StaticInvoke calls getDeclaredMethod with argClasses. I think it may also suffer the issue if the arg class is Object.

Let me see if I can come out a test case and fix it (might be in other PR).

As said above, I think it is safer to try to combine two method looking up in master only.

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42628/

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42628/

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Test build #138107 has finished for PR 32404 at commit 8eed4d1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Let me know if you prefer to skip 2.4 backport.

For that, it's totally up to your decision, @viirya , because you are the release manager of Apache Spark 2.4.8. I support your decision.

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42636/

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42636/

@SparkQA
Copy link

SparkQA commented May 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42643/

@SparkQA
Copy link

SparkQA commented May 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42643/

@SparkQA
Copy link

SparkQA commented May 1, 2021

Test build #138121 has finished for PR 32404 at commit 176b103.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sys.error(s"Couldn't find $encodedFunctionName on $cls")
} else if (m.length > 1) {
// More than one matched method signature. Exclude synthetic one, e.g. generic one.
val realMethods = m.filter(!_.isSynthetic)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot filter out synthetic ones in L336?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it is possible that there is only one method and it is also a synthetic one. If we filter synthetic ones at L336, we may miss it?

@maropu
Copy link
Member

maropu commented May 1, 2021

Looks fine otherwise.

@SparkQA
Copy link

SparkQA commented May 1, 2021

Test build #138122 has finished for PR 32404 at commit bfaaf32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class BaseClass[T]
  • class ConcreteClass extends BaseClass[Int] with Serializable

@viirya
Copy link
Member Author

viirya commented May 1, 2021

Thanks @cloud-fan @dongjoon-hyun @maropu. Merging to master/3.1/3.0.

There is conflicts in 2.4. Will manually backport.

@viirya viirya closed this in 6ce1b16 May 1, 2021
viirya added a commit that referenced this pull request May 1, 2021
…of parameters

### What changes were proposed in this pull request?

This patch fixes `Invoke` expression when the target object has more than one method with the given method name.

### Why are the changes needed?

`Invoke` will find out the method on the target object with given method name. If there are more than one method with the name, currently it is undeterministic which method will be used. We should add the condition of parameter number when finding the method.

### Does this PR introduce _any_ user-facing change?

Yes, fixed a bug when using `Invoke` on a object where more than one method with the given method name.

### How was this patch tested?

Unit test.

Closes #32404 from viirya/verify-invoke-param-len.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 6ce1b16)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
viirya added a commit that referenced this pull request May 1, 2021
…of parameters

### What changes were proposed in this pull request?

This patch fixes `Invoke` expression when the target object has more than one method with the given method name.

### Why are the changes needed?

`Invoke` will find out the method on the target object with given method name. If there are more than one method with the name, currently it is undeterministic which method will be used. We should add the condition of parameter number when finding the method.

### Does this PR introduce _any_ user-facing change?

Yes, fixed a bug when using `Invoke` on a object where more than one method with the given method name.

### How was this patch tested?

Unit test.

Closes #32404 from viirya/verify-invoke-param-len.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 6ce1b16)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
viirya added a commit that referenced this pull request May 1, 2021
…mber of parameters

### What changes were proposed in this pull request?

This patch fixes `Invoke` expression when the target object has more than one method with the given method name. This is 2.4 backport of #32404.

### Why are the changes needed?

`Invoke` will find out the method on the target object with given method name. If there are more than one method with the name, currently it is undeterministic which method will be used. We should add the condition of parameter number when finding the method.

### Does this PR introduce _any_ user-facing change?

Yes, fixed a bug when using `Invoke` on a object where more than one method with the given method name.

### How was this patch tested?

Unit test.

Closes #32412 from viirya/SPARK-35278-2.4.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
@dongjoon-hyun
Copy link
Member

+1, late LGTM with the last commit.

@cloud-fan
Copy link
Contributor

@viirya can we look into which callers do not provide the concrete parameter types when building Invoke? I think it's better to fix them, so that we can do a proper method resolution in Invoke, as a Java class may have several methods with the same name and number of arguments, but different argument types.

@viirya
Copy link
Member Author

viirya commented May 5, 2021

@cloud-fan Okay. Let me find them out and fix them.

@viirya
Copy link
Member Author

viirya commented May 5, 2021

@cloud-fan Just to confirm. So you mean we only allow Invoke to run the method with exact parameter type matches, right?

For example, calling a method func(input: Object) with argument Tuple2 will be disallowed.

@viirya
Copy link
Member Author

viirya commented May 5, 2021

@cloud-fan Seems to me we still need to support it. One example is UserDefinedType.

For example, VectorUDT extends UserDefinedType[Vector]. VectorUDT is responsible for serialization of dense and sparse vectors. It overrides def serialize(obj: Vector) that takes both SparseVector and DenseVector as input parameters.

@cloud-fan
Copy link
Contributor

For example, calling a method func(input: Object) with argument Tuple2 will be disallowed.

This should be allowed, because we do provide the concrete parameter type (Tuple2). What should be forbidden is something like func(s: String) and the parameter type is Object.

I think I get your point, we should probably use a util function to look up the method, instead of writing the code by ourselves, to handle cases like search the inheritance hierarchy, argument type casting, etc.

How about https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/reflect/MethodUtils.html#getMatchingMethod-java.lang.Class-java.lang.String-java.lang.Class...-

@viirya
Copy link
Member Author

viirya commented May 6, 2021

For example, calling a method func(input: Object) with argument Tuple2 will be disallowed.

This should be allowed, because we do provide the concrete parameter type (Tuple2). What should be forbidden is something like func(s: String) and the parameter type is Object.

I think I get your point, we should probably use a util function to look up the method, instead of writing the code by ourselves, to handle cases like search the inheritance hierarchy, argument type casting, etc.

How about https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/reflect/MethodUtils.html#getMatchingMethod-java.lang.Class-java.lang.String-java.lang.Class...-

It looks promising. Not aware there is an util function we can use. So my question would be should we use it in backports? Or only replacing with our code with the util in master only?

@cloud-fan
Copy link
Contributor

I think this PR itself is pretty conservative and is good to backport. The new code is clearly better than the old code without regression. I'd suggest to only change to the util method in the master branch, for both invoke and static invoke.

@cloud-fan
Copy link
Contributor

BTW, AFAIK invoke and static invoke are only used by internal places such as the encoder and UDT. It's not really a bug for <= 3.1, but a bug for 3.2 as we use it to execute the new UDF.

@viirya
Copy link
Member Author

viirya commented May 6, 2021

I think this PR itself is pretty conservative and is good to backport. The new code is clearly better than the old code without regression. I'd suggest to only change to the util method in the master branch, for both invoke and static invoke.

I see. Then I don't touch invoke in old branches again and keep this backport as it is. Let's use the util method in master only.

@viirya
Copy link
Member Author

viirya commented May 6, 2021

BTW, AFAIK invoke and static invoke are only used by internal places such as the encoder and UDT. It's not really a bug for <= 3.1, but a bug for 3.2 as we use it to execute the new UDF.

Rarely but maybe possibly developers will use invoke/static invoke for custom encoder? As this is already backport to 2.4/3.0/3.1, I think it should be fine to keep as it is? For static invoke #32413, it is also conservative, so I guess it may be also good for backport?

@cloud-fan
Copy link
Contributor

For static invoke #32413, it is also conservative, so I guess it may be also good for backport?

Yea I think so

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…of parameters

This patch fixes `Invoke` expression when the target object has more than one method with the given method name.

`Invoke` will find out the method on the target object with given method name. If there are more than one method with the name, currently it is undeterministic which method will be used. We should add the condition of parameter number when finding the method.

Yes, fixed a bug when using `Invoke` on a object where more than one method with the given method name.

Unit test.

Closes apache#32404 from viirya/verify-invoke-param-len.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 6ce1b16)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
…of parameters

### What changes were proposed in this pull request?

This patch fixes `Invoke` expression when the target object has more than one method with the given method name.

### Why are the changes needed?

`Invoke` will find out the method on the target object with given method name. If there are more than one method with the name, currently it is undeterministic which method will be used. We should add the condition of parameter number when finding the method.

### Does this PR introduce _any_ user-facing change?

Yes, fixed a bug when using `Invoke` on a object where more than one method with the given method name.

### How was this patch tested?

Unit test.

Closes apache#32404 from viirya/verify-invoke-param-len.

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit 6ce1b16)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
@viirya viirya deleted the verify-invoke-param-len branch December 27, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants