-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-24213][ML] Fix for Int id type for PowerIterationClustering in spark.ml #21274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #90389 has finished for PR 21274 at commit
|
| dataset.schema($(idCol)).dataType match { | ||
| case _: LongType => | ||
| uncastPredictions | ||
| case otherType => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not directly use
case intType: IntType: so that make a stronger restriction ?
Or do we need to support other types besides int/long ?
|
test this please |
|
Test build #90419 has finished for PR 21274 at commit
|
|
Test build #90420 has finished for PR 21274 at commit
|
| case otherType => | ||
| uncastPredictions.select(col($(idCol)).cast(otherType).alias($(idCol))) | ||
| throw new IllegalArgumentException(s"PowerIterationClustering had an unexpected error: " + | ||
| s"ID col was found to be of type $otherType, despite initial schema checks. Please " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ${otherType.simpleString}
|
Test build #90427 has finished for PR 21274 at commit
|
|
LGTM. ! |
| case _: LongType => | ||
| uncastPredictions | ||
| case _: IntegerType => | ||
| uncastPredictions.withColumn($(idCol), col($(idCol)).cast(LongType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be
case _: IntegerType => uncastPredictions.withColumn($(idCol), col($(idCol)).cast(IntegerType))
Otherwise it is not necessary for casting. right? Because prediction already has id as Long type and dataset has id as IntegerType. So, we need to cast prediction.id to IntegerType. right?
Correct me if I am wrong.
|
This issue actually brings up a problem with the Transformer approach for PIC. Just commented more here: https://issues.apache.org/jira/browse/SPARK-15784 Thank you for pushing back! |
What changes were proposed in this pull request?
PIC in spark.ml has tests for "id" type IntegerType, but those tests did not fully check the result. This patch:
How was this patch tested?
Existing unit tests, with fixes