Skip to content

Conversation

@jkbradley
Copy link
Member

What changes were proposed in this pull request?

PIC in spark.ml has tests for "id" type IntegerType, but those tests did not fully check the result. This patch:

  • fixes the unit test to break for the existing implementation
  • fixes the implementation

How was this patch tested?

Existing unit tests, with fixes

@SparkQA
Copy link

SparkQA commented May 8, 2018

Test build #90389 has finished for PR 21274 at commit cd02df5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dataset.schema($(idCol)).dataType match {
case _: LongType =>
uncastPredictions
case otherType =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly use
case intType: IntType: so that make a stronger restriction ?
Or do we need to support other types besides int/long ?

@jkbradley
Copy link
Member Author

test this please

@SparkQA
Copy link

SparkQA commented May 9, 2018

Test build #90419 has finished for PR 21274 at commit e504c3e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2018

Test build #90420 has finished for PR 21274 at commit e504c3e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case otherType =>
uncastPredictions.select(col($(idCol)).cast(otherType).alias($(idCol)))
throw new IllegalArgumentException(s"PowerIterationClustering had an unexpected error: " +
s"ID col was found to be of type $otherType, despite initial schema checks. Please " +
Copy link
Contributor

@mgaido91 mgaido91 May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ${otherType.simpleString}

@SparkQA
Copy link

SparkQA commented May 9, 2018

Test build #90427 has finished for PR 21274 at commit 3e40a92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WeichenXu123
Copy link
Contributor

LGTM. !

case _: LongType =>
uncastPredictions
case _: IntegerType =>
uncastPredictions.withColumn($(idCol), col($(idCol)).cast(LongType))
Copy link
Contributor

@shahidki31 shahidki31 May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be

case _: IntegerType => uncastPredictions.withColumn($(idCol), col($(idCol)).cast(IntegerType))

Otherwise it is not necessary for casting. right? Because prediction already has id as Long type and dataset has id as IntegerType. So, we need to cast prediction.id to IntegerType. right?
Correct me if I am wrong.

@jkbradley
Copy link
Member Author

This issue actually brings up a problem with the Transformer approach for PIC. Just commented more here: https://issues.apache.org/jira/browse/SPARK-15784

Thank you for pushing back!

@jkbradley jkbradley closed this May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants