Skip to content

Conversation

@zero323
Copy link
Member

@zero323 zero323 commented Mar 13, 2017

What changes were proposed in this pull request?

This commit moved distinct in its intended place to avoid duplicated predictions and adds unit test covering the issue.

How was this patch tested?

Unit tests.

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74467 has finished for PR 17283 at commit 6183093.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hhbyyh hhbyyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The code change looks good. Only some minor things in the unit test.

Just personal opinion, the dataset in the unit test can be more specific, like train a model with
"1 3",
"2 3"
and predict with Array(1, 2), then we can just
assert(prediction == Seq("3")).
Also we can set support and confidence intentionally to avoid future change of their default values.
The current dataset is fine, but I cannot just tell which rules will be generated from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"prediction"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "FPGrowth prediction should not contain duplicates"

@zero323 zero323 force-pushed the SPARK-19940 branch 2 times, most recently from 1edf3fa to 7a44dbe Compare March 14, 2017 00:22
@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74469 has finished for PR 17283 at commit 1edf3fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74470 has finished for PR 17283 at commit 7a44dbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hhbyyh hhbyyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those will be all from me. You can wait for committers to see if there's more suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can merge this line to the last statement.

model.transform(
    ...
    ).first().getAs[Seq[String]]("prediction")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a violation of style. Not sure if we need the jira id here as this is self-explanatory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assert(prediction.size === Seq("3")) may be more clear.

- Apply predictions after flatMap in ml.FPGrowthModel.transform
- Add tests
@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74519 has finished for PR 17283 at commit d292896.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Thanks for fixing this issue! LGTM
Merging with master

Stating the JIRA number for a bug fix is reasonable, though it's most useful if the bug appears in an actual release.

@asfgit asfgit closed this in d4a637c Mar 14, 2017
@zero323 zero323 deleted the SPARK-19940 branch March 16, 2017 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants