-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19940][ML][MINOR] FPGrowthModel.transform should skip duplicated items #17283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #74467 has finished for PR 17283 at commit
|
hhbyyh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. The code change looks good. Only some minor things in the unit test.
Just personal opinion, the dataset in the unit test can be more specific, like train a model with
"1 3",
"2 3"
and predict with Array(1, 2), then we can just
assert(prediction == Seq("3")).
Also we can set support and confidence intentionally to avoid future change of their default values.
The current dataset is fine, but I cannot just tell which rules will be generated from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"prediction"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "FPGrowth prediction should not contain duplicates"
1edf3fa to
7a44dbe
Compare
|
Test build #74469 has finished for PR 17283 at commit
|
|
Test build #74470 has finished for PR 17283 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those will be all from me. You can wait for committers to see if there's more suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can merge this line to the last statement.
model.transform(
...
).first().getAs[Seq[String]]("prediction")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a violation of style. Not sure if we need the jira id here as this is self-explanatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert(prediction.size === Seq("3")) may be more clear.
- Apply predictions after flatMap in ml.FPGrowthModel.transform - Add tests
|
Test build #74519 has finished for PR 17283 at commit
|
|
Thanks for fixing this issue! LGTM Stating the JIRA number for a bug fix is reasonable, though it's most useful if the bug appears in an actual release. |
What changes were proposed in this pull request?
This commit moved
distinctin its intended place to avoid duplicated predictions and adds unit test covering the issue.How was this patch tested?
Unit tests.