-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19939] [ML] Add support for association rules in ML #17280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #74458 has finished for PR 17280 at commit
|
|
I'll focus on #17130 first and resolve conflict here. |
|
Test build #74610 has finished for PR 17280 at commit
|
|
Test build #75715 has finished for PR 17280 at commit
|
|
I'll update this after FPGrowth examples and doc merged #17130, since there'll be some conflicts. update: Done merging the conflict |
-better code
|
Test build #77829 has finished for PR 17280 at commit
|
|
Test build #78005 has started for PR 17280 at commit |
|
test this please |
|
Test build #78010 has finished for PR 17280 at commit
|
| | [r]| 3| | ||
| | [y]| 3| | ||
| |[y, x]| 3| | ||
| +------+----+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to change the result quite a bit, is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or rather, why is https://github.com/apache/spark/pull/17280/files#diff-b6dbf16870bd2cca9b4140df8aebd681L189 changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set a higher minSupport to avoid the 0.3333333... in the support column.
| override def load(path: String): FPGrowthModel = { | ||
| val metadata = DefaultParamsReader.loadMetadata(path, sc, className) | ||
| implicit val format = DefaultFormats | ||
| val numTrainingRecords = (metadata.metadata \ "numTrainingRecords").extract[Long] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this break backward compatibility for loading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does now. If saving numTrainingRecords to FPGrowthModel looks good, I can update the load logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're adding numTrainingRecords to FPGrowthModel and there isn't a proper default number, I suggest we break the backward model loading compatibility. Otherwise we need to fill numTrainingRecords with an incorrect value and will likely create a maintenance trap.
|
Thanks for taking a look @MLnick |
|
Please advice if this is a good feature to add. If not I'll close it. Thanks. |
|
Test build #92514 has finished for PR 17280 at commit
|
|
Test build #93814 has finished for PR 17280 at commit
|
|
Test build #93842 has finished for PR 17280 at commit
|
|
Updated to support backward model loading compatibility. |
|
Test build #95316 has finished for PR 17280 at commit
|
|
Test build #95318 has finished for PR 17280 at commit
|
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
Is there any plan to revive this PR? It would be great if we can the support of rules exposed. |
What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-19939
Adding another essential characteristic for the Association Rule in Spark ml.fpm.
Support is an indication of how frequently the itemset of an association rule appears in the database and suggests if the rules are generally applicable to the dateset. refer to wiki for more details.
Before adding support:
After adding support:
Thus to allow a better understanding for the generated association rules. This is a new feature and was not included in the original function parity PR.
How was this patch tested?
existing and new unit test