Skip to content

Conversation

@hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented Mar 13, 2017

What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-19939
Adding another essential characteristic for the Association Rule in Spark ml.fpm.

Support is an indication of how frequently the itemset of an association rule appears in the database and suggests if the rules are generally applicable to the dateset. refer to wiki for more details.

Before adding support:

rules confidence
beer -> soda 0.5
pecan -> milk 0.6

After adding support:

rules confidence support
beer -> soda 0.5 0.3
pecan -> milk 0.6 0.01

Thus to allow a better understanding for the generated association rules. This is a new feature and was not included in the original function parity PR.

How was this patch tested?

existing and new unit test

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74458 has finished for PR 17280 at commit 9580aa9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Mar 15, 2017

I'll focus on #17130 first and resolve conflict here.

@SparkQA
Copy link

SparkQA commented Mar 15, 2017

Test build #74610 has finished for PR 17280 at commit 1225496.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75715 has finished for PR 17280 at commit 3f780dc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Apr 17, 2017

I'll update this after FPGrowth examples and doc merged #17130, since there'll be some conflicts.

update: Done merging the conflict

garis added a commit to garis/DataMiningBackBlazeCustomSpark that referenced this pull request May 2, 2017
@SparkQA
Copy link

SparkQA commented Jun 9, 2017

Test build #77829 has finished for PR 17280 at commit 180b43d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 13, 2017

Test build #78005 has started for PR 17280 at commit eb25f28.

@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Jun 14, 2017

Test build #78010 has finished for PR 17280 at commit eb25f28.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

| [r]| 3|
| [y]| 3|
|[y, x]| 3|
+------+----+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to change the result quite a bit, is this expected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set a higher minSupport to avoid the 0.3333333... in the support column.

override def load(path: String): FPGrowthModel = {
val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
implicit val format = DefaultFormats
val numTrainingRecords = (metadata.metadata \ "numTrainingRecords").extract[Long]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this break backward compatibility for loading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does now. If saving numTrainingRecords to FPGrowthModel looks good, I can update the load logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're adding numTrainingRecords to FPGrowthModel and there isn't a proper default number, I suggest we break the backward model loading compatibility. Otherwise we need to fill numTrainingRecords with an incorrect value and will likely create a maintenance trap.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jan 17, 2018

Thanks for taking a look @MLnick

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Mar 21, 2018

Please advice if this is a good feature to add. If not I'll close it. Thanks.

@SparkQA
Copy link

SparkQA commented Jul 1, 2018

Test build #92514 has finished for PR 17280 at commit ce6959a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2018

Test build #93814 has finished for PR 17280 at commit 40cf449.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93842 has finished for PR 17280 at commit 34f883f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Aug 1, 2018

Updated to support backward model loading compatibility.
@MLnick @jkbradley

@SparkQA
Copy link

SparkQA commented Aug 27, 2018

Test build #95316 has finished for PR 17280 at commit 9e2854a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2018

Test build #95318 has finished for PR 17280 at commit 733c7ff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

github-actions bot commented Feb 1, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 1, 2020
@github-actions github-actions bot closed this Feb 2, 2020
@gungor2
Copy link

gungor2 commented Apr 20, 2021

Is there any plan to revive this PR? It would be great if we can the support of rules exposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants