Skip to content

Conversation

@zero323
Copy link
Member

@zero323 zero323 commented Apr 7, 2017

What changes were proposed in this pull request?

Document fpGrowth in:

  • vignettes
  • programming guide
  • code example

How was this patch tested?

Manual tests.

@SparkQA
Copy link

SparkQA commented Apr 7, 2017

Test build #75588 has finished for PR 17557 at commit 27e94fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 7, 2017

Test build #75593 has started for PR 17557 at commit 30949a1.

@SparkQA
Copy link

SparkQA commented Apr 7, 2017

Test build #75602 has finished for PR 17557 at commit 2f4c70e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 7, 2017

Test build #75605 has finished for PR 17557 at commit eb4939d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! - I'd prefer example with real data...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "real"? Something human readable (e.g. milk, bread, butter) or some standard pattern mining dataset? If the former one then it is not a problem. If the latter one I am not aware of any dataset which would be safe enough on the license side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something that is not coded in 3 lines ;)
reading from a file if we could - if there isn't any dataset that we can license to use, can we anonymize an existing one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something that is not coded in 3 lines ;)

That's for sure :) For now I am more trying to figure out how to present this to make it useful. For ML guide we can safely reuse data/mllib but I don't think we can do the same with vignette unless we bring sample_fpgrowth.txt as a package data.

@SparkQA
Copy link

SparkQA commented Apr 9, 2017

Test build #75633 has finished for PR 17557 at commit 03afd08.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323 zero323 changed the title [SPARK-20208][WIP][R][DOCS] Document R fpGrowth support [SPARK-20208][R][DOCS] Document R fpGrowth support Apr 9, 2017
@zero323
Copy link
Member Author

zero323 commented Apr 11, 2017

@felixcheung For vignette I used a bit larger synthetic dataset which should show all the features implemented by fpm. For examples I used the same data as #17130.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps it's slightly less clear, since there are 3 references to "items" (or really, just the SparkDataFrame and its column name), which "items" L923 is referring to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung Updated.

BTW There is a JIRA tracking SQL functions parity, isn't there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean making sure we have all the SQL functions in R?
we don't, actually, since it's a evolving tasks - there are constantly new functions being added.

I think you are referring to split - yes we should probably add that in R too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you please rename the dataframe to df like the other example you have too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pretty sure I've seen one :) split and array. There are of course name conflicts involved (spark.array?) but it would be really useful to have these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's possible but I'm fairly certain there are still quite a few functions we have missed over the year that are not in JIRA.

feel free to add them - would appreciate your help.

agree array is a bit tricky - I'd rather not having to diverge because of consistency with array_contain function and so on, but I can see spark.array might be an approach. Or perhaps array_col?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe create_array (like PySpark create_map)?

@SparkQA
Copy link

SparkQA commented Apr 17, 2017

Test build #75854 has finished for PR 17557 at commit 4c02933.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2017

Test build #75859 has finished for PR 17557 at commit ab251f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. missed this - this should be {r}

- Vignettes.
- Programming guide.
- Code example.
@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75893 has finished for PR 17557 at commit 3445815.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Apr 19, 2017
## What changes were proposed in this pull request?

Document  fpGrowth in:

- vignettes
- programming guide
- code example

## How was this patch tested?

Manual tests.

Author: zero323 <[email protected]>

Closes #17557 from zero323/SPARK-20208.

(cherry picked from commit 702d85a)
Signed-off-by: Felix Cheung <[email protected]>
@felixcheung
Copy link
Member

merged to master/2.2

@asfgit asfgit closed this in 702d85a Apr 19, 2017
@zero323
Copy link
Member Author

zero323 commented Apr 19, 2017

Thanks @felixcheung!

@zero323 zero323 deleted the SPARK-20208 branch April 20, 2017 20:43
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
## What changes were proposed in this pull request?

Document  fpGrowth in:

- vignettes
- programming guide
- code example

## How was this patch tested?

Manual tests.

Author: zero323 <[email protected]>

Closes apache#17557 from zero323/SPARK-20208.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants