Skip to content

Conversation

@asokadiggs
Copy link
Contributor

Documentation for dropDuplicates() and drop_duplicates() is one and the same. Resolved the error in the example for drop_duplicates using the same approach used for groupby and groupBy, by indicating that dropDuplicates and drop_duplicates are aliases.

Documentation for dropDuplicates() and drop_duplicates() is one and the same.  Resolved the error in the example for drop_duplicates using the same approach used for groupby and groupBy, by indicating that dropDuplicates and drop_duplicates are aliases.
@asokadiggs asokadiggs changed the title Update dropDuplicates() documentation [SPARK-10782] Update dropDuplicates documentation Sep 28, 2015
@asokadiggs
Copy link
Contributor Author

FYI - I've used spark/python/docs/make html to generate the updated HTML versions of the files and visually verified the result in the resulting documentation.

@asokadiggs asokadiggs changed the title [SPARK-10782] Update dropDuplicates documentation [SPARK-10782] [Python] Update dropDuplicates documentation Sep 28, 2015
@SparkQA
Copy link

SparkQA commented Sep 29, 2015

Test build #1822 has finished for PR 8930 at commit 279d620.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 29, 2015

I was going to say, I don't think this addresses the problem you identified, but then I realized there's no separate declaration of drop_duplicates? so I probably don't understand where that's coming from. If it must necessarily clone dropDuplicates docs then I agree with your note. Looks like this needs a style fix though. See the test result.

@asokadiggs
Copy link
Contributor Author

Correct - there is no separate declaration for drop_duplicates. There is a section in dataframe.py at approximately line 1280-1285 that sets drop_duplicates = dropDuplicates and groupby = groupBy for Pandas compatibility. So there isn't an option to separately document drop_duplicates or groupby.

I implemented the same solution for drop_duplicates as is already implemented for groupby. At least for me it's readable and understandable, so it works.

I found the test result - I'll figure out the white space error and resubmit.

Documentation for dropDuplicates() and drop_duplicates() is one and the same. Resolved the error in the example for drop_duplicates  using the same approach used for groupby and groupBy, by indicating that dropDuplicates and drop_duplicates are aliases.
@davies
Copy link
Contributor

davies commented Sep 29, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Sep 29, 2015

Test build #1825 has finished for PR 8930 at commit 279d620.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in c1ad373 Sep 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants