Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

refactor spearman correlation in DataFrame.corr to:

  1. support missing values;
  2. add parameter min_periods;
  3. enable arrow execution since no longer depend on VectorUDT;
  4. support lazy evaluation;

Why are the changes needed?

to make its behavior same as Pandas

Does this PR introduce any user-facing change?

yes, API change, new parameter supported

How was this patch tested?

added UT

@zhengruifeng
Copy link
Contributor Author

Merged into master, thanks @HyukjinKwon

@zhengruifeng zhengruifeng deleted the ps_df_spearman branch September 15, 2022 01:54
HyukjinKwon pushed a commit that referenced this pull request Sep 22, 2022
### What changes were proposed in this pull request?
Remove `pyspark.pandas.ml`

### Why are the changes needed?
`pyspark.pandas.ml` is no longer needed, since we implemented correlations based on Spark SQL:

1. pearson corrleation implemented in #37845
2. spearman corrleation implemented #37874

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
updated suites

Closes #37968 from zhengruifeng/ps_del_ml.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants