-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-40510][PS] Implement ddof in Series.cov
#37953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
itholic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise
python/pyspark/pandas/series.py
Outdated
| min_periods : int, optional | ||
| Minimum number of observations needed to have a valid result. | ||
| ddof : int, default 1 | ||
| Delta degrees of freedom. The divisor used in calculations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there are two spaces between sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
python/pyspark/pandas/series.py
Outdated
| return sdf.select(SF.covar(F.col(sdf.columns[0]), F.col(sdf.columns[1]), ddof)).head(1)[ | ||
| 0 | ||
| ][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it a bit more prettier?
e.g.
return sdf.select(
SF.covar(F.col(sdf.columns[0]), F.col(sdf.columns[1]), ddof)).head(1)[0][0]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was changed by dev/reformat-python...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sometimes black reformats the code looking more ugly 😂
| if not isinstance(ddof, int): | ||
| raise TypeError("ddof must be integer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe do we need to add a negative test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, willl update soon
|
also cc @HyukjinKwon |
|
Merged into master, thank @HyukjinKwon @itholic for reivews! |
What changes were proposed in this pull request?
Implement
ddofinSeries.cov, by switch toSF.covarWhy are the changes needed?
for API coverage
Does this PR introduce any user-facing change?
yes,
ddofsupported nowHow was this patch tested?
added UT