-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
API/ENH/DEPR: Series.unique returns Series #24108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello @h-vetinari! Thanks for submitting the PR.
|
h-vetinari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some inline notes
| if isinstance(self, ABCSeries): | ||
| uniqs = self.unique(raw=True) | ||
| else: | ||
| uniqs = self.unique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, raw=True is unfortunately not compatible with the Index case
| >>> pd.Series([pd.Timestamp('2016-01-01') | ||
| ... for _ in range(3)]).unique(raw=False) | ||
| 0 2016-01-01 | ||
| dtype: datetime64[ns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is clearly superior output...
| dtype: category | ||
| Categories (3, object): [a < b < c] | ||
| """ | ||
| result = super(Series, self).unique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch for raw=True is exactly the same as before, but the diff is messed up because of the changed indentation.
| if isinstance(duplicated, ABCSeries) and method != pd.unique: | ||
| result = method(duplicated, raw=True) | ||
| else: | ||
| result = method(duplicated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward, I'll admit, but it's the only way I found of keeping the parametrisation while avoiding to raise the FutureWarning.
| We see that the values of `animals` get reconstructed correctly, but | ||
| the index does not match yet -- consequently, the last step is to | ||
| correctly set the index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, it's be really neat to have #22225 available to use .set_index...
Codecov Report
@@ Coverage Diff @@
## master #24108 +/- ##
==========================================
- Coverage 92.21% 92.16% -0.05%
==========================================
Files 161 161
Lines 51684 51723 +39
==========================================
+ Hits 47658 47673 +15
- Misses 4026 4050 +24
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #24108 +/- ##
==========================================
- Coverage 92.21% 92.16% -0.05%
==========================================
Files 161 161
Lines 51684 51723 +39
==========================================
+ Hits 47658 47673 +15
- Misses 4026 4050 +24
Continue to review full report at Codecov.
|
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right this is way too big.
the question about the return value of .unique needs to be answered first; return an .array of the result (for Index and Series) is probably the most reasonable change and is mostly backward compatible
however better to bring this up on the issue (for unique return value)
This is what this PR is about, since the issue had stalled for >1 months despite several pings.
I disagree with this quite strongly:
Would you mind chiming in there then? |
|
good idea, but closing as stale |
|
@jreback @gfyoung @simonjayhawkins |
|
i am not averse to changing the return type of .unique but cannot be linked to return_inverse which likely has less support PRs need to do exactly 1 thing as have been stated many times |
|
@jreback Unfortunately, the |
|
@h-vetinari maybe i still don’t see it why is returning a Series from .unique() actually useful? |
I know this PR is too big, but I need it to initiate discussion before
v.0.24cutoff.I've been working on adding a
return_inversetouniquesince mid June (as fast as possible). I know that changing the return type ofSeries.uniqueis potentially a controversial issue, but I truly believe that this should be done for v.0.24 (as otherwise the behavior is locked in past 1.0 and who knows if it ever changes then).IMO, it's even a necessity if pandas ever wants to support
return_inverseforunique. Here's a few arguments from working on this for a while:.uniqueis the obvious place for an inverse (as opposed to.duplicated, which I was directed to work on first, ENH: add return_inverse to duplicated for DataFrame/Series/Index/MultiIndex #21645)Seriesonly works well if the return type ofSeries.uniqueis aSeries, see API/ENH: overhaul/unify/improve .unique #22824 (comment). This is not even the strongest reason IMO to change the type, as.uniqueis a patchwork currently (see rest OP of API/ENH: overhaul/unify/improve .unique #22824):Index.uniqueis already anIndex(and changing that back tondarraywould be equally disruptive, and wouldn't lead anyhwere re:reconstruction)Series.uniquealready special-casesCategoricaland (effectively)DatetimeArray.The reason I'm pushing this WIP for discussion is that this is obviously needs a deprecation cycle, and I really think this should be part of
v.0.24. I'm sorry for the late timing, but I've been working as fast as feedback speed allowed (as often as politeness allowed me to ping) - #21645 was lying around mostly finished for 2 month, the cython backend (#22986 / #23400) took about 2 months, and I haven't been able to get an answer at #22824 for about 5 weeks (e.g. @jorisvandenbossche seems to be very busy or simply not available)As for the PR itself, I only wanted to change the return-type in this PR, but
Series.uniquetouches several important paths:Series.uniqueIndexOpsMixin.uniqueand henceIndex.uniqueCategorical.uniqueEA.uniqueSo, as a demo here, I've adapted the first three, and it's a separate issue that the EA contract should support the possibility for
return_inverse. This is also something that should IMO needs to make it tov.0.24.git diff upstream/master -u -- "*.py" | flake8 --diffreturn_inversetopd.uniquereturn_inversetoIndex.uniquereturn_inversetoCategorical.uniquerawtoSeries.uniqueunique-calls through class hierarchy and just directly call the cython methods fromSeries.unique, which would allow introducing the deprecation without introducing thereturn_inversekwarg yet.