Skip to content

Conversation

@jbrockmendel
Copy link
Member

Makes astype_nansafe for (td64|dt64) -> (object|str|string) match DTA/TDA/Series behavior.

Medium-term (weeks) the goal is to get rid of Block._astype altogether and just use astype_nansafe, which among other things will be helpful for ArrayManager.

This changes Series[dt64].astype("string") behavior in a way that causes a new xfail in test_astype_roundtrip, but as discussed in #36153 that test is already wrong for other reasons.

This also has a side-effect of changing Series(dt64, dtype="Sparse[object]") behavior, discussed in #38508 as possibly not-desirable.

@jreback
Copy link
Contributor

jreback commented Dec 19, 2020

This also has a side-effect of changing Series(dt64, dtype="Sparse[object]") behavior, discussed in #38508 as possibly not-desirable.

where is this test case changed?

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Refactor Internal refactoring of code labels Dec 19, 2020
@jreback jreback added this to the 1.3 milestone Dec 19, 2020
@jbrockmendel
Copy link
Member Author

where is this test case changed?

we dont have a test for this; #38508 introduced new ones

@jreback jreback merged commit 9a46a4b into pandas-dev:master Dec 21, 2020
@jbrockmendel jbrockmendel deleted the ref-blk-astype-3 branch December 21, 2020 17:29
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
@simonjayhawkins
Copy link
Member

This changes Series[dt64].astype("string") behavior in a way that causes a new xfail in test_astype_roundtrip, but as discussed in #36153 that test is already wrong for other reasons.

I'm not sure about the new behaviour, I think this should at least have a release note if not reverted.

old behaviour

>>> pd.__version__
'1.2.4'
>>> 
>>> tdi = pd.timedelta_range("1 Day", periods=3)
>>> ser = pd.Series(tdi)
>>> ser
0   1 days
1   2 days
2   3 days
dtype: timedelta64[ns]
>>> ser.astype("string")
0    1 days
1    2 days
2    3 days
dtype: string
>>> 
>>> dti = pd.date_range("2021", periods=3)
>>> ser = pd.Series(dti)
>>> ser
0   2021-01-01
1   2021-01-02
2   2021-01-03
dtype: datetime64[ns]
>>> ser.astype("string")
0    2021-01-01
1    2021-01-02
2    2021-01-03
dtype: string

new behaviour

>>> pd.__version__
'1.3.0.dev0+1567.g67c9385787'
>>> 
>>> tdi = pd.timedelta_range("1 Day", periods=3)
>>> ser = pd.Series(tdi)
>>> ser
0   1 days
1   2 days
2   3 days
dtype: timedelta64[ns]
>>> ser.astype("string")
0     86400000000000 nanoseconds
1    172800000000000 nanoseconds
2    259200000000000 nanoseconds
dtype: string
>>> 
>>> dti = pd.date_range("2021", periods=3)
>>> ser = pd.Series(dti)
>>> ser
0   2021-01-01
1   2021-01-02
2   2021-01-03
dtype: datetime64[ns]
>>> ser.astype("string")
0    2021-01-01T00:00:00.000000000
1    2021-01-02T00:00:00.000000000
2    2021-01-03T00:00:00.000000000
dtype: string
>>> 

@jreback
Copy link
Contributor

jreback commented May 10, 2021

@simonjayhawkins can u open a new issue - this is going to be very hard to revert - but i agree original behavior is correct so should fix

@simonjayhawkins
Copy link
Member

#41409

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Dtype Conversions Unexpected or buggy dtype conversions Refactor Internal refactoring of code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants