API: reconsider returning read-only arrays from DataFrame/Series .array/.values/__array__

**Context**: during the implementation of the Copy-on-Write feature (https://github.com/pandas-dev/pandas/issues/48998), there was the idea to make returned arrays read-only for APIs that return underlying arrays (`.values`, `to_numpy()`, `__array__`).

This was initially only done for numpy arrays (the first two PRs), and recently also for columns backed by ExtensionArrays (both for when returning an EA (`.values` / `.array`) or returning the EA as a numpy array (`to_numpy()`, `__array__`)):

- https://github.com/pandas-dev/pandas/pull/51082
- https://github.com/pandas-dev/pandas/pull/53704
- https://github.com/pandas-dev/pandas/pull/61925

The idea behind returning a read-only array is as follows: with Copy-on-Write, the guarantee we provide is that mutating one _pandas_ object (Series, DataFrame) doesn't update another pandas object (whose data is shared as an implementation detail). But users can still easily get a viewing numpy array, and mutate that one. And at that point, we don't have any control over how this mutation propagates (it might update more objects than just the one from which the user obtained it, for example if other Series/DataFrames were sharing data with this object with CoW). 

Example to illustrate this:

```python
# creating a dataframe and a derived dataframe through some operation
# (that in this case didn't need to copy)
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> df2 = df.sort_values(by="a").reset_index()

# getting a column and mutating this -> CoW gets triggered and only `ser` is changed, not `df`
>>> ser = df["a"]
>>> ser[0] = 100
>>> ser
0    100
1      2
2      3
Name: a, dtype: int64
>>> df
   a  b
0  1  4
1  2  5
2  3  6

# however, when the code is mutating the numpy array it got from the series (or dataframe)
# (though .values, or np.asarray(ser), etc), then even the derived `df2` is silently mutated
>>> ser = df["a"]
>>> arr = ser.values
>>> arr.flags.writeable = True  # <-- this is now needed because we made .values readonly
>>> arr[0] = 100
>>> df2
   index    a  b
0      0  100  4
1      1    2  5
2      2    3  6
```

Right now, with returning read-only arrays, I have to include `arr.flags.writeable = True` to make this work (otherwise the above example would raise an error in `arr[0] = 100` about the array being read-only). 

But if we didn't make the returned arrays read-only, this would work, and such mutations of the underlying numpy array would propagate unpredictably to other pandas series/dataframe objects.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: reconsider returning read-only arrays from DataFrame/Series .array/.values/array #63099

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: reconsider returning read-only arrays from DataFrame/Series .array/.values/__array__ #63099

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

API: reconsider returning read-only arrays from DataFrame/Series .array/.values/array #63099