@@ -46,8 +46,8 @@ of elements to display is five, but you may pass a custom number.
4646
4747 .. _basics.attrs :
4848
49- Attributes and the raw ndarray(s)
50- ---------------------------------
49+ Attributes and Underlying Data
50+ ------------------------------
5151
5252pandas objects have a number of attributes enabling you to access the metadata
5353
@@ -65,14 +65,43 @@ Note, **these attributes can be safely assigned to**!
6565 df.columns = [x.lower() for x in df.columns]
6666 df
6767
68- To get the actual data inside a data structure, one need only access the
69- **values ** property:
68+ Pandas objects (:class: `Index `, :class: `Series `, :class: `DataFrame `) can be
69+ thought of as containers for arrays, which hold the actual data and do the
70+ actual computation. For many types, the underlying array is a
71+ :class: `numpy.ndarray `. However, pandas and 3rd party libraries may *extend *
72+ NumPy's type system to add support for custom arrays
73+ (see :ref: `basics.dtypes `).
74+
75+ To get the actual data inside a :class: `Index ` or :class: `Series `, use
76+ the **array ** property
77+
78+ .. ipython :: python
79+
80+ s.array
81+ s.index.array
82+
83+ Depending on the data type (see :ref: `basics.dtypes `), :attr: `~Series.array `
84+ be either a NumPy array or an :ref: `ExtensionArray <extending.extension-type >`.
85+ If you know you need a NumPy array, use :meth: `~Series.to_numpy `
86+ or :meth: `numpy.asarray `.
7087
7188.. ipython :: python
7289
73- s.values
74- df.values
75- wp.values
90+ s.to_numpy()
91+ np.asarray(s)
92+
93+ For Series and Indexes backed by NumPy arrays (like we have here), this will
94+ be the same as :attr: `~Series.array `. When the Series or Index is backed by
95+ a :class: `~pandas.api.extension.ExtensionArray `, :meth: `~Series.to_numpy `
96+ may involve copying data and coercing values.
97+
98+ Getting the "raw data" inside a :class: `DataFrame ` is possibly a bit more
99+ complex. When your ``DataFrame `` only has a single data type for all the
100+ columns, :atr: `DataFrame.to_numpy ` will return the underlying data:
101+
102+ .. ipython :: python
103+
104+ df.to_numpy()
76105
77106 If a DataFrame or Panel contains homogeneously-typed data, the ndarray can
78107actually be modified in-place, and the changes will be reflected in the data
@@ -87,6 +116,21 @@ unlike the axis labels, cannot be assigned to.
87116 strings are involved, the result will be of object dtype. If there are only
88117 floats and integers, the resulting array will be of float dtype.
89118
119+ In the past, pandas recommended :attr: `Series.values ` or :attr: `DataFrame.values `
120+ for extracting the data from a Series or DataFrame. You'll still find references
121+ to these in old code bases and online. Going forward, we recommend avoiding
122+ ``.values `` and using ``.array `` or ``.to_numpy() ``. ``.values `` has the following
123+ drawbacks:
124+
125+ 1. When your Series contains an :ref: `extension type <extending.extension-type >`, it's
126+ unclear whether :attr: `Series.values ` returns a NumPy array or the extension array.
127+ :attr: `Series.array ` will always return the actual array backing the Series,
128+ while :meth: `Series.to_numpy ` will always return a NumPy array.
129+ 2. When your DataFrame contains a mixture of data types, :attr: `DataFrame.values ` may
130+ involve copying data and coercing values to a common dtype, a relatively expensive
131+ operation. :meth: `DataFrame.to_numpy `, being a method, makes it clearer that the
132+ returned NumPy array may not be a view on the same data in the DataFrame.
133+
90134.. _basics.accelerate :
91135
92136Accelerated operations
@@ -541,7 +585,7 @@ will exclude NAs on Series input by default:
541585.. ipython :: python
542586
543587 np.mean(df[' one' ])
544- np.mean(df[' one' ].values )
588+ np.mean(df[' one' ].to_numpy() )
545589
546590:meth: `Series.nunique ` will return the number of unique non-NA values in a
547591Series:
@@ -839,7 +883,7 @@ Series operation on each column or row:
839883
840884 tsdf = pd.DataFrame(np.random.randn(10 , 3 ), columns = [' A' , ' B' , ' C' ],
841885 index = pd.date_range(' 1/1/2000' , periods = 10 ))
842- tsdf.values [3 :7 ] = np.nan
886+ tsdf.iloc [3 :7 ] = np.nan
843887
844888 .. ipython :: python
845889
@@ -1875,17 +1919,29 @@ dtypes
18751919------
18761920
18771921For the most part, pandas uses NumPy arrays and dtypes for Series or individual
1878- columns of a DataFrame. The main types allowed in pandas objects are ``float ``,
1879- ``int ``, ``bool ``, and ``datetime64[ns] `` (note that NumPy does not support
1880- timezone-aware datetimes).
1881-
1882- In addition to NumPy's types, pandas :ref: `extends <extending.extension-types >`
1883- NumPy's type-system for a few cases.
1884-
1885- * :ref: `Categorical <categorical >`
1886- * :ref: `Datetime with Timezone <timeseries.timezone_series >`
1887- * :ref: `Period <timeseries.periods >`
1888- * :ref: `Interval <indexing.intervallindex >`
1922+ columns of a DataFrame. NumPy provides support for ``float ``,
1923+ ``int ``, ``bool ``, ``timedelta64[ns] `` and ``datetime64[ns] `` (note that NumPy
1924+ does not support timezone-aware datetimes).
1925+
1926+ Pandas and third-party libraries *extend * NumPy's type system in a few places.
1927+ This section describes the extensions pandas has made internally.
1928+ See :ref: `extending.extension-types ` for how to write your own extension that
1929+ works with pandas. See :ref: `ecosystem.extensions ` for a list of third-party
1930+ libraries that have implemented an extension.
1931+
1932+ The following table lists all of pandas extension types. See the respective
1933+ documentation sections for more on each type.
1934+
1935+ =================== ========================= ================== ============================= =============================
1936+ Kind of Data Data Type Scalar Array Documentation
1937+ =================== ========================= ================== ============================= =============================
1938+ tz-aware datetime :class: `DatetimeArray ` :class: `Timestamp ` :class: `arrays.DatetimeArray ` :ref: `timeseries.timezone `
1939+ Categorical :class: `CategoricalDtype ` (none) :class: `Categorical ` :ref: `categorical `
1940+ period (time spans) :class: `PeriodDtype ` :class: `Period ` :class: `arrays.PeriodArray ` :ref: `timeseries.periods `
1941+ sparse :class: `SparseDtype ` (none) :class: `arrays.SparseArray ` :ref: `sparse `
1942+ intervals :class: `IntervalDtype ` :class: `Interval ` :class: `arrays.IntervalArray ` :ref: `advanced.intervalindex `
1943+ nullable integer :clsas: `Int64Dtype `, ... (none) :class: `arrays.IntegerArray ` :ref: `integer_na `
1944+ =================== ========================= ================== ============================= =============================
18891945
18901946Pandas uses the ``object `` dtype for storing strings.
18911947
@@ -1983,13 +2039,13 @@ from the current type (e.g. ``int`` to ``float``).
19832039 df3
19842040 df3.dtypes
19852041
1986- The `` values `` attribute on a DataFrame return the *lower-common-denominator * of the dtypes, meaning
2042+ :meth: ` DataFrame.to_numpy ` will return the *lower-common-denominator * of the dtypes, meaning
19872043the dtype that can accommodate **ALL ** of the types in the resulting homogeneous dtyped NumPy array. This can
19882044force some *upcasting *.
19892045
19902046.. ipython :: python
19912047
1992- df3.values .dtype
2048+ df3.to_numpy() .dtype
19932049
19942050 astype
19952051~~~~~~
@@ -2211,11 +2267,11 @@ dtypes:
22112267 ' float64' : np.arange(4.0 , 7.0 ),
22122268 ' bool1' : [True , False , True ],
22132269 ' bool2' : [False , True , False ],
2214- ' dates' : pd.date_range(' now' , periods = 3 ).values ,
2270+ ' dates' : pd.date_range(' now' , periods = 3 ),
22152271 ' category' : pd.Series(list (" ABC" )).astype(' category' )})
22162272 df[' tdeltas' ] = df.dates.diff()
22172273 df[' uint64' ] = np.arange(3 , 6 ).astype(' u8' )
2218- df[' other_dates' ] = pd.date_range(' 20130101' , periods = 3 ).values
2274+ df[' other_dates' ] = pd.date_range(' 20130101' , periods = 3 )
22192275 df[' tz_aware_dates' ] = pd.date_range(' 20130101' , periods = 3 , tz = ' US/Eastern' )
22202276 df
22212277
0 commit comments