-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Code Sample, a copy-pastable example if possible
In [2]: pd.DataFrame([1, 2], columns=range(3))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
4844 blocks = [make_block(values=blocks[0],
-> 4845 placement=slice(0, len(axes[0])))]
4846
/home/nobackup/repo/pandas/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
3192
-> 3193 return klass(values, ndim=ndim, placement=placement)
3194
/home/nobackup/repo/pandas/pandas/core/internals.py in __init__(self, values, placement, ndim)
124 'Wrong number of items passed {val}, placement implies '
--> 125 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
126
ValueError: Wrong number of items passed 1, placement implies 3
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-2-4ad51ebcfae4> in <module>()
----> 1 pd.DataFrame([1, 2], columns=range(3))
/home/nobackup/repo/pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
403 else:
404 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 405 copy=copy)
406 else:
407 mgr = self._init_dict({}, index, columns, dtype=dtype)
/home/nobackup/repo/pandas/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
536 values = maybe_infer_to_datetimelike(values)
537
--> 538 return create_block_manager_from_blocks([values], [columns, index])
539
540 @property
/home/nobackup/repo/pandas/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
4852 blocks = [getattr(b, 'values', b) for b in blocks]
4853 tot_items = sum(b.shape[0] for b in blocks)
-> 4854 construction_error(tot_items, blocks[0].shape[1:], axes, e)
4855
4856
/home/nobackup/repo/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
4829 raise ValueError("Empty data passed with indices specified.")
4830 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4831 passed, implied))
4832
4833
ValueError: Shape of passed values is (1, 2), indices imply (3, 2)Problem description
(From #18626 (comment) )
#18819 (now fixed) disabled a call such as pd.Series([1], index=range(3)) - the same result can be obtained with pd.Series(1, index=range(3), which is less ambiguous.
In principle, the same reasoning should lead us to disable pd.DataFrame([[1, 2]], index=range(3)). But that can't be replaced as comfortably, because pd.DataFrame([1, 2], index=range(3)) aligns vertically - and this couldn't be otherwise, as 1d objects are treated as Series, and Series in DataFrames are mainly columns, not rows. Moreover, this is probably quite used in existing code, and also in tests:
pandas/pandas/tests/frame/test_apply.py
Line 139 in 6cacdde
| expected = DataFrame([self.frame.mean()], index=self.frame.index) |
pandas/pandas/tests/indexes/test_multi.py
Line 3248 in 6cacdde
| df0 = pd.DataFrame([[1, 2]], index=idx0) |
pandas/pandas/tests/reshape/test_reshape.py
Line 499 in 6cacdde
| df = DataFrame([[10, 11]], index=midx) |
So I think the best way to proceed is:
- allow 1d objects to be broadcasted horizontally (not just aligned vertically)
- clearly document the above, and the fact that 2d objects of length 1 are broadcasted vertically instead
Expected Output
In [3]: pd.DataFrame([[1]*3, [2]*3], columns=range(3))
Out[3]:
0 1 2
0 1 1 1
1 2 2 2Output of pd.show_versions()
In [3]: pd.show_versions()
INSTALLED VERSIONS
commit: 7ec74e5
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.23.0.dev0+798.g7ec74e5f7
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1