Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

I think ideally the DataFrame does not need to be aware of how the underlying manager stores the data (as 2D, transposed or not), so moving the logic of ensuring 2D/transposed values from the DataFrame set_item-related method into BlockManager.iset.

This will help for the ArrayManager, so we don't have to re-reshape there.

@jorisvandenbossche jorisvandenbossche added Refactor Internal refactoring of code Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation labels Feb 10, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Feb 10, 2021
@jreback jreback merged commit 83479e1 into pandas-dev:master Feb 10, 2021
@jorisvandenbossche jorisvandenbossche deleted the am-iset branch February 10, 2021 14:44

class TestReshaping(BaseNumPyTests, base.BaseReshapingTests):
@skip_nested
@pytest.mark.skip(reason="Incorrect expected.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand how testing with the PandasArrays works, but, within the merge implementation, we set a column in the resulting dataframe (the column with the key values) using an Index.
And doing an extract_array on an Int64Index gives a PandasDtype("int64") column when running those tests. And because in this PR I changed it such that the values being set are not converted to a 1D array before ending up in BlockManager.iset, the extension dtype is now preserved. And so the expected result from the base test is wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this turns out to be a PITA in a bunch of places. could just monkeypatch extract_array to extract PandasArray too

"'self.flags.allows_duplicate_labels' is False."
)
value = self._sanitize_column(value)
value = _maybe_atleast_2d(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this removed all the usages of maybe_atleast_2d, so can remove the function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, will remove that in one of my next PRs

if value.ndim == 2:
value = value.T

if value.ndim == self.ndim - 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can become an elif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, will also include in a next PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it needs to be an if, because this case needs to end up in the final else to define the value_getitem function

if value.ndim == 2:
value = value.T

if value.ndim == self.ndim - 1 and not is_extension_array_dtype(value.dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif

@jreback
Copy link
Contributor

jreback commented Feb 10, 2021

sorry for so quick @jbrockmendel

will wait in future

@jorisvandenbossche
Copy link
Member Author

Yeah, I explicitly marked Brock for review ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants