Skip to content

Conversation

@SaladRaider
Copy link

What do these changes do?

Implement dropna when axis is tuple or list.

Copy link
Owner

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition! A few comments though.

cp = ray_df.copy()
result = ray_df.dropna(how='all', axis=[0, 1])
result2 = ray_df.dropna(how='all', axis=(0, 1))
expected = pd_df.dropna(how='all').dropna(how='all', axis=1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this. For now, just comparing ray and pandas is fine.


ray_df_equals_pandas(ray_df.dropna(axis=1, how='any'),
pd_df.dropna(axis=1, how='any'))
assert ray_df_equals_pandas(ray_df.dropna(axis=1, how='any'),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, thanks!

raise NotImplementedError(
"To contribute to Pandas on Ray, please visit "
"github.com/ray-project/ray.")
axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put it in a set here.

"To contribute to Pandas on Ray, please visit "
"github.com/ray-project/ray.")
axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis])
result = self
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is necessary in this case b/c the for loop makes reference to result. this is consistent with the pandas source code.

"github.com/ray-project/ray.")
axis = set([pd.DataFrame()._get_axis_number(ax) for ax in axis])
result = self
for ax in axis:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here that this is inefficient since it forces the DataFrame to be built as an intermediate and should be fixed later.

if not inplace:
return result

return self._update_inplace(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass _block_partitions in instead, it's more efficient.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

assert ray_df_equals_pandas(result, expected)
assert ray_df_equals_pandas(result2, expected)

assert ray_df_equals_pandas(result, expected)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code duplicated?

assert ray_df_equals_pandas(result2, expected)
assert ray_df_equals(ray_df, cp)

inp = ray_df.copy()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already make a copy above, use it since none of those operations mutated the copy.


inp = ray_df.copy()
inp.dropna(how='all', axis=(0, 1), inplace=True)
assert ray_df_equals_pandas(inp, expected)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split inplace test from other test and model after the other dropna tests.

Copy link
Owner

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more!


@pytest.fixture
def test_dropna_multiple_axes_inplace(ray_df, pd_df):
ray_df = ray_df.copy()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name here to ray_df_copy


assert ray_df_equals_pandas(ray_df, pd_df)

ray_df.dropna(how='all', axis=(0, 1), inplace=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a new copy of the original dfs first, else this operates on the result of the previous dropna call.


return self._update_inplace(
block_partitions=result._block_partitions,
columns=result._col_metadata.index,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.columns here

return self._update_inplace(
block_partitions=result._block_partitions,
columns=result._col_metadata.index,
index=result._row_metadata.index
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.index here

"To contribute to Pandas on Ray, please visit "
"github.com/ray-project/ray.")
result = self
for ax in axis: # TODO: inefficient, df built as intermediate
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer comment: # TODO(kunalgosar): this builds an intermediate dataframe, which does unnecessary computation

Also, put comment on its own line.

Copy link
Owner

@kunalgosar kunalgosar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!! Thanks

@kunalgosar kunalgosar merged commit 16faf64 into kunalgosar:filter May 4, 2018
kunalgosar added a commit that referenced this pull request May 6, 2018
* implement filter

* begin implementation of dropna

* implement dropna

* docs and tests

* resolving comments

* resolving merge

* add error checking to dropna

* fix update inplace call

* Implement multiple axis for dropna (#13)

* Implement multiple axis for dropna

* Add multiple axis dropna test

* Fix using dummy_frame in dropna

* Clean up dropna multiple axis tests

* remove unnecessary axis modification

* Clean up dropna tests

* resolve comments

* fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants