Skip to content

[BUG] sort_values failed after using dropna #2488

@hoarjour

Description

@hoarjour

Describe the bug
when I try to use sort_values(ignore_index=True) after dropna, it raises TypeError:

a = md.Series([1,3,2,np.nan,np.nan])
a.dropna().sort_values(ignore_index=True).execute()

but I can do it in pandas:

b = pd.Series([1, 3, 2, np.nan, np.nan])
b.dropna().sort_values(ignore_index=True)

To Reproduce
To help us reproducing this bug, please provide information below:

  1. Your Python version: 3.8.0
  2. The version of Mars you use: 0.6.11
  3. Versions of crucial packages, such as numpy, scipy and pandas: pandas: 1.1.3
  4. Full stack of the error.
ValueError                                Traceback (most recent call last)
c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\dtypes\common.py in ensure_python_int(value)
    170     try:
--> 171         new_value = int(value)
    172         assert new_value == value

ValueError: cannot convert float NaN to integer

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-18-f7e878c753c1> in <module>
      1 a = md.Series([1,3,2,np.nan,np.nan])
----> 2 a.dropna().sort_values(ignore_index=True).execute()

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\dataframe\sort\sort_values.py in series_sort_values(series, axis, ascending, inplace, kind, na_position, ignore_index, parallel_kind, psrs_kinds)
    317                              parallel_kind=parallel_kind, psrs_kinds=psrs_kinds,
    318                              output_types=[OutputType.series], gpu=series.op.is_gpu())
--> 319     sorted_series = op(series)
    320     if inplace:
    321         series.data = sorted_series.data

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\utils.py in _inner(*args, **kwargs)
    454         def _inner(*args, **kwargs):
    455             with self:
--> 456                 return func(*args, **kwargs)
    457 
    458         return _inner

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\dataframe\sort\sort_values.py in __call__(self, a)
     97         assert self.axis == 0
     98         if self.ignore_index:
---> 99             index_value = parse_index(pd.RangeIndex(a.shape[0]))
    100         else:
    101             if isinstance(a.index_value.value, IndexValue.RangeIndex):

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\indexes\range.py in __new__(cls, start, stop, step, dtype, copy, name)
    100             raise TypeError("RangeIndex(...) must be called with integers")
    101 
--> 102         start = ensure_python_int(start) if start is not None else 0
    103 
    104         if stop is None:

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\dtypes\common.py in ensure_python_int(value)
    172         assert new_value == value
    173     except (TypeError, ValueError, AssertionError) as err:
--> 174         raise TypeError(f"Wrong type {type(value)} for value {value}") from err
    175     return new_value
    176 

TypeError: Wrong type <class 'float'> for value nan

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions