Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] sort_values failed after using dropna #2488

Open
hoarjour opened this issue Sep 27, 2021 · 5 comments · May be fixed by #3367
Open

[BUG] sort_values failed after using dropna #2488

hoarjour opened this issue Sep 27, 2021 · 5 comments · May be fixed by #3367

Comments

@hoarjour
Copy link
Contributor

hoarjour commented Sep 27, 2021

Describe the bug
when I try to use sort_values(ignore_index=True) after dropna, it raises TypeError:

a = md.Series([1,3,2,np.nan,np.nan])
a.dropna().sort_values(ignore_index=True).execute()

but I can do it in pandas:

b = pd.Series([1, 3, 2, np.nan, np.nan])
b.dropna().sort_values(ignore_index=True)

To Reproduce
To help us reproducing this bug, please provide information below:

  1. Your Python version: 3.8.0
  2. The version of Mars you use: 0.6.11
  3. Versions of crucial packages, such as numpy, scipy and pandas: pandas: 1.1.3
  4. Full stack of the error.
ValueError                                Traceback (most recent call last)
c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\dtypes\common.py in ensure_python_int(value)
    170     try:
--> 171         new_value = int(value)
    172         assert new_value == value

ValueError: cannot convert float NaN to integer

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-18-f7e878c753c1> in <module>
      1 a = md.Series([1,3,2,np.nan,np.nan])
----> 2 a.dropna().sort_values(ignore_index=True).execute()

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\dataframe\sort\sort_values.py in series_sort_values(series, axis, ascending, inplace, kind, na_position, ignore_index, parallel_kind, psrs_kinds)
    317                              parallel_kind=parallel_kind, psrs_kinds=psrs_kinds,
    318                              output_types=[OutputType.series], gpu=series.op.is_gpu())
--> 319     sorted_series = op(series)
    320     if inplace:
    321         series.data = sorted_series.data

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\utils.py in _inner(*args, **kwargs)
    454         def _inner(*args, **kwargs):
    455             with self:
--> 456                 return func(*args, **kwargs)
    457 
    458         return _inner

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\mars\dataframe\sort\sort_values.py in __call__(self, a)
     97         assert self.axis == 0
     98         if self.ignore_index:
---> 99             index_value = parse_index(pd.RangeIndex(a.shape[0]))
    100         else:
    101             if isinstance(a.index_value.value, IndexValue.RangeIndex):

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\indexes\range.py in __new__(cls, start, stop, step, dtype, copy, name)
    100             raise TypeError("RangeIndex(...) must be called with integers")
    101 
--> 102         start = ensure_python_int(start) if start is not None else 0
    103 
    104         if stop is None:

c:\users\hoa'r'jou'r\appdata\local\programs\python\python38\lib\site-packages\pandas\core\dtypes\common.py in ensure_python_int(value)
    172         assert new_value == value
    173     except (TypeError, ValueError, AssertionError) as err:
--> 174         raise TypeError(f"Wrong type {type(value)} for value {value}") from err
    175     return new_value
    176 

TypeError: Wrong type <class 'float'> for value nan

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@hekaisheng
Copy link
Contributor

hekaisheng commented Sep 27, 2021

Please copy-paste your code and error message instead of screenshots.

@qinxuye qinxuye added mod: dataframe type: bug Something isn't working labels Sep 27, 2021
@qinxuye qinxuye added this to the v0.8.0b2 milestone Sep 27, 2021
@wjsi
Copy link
Member

wjsi commented Oct 8, 2021

Can be fixed by parsing pd.RangeIndex(-1) when size of certain dimension is unknown.

@wjsi wjsi added the pr welcome label Oct 8, 2021
@qinxuye qinxuye added the good first issue Good for newcomers label Oct 8, 2021
@qinxuye qinxuye modified the milestones: v0.8.0b2, v0.8.0rc1 Oct 9, 2021
@qinxuye qinxuye modified the milestones: v0.8.0rc1, v0.9.0a1 Oct 23, 2021
@qinxuye qinxuye modified the milestones: v0.9.0a1, v0.9.0a2 Dec 16, 2021
@qinxuye qinxuye removed this from the v0.9.0a2 milestone Jan 29, 2022
@DanielGoman
Copy link

Hello :)
I'm a beginner to open source and I'd like to resolve this issue.
Is it still relevant?

@qinxuye
Copy link
Collaborator

qinxuye commented Oct 11, 2022

Hello :) I'm a beginner to open source and I'd like to resolve this issue. Is it still relevant?

Super welcome, you can try to fix this, feel free to ask question if you encounter any issue.

Shaun2h added a commit to Shaun2h/mars that referenced this issue Oct 7, 2023
@Shaun2h Shaun2h mentioned this issue Oct 7, 2023
2 tasks
@Shaun2h
Copy link

Shaun2h commented Oct 7, 2023

Hello. I'm new to the open source pull request thing, but I've forked and sent out a pull request at #3363

I would note that running black as suggested for linting also edited mars/learn/contrib/lightgbm/tests/test_classifier.py.

Edits at a glance:
mars\dataframe\sort\sort_values.py
Lines 111 - 114
From:

  def __call__(self, a):
        assert self.axis == 0
        if self.ignore_index:
            index_value = parse_index(pd.RangeIndex(a.shape[0]))
        else:
            if isinstance(a.index_value.value, IndexValue.RangeIndex):
                index_value = parse_index(pd.Index([], dtype=np.int64))
            else:
                index_value = a.index_value
    -snip-

To:

    def __call__(self, a):
        assert self.axis == 0
        if self.ignore_index:
            if type(a.shape[0]) != int:
                index_value = parse_index(pd.RangeIndex(-1))
            else:
                index_value = parse_index(pd.RangeIndex(a.shape[0]))
        else:
            if isinstance(a.index_value.value, IndexValue.RangeIndex):
                index_value = parse_index(pd.Index([], dtype=np.int64))
            else:
                index_value = a.index_value
    -snip-

Gist - Code to recreate problem + some notes (since it's an old issue)
https://gist.github.com/Shaun2h/cf294782c840eaa1223caf2e4ad5bfd0

@vineethsaivs vineethsaivs linked a pull request Oct 10, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants