Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour when assigning to series? #25548

Open
Tracked by #1 ...
mwiebusch78 opened this issue Mar 5, 2019 · 7 comments
Open
Tracked by #1 ...

Inconsistent behaviour when assigning to series? #25548

mwiebusch78 opened this issue Mar 5, 2019 · 7 comments
Assignees
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions

Comments

@mwiebusch78
Copy link

I noticed that .loc and __setitem__ behave very differently when assigning one series to a sub-range of another series:

>>> s = pd.Series(0.0, index=list('abcd'))
>>> s1 = pd.Series(1.0, index=list('ab'))
>>> s2 = pd.Series(2.0, index=list('xy'))
>>> s[['a', 'b']] = s2
>>> s  # names of s2 are ignored as expected
a    2.0
b    2.0
c    0.0
d    0.0
dtype: float64
>>> s.loc[['a', 'b']] = s2
>>> s  # not expected!!
a    NaN
b    NaN
c    0.0
d    0.0
dtype: float64
>>> s.loc[['a', 'b']] = s1
>>> s  # everything's fine if the indices match
a    1.0
b    1.0
c    0.0
d    0.0
dtype: float64

I'm not sure if this is intended behaviour but it seems odd.

I'm on pandas v. 0.24.1

@WillAyd
Copy link
Member

WillAyd commented Mar 6, 2019

Not sure I agree on expectation but this is rather nuanced. I think this should be raising a SettingWithCopyWarning for the first sample - @TomAugspurger maybe thoughts on your end?

@WillAyd WillAyd added the Indexing Related to indexing on series/frames, not to indexes themselves label Mar 6, 2019
@TomAugspurger
Copy link
Contributor

I'm not sure what the rules are for setitem. It seems like labels are ignored when the lengths are the same?

In [48]: s3 = pd.Series([1, 2], index=['a', 'b'])

In [49]: target = s.copy()

In [50]: target[['a', 'b']] = s3; target
Out[50]:
a    1.0
b    2.0
c    0.0
d    0.0
dtype: float64

In [51]: target = s.copy()

In [52]: target[['a', 'b']] = s3[['b', 'a']]; target
Out[52]:
a    2.0
b    1.0
c    0.0
d    0.0
dtype: float64

But differing lengths triggers an alignment (output 2 and 3; though 3 is already aligned)?

I wouldn't expect a SettingWithCopyWarning on the first one. The target isn't a (maybe) copy of another object. This is all in a single call to __setitem__ so it's fine (as opposed to x = s[['a', 'b']]; x = s2)

@phofl
Copy link
Member

phofl commented Nov 7, 2020

This seems to be consitent now and returns

a    NaN
b    NaN
c    0.0
d    0.0
dtype: float64
a    NaN
b    NaN
c    0.0
d    0.0
dtype: float64
a    1.0
b    1.0
c    0.0
d    0.0
dtype: float64

Is this the expected output now?

@phofl phofl added Needs Tests Unit test(s) needed to prevent regressions and removed Needs Tests Unit test(s) needed to prevent regressions labels Nov 7, 2020
@mroeschke mroeschke added the Bug label Jun 27, 2021
@phofl
Copy link
Member

phofl commented Apr 18, 2023

This is expected

@phofl phofl added Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Apr 18, 2023
@srkds
Copy link
Contributor

srkds commented Apr 22, 2023

I noticed that .loc and __setitem__ behave very differently when assigning one series to a sub-range of another series:

>>> s = pd.Series(0.0, index=list('abcd'))
>>> s1 = pd.Series(1.0, index=list('ab'))
>>> s2 = pd.Series(2.0, index=list('xy'))
>>> s[['a', 'b']] = s2
>>> s  # names of s2 are ignored as expected
a    2.0
b    2.0
c    0.0
d    0.0
dtype: float64
>>> s.loc[['a', 'b']] = s2
>>> s  # not expected!!
a    NaN
b    NaN
c    0.0
d    0.0
dtype: float64
>>> s.loc[['a', 'b']] = s1
>>> s  # everything's fine if the indices match
a    1.0
b    1.0
c    0.0
d    0.0
dtype: float64

I'm not sure if this is intended behaviour but it seems odd.

I'm on pandas v. 0.24.1

I tried executing the same example and got the same result.
pandas version == 2.0.0

>>> s.loc[['a', 'b']] = s2
> >>> s  # This should be the expected output or it works as intended (o/p with NaN one)?
> a    2.0
> b    2.0
> c    0.0
> d    0.0

@DhruvBShetty
Copy link
Contributor

take

@DhruvBShetty
Copy link
Contributor

The behaviour of .loc[[]] with other dtypes and boolean dtypes in Index is different.

import pandas as pd

series1=pd.Series(['a','b','c','d'],index=[1,2,3,4])
series2=pd.Series(['a','b','c','d'],index=[True,True,True,False])

print(series1.loc[[2]])
Out[1]: 2    b 
dtype: object

print(series2.loc[[False,True,False,False]])
Out[2]: True    b
dtype: object

This is expected behaviour

Should we have tests for say series2.loc[[]] for boolean indices being assigned with series3 that is another series with boolean indices? Example below

series3=pd.Series(['e','f'],index=[True,False])
series2.loc[[True,False,False,True]]=series3
Out[3]:True     e
True     b
True     c
False    f
dtype: object

And if it's included, should it have one test or 2 separate tests( 1 for other dtypes in index, 1 for boolean dtypes in index).
I wrote the test already using the series_with_simple_index fixture and required an if condition to handle boolean types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

No branches or pull requests

7 participants