-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception when using loc[boolean] in assignment #1044
Comments
Thanks @gshimansky for the report! I can reproduce this locally and will get this fixed for the next release. Thanks again for reporting! Here's the Traceback for future reference:
|
Hi. I wrote another reproducer which results in quite different exception stack trace (because I think it uses a different indexing engine in Pandas). The only difference is that integers are used in index, but error message is quite different. #import pandas as pd
import ray
ray.init(huge_pages=True, plasma_directory="/mnt/hugepages")
import modin.pandas as pd
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
index=[1, 2, 3],
columns=['max_speed', 'shield'])
print(df)
condition = df['shield'] > 6
print(condition)
df.loc[condition, 'new_col'] = df.loc[condition, 'max_speed']
print(df) |
Interesting @gshimansky, would you post the |
Ok
|
The reason why I added a second reproducer is that this is exactly the error that I am getting. When I wrote a first reproducer I saw that errors are different, but couldn't make it to produce exception that I am getting in my code. Today I managed to find the difference, it is because in first reproducer Pandas uses |
Thanks @gshimansky, there may be two bugs here. I'll dig into this and get back. The indexing logic is a bit complex and dense because pandas allows so many different ways of using |
I dug into this a bit and it's going to be a difficult edge case to solve. Basically the challenge is assigning a new column with a boolean mask at the same time. One option is to insert the new column, then correct the values after the new column is generated. Another option is to reindex the new |
Would this special case slow down all other |
Detecting the booleans isn't the issue in this case (we already do that), it's creating a new column with a boolean index value to another df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
index=[1, 2, 3],
columns=['max_speed', 'shield'])
print(df)
condition = df['shield'] > 6
new_values = df.loc[condition, 'max_speed'] # A Series with one value
df.loc[condition, 'new_col'] = new_values # Create a new column with the subset of values I tested some other weird things, like if the boolean index doesn't line up with the new values (changing the |
Yes, I didn't realize that |
Pushing to next release |
Hi, since this issue is there for quite sometime, i wanted to know if theres a work around for this. my code is doing something like: and this is producing error |
@kunal-gohrani Thanks for posting! Does |
System information
Ubuntu 19.10
Source
Git master revision 0.7.0+3.g7528bf3
Python 3.7.5
Describe the problem
The following reproducer code works on pandas but generates an exception in assignment
Source code / logs
The text was updated successfully, but these errors were encountered: