-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.loc can't work effectively when I try to append a list to dataframe #2953
Comments
Hi @ifrozenwhale, thanks for the report! This looks like an issue with empty dataframes, I cannot reproduce the issue when there is content in the dataframe, only when it is empty. We are actively working on imrpoving the handling of empty dataframes. Thanks again for the report, we will get this fixed! |
I think I have encountered this same (or similar) issue, but with Unsure if I should open a new issue or not. Same thing. Works with pandas. Doesn't with
Environment[tool.poetry.dependencies]
python = "3.9.9"
modin = { extras = ["ray"], version = "0.12.1" }
notebook = "6.4.6" Full contextTutorial: Using Pandas with Large Data Sets in Python Issueconverted_obj = pd.DataFrame() # DOES NOT WORK
# converted_obj = df_obj.copy() # WORKS
for col in df_obj.columns:
num_unique_values = len(df_obj[col].unique())
num_total_values = len(df_obj[col])
if num_unique_values / num_total_values < 0.5:
converted_obj.loc[:,col] = df_obj[col].astype('category')
else:
converted_obj.loc[:,col] = df_obj[col] IndexError Traceback (most recent call last)
/tmp/ipykernel_689608/3414656741.py in <module>
5 num_total_values = len(df_obj[col])
6 if num_unique_values / num_total_values < 0.5:
----> 7 converted_obj.loc[:,col] = df_obj[col].astype('category')
8 else:
9 converted_obj.loc[:,col] = df_obj[col]
~/Devel/redqueen/projects/icicles/.venv/lib/python3.9/site-packages/modin/pandas/indexing.py in __setitem__(self, key, item)
659 else:
660 row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
--> 661 super(_LocIndexer, self).__setitem__(
662 row_lookup,
663 col_lookup,
~/Devel/redqueen/projects/icicles/.venv/lib/python3.9/site-packages/modin/pandas/indexing.py in __setitem__(self, row_lookup, col_lookup, item, axis)
315 # should be handled in a fastpath with `df[col] = item`.
316 if axis == 0:
--> 317 self.df[self.df.columns[col_lookup][0]] = item
318 # This is True when we are assigning to a full row. We want to reuse the setitem
319 # mechanism to operate along only one axis for performance reasons.
~/Devel/redqueen/projects/icicles/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py in __getitem__(self, key)
4614 key = np.asarray(key, dtype=bool)
4615
-> 4616 result = getitem(key)
4617 if not is_scalar(result):
4618 # error: Argument 1 to "ndim" has incompatible type "Union[ExtensionArray,
IndexError: index -1 is out of bounds for axis 0 with size 0 UPDATEChanging: |
Thanks @bstivers for confirming the issue! One minor comment:
Modin doesn't copy in the same way as pandas, and has a memory structure rooted in computer science fundamentals (copy-on-write if you are familiar with it). This means that no physical copy is created until you write to the objects themselves. In the workflow you show, I would expect the memory consumption to be the same as if the empty dataframe case worked. The fix for this issue is slated for the next release. Thanks for the nicely reproducible examples, they really help us when trying to fix these cases! |
Duplicate of #3764 |
System information
Describe the problem
When I try to add a line of list as a new value to a dataframe, it will report an error, prompting index error. Just like this:
Source code / logs
but pandas works.
The text was updated successfully, but these errors were encountered: