Fix `add_column` on datasets with indices mapping #3647

mariosasko · 2022-01-28T13:06:29Z

My initial idea was to avoid the flatten_indices call and reorder a new column instead, but in the end I decided to follow concatenate_datasets and use flatten_indices to avoid padding when dataset._indices.num_rows != dataset._data.num_rows.

Fix #3599

lhoestq

Thanks for the fix ! Do you think we can include this in today's patch release ?

tests/test_arrow_dataset.py

mariosasko · 2022-01-28T14:19:39Z

Sure, let's include this in today's release.

lhoestq · 2022-01-28T14:41:34Z

Cool ! The windows CI should be fixed on master now, feel free to merge :)

mariosasko added 2 commits January 27, 2022 14:23

Flatten indices in add_column if indices table exists

6e6490c

Add test

fce6aaf

lhoestq approved these changes Jan 28, 2022

View reviewed changes

tests/test_arrow_dataset.py Outdated Show resolved Hide resolved

Address review comment

d00a646

Merge branch 'master' of github.com:huggingface/datasets into fix-3599

4c60db0

mariosasko merged commit e8cd145 into master Jan 28, 2022

mariosasko deleted the fix-3599 branch January 28, 2022 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `add_column` on datasets with indices mapping #3647

Fix `add_column` on datasets with indices mapping #3647

mariosasko commented Jan 28, 2022

lhoestq left a comment

mariosasko commented Jan 28, 2022

lhoestq commented Jan 28, 2022

Fix add_column on datasets with indices mapping #3647

Fix add_column on datasets with indices mapping #3647

Conversation

mariosasko commented Jan 28, 2022

lhoestq left a comment

Choose a reason for hiding this comment

mariosasko commented Jan 28, 2022

lhoestq commented Jan 28, 2022

Fix `add_column` on datasets with indices mapping #3647

Fix `add_column` on datasets with indices mapping #3647