-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] DataPanel.from_huggingface does not work #201
Comments
seyuboglu
added a commit
that referenced
this issue
Dec 1, 2021
Should be fixed with #205 |
seyuboglu
added a commit
that referenced
this issue
Feb 17, 2022
* delete nn * Add support for loading train and test set in cifar10" (#193) * Fix issue where tensor columns can't be indexed with pandas series (#195) * Update cifar10 to support test set too (#196) * Fix bacckwards compat issue with base_dir and gcs_image_column (#197) * Support backwards compatibility with nn (#198) * Bump version (#199) * Update contributing to support new dev main structure (#203) * Add args, kwargs to ColumnIOMixin._read_data (#204) Co-authored-by: Jesse Vig <[email protected]> * Fix from_huggingface and add tests (#205) closes #201 * allow_pickle=true when loading numpy block (#206) * Add downloader to ImageColumn (#207) * Remove default addition of index (#208) * Remove default addition of index * Fix provenance tests * Add DEW contrib to registry (#209) * Catch ConnectionResetError (#210) * Add inaturalist to contrib (#211) * Add inaturalist to contrib * Add annotations to intarualist * Fix issue where arraycolumns can't be saved with jsonlines (#214) * Update the docs and add user guide. (#215) * Add contrib for enron (#217) * Fix PIL attribute error on list column representation (#218) * mmap path bug fix (#219) * Downgrade pytorch dependency bound (#220) * Fix issue with subclassing datapanel _state_keys (#224) * Use multiple slices instead of pa.Table.take in ArrowBlock (#226) * Fix issue where boolean list can't index (#227) * Add support for AudioColumn (#222) * Add waterbirds (#228) * Add use guide to indexing and stubs for remaining sections (#225) * Docs/build fix (#230) * Bump version (#231) Co-authored-by: Karan Goel <[email protected]> Co-authored-by: Karan Goel <[email protected]> Co-authored-by: Jesse Vig <[email protected]> Co-authored-by: Khaled Saab <[email protected]>
seyuboglu
added a commit
that referenced
this issue
Jul 22, 2022
* delete nn * Add support for loading train and test set in cifar10" (#193) * Fix issue where tensor columns can't be indexed with pandas series (#195) * Update cifar10 to support test set too (#196) * Fix bacckwards compat issue with base_dir and gcs_image_column (#197) * Support backwards compatibility with nn (#198) * Bump version (#199) * Update contributing to support new dev main structure (#203) * Add args, kwargs to ColumnIOMixin._read_data (#204) Co-authored-by: Jesse Vig <[email protected]> * Fix from_huggingface and add tests (#205) closes #201 * allow_pickle=true when loading numpy block (#206) * Add downloader to ImageColumn (#207) * Remove default addition of index (#208) * Remove default addition of index * Fix provenance tests * Add DEW contrib to registry (#209) * Catch ConnectionResetError (#210) * Add inaturalist to contrib (#211) * Add inaturalist to contrib * Add annotations to intarualist * Fix issue where arraycolumns can't be saved with jsonlines (#214) * Update the docs and add user guide. (#215) * Add contrib for enron (#217) * Fix PIL attribute error on list column representation (#218) * mmap path bug fix (#219) * Downgrade pytorch dependency bound (#220) * Fix issue with subclassing datapanel _state_keys (#224) * Use multiple slices instead of pa.Table.take in ArrowBlock (#226) * Fix issue where boolean list can't index (#227) * Add support for AudioColumn (#222) * Add waterbirds (#228) * Add use guide to indexing and stubs for remaining sections (#225) * Docs/build fix (#230) * Bump version (#231) * Audioset DataPanel (#229) * Add the audioset dataset * Add AudioColumn to audioset datapanel * Fix issue where old datapanels didn't have formatter state (#233) * Make audioset datapanels relational (#235) * Add coco, mir, and pascal (#239) * Make write only write columns in datapanel (#240) * Enforce contiguous index in pandas columns (#244) * Fix issue where ray pickle fails on lazy loader (#245) * Add support for groupby operation * Reorganize the implementation of datasets (#246) * Add support for persistent configuration (#247) * Implement sort for data panel and columns (#237) * Add emb module (#249) * clusterby stuff * Add clusterby * clusterby stuff * Add clusterby * Add embed op (#248) * Autoformat Co-authored-by: Sam Randall <[email protected]> * Reorganize ops code (#250) * Update CI to include 3.9 and 3.10 and to drop 3.7 * Add sample (#251) * Update ci.yml * Add several HAPI datasets (#252) * Update styling of docs (#253) * Bump version (#254) * Remove fastbpe Co-authored-by: Karan Goel <[email protected]> Co-authored-by: Karan Goel <[email protected]> Co-authored-by: Jesse Vig <[email protected]> Co-authored-by: Khaled Saab <[email protected]> Co-authored-by: Priya2698 <[email protected]> Co-authored-by: sam-randall <[email protected]> Co-authored-by: Hannah Kim <[email protected]> Co-authored-by: Sam Randall <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
DataPanel.from_huggingface does not work
To Reproduce
import meerkat as mk
mk.DataPanel.from_huggingface('boolq')
Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/envs/meerkat-test/lib/python3.8/site-packages/meerkat/datapanel.py", line 342, in from_huggingface
return dict(
File "/opt/anaconda3/envs/meerkat-test/lib/python3.8/site-packages/meerkat/datapanel.py", line 344, in
lambda t: (t[0], cls(t[1])),
File "/opt/anaconda3/envs/meerkat-test/lib/python3.8/site-packages/meerkat/datapanel.py", line 78, in init
self.data = data
File "/opt/anaconda3/envs/meerkat-test/lib/python3.8/site-packages/meerkat/datapanel.py", line 152, in data
self._set_data(value)
File "/opt/anaconda3/envs/meerkat-test/lib/python3.8/site-packages/meerkat/datapanel.py", line 146, in _set_data
raise ValueError(
ValueError: Cannot set DataPanel
data
to object of type <class 'datasets.arrow_dataset.Dataset'>.System Information
The text was updated successfully, but these errors were encountered: