You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to C:\Users\WSUIDGEE\tensorflow_datasets\higgs\2.0.0...
Extraction completed...: 0 file [00:00, ? file/s]████████████████████████████████████████| 1/1 [00:00<00:00, 157.03 url/s]
Dl Size...: 100%|█████████████████████████████████████████████| 2816407858/2816407858 [00:00<00:00, 300620199629.49 MiB/s]
Dl Completed...: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 96.44 url/s]
Generating splits...: 0%| | 0/1 [00:00<?, ? splits/s]
Traceback (most recent call last):
File "c:\Users\WSUIDGEE\Documents\FP\AutoSparse\main.py", line 105, in <module>
evaluate_configuration(
File "c:\Users\WSUIDGEE\Documents\FP\AutoSparse\main.py", line 87, in evaluate_configuration
ds = Dataset(dataset)
^^^^^^^^^^^^^^^^
File "c:\Users\WSUIDGEE\Documents\FP\AutoSparse\datasets.py", line 17, in __init__
trains_ds, vals_ds, test_ds = self.__load_dataset(dataset_name, k_folds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\WSUIDGEE\Documents\FP\AutoSparse\datasets.py", line 46, in __load_dataset
ds_builder.download_and_prepare()
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\logging\__init__.py", line 168, in __call__
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 691, in download_and_prepare
self._download_and_prepare(
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1584, in _download_and_prepare
future = split_builder.submit_split_generation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\split_builder.py", line 341, in submit_split_generation
return self._build_from_generator(**build_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\split_builder.py", line 417, in _build_from_generator
utils.reraise(e, prefix=f'Failed to encode example:\n{example}\n')
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\split_builder.py", line 415, in _build_from_generator
example = self._features.encode_example(example)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\features\features_dict.py", line 243, in encode_example
utils.reraise(
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\features\features_dict.py", line 241, in encode_example
example[k] = feature.encode_example(example_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\WSUIDGEE\Documents\FP\AutoSparse\venv\Lib\site-packages\tensorflow_datasets\core\features\tensor_feature.py", line 175, in encode_example
example_data = np.array(example_data, dtype=np_dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Failed to encode example:
{'class_label': '1.000000000000000000e+00', 'lepton_pT': '3.647371232509613037e-01', 'lepton_eta': '1.489144206047058105e+00', 'lepton_phi': '3.394368290901184082e-01', 'missing_energy_magnitude': '1.493860602378845215e+00', 'missing_energy_phi': '-1.723330497741699219e+00', 'jet_1_pt': '7.524616718292236328e-01', 'jet_1_eta': '-2.802605032920837402e-01', 'jet_1_phi': '-4.207125604152679443e-01', 'jet_1_b-tag': '2.173076152801513672e+00', 'jet_2_pt': '', 'jet_2_eta': None, 'jet_2_phi': None, 'jet_2_b-tag': None, 'jet_3_pt': None, 'jet_3_eta': None, 'jet_3_phi': None, 'jet_3_b-tag': None, 'jet_4_pt': None, 'jet_4_eta': None, 'jet_4_phi': None, 'jet_4_b-tag': None, 'm_jj': None, 'm_jjj': None, 'm_lv': None, 'm_jlv': None, 'm_bb': None, 'm_wbb': None, 'm_wwbb': None}
In <Tensor> with name "jet_2_pt":
could not convert string to float: ''
Expected behavior
I expect the dataset to be downloaded and prepared such that I can quickly load it in the future.
Additional context
I am new to using tfds, but other datasets (e.g. MNIST, CIFAR10) work as intended.
The dataset is not supposed to have missing values, according to https://archive.ics.uci.edu/dataset/280/higgs
The text was updated successfully, but these errors were encountered:
Could this be an issue with Windows? I don't reproduce locally and I can successfully download_and_prepare the dataset. If the problem persists, you could also try to filter missing values (example).
If you find a fix for windows, please feel free to push a PR that fixes the issue :) Thanks!
Short description
The Higgs dataset cannot be used, probably because it contains unexpected missing values.
Environment information
Operating System: Windows 11
Python version: 3.11.1
tensorflow-datasets
/tfds-nightly
version: tensorflow-datasets 4.9.4tensorflow
/tf-nightly
version: tensorflow 2.16.1Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ? Yes.Reproduction instructions
Logs
Expected behavior
I expect the dataset to be downloaded and prepared such that I can quickly load it in the future.
Additional context
I am new to using tfds, but other datasets (e.g. MNIST, CIFAR10) work as intended.
The dataset is not supposed to have missing values, according to https://archive.ics.uci.edu/dataset/280/higgs
The text was updated successfully, but these errors were encountered: