You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wasn't present at the discussions about how this should behave, so perhaps it's functioning exactly as expected, but it seems a little odd that the user can configure the splits in the config file such that some data will not be included in any splits.
The text was updated successfully, but these errors were encountered:
I have a related issue with HSCDataSet splits that I will bundle here. This is when trying to load the entirely of the HSC 0.25 < z < 0.50 dataset with train_size = 0.8 and the other two sizes each set to 0.1
[2024-12-09 07:11:00,625 fibad.data_sets.hsc_data_set:INFO] HSC Data set loader has 8088376 objects
[2024-12-09 07:11:02,224 fibad.data_sets.hsc_data_set:INFO] HSC Data Set Splits loaded are:
[2024-12-09 07:11:02,225 fibad.data_sets.hsc_data_set:INFO] test split contains 808838 items
[2024-12-09 07:11:02,225 fibad.data_sets.hsc_data_set:INFO] train split contains 6470701 items
[2024-12-09 07:11:02,225 fibad.data_sets.hsc_data_set:INFO] validate split contains 1 items
[2024-12-09 07:11:04,178 fibad.data_sets.hsc_data_set:INFO] Test split contains 808838 items
[2024-12-09 07:11:04,179 fibad.data_sets.hsc_data_set:INFO] Train split contains 6470701 items
[2024-12-09 07:11:04,179 fibad.data_sets.hsc_data_set:INFO] Validation split contains 808837 items
Interestingly, the first set of printed messages has the wrong number of items in the validation set. Also, why are there two sets of print statements, with the only difference being the capitalisation of the first letter?
It's being printed twice because there are actually two approaches to splitting that happen in the HSCDataSet class right now. The code will be making use of the second method (the one that is producing the test=808k, train=6.47M, val=808k), but both have been kept for the time being.
Given your first comment @aritraghsh09 it would be good for me to go back and make sure that the second method properly allows users to define a small fraction of a large dataset.
I wasn't present at the discussions about how this should behave, so perhaps it's functioning exactly as expected, but it seems a little odd that the user can configure the splits in the config file such that some data will not be included in any splits.
The text was updated successfully, but these errors were encountered: