-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] argilla: support multi label values #5625
[BUGFIX] argilla: support multi label values #5625
Conversation
image_columns = [] | ||
class_label_columns = [] | ||
class_label_sequence_columns = [] | ||
|
||
for name, feature in hf_dataset.features.items(): | ||
if isinstance(feature, Image): | ||
casted_features[Image].append(name) | ||
if isinstance(feature, ClassLabel): | ||
casted_features[ClassLabel].append(name) | ||
image_columns.append(name) | ||
elif isinstance(feature, ClassLabel): | ||
class_label_columns.append(name) | ||
elif isinstance(feature, Sequence) and isinstance(feature.feature, ClassLabel): | ||
class_label_sequence_columns.append(name) | ||
|
||
if image_columns: | ||
hf_dataset = _cast_images_as_urls(hf_dataset, image_columns) | ||
|
||
if class_label_columns: | ||
hf_dataset = _cast_classlabels_as_strings(hf_dataset, class_label_columns) | ||
|
||
for feature_type, columns in casted_features.items(): | ||
hf_dataset = FEATURE_CASTERS[feature_type](hf_dataset, columns) | ||
if class_label_sequence_columns: | ||
hf_dataset = _cast_class_label_sequence_as_string_list(hf_dataset, class_label_sequence_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this logic could be improved by using a general map with batches and convert the values on each map.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## feat/argilla-direct-feature-branch #5625 +/- ##
======================================================================
+ Coverage 91.20% 91.24% +0.03%
======================================================================
Files 150 150
Lines 6242 6246 +4
======================================================================
+ Hits 5693 5699 +6
+ Misses 549 547 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iiuc, this allows us to invest a sequence of class labels as suggestions to a LabelQuestion and not a multilabel.
Lgtm
3fbbc6e
into
feat/argilla-direct-feature-branch
Description
This PR fixes problems when a class label sequence is mapped as suggestions.
NOTE: This PR does not infer a sequence of class labels as multi-label questions. Only prevent errors when a sequence of class labels column is used as a suggestion (or other kinds of properties in argilla)
Type of change
How Has This Been Tested
Checklist