Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] argilla: support multi label values #5625

Conversation

frascuchon
Copy link
Member

Description

This PR fixes problems when a class label sequence is mapped as suggestions.

NOTE: This PR does not infer a sequence of class labels as multi-label questions. Only prevent errors when a sequence of class labels column is used as a suggestion (or other kinds of properties in argilla)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • I confirm My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@frascuchon frascuchon requested a review from burtenshaw October 24, 2024 08:22
@frascuchon frascuchon changed the base branch from develop to feat/argilla-direct-feature-branch October 24, 2024 08:22
Comment on lines +275 to +294
image_columns = []
class_label_columns = []
class_label_sequence_columns = []

for name, feature in hf_dataset.features.items():
if isinstance(feature, Image):
casted_features[Image].append(name)
if isinstance(feature, ClassLabel):
casted_features[ClassLabel].append(name)
image_columns.append(name)
elif isinstance(feature, ClassLabel):
class_label_columns.append(name)
elif isinstance(feature, Sequence) and isinstance(feature.feature, ClassLabel):
class_label_sequence_columns.append(name)

if image_columns:
hf_dataset = _cast_images_as_urls(hf_dataset, image_columns)

if class_label_columns:
hf_dataset = _cast_classlabels_as_strings(hf_dataset, class_label_columns)

for feature_type, columns in casted_features.items():
hf_dataset = FEATURE_CASTERS[feature_type](hf_dataset, columns)
if class_label_sequence_columns:
hf_dataset = _cast_class_label_sequence_as_string_list(hf_dataset, class_label_sequence_columns)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this logic could be improved by using a general map with batches and convert the values on each map.

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.24%. Comparing base (7b85b45) to head (ceb9f06).

Additional details and impacted files
@@                          Coverage Diff                           @@
##           feat/argilla-direct-feature-branch    #5625      +/-   ##
======================================================================
+ Coverage                               91.20%   91.24%   +0.03%     
======================================================================
  Files                                     150      150              
  Lines                                    6242     6246       +4     
======================================================================
+ Hits                                     5693     5699       +6     
+ Misses                                    549      547       -2     
Flag Coverage Δ
argilla-server 91.24% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@burtenshaw burtenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iiuc, this allows us to invest a sequence of class labels as suggestions to a LabelQuestion and not a multilabel.

Lgtm

@frascuchon frascuchon merged commit 3fbbc6e into feat/argilla-direct-feature-branch Oct 25, 2024
6 of 7 checks passed
@frascuchon frascuchon deleted the fix/argilla/support-multi-label-values branch October 25, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants