Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dataset split to be used along with row idx when external_id is not provided on mapping #5616

Merged

Conversation

jfcalvo
Copy link
Member

@jfcalvo jfcalvo commented Oct 21, 2024

Description

This PR add the dataset imported split to be used as external_id when there is no value for external_id specified on the import mapping.

If importing the split train for a dataset and no external_id is provided the external_id will be calculated like the following:

  • train_0: first row of train split.
  • train_1: second row of train split.
  • ...

With this we are avoiding row duplications when another split is imported to the same dataset. So if later we import the test split for the same dataset we will have for external_id:

  • train_0: first row of train split.
  • train_1: second row of train split.
  • ...
  • test_0: first row of test split.
  • test_1: second row of test split.
  • ...

Refs argilla-io/roadmap#21

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested

  • Adding additional tests.

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • I confirm My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@jfcalvo jfcalvo requested a review from frascuchon October 21, 2024 13:46
Copy link

codecov bot commented Oct 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.19%. Comparing base (098d36b) to head (249e5d8).
Report is 1 commits behind head on feat/argilla-direct-feature-branch.

Additional details and impacted files
@@                          Coverage Diff                           @@
##           feat/argilla-direct-feature-branch    #5616      +/-   ##
======================================================================
+ Coverage                               91.18%   91.19%   +0.01%     
======================================================================
  Files                                     150      150              
  Lines                                    6260     6261       +1     
======================================================================
+ Hits                                     5708     5710       +2     
+ Misses                                    552      551       -1     
Flag Coverage Δ
argilla-server 91.19% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@frascuchon frascuchon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref #5617

@frascuchon frascuchon merged commit c916930 into feat/argilla-direct-feature-branch Oct 22, 2024
5 of 6 checks passed
@frascuchon frascuchon deleted the feat/add-split-to-row-idx branch October 22, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants