feat: add dataset split to be used along with row idx when `external_id` is not provided on mapping #5616

jfcalvo · 2024-10-21T13:46:41Z

Description

This PR add the dataset imported split to be used as external_id when there is no value for external_id specified on the import mapping.

If importing the split train for a dataset and no external_id is provided the external_id will be calculated like the following:

train_0: first row of train split.
train_1: second row of train split.
...

With this we are avoiding row duplications when another split is imported to the same dataset. So if later we import the test split for the same dataset we will have for external_id:

train_0: first row of train split.
train_1: second row of train split.
...
test_0: first row of test split.
test_1: second row of test split.
...

Refs argilla-io/roadmap#21

Type of change

New feature (non-breaking change which adds functionality)

How Has This Been Tested

Adding additional tests.

Checklist

I added relevant documentation
I followed the style guidelines of this project
I did a self-review of my code
I made corresponding changes to the documentation
I confirm My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

… the row idx is used

codecov · 2024-10-21T13:52:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.19%. Comparing base (098d36b) to head (249e5d8).
Report is 1 commits behind head on feat/argilla-direct-feature-branch.

Additional details and impacted files

@@                          Coverage Diff                           @@
##           feat/argilla-direct-feature-branch    #5616      +/-   ##
======================================================================
+ Coverage                               91.18%   91.19%   +0.01%     
======================================================================
  Files                                     150      150              
  Lines                                    6260     6261       +1     
======================================================================
+ Hits                                     5708     5710       +2     
+ Misses                                    552      551       -1

Flag	Coverage Δ
argilla-server	`91.19% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

frascuchon

Ref #5617

feat: when no external_id is provided on import mapping the split and…

249e5d8

… the row idx is used

jfcalvo requested a review from frascuchon October 21, 2024 13:46

frascuchon approved these changes Oct 22, 2024

View reviewed changes

frascuchon merged commit c916930 into feat/argilla-direct-feature-branch Oct 22, 2024
5 of 6 checks passed

frascuchon deleted the feat/add-split-to-row-idx branch October 22, 2024 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add dataset split to be used along with row idx when `external_id` is not provided on mapping #5616

feat: add dataset split to be used along with row idx when `external_id` is not provided on mapping #5616

jfcalvo commented Oct 21, 2024

codecov bot commented Oct 21, 2024 •

edited

Loading

frascuchon left a comment

feat: add dataset split to be used along with row idx when external_id is not provided on mapping #5616

feat: add dataset split to be used along with row idx when external_id is not provided on mapping #5616

Conversation

jfcalvo commented Oct 21, 2024

Description

codecov bot commented Oct 21, 2024 • edited Loading

Codecov Report

frascuchon left a comment

Choose a reason for hiding this comment

feat: add dataset split to be used along with row idx when `external_id` is not provided on mapping #5616

feat: add dataset split to be used along with row idx when `external_id` is not provided on mapping #5616

codecov bot commented Oct 21, 2024 •

edited

Loading