Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: basic Excel/tabular integrations for importing/exporting data #1888

Closed
Tracked by #31
davidberenstein1957 opened this issue Nov 11, 2022 · 13 comments
Closed
Tracked by #31
Assignees
Labels
stale type: enhancement Indicates new feature requests

Comments

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Nov 11, 2022

Is your feature request related to a problem? Please describe.
One of our clients wanted to have a more diverse team of non-technical people involved in the annotation process, which might not have experience with anything about programming.

Describe the solution you'd like
Be able import/export Excel.

Describe alternatives you've considered
Do python.client uploads, but that sadly doesn´t work.

Additional context
N.A.

@davidberenstein1957 davidberenstein1957 added the type: enhancement Indicates new feature requests label Nov 11, 2022
@dvsrepo
Copy link
Member

dvsrepo commented Nov 11, 2022

This is related to #1870 as I understand you are referring to the UI. Although I would say that supporting data upload from Excel can be much more complex and trickier to support than it seems. I guess that even CSV would be fine for profiles without programming knowledge, but we can discuss about value vs complexity. Exporting as Excel should be fine and easy to do (for non huge datasets of course).

The comment regarding the python client means you tried pandas and from_pandas and didn't work? Or that it's of course not possible for users without python skills to do so?

@davidberenstein1957
Copy link
Member Author

It is indeed related. It is more about being able to do some basic data importing/downloading without programming knowledge.

@Amelie-V Amelie-V added this to the 2023 Q1 milestone Nov 17, 2022
@dhruvsakalley
Copy link

This is a useful feature, but I would like to point out that even with simple exports massaging is required per custom needs, maybe a macro button which can hold some custom export python logic would be very generic.

@github-actions
Copy link

github-actions bot commented Jan 7, 2023

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the status: stale Indicates that there is no activity on an issue or pull request label Jan 7, 2023
@frascuchon frascuchon removed the status: stale Indicates that there is no activity on an issue or pull request label Jan 9, 2023
@davidberenstein1957
Copy link
Member Author

@dvsrepo
Copy link
Member

dvsrepo commented Feb 12, 2023

@davidberenstein1957 maybe we can add import/export from xls too? wdyt?

@dhruvsakalley
Copy link

dhruvsakalley commented Feb 13, 2023

Thanks for following up on this, while the data Manager is a great idea, sometimes you wants to run a query before you export the data, perhaps only a few annotations or maybe a date filter. Would be really nice if the results in context could be exported from the UI.

@dvsrepo
Copy link
Member

dvsrepo commented Feb 13, 2023

Very good suggestion @dhruvsakalley. We will introduce this feature directly in the UI in the future.

As an immediate step, I have included the query in the data manager, with a link to the queries docs. This means

Screenshot 2023-02-13 at 10 55 15

@davidberenstein1957
Copy link
Member Author

davidberenstein1957 commented Feb 13, 2023 via email

@dhruvsakalley
Copy link

Thanks Daniel, I think we can sync a bit on the Data Manager, it's a really good way of wrapping ops related to data, seems like a good place to build integrations into databases to enable things like streaming /rolling updates based on SQL triggers etc. Also, a good place to add label management functionality.

@davidberenstein1957
Copy link
Member Author

@dhruvsakalley I have fine-tuned this a bit here. Perhaps what you describe would be some kind of listener? I would love to help here from the Argilla side too.

@dhruvsakalley
Copy link

Apologize for going on a tangent way beyond the scope of the issue, but maybe this helps.

There are many ways to approach how this data lands in argilla, sure an event driven paradigm might be a good solution, even some kind of polling mechanism would be just as useful for majority of the cases. What architecture you chose to solve this is all good, but here are some of the needs for a rolling update:

  • Ability to snapshot and restore, data, annotations, predictions, rules, and the labeling scheme at a point of time
  • Handling labeling schema changes along with data changes in an existing dataset. (new labels get added, old ones get retired and there is a need to be able to go back to a point of time to see how data was labeled back then, but also the need to keep the labels relevant to the current state of wanted labels. This can be handled in an OLAP system, but we still need the ability to do label management on the fly from a rolling update perspective.)
  • Ability to update/merge: just the text/ just the metadata/ just the annotations/ just the predictions / just the vectors as a bulk operation (not just by id)
    • Managing inserts and updates while choosing to retain work in terms of annotations put into place (almost like handling a git merge conflict)
  • Ability to update/merge rules as a bulk operation.

These scenarios emerge when we chose to work with continuously updating systems. I think handling some of these scenarios is a missed opportunity for a lot of annotation tools that I have come across in the past. Lot of data engineering gets left behind, though these are fairly standard things which can be abstracted from the end user experience.

In an ideal world all the "human work" aspect on data annotation should be under source control just like we handle code/ documentation, but that's probably a conversation to be had over a beer.

Copy link

This issue is stale because it has been open for 90 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale type: enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

6 participants