-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Argilla integration for v2.x SDK #2915
Update Argilla integration for v2.x SDK #2915
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe recent changes enhance the documentation and functionality of the ZenML framework, particularly around the Argilla integration. Documentation has been clarified to better represent tools and their usage, while code updates improve exception handling, workspace management, and usability. Key modifications include upgrading the Argilla package requirement and refining methods to support workspace-specific operations, streamlining dataset management, and improving user interaction with new client features. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ArgillaAnnotator
participant ArgillaAPI
User->>ArgillaAnnotator: Launch Annotation Interface
ArgillaAnnotator->>ArgillaAPI: Initialize Client
ArgillaAnnotator->>ArgillaAPI: Get Datasets (with workspace)
ArgillaAPI-->>ArgillaAnnotator: Return Datasets
ArgillaAnnotator->>User: Display Datasets
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
from typing import List
import argilla as rg
from zenml import step, pipeline
from zenml.client import Client
from zenml.logger import get_logger
logger = get_logger(__name__)
@step
def get_data() -> List[str]:
"""Random text data for annotation."""
texts = [
"The quick brown fox jumps over the lazy dog.",
"ZenML is an extensible, open-source MLOps framework.",
"Argilla helps in managing and annotating data.",
"Machine learning models require good quality data.",
"Annotation tools are essential for supervised learning.",
]
return texts
@step
def upload_to_argilla(
texts: List[str],
dataset_name: str = "default_dataset",
):
"""Creates a dataset and loads the data to Argilla.
Args:
texts: List of text data.
dataset_name: Name of the dataset in Argilla.
"""
annotator = Client().active_stack.annotator
from zenml.integrations.argilla.annotators.argilla_annotator import (
ArgillaAnnotator,
)
if not isinstance(annotator, ArgillaAnnotator):
raise TypeError(
"This step can only be used with the Argilla annotator."
)
argilla = annotator._get_client()
# Create or get the dataset in Argilla
try:
dataset = annotator.get_dataset(dataset_name=dataset_name)
if dataset is None:
settings = rg.Settings(
guidelines="These are some guidelines.",
fields=[
rg.TextField(
name="text",
),
],
questions=[
rg.LabelQuestion(
name="label",
labels=["label_1", "label_2", "label_3"]
),
],
metadata=[
rg.TermsMetadataProperty(
name="terms",
title="Annotation groups",
),
],
)
dataset = annotator.add_dataset(dataset_name=dataset_name, settings=settings)
records = [
rg.Record(
fields={"text": text}
)
for text in texts
]
annotator.add_records(dataset_name=dataset_name, records=records)
except Exception as e:
logger.error(f"Error: {e}")
raise e
print(dataset)
@pipeline
def data_pipeline():
data = get_data()
upload_to_argilla(data)
if __name__ == "__main__":
data_pipeline()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two quick things now, but I'll take a more thorough look in the morning. Thank you so much for making the updates! I'll also test it out.
Also a few errors from CI: https://github.com/zenml-io/zenml/actions/runs/10290280476/job/28480088504?pr=2915#step:6:41 @sdiazlor |
Co-authored-by: Alex Strick van Linschoten <[email protected]>
Co-authored-by: Alex Strick van Linschoten <[email protected]>
@strickvl Thank you! They should be fixed, I had missed those. |
@coderabbitai review |
Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range, codebase verification and nitpick comments (1)
docs/book/component-guide/annotators/argilla.md (1)
63-63
: Add missing comma for clarity.A comma is needed after the
headers
parameter to improve readability.- `headers` parameter which would include a Hugging Face token. + `headers` parameter, which would include a Hugging Face token.Tools
LanguageTool
[uncategorized] ~63-~63: Possible missing comma found.
Context: ...rivate, you must also set theheaders
parameter which would include a Hugging Face toke...(AI_HYDRA_LEO_MISSING_COMMA)
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (6)
- docs/book/component-guide/annotators/annotators.md (1 hunks)
- docs/book/component-guide/annotators/argilla.md (4 hunks)
- docs/mocked_libs.json (1 hunks)
- src/zenml/integrations/argilla/init.py (1 hunks)
- src/zenml/integrations/argilla/annotators/argilla_annotator.py (3 hunks)
- src/zenml/integrations/argilla/flavors/argilla_annotator_flavor.py (1 hunks)
Files skipped from review due to trivial changes (1)
- docs/book/component-guide/annotators/annotators.md
Additional context used
Path-based instructions (4)
src/zenml/integrations/argilla/__init__.py (1)
Pattern
src/zenml/**/*.py
: Review the Python code for conformity with Python best practices.src/zenml/integrations/argilla/flavors/argilla_annotator_flavor.py (1)
Pattern
src/zenml/**/*.py
: Review the Python code for conformity with Python best practices.docs/book/component-guide/annotators/argilla.md (1)
Pattern
docs/**/*.md
: Review the documentation for readability and clarity.src/zenml/integrations/argilla/annotators/argilla_annotator.py (1)
Pattern
src/zenml/**/*.py
: Review the Python code for conformity with Python best practices.
LanguageTool
docs/book/component-guide/annotators/argilla.md
[uncategorized] ~63-~63: Possible missing comma found.
Context: ...rivate, you must also set theheaders
parameter which would include a Hugging Face toke...(AI_HYDRA_LEO_MISSING_COMMA)
Additional comments not posted (15)
src/zenml/integrations/argilla/__init__.py (1)
29-29
: Verify compatibility with Argilla v2.0.0.The update to require
argilla>=2.0.0
may introduce breaking changes. Ensure that the rest of the codebase is compatible with this new version.src/zenml/integrations/argilla/flavors/argilla_annotator_flavor.py (1)
47-54
: Ensure backward compatibility for renamed attributes.The
extra_headers
attribute has been renamed toheaders
. Verify that this change does not break existing configurations and consider using a deprecation utility if necessary.docs/mocked_libs.json (1)
229-229
: LGTM!The changes to the mocked libraries reflect a shift towards using a specific exceptions module, which aligns with the updated Argilla integration. This seems intentional and appropriate.
src/zenml/integrations/argilla/annotators/argilla_annotator.py (12)
109-124
: LGTM!The method correctly incorporates
**kwargs
for workspace-specific operations, enhancing flexibility.
127-145
: LGTM!The changes effectively support workspace-specific dataset retrieval, improving functionality.
147-168
: LGTM!The method correctly implements workspace-specific logic for retrieving dataset names, enhancing its capabilities.
170-195
: LGTM!The new method
_get_data_by_status
is well-implemented, providing clear functionality to filter records by status within a workspace context.
197-227
: LGTM!The refactoring to use
_get_data_by_status
enhances clarity and maintainability in computing dataset statistics.
230-244
: LGTM!The
launch
method is a valuable addition, providing a user-friendly way to access the annotation interface with proper error handling.
246-296
: LGTM!The method improvements, including requiring
settings
and supporting workspace creation, enhance functionality and robustness.
299-335
: LGTM!The method correctly incorporates workspace context for adding records, with appropriate logging and error handling.
336-366
: LGTM!The method enhancements for workspace-specific dataset retrieval improve flexibility and error handling.
370-397
: LGTM!The method improvements for workspace-specific dataset deletion enhance functionality and error handling.
398-421
: LGTM!The refactoring to use
_get_data_by_status
enhances clarity and maintainability in retrieving labeled data.
422-443
: LGTM!The refactoring to use
_get_data_by_status
enhances clarity and maintainability in retrieving unlabeled data.
@schustmi we'll have to look into conditionally ignoring / not installing argilla on our Python 3.8 instances on slow CI. |
🦭 - thanks so much @sdiazlor for your contribution, i tested this in action an everything works as expected. |
Describe changes
I updated the Argilla integration with the new version Argilla 2.0.
NOTES:
cc @strickvl
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes
Summary by CodeRabbit
Documentation Updates
New Features
Dependency Updates
argilla
package to>=2.0.0
.