Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add telemetry pipeline run ends #2377

Merged
merged 13 commits into from
Feb 1, 2024

Conversation

htahir1
Copy link
Contributor

@htahir1 htahir1 commented Jan 30, 2024

Describe changes

I implemented an event that captures when a pipeline ends and also I added some docs info on pipelines

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • If my changes require changes to the dashboard, these changes are communicated/requested.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

Summary by CodeRabbit

  • New Features
    • Added a new analytics event to track when a pipeline run ends.
  • Documentation
    • Updated the user guide with a new warning hint and example for refreshing pipeline run states.
  • Refactor
    • Updated pipeline execution and analytics tracking logic for improved performance and accuracy.

@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Jan 30, 2024
Copy link
Contributor

coderabbitai bot commented Jan 30, 2024

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The recent updates focus on enhancing analytics and run tracking within the system. A new warning hint guides users on pipeline run states, and the introduction of a RUN_PIPELINE_ENDED event improves analytics tracking. Changes in the pipeline execution logic involve the shift from using a run variable to a run_id for better identification and tracking. Additionally, the analytics tracking mechanism has been bolstered with a new import and functionality in the SQL Zen Store to support this enhanced tracking.

Changes

File Path Change Summary
docs/.../fetching-pipelines.md Introduced a warning hint about pipeline run state and an example for refreshing the pipeline run from the client.
src/zenml/analytics/enums.py Added RUN_PIPELINE_ENDED to AnalyticsEvent enum class.
src/zenml/.../pipeline.py Replaced run variable with run_id and updated _get_pipeline_analytics_metadata to accept an optional run_id.
src/zenml/zen_stores/... Updated to include a new import for track_handler and added code to track analytics within a context manager.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@htahir1
Copy link
Contributor Author

htahir1 commented Jan 30, 2024

@CodeRabbit review

@htahir1
Copy link
Contributor Author

htahir1 commented Jan 30, 2024

@coderabbitai review

@htahir1
Copy link
Contributor Author

htahir1 commented Jan 31, 2024

@coderabbitai review

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: .coderabbit.yaml

Commits Files that changed from the base of the PR and between 9f31c6b and 65b999a.
Files selected for processing (4)
  • docs/book/user-guide/starter-guide/fetching-pipelines.md (1 hunks)
  • src/zenml/analytics/enums.py (1 hunks)
  • src/zenml/new/pipelines/pipeline.py (4 hunks)
  • src/zenml/zen_stores/sql_zen_store.py (2 hunks)
Additional comments: 6
src/zenml/analytics/enums.py (1)
  • 23-23: The addition of RUN_PIPELINE_ENDED to the AnalyticsEvent enum class is correctly implemented and follows the naming convention of existing enum values. This change effectively extends the telemetry capabilities to include tracking the end of pipeline runs.
docs/book/user-guide/starter-guide/fetching-pipelines.md (1)
  • 104-112: The documentation update introduces a warning hint effectively, providing valuable information about the state of the pipeline run and how to obtain a refreshed version from the client. This addition enhances user understanding of pipeline run states and their management. However, ensure the code snippet is correctly formatted and clear to the readers.
src/zenml/new/pipelines/pipeline.py (3)
  • 767-769: The method _get_pipeline_analytics_metadata now accepts an optional run_id parameter and includes it in the returned metadata if provided. This change aligns with the PR's objective to enhance telemetry capabilities by tracking the end of pipeline runs more accurately. Ensure that the run_id is being correctly utilized downstream to achieve the intended telemetry enhancements.
  • 762-772: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [742-769]

The changes made to the _run method, including the early assignment of run_id and its inclusion in the _get_pipeline_analytics_metadata method, are logically sound and support the PR's objective of enhancing telemetry capabilities. However, ensure that the removal of the redundant run_id initialization does not affect the logic or functionality of the telemetry enhancements.

  • 762-772: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [742-769]

The implementation of telemetry enhancements, including the handling of run_id and its integration into analytics metadata, appears to be done with consideration for performance and maintainability. The changes are localized and do not introduce unnecessary complexity or security concerns. However, it's crucial to ensure that the run_id is handled securely throughout its lifecycle, especially if it's used in contexts where it could be exposed to external systems or users.

src/zenml/zen_stores/sql_zen_store.py (1)
  • 6695-6708: The implementation of analytics tracking within the _update_pipeline_run_status function is correctly placed and uses the track_handler context manager as expected. This ensures that the event RUN_PIPELINE_ENDED is captured whenever a pipeline run ends, which aligns with the PR's objective to enhance telemetry capabilities by tracking the end of pipeline runs. The metadata being captured, including pipeline_run_id, status, num_steps, start_time, and end_time, is appropriate for monitoring and analytics purposes.

src/zenml/new/pipelines/pipeline.py Show resolved Hide resolved
Copy link
Contributor

E2E template updates in examples/e2e have been pushed.

```python
from zenml.client import Client

Client().get_pipeline_run(last_run.id) to get a refreshed version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also not call it last_run. This isn't actually the last run as someone might have created another one in the meantime, but this is exactly the run that was created by calling training_pipeline() in the script above.

"pipeline_run_id": pipeline_run_id,
"status": new_status,
"num_steps": num_steps,
"start_time": pipeline_run.start_time.strftime(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just FYI, we don't send the start_time of the pipeline run in the other event, so there won't be any way to match that I assume. Not sure if we either want to add that in the other analytics event, or remove it here or maybe just send the duration here if that is interesting?

@htahir1 htahir1 requested a review from schustmi January 31, 2024 13:30
Copy link
Contributor

Quickstart template updates in examples/quickstart have been pushed.

@htahir1 htahir1 merged commit 16233b4 into develop Feb 1, 2024
62 checks passed
@htahir1 htahir1 deleted the feature/add-telemetry-pipeline-run-ends branch February 1, 2024 17:41
adtygan pushed a commit to adtygan/zenml that referenced this pull request Mar 21, 2024
* Addd another event

* Added more context

* Docstring

* Auto-update of E2E template

* Docs

* python things

* python things

* python things

* Auto-update of Starter template

* Docstrings

* Linting

* Format

---------

Co-authored-by: GitHub Actions <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal To filter out internal PRs and issues run-slow-ci
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants