Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data-pipelines.md #1645

Merged
merged 9 commits into from
Jul 31, 2020
Merged

Update data-pipelines.md #1645

merged 9 commits into from
Jul 31, 2020

Conversation

noahshpak
Copy link
Contributor

Saw a typo and got a bit carried away. I'm a big DVC fan! πŸ’« πŸ‘¨β€πŸ’»

Saw a typo and got a bit carried away. I'm a big DVC fan! πŸ’« πŸ‘¨β€πŸ’»
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @noahshpak we appreciate your interest! I see that typo in the 3rd bullet, thanks for updating the text. But I do have some followup comments (below, one per bullet). Please lmk if you don't have the capacity to deal with this and we'll take it over. Best

content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
Comment on lines 301 to 302
- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing
project in way that it can be reproduced (built) is the fist necessary step
before introducing CI/CD systems.
reproducible ML pipelines (builds) facilitates CI/CD systems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely better. But not sure about "(builds)" What builds? Builds what? "builds" the plural noun? No need to summarize so much that it's ambiguous, let's try to be as explicit as necessary. Same with "facilitates": how does it facilitate them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback! I agree. Hopefully my additions are more explicit.

@noahshpak noahshpak requested a review from jorgeorpinel July 31, 2020 19:12
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A followup here, and let me commit my other changes... I may have more follow-ups ⏳

content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more changes I'll apply:

content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved
@jorgeorpinel jorgeorpinel requested a review from shcheklein July 31, 2020 19:58
@jorgeorpinel
Copy link
Contributor

(Planing to merge the Restyled PR.)

model). Storing these
files in Git makes it easy to version and share.
- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible
ML pipelines allow CI/CD systems to retrain models on fresh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this describes a very specific use case. CI/CD can include running the last stage for example, full retraining, just training remotely instead of local ... etc

Comment on lines 303 to 306
- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible
ML pipelines allow CI/CD systems to retrain models on fresh
datasets with identical training, save the results, and even produce reports
about the whole process. See [CML.dev](https://cml.dev/) for some examples.
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this @shcheklein ?

Suggested change
- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible
ML pipelines allow CI/CD systems to retrain models on fresh
datasets with identical training, save the results, and even produce reports
about the whole process. See [CML.dev](https://cml.dev/) for some examples.
- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible
ML pipelines allow CI/CD systems to reproduce all or part of the process (e.g.
model training) on production datasets. Using DVC, they can then save the
results, and even produce reports about the whole process. See
[CML.dev](https://cml.dev/) for some examples.

Feel free to edit/commit it and merge the Restyled PR if you agree πŸ™‚ or lmk and I'll do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it still describes one very specific case "reproduce all or part of the process ... on production datasets"

let's go back a bit, gusy. What was the problem with initial version? So, that I understand how to help you solve it.

This comment was marked as resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, let's just fix the typo then? if the problem is not clear. Changing language a bit is fine either if it sounds/reads better, but let's not change the essence if there is no clear understanding what exactly are we fixing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I meant. The typo is fixed in the suggestion I left. We just need to commit it. Allow me...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the original version echos reproducibility rather than pipeline flexibility. The piece that matters for CI/CD is DVC can figure out when something important has changed (data, code, etc) and re-run a pipeline correctly. Like what @jorgeorpinel said "pipelines allow CI/CD systems reproduce all or part of the process."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Noah, I'll chat with Ivan and continue this elsewhere if needed, since this has been merged. But feel free to contribute another PR if you prefer πŸ™‚

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! Again, thanks for all of the feedback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. I agree with the pipelines allow CI/CD systems reproduce all or part of the process part. on production datasets and mentioning train make it specific.

The piece that matters for CI/CD is DVC can figure out when something important has changed (data, code, etc) and re-run a pipeline correctly

not sure this it true. Since it's already committed it can force dvc repro for example to guarantee that external system gets the same result. There are other examples when we can "abuse" CI/CD to do the training first time- we don't change anything per-se - everything comes with a commit, system runs it first time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think CML.dev link still make a lot of sense.

- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing
project in way that it can be reproduced (built) is the fist necessary step
projects in way that it can be reproduced (built) is the fist necessary step
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it -> they?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I merged the REstyled PR #1647. I'll continue this elsewhere... ⏳

@jorgeorpinel jorgeorpinel merged commit b5c5923 into iterative:master Jul 31, 2020
@jorgeorpinel
Copy link
Contributor

Thanks again @noahshpak !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants