Update data-pipelines.md #1645

noahshpak · 2020-07-31T18:23:13Z

Saw a typo and got a bit carried away. I'm a big DVC fan! 💫 👨‍💻

jorgeorpinel

Thanks @noahshpak we appreciate your interest! I see that typo in the 3rd bullet, thanks for updating the text. But I do have some followup comments (below, one per bullet). Please lmk if you don't have the capacity to deal with this and we'll take it over. Best

content/docs/start/data-pipelines.md

jorgeorpinel · 2020-07-31T18:46:26Z

content/docs/start/data-pipelines.md

 - _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing
-  project in way that it can be reproduced (built) is the fist necessary step
-  before introducing CI/CD systems.
+  reproducible ML pipelines (builds) facilitates CI/CD systems.


Definitely better. But not sure about "(builds)" What builds? Builds what? "builds" the plural noun? No need to summarize so much that it's ambiguous, let's try to be as explicit as necessary. Same with "facilitates": how does it facilitate them?

Thank you for the feedback! I agree. Hopefully my additions are more explicit.

content/docs/start/data-pipelines.md

jorgeorpinel

A followup here, and let me commit my other changes... I may have more follow-ups ⏳

content/docs/start/data-pipelines.md

jorgeorpinel

A few more changes I'll apply:

content/docs/start/data-pipelines.md

jorgeorpinel · 2020-07-31T19:59:13Z

(Planing to merge the Restyled PR.)

shcheklein · 2020-07-31T20:30:31Z

content/docs/start/data-pipelines.md

+  model). Storing these
+  files in Git makes it easy to version and share.
+- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible
+  ML pipelines allow CI/CD systems to retrain models on fresh


this describes a very specific use case. CI/CD can include running the last stage for example, full retraining, just training remotely instead of local ... etc

jorgeorpinel · 2020-07-31T21:56:16Z

content/docs/start/data-pipelines.md

+- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible
+  ML pipelines allow CI/CD systems to retrain models on fresh
+  datasets with identical training, save the results, and even produce reports
+  about the whole process. See [CML.dev](https://cml.dev/) for some examples.


How about this @shcheklein ?

Suggested change

- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible

ML pipelines allow CI/CD systems to retrain models on fresh

datasets with identical training, save the results, and even produce reports

about the whole process. See [CML.dev](https://cml.dev/) for some examples.

- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible

ML pipelines allow CI/CD systems to reproduce all or part of the process (e.g.

model training) on production datasets. Using DVC, they can then save the

results, and even produce reports about the whole process. See

[CML.dev](https://cml.dev/) for some examples.

Feel free to edit/commit it and merge the Restyled PR if you agree 🙂 or lmk and I'll do so.

it still describes one very specific case "reproduce all or part of the process ... on production datasets"

let's go back a bit, gusy. What was the problem with initial version? So, that I understand how to help you solve it.

So, let's just fix the typo then? if the problem is not clear. Changing language a bit is fine either if it sounds/reads better, but let's not change the essence if there is no clear understanding what exactly are we fixing.

Yes, that's what I meant. The typo is fixed in the suggestion I left. We just need to commit it. Allow me...

To me the original version echos reproducibility rather than pipeline flexibility. The piece that matters for CI/CD is DVC can figure out when something important has changed (data, code, etc) and re-run a pipeline correctly. Like what @jorgeorpinel said "pipelines allow CI/CD systems reproduce all or part of the process."

Thanks Noah, I'll chat with Ivan and continue this elsewhere if needed, since this has been merged. But feel free to contribute another PR if you prefer 🙂

No problem! Again, thanks for all of the feedback

yep. I agree with the pipelines allow CI/CD systems reproduce all or part of the process part. on production datasets and mentioning train make it specific.

The piece that matters for CI/CD is DVC can figure out when something important has changed (data, code, etc) and re-run a pipeline correctly

not sure this it true. Since it's already committed it can force dvc repro for example to guarantee that external system gets the same result. There are other examples when we can "abuse" CI/CD to do the training first time- we don't change anything per-se - everything comes with a commit, system runs it first time.

Also, I think CML.dev link still make a lot of sense.

shcheklein · 2020-07-31T22:47:04Z

content/docs/start/data-pipelines.md

 - _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing
-  project in way that it can be reproduced (built) is the fist necessary step
+  projects in way that it can be reproduced (built) is the fist necessary step


it -> they?

Oops I merged the REstyled PR #1647. I'll continue this elsewhere... ⏳

jorgeorpinel · 2020-07-31T22:50:51Z

Thanks again @noahshpak !

Update data-pipelines.md

5926097

Saw a typo and got a bit carried away. I'm a big DVC fan! 💫 👨‍💻

restyled-io bot mentioned this pull request Jul 31, 2020

Restyle Update data-pipelines.md #1646

Closed

jorgeorpinel suggested changes Jul 31, 2020

View reviewed changes

update wording; be more specific; extrapolate on CI/CD

f79c3a0

noahshpak requested a review from jorgeorpinel July 31, 2020 19:12

shcheklein reviewed Jul 31, 2020

View reviewed changes

content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved

jorgeorpinel reviewed Jul 31, 2020

View reviewed changes

content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved

content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved

content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved

Update content/docs/start/data-pipelines.md

66664be

restyled-io bot mentioned this pull request Jul 31, 2020

Restyle Update data-pipelines.md #1647

Merged

Update content/docs/start/data-pipelines.md

6127df1

jorgeorpinel reviewed Jul 31, 2020

View reviewed changes

content/docs/start/data-pipelines.md Outdated Show resolved Hide resolved

jorgeorpinel reviewed Jul 31, 2020

View reviewed changes

jorgeorpinel added 3 commits July 31, 2020 14:58

Update content/docs/start/data-pipelines.md

c951b2f

Update content/docs/start/data-pipelines.md

b83ba40

Update content/docs/start/data-pipelines.md

8488996

jorgeorpinel requested a review from shcheklein July 31, 2020 19:58

shcheklein reviewed Jul 31, 2020

View reviewed changes

Update content/docs/start/data-pipelines.md

0a9ff51

jorgeorpinel reviewed Jul 31, 2020

View reviewed changes

Update content/docs/start/data-pipelines.md

b5c5923

shcheklein approved these changes Jul 31, 2020

View reviewed changes

shcheklein reviewed Jul 31, 2020

View reviewed changes

jorgeorpinel merged commit b5c5923 into iterative:master Jul 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update data-pipelines.md #1645

Update data-pipelines.md #1645

noahshpak commented Jul 31, 2020

jorgeorpinel left a comment •

edited

Loading

jorgeorpinel Jul 31, 2020

noahshpak Jul 31, 2020

jorgeorpinel left a comment

jorgeorpinel left a comment

jorgeorpinel commented Jul 31, 2020

shcheklein Jul 31, 2020

jorgeorpinel Jul 31, 2020 •

edited

Loading

shcheklein Jul 31, 2020

This comment was marked as resolved.

shcheklein Jul 31, 2020

jorgeorpinel Jul 31, 2020

noahshpak Jul 31, 2020

jorgeorpinel Jul 31, 2020

noahshpak Jul 31, 2020

shcheklein Jul 31, 2020

shcheklein Jul 31, 2020

shcheklein Jul 31, 2020

jorgeorpinel Jul 31, 2020

jorgeorpinel commented Jul 31, 2020

Update data-pipelines.md #1645

Update data-pipelines.md #1645

Conversation

noahshpak commented Jul 31, 2020

jorgeorpinel left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel left a comment

Choose a reason for hiding this comment

jorgeorpinel left a comment

Choose a reason for hiding this comment

jorgeorpinel commented Jul 31, 2020

Choose a reason for hiding this comment

jorgeorpinel Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel commented Jul 31, 2020

jorgeorpinel left a comment •

edited

Loading

jorgeorpinel Jul 31, 2020 •

edited

Loading