start: Data Management Trail #2894

iesahin · 2021-10-05T16:03:07Z

Adds Data Management Trail to Get Started.

content/docs/start/data-management.md

shcheklein · 2021-10-06T02:05:43Z

content/docs/start/data-management.md

+---
+
+As its name implies, DVC is used to control versions of data. It enables to keep
+track of multiple versions of your datasets.


always better to include models ... in this case we might even include just "large files"?

I think the trail name can be "Data and Model Management", BTW. Rename at this early stage?

sounds good to me. Even though "model management" should be about metrics to some extent ...

content/docs/start/data-management.md

jorgeorpinel · 2021-10-08T03:48:42Z

content/docs/start/data-and-model-management.md

@@ -0,0 +1,272 @@
+---
+title: Data and Model Management Trail


Model Management seems like a very different thing e.g. https://www.dominodatalab.com/solutions/model-management/

I'd say either keep it simple with "Data Management" (well known and understood term) or use another word like "Artifact".

The current titles include "Model" as well. We consider models as another kind of file. The link you shared adds some more stuff to it, but most of those aspects of model management are covered in dvc exp show or deployment.

But the phrase "model management" is in that title and by itself has a different meaning, which may be confusing for readers and search engines.

shcheklein · 2021-10-11T16:00:16Z

content/docs/start/data-and-model-management.md

+As its name implies, DVC is used to control versions of data. It enables to keep
+track of multiple versions of your datasets.
+
+## Initialize a DVC project


@iesahin please, let's not for now do any substantial large changes to the existing data management trail. Let's keep the previous project, keep the structure and wrap it up in the section (trail). Maybe remove some parts - like experiments.

It's not the largest priority to rewrite it at the moment to use MNIST or include stuff like remove/gc (which can be even too much for get started to my mind)

I would rather focus on expanding experiments trails with the next steps - metrics, etc. Connecting trails properly, etc.

Ok. That's fine with me.

Before seeing this comment, I began to write #2919 as a replacement for data-pipelines and it updates the underlying project as well. Should I revert it as well?

I thought our initial decision was to create projects suitable for each of these trails.

What's the scope of changes in your mind? @shcheklein @jorgeorpinel

I thought our initial decision was to create projects suitable for each of these trails.

yes, and I'm not opposed to this. I would just try to go from some simple steps - like wrap up the existing project into the trail, move metrics properly to the experiments (or keep them here as well - I'm fine with that either, wrap up the experiments trail.

I wish we can try to keep two projects at most - deep learning (experiments, checkpoints, live metrics) and pipelines (nlp / some data processing is a better fit here probably).

I wish we can try to keep two projects at most - deep learning (experiments, checkpoints, live metrics) and pipelines (nlp / some data processing is a better fit here probably).

I believe we can get away with a single project mostly. example-dvc-experiments already has a 2-stage pipeline suitable for telling the pipelines.

Yep, that would be fine. But let's first keep it as is as much as possible in terms of content/projects. Just rename/move existing sections under the "Data (and Model?) Management Trail", and keep iterating on the experiments for now. It doesn't look that data management lacks any content or needs any immediate rewrite to be honest.

iesahin · 2021-10-19T09:53:52Z

I'm closing this. I'll make a quick review to the current docs instead.

start: Data Management Trail

482cb7f

Fixes #2856

shcheklein temporarily deployed to dvc-org-iesahin-issue28-1l1qfv October 5, 2021 16:03 Inactive

iesahin changed the title ~~Iesahin/issue2856~~ start: Data Management Trail Oct 5, 2021

iesahin self-assigned this Oct 5, 2021

shcheklein reviewed Oct 6, 2021

View reviewed changes

content/docs/start/data-management.md Outdated Show resolved Hide resolved

shcheklein reviewed Oct 6, 2021

View reviewed changes

content/docs/start/data-management.md Outdated Show resolved Hide resolved

iesahin added 2 commits October 6, 2021 17:37

renamed the file and added content from the current GS

4019a6d

finished the draft except "remove" and renamed sidebar link

16c1dcc

shcheklein temporarily deployed to dvc-org-iesahin-issue28-1l1qfv October 6, 2021 15:48 Inactive

jorgeorpinel reviewed Oct 8, 2021

View reviewed changes

added content for remove and gc

9dabe48

shcheklein temporarily deployed to dvc-org-iesahin-issue28-1l1qfv October 11, 2021 10:14 Inactive

shcheklein reviewed Oct 11, 2021

View reviewed changes

shcheklein changed the title ~~start: Data Management Trail~~ fix #2856: Data Management Trail Oct 13, 2021

iesahin changed the title ~~fix #2856: Data Management Trail~~ guide: Data Management Trail Oct 14, 2021

iesahin changed the title ~~guide: Data Management Trail~~ start: Data Management Trail Oct 14, 2021

iesahin closed this Oct 19, 2021

jorgeorpinel deleted the iesahin/issue2856 branch July 29, 2022 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

start: Data Management Trail #2894

start: Data Management Trail #2894

iesahin commented Oct 5, 2021

shcheklein Oct 6, 2021

iesahin Oct 6, 2021

shcheklein Oct 6, 2021

jorgeorpinel Oct 8, 2021 •

edited

Loading

iesahin Oct 8, 2021 •

edited

Loading

jorgeorpinel Oct 14, 2021

shcheklein Oct 11, 2021

iesahin Oct 12, 2021

shcheklein Oct 12, 2021

iesahin Oct 13, 2021

shcheklein Oct 13, 2021

iesahin commented Oct 19, 2021

start: Data Management Trail #2894

start: Data Management Trail #2894

Conversation

iesahin commented Oct 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Oct 8, 2021 • edited Loading

Choose a reason for hiding this comment

iesahin Oct 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iesahin commented Oct 19, 2021

jorgeorpinel Oct 8, 2021 •

edited

Loading

iesahin Oct 8, 2021 •

edited

Loading