Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

RFC: Naming conventions for the documentation containers #3

Closed
iesahin opened this issue Apr 5, 2021 · 7 comments · Fixed by #4
Closed

RFC: Naming conventions for the documentation containers #3

iesahin opened this issue Apr 5, 2021 · 7 comments · Fixed by #4

Comments

@iesahin
Copy link
Contributor

iesahin commented Apr 5, 2021

This is actually related to the Doc Containers Repository but here seems more appropriate to discuss:

I read the discussion in iterative/cml#217 and it may be worthwhile to discuss the naming convention for documentation containers.

@shcheklein and I discussed having a single (Docker) repository and maintain containers for different Katacoda scenarios with tags. My points about having different containers for each scenario, in summary:

  • Having different repository names for each container is more explicit. docker pull dvcorg/katacoda-gs-versioning is more cleaner than docker pull dvcorg/katacoda:gs-versioning. Repositories have easy to remember URLs.
  • Tags usually refer to versions of a single setup. We have different Dockerfiles for different scenarios. Although we can push dvc-doc-containers/katacoda/get-started/01-initialize to dvcorg/katacoda:gs-initalize and .../06-experiments/ to dvcorg/katacoda:gs-experiments, this is usually not expected. It's a bit like debian and ubuntu having the same repository and differ by tags.
  • There is a build-all.zsh script to build and push, but manually pushing the containers becomes error-prone. It's very easy to push to the latest tag or having a typo in the tag. Asking developers to build and push the containers using only the script seems unfeasible to me.
  • In time, we may need to use the tags for versioning. When DVC 3 is out, we can tag all DVC 2 containers appropriately as a reference.

Anyhow, I can update the script to push to a single Docker repository if you prefer single repo with multiple tags.

I didn't push the images to https://hub.docker.com/u/dvcorg/ yet. Currently, they all reside in https://hub.docker.com/u/emresult/

When pushed they will look like:

dvcorg/katacoda-base (As the base image which installs DVC and other requirements. All Katacoda images derive from this image.)

dvcorg/katacoda-gs-initialize (for gs = Get Started)

dvcorg/katacoda-gs-versioning
dvcorg/katacoda-gs-accessing
dvcorg/katacoda-gs-params
dvcorg/katacoda-gs-stages
dvcorg/katacoda-gs-experiments

For tutorials in https://katacoda.com/dvc/courses/tutorials, naming will be like:

dvcorg/katacoda-tutorial-versioning
dvcorg/katacoda-tutorial-mnist

and for the examples in https://katacoda.com/dvc/courses/examples

dvcorg/katacoda-example-dvcignore
dvcorg/katacoda-example-dvc-fetch
...

For the containers which will run the examples in the documentation, I plan a naming convention similar to the URLs in the dvc.org site. So, a container that runs the examples in https://dvc.org/doc/start/ will be named dvcorg/doc-start, and
a container that replays the commands in https://dvc.org/doc/use-cases/data-registries will be named dvcorg/doc-use-cases-data-registries.

Our goal is to keep the number of containers low by reusing them for all the documentation. So REF pages will share a single container in their examples and at the end, (hopefully) we'll have a few images that contain all the example code and data for the whole documentation.

I can update the naming convention to have dvc prefix, like dvcorg/dvc-katacoda-gs-versioning or dvcorg/dvc-doc-start. CML containers have cml- as their prefix, so this may be more clear or repeating dvc may be unnecessary.

Any comments, questions? Thank you.

@shcheklein @DavidGOrtega @jorgeorpinel @dberenbaum @dmpetrov

Related: iterative/dvc.org#2318
Related: iterative/dvc.org#2355

@iesahin iesahin changed the title Naming conventions for the documentation containers RFC: Naming conventions for the documentation containers Apr 5, 2021
@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Apr 5, 2021

Tags usually refer to versions of a single setup. We have different Dockerfiles for different scenarios. Although we can push dvc-doc-containers/katacoda/get-started/01-initialize to dvcorg/katacoda:gs-initalize and .../06-experiments/ to dvcorg/katacoda:gs-experiments, this is usually not expected. It's a bit like debian and ubuntu having the same repository and differ by tags.

In fact, using a single repository for multiple, loosely related images is not that uncommon: nvidia/cuda releases all the images with different tags under a single repository, including different tag suffixes like -ubuntu20.04 and -centos7 for the image base.

I personally like the idea of keeping as few repositories as possible and using tags to organize the different images, specially when they're for the same project or share a common nexus, like a series of examples or a single program with different but related capabilities.

I would even go a step further and suggest something like dvcorg/examples:katacoda-gs-versioning to keep all the examples under a single namespace. Docker registry addresses are somewhat similar to triple tags, and this naming schema could have the potential advantage of uncluttering the repository list.

I can update the naming convention to have dvc prefix, like dvcorg/dvc-katacoda-gs-versioning or dvcorg/dvc-doc-start. CML containers have cml- as their prefix, so this may be more clear or repeating dvc may be unnecessary.

It looks like dvcorg predates iterative as the organization namespace, but it might be subject to change in a future, effectively eliminating the repeated dvc in repository addresses. I'm inclined to think that katacoda- or even example- or doc[umentation]- could be a good prefix for this kind of material, and we could keep the dvc- prefix for more general use cases like production images.


Note: just being nosy, like the uninvited wicked fairy of the tale; please disregard my comments if they are untimely. 😸

@jorgeorpinel
Copy link

jorgeorpinel commented Apr 5, 2021

Hi @iesahin what other docs besides Katacoda scenarios do you intend https://github.com/iterative/dvc-doc-containers to have containers for? Any specifics?

Also, since this is about docker container naming conventions I'd keep it in that other repo. It's not really about documentation but on a docker implementation detail.

As for the naming questions I think whoever has more experience publishing Docker images can better decide. The one thing you could take from dvc.org/doc is the ideal URL paths, which I would reshape into:

  • dvc/doc/start (Get Started)
  • dvc/doc/cases (Use Cases)
  • dvc/doc/guide (User Guide)
  • dvc/doc/ref(erence) (Cmd/API reference)
  • cml/doc (CML)
  • dvclive/doc (or logml/doc)

See also iterative/dvc.org#144

@iesahin
Copy link
Contributor Author

iesahin commented Apr 6, 2021

Note: just being nosy, like the uninvited wicked fairy of the tale; please disregard my comments if they are untimely. 😸

Ah, no, certainly not, this is the kind of comment I would like to have. Thank you very much.

I personally like the idea of keeping as few repositories as possible and using tags to organize the different images, specially when they're for the same project or share a common nexus, like a series of examples or a single program with different but related capabilities.

At the end, what I would like to have is also as few repositories as possible, but I prefer to have one repository for one Dockerfile approach, as it's simpler to follow the changes and describe to new people. When we store the semantics in tags, we'll also need to document it, and possibly need a gatekeeping and validation mechanism for tags as well. What if, for example, I mistakenly remember examples:katacoda-gs-versioning as examples:gs-katacoda-versioning after several months? A docker push will silently accept it, without admin rights I cannot create a new repository.

Number of repositories disturbs me too, but I think to use this motivation to reduce the number of different setups. We may need 10-12 containers in the beginning, after some time all data/code examples should be merged into 3-4. Actually the reason of this endeavor is to merge all datasets and examples into as few setups as possible, but keeping these in tags will be like sweeping them under the rug. For example, we would like to have no katacoda containers, they should all use the same Getting Started material, but currently the data in GS docs and Katacoda have to be different.

I'm inclined to think that katacoda- or even example- or doc[umentation]- could be a good prefix for this kind of material, and we could keep the dvc- prefix for more general use cases like production images.

This is a good point. It seems better to use doc- as a prefix in all documentation containers, like dvcorg/doc-katacoda-gs-versioning.

Thank you @0x2b3bfa0

@iesahin
Copy link
Contributor Author

iesahin commented Apr 6, 2021

Hi @iesahin what other docs besides Katacoda scenarios do you intend https://github.com/iterative/dvc-doc-containers to have containers for? Any specifics?

Hi @jorgeorpinel. What I have in mind is to have an associated container for all pages that have a code example. This doesn't mean all of them should use separate containers, actually what we need is to stabilize all the examples and datasets into as few moving parts as possible. Containers are to test this goal, not a goal by itself. A side benefit may be to link these containers in the documentation and the users can test the commands right away, after a docker run -it dvcorg/doc-gs-experiments.

Also, since this is about docker container naming conventions I'd keep it in that other repo. It's not really about documentation but on a docker implementation detail.

You're right, but that repository is so new that I hesitated to invite comments to there.

Thank you.

@0x2b3bfa0
Copy link
Member

At the end, what I would like to have is also as few repositories as possible, but I prefer to have one repository for one Dockerfile approach, as it's simpler to follow the changes and describe to new people.

Agreed! That's a good constraint for most of the usual containerized applications, though a repository with a sequential collection of loosely related examples could be a good fit for a single repository approach. In fact, the custom Dockertag files used on the current build system could play nicely with actual tags instead of repository names.

When we store the semantics in tags, we'll also need to document it and possibly need a gatekeeping and validation mechanism for tags as well. What if, for example, I mistakenly remember examples:katacoda-gs-versioning as examples:gs-katacoda-versioning after several months? A docker push will silently accept it, without admin rights I cannot create a new repository.

What do you think about automating both building and publishing with GitHub Actions for continuous delivery? This would eliminate both the permission requirements and the possible human errors derived from manual pushes, similarly to the current build-all.zsh approach, but without user interaction —beyond approval, if required.

Number of repositories disturbs me too, but I think to use this motivation to reduce the number of different setups. We may need 10-12 containers in the beginning, after some time all data/code examples should be merged into 3-4. Actually the reason of this endeavor is to merge all datasets and examples into as few setups as possible, but keeping these in tags will be like sweeping them under the rug. For example, we would like to have no katacoda containers, they should all use the same Getting Started material, but currently the data in GS docs and Katacoda have to be different.

That's a good point. I lack enough context about the current needs to give any valuable suggestion in that sense, but, definitely, tags should not be used just as a patch to hide a still evolving project structure.

@iesahin
Copy link
Contributor Author

iesahin commented Apr 7, 2021

That's a good constraint for most of the usual containerized applications, though a repository with a sequential collection of loosely related examples could be a good fit for a single repository approach. In fact, the custom Dockertag files used on the current build system could play nicely with actual tags instead of repository names.

I wrote that script because I was too lazy to check the repository names (and they were changing), but, yes, we can put ˋkatacoda:gs-versioningˋ into a ˋDockertagˋ file and it will be fine.

What do you think about automating both building and publishing with GitHub Actions for continuous delivery?

We already need to do that or a ˋcronˋ job somewhere to push the images daily. DVC seems to have a new release every day. The images become stale quickly.

Other than CI, however, we'll probably update some images frequently. But on a second thought, frequent image updates are mostly needed for Katacoda, in which I had to update the images to see the effects on the platform. Other images can be developed locally and pushed once they are done.

So human error seems to be a minor issue to me now. Thank you. @0x2b3bfa0

Today, when I was checking dvc-checkpoints-mnist, I thought the branches can correspond to tags and we can have a single repository to run the example code. (We plan to base the new examples on top of MNIST and this repository.)

So a hybrid approach may also be feasible.

Something along the lines of:

dvcorg/doc-katacoda-gs:base
dvcorg/doc-katacoda-gs:initialize
...
dvcorg/doc-katacoda-gs:experiments

dvcorg/doc-katacoda-tutorial:base
dvcorg/doc-katacoda-tutorial:mnist
dvcorg/doc-katacoda-tutorial:..

dvcorg/doc-katacoda-examples:base
...

dvcorg/doc-gs:base
dvcorg/doc-gs:versioning
dvcorg/doc-gs:accessing
...

dvcorg/doc-uc:base
...

dvcorg/doc-ref:base
dvcorg/doc-ref:add
dvcorg/doc-ref:stage-add

dvcorg/doc-ug:base
...

If a page doesn't need a specialized container, it can use the ˋbaseˋ version, otherwise it can derive from it and add any specialized setup.

Docker repository names can also reflect the URLs of dvc.org, like ˋdoc-startˋ or ˋdoc-command-referenceˋ. It'll be easier to relate to the documentation pages.

@iesahin
Copy link
Contributor Author

iesahin commented Apr 9, 2021

I think it's better to have a naming scheme similar to the dvc.org documentation organization, with the repositories

dvcorg/doc-start
dvcorg/doc-use-cases
dvcorg/doc-user-guide
dvcorg/doc-command-reference
dvcorg/doc-katacoda

as the repositories.

Each repository will have a :base label and the labels will show the individual pages when necessary.

I'm transferring this issue to dvc-doc-containers as a reference.

Thank you 🤝 @0x2b3bfa0 @jorgeorpinel @shcheklein

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants