Skip to content
This repository has been archived by the owner on Jul 5, 2022. It is now read-only.

GS/Accessing updates (1.0) #28

Closed
jorgeorpinel opened this issue Mar 8, 2021 · 0 comments · Fixed by #35
Closed

GS/Accessing updates (1.0) #28

jorgeorpinel opened this issue Mar 8, 2021 · 0 comments · Fixed by #35

Comments

@jorgeorpinel
Copy link
Contributor

https://katacoda.com/dvc/courses/get-started/accessing

Step 1:

  • There are a pre-existing file and dir that get used much later which are confusing when you ls. Could they be downloaded automatically when you get to the step that needs them?
  • "You don't need to be inside a Git or DVC repo to execute it" - Remove ore reuse the get note from https://katacoda.com/dvc/courses/get-started/versioning
  • ls data.xml -> just ls?
  • "accessing data file directly" -> "downloading the data file directly"
  • "or as a data/model registry" -> (data registry pattern)

Step 2:

  • Maybe this whole step should be part of step 1, if we can compress them a little.
  • "Remember those .dvc files dvc add generates?" - Link to https://katacoda.com/dvc/courses/get-started/versioning (step 1 if possble)
  • "dvc.lock that" -> "dvc.lock, which"
  • "in Git, DVC remote storage config saved in Git" -> "in Git, and DVC remote storage config"
  • "needed to access and download" -> "needed to access" - but this whole sentence is too long, could be rewritten.
  • ls -lh -> just ls?
  • "dvc get automated this by reading" - This explanation would make more sense before the wget example
  • .dvc/config and get-started/data.xml.dvc links - Should it open the in-system IDE instead?
  • "at the dataset-registry you cannot find it" -> "at the dataset-registry, you cannot find the file"
  • "stored in a data storage" -> "stored in a DVC remote"

Step 3:

  • "if you look at the Get Started repository" -> Should be [Data Registry]
  • "dvc get can download them, but how do we first even know what exactly there before downloading (or accessing in other ways we'll cover later)?" -> "We can dvc get them, but how do we even know what data is tracked in a remote DVC repo before accessing it?"
  • "we pass Git URL" -> "we pass a Git URL"
  • "as with dvc get" -> "as dvc ge"t"
  • "Now, you can see the data.xml file. As well" -> "Now we can see data.xml and"

Step 4:

  • "Alternatively to the command line dvc get" -> "Besides using dvc commands"
  • "with dvc.api" -> "with the Python API" (same link)
  • Install dvc first, I think
  • cat process.py... - Use IDE instead?
  • "Yes, the interface" -> "The interface"
  • "works similar" -> "works the same way"
  • "It doesn't consume space for a file on the file system - it reads data directly into memory" -> "open() doesn't consume space in the file system - it streams data into memory as needed"
  • "Means, you can" -> "This means that you can"
    But this 3rd point is kind of repetitive vs the 2nd one, may want to rephrase a bit
  • "interface is the same" -> "the interface is the same"

Step 5:

  • I'm not sure we even need the pre-existing example-get-started dir. Why have that an doverwrite data/data.xml? Just to match https://dvc.org/doc/start/data-access#download? The rest of the scenario doesn't match the GS anyway.
  • "simplified" -> "simplifies"
  • "complexity" -> "the complexity"
  • "How about ..." -> "What about datasets or ML models?"
  • "DVC repositories and dvc import command" -> "A DVC repository and the dvc import command"
  • "The url and rev_lock subfields" - Needs more context (mention dvc.xml.dvc)
  • git diff -> could just be cat data/data.xml.dvc. It's not clear why we're comparing something.
  • "dvc import, is" -> "dvc import is"
  • "repository this" -> "repository, this"

Step 6:
Not sure we need it since we've already mentioned and linked to the Data Registry pattern (use case).

@jorgeorpinel jorgeorpinel changed the title GS/Accessing updates GS/Accessing updates (1.0) Mar 8, 2021
@iesahin iesahin mentioned this issue Mar 9, 2021
40 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant