You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 5, 2022. It is now read-only.
There are a pre-existing file and dir that get used much later which are confusing when you ls. Could they be downloaded automatically when you get to the step that needs them?
"in Git, DVC remote storage config saved in Git" -> "in Git, and DVC remote storage config"
"needed to access and download" -> "needed to access" - but this whole sentence is too long, could be rewritten.
ls -lh -> just ls?
"dvc get automated this by reading" - This explanation would make more sense before the wget example
.dvc/config and get-started/data.xml.dvc links - Should it open the in-system IDE instead?
"at the dataset-registry you cannot find it" -> "at the dataset-registry, you cannot find the file"
"stored in a data storage" -> "stored in a DVC remote"
Step 3:
"if you look at the Get Started repository" -> Should be [Data Registry]
"dvc get can download them, but how do we first even know what exactly there before downloading (or accessing in other ways we'll cover later)?" -> "We can dvc get them, but how do we even know what data is tracked in a remote DVC repo before accessing it?"
"we pass Git URL" -> "we pass a Git URL"
"as with dvc get" -> "as dvc ge"t"
"Now, you can see the data.xml file. As well" -> "Now we can see data.xml and"
Step 4:
"Alternatively to the command line dvc get" -> "Besides using dvc commands"
"with dvc.api" -> "with the Python API" (same link)
Install dvc first, I think
cat process.py... - Use IDE instead?
"Yes, the interface" -> "The interface"
"works similar" -> "works the same way"
"It doesn't consume space for a file on the file system - it reads data directly into memory" -> "open() doesn't consume space in the file system - it streams data into memory as needed"
"Means, you can" -> "This means that you can"
But this 3rd point is kind of repetitive vs the 2nd one, may want to rephrase a bit
"interface is the same" -> "the interface is the same"
Step 5:
I'm not sure we even need the pre-existing example-get-started dir. Why have that an doverwrite data/data.xml? Just to match https://dvc.org/doc/start/data-access#download? The rest of the scenario doesn't match the GS anyway.
"simplified" -> "simplifies"
"complexity" -> "the complexity"
"How about ..." -> "What about datasets or ML models?"
"DVC repositories and dvc import command" -> "A DVC repository and the dvc import command"
"The url and rev_lock subfields" - Needs more context (mention dvc.xml.dvc)
git diff -> could just be cat data/data.xml.dvc. It's not clear why we're comparing something.
"dvc import, is" -> "dvc import is"
"repository this" -> "repository, this"
Step 6:
Not sure we need it since we've already mentioned and linked to the Data Registry pattern (use case).
The text was updated successfully, but these errors were encountered:
https://katacoda.com/dvc/courses/get-started/accessing
Step 1:
ls
. Could they be downloaded automatically when you get to the step that needs them?get
note from https://katacoda.com/dvc/courses/get-started/versioningls data.xml
-> justls
?Step 2:
dvc.lock
, which"ls -lh
-> justls
?wget
exampleStep 3:
dvc get
them, but how do we even know what data is tracked in a remote DVC repo before accessing it?"dvc ge"t
"data.xml
and"Step 4:
dvc
commands"dvc
first, I thinkcat process.py...
- Use IDE instead?open()
doesn't consume space in the file system - it streams data into memory as needed"But this 3rd point is kind of repetitive vs the 2nd one, may want to rephrase a bit
Step 5:
dvc import
command"git diff
-> could just becat data/data.xml.dvc
. It's not clear why we're comparing something.dvc import
is"Step 6:
Not sure we need it since we've already mentioned and linked to the Data Registry pattern (use case).
The text was updated successfully, but these errors were encountered: