user-guide: restructure #745
Labels
A: docs
Area: user documentation (gatsby-theme-iterative)
C: guide
Content of /doc/user-guide
status: research
Writing concrete steps for the issue
website: design
Website graphic design
The Structure of the Docs
The focus is on discussing the structure of the User Guide, because the
structure for the rest of the top items is more or less clear.
Get Started
[. . . . .]
Installation
[. . . . .]
how to: "uninstall dvc" for both binary and pip packages #65
Tutorials
[. . . . .]
User Guide
Introduction
Why DVC
Problems in DS/ML
https://dvc.org/doc/understanding-dvc/collaboration-issues
Existing Tools
https://dvc.org/doc/understanding-dvc/existing-tools
https://dvc.org/doc/understanding-dvc/related-technologies
What is DVC
https://dvc.org/doc/understanding-dvc/what-is-dvc
DVC Concepts
https://dvc.org/doc/understanding-dvc/what-is-dvc
user-guide: cache/remote/workspace relationships #53 (comment)
DVC Features
https://dvc.org/doc/understanding-dvc/core-features
https://dvc.org/doc/understanding-dvc/how-it-works
Other Resources
https://dvc.org/doc/understanding-dvc/resources
Maybe some of the resources can be referenced from the other sections in
the Introduction, and this section be removed.
Basic Concepts
Explains in more details the basic concepts of DVC.
Data Management
The core function of DVC is data tracking and management.
https://katacoda.com/dvc/courses/basics/data
Tracking Data Versions
DVC takes advantage of Git's versioning features to keep track of the data
versions.
https://katacoda.com/dvc/courses/basics/versioning
https://dvc.org/doc/use-cases/data-and-model-files-versioning
Sharing Data
DVC facilitates sharing of data between different people that work on the
same project.
https://katacoda.com/dvc/courses/basics/sharing
https://dvc.org/doc/use-cases/share-data-and-model-files
Stages and Pipelines
DVC has a built-in way to connect ML steps into a DAG and run the full
pipeline end-to-end.
https://katacoda.com/dvc/courses/basics/pipelines
Importing Data
Download and track data from another DVC project that is hosted in a Git
repository.
https://katacoda.com/dvc/courses/basics/importing
Dvcignore
https://dvc.org/doc/user-guide/dvcignore
DVC Internals
DVC Files and Directories
https://dvc.org/doc/user-guide/dvc-files-and-directories
user-guide: clearly specify that state db is only used locally #175
Structure of Cache Directory
https://dvc.org/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory
DVC-file Format
https://dvc.org/doc/user-guide/dvc-file-format
guide: describe the how hash values (
md5
) are calculated #68How DVC Understands Data Changes
cmd ref: elaborate on how DVC understands a file's status #576
Updating DVC-files
Anonymized Usage Analytics
https://dvc.org/doc/user-guide/analytics
Large Dataset Optimization
It is important to optimize the DVC setup for having the best performance
with handling big data files.
https://katacoda.com/dvc/courses/basics/performance
https://dvc.org/doc/user-guide/large-dataset-optimization
https://dvc.org/doc/user-guide/update-tracked-files
Using Deduplicating Filesystems
Using Symlinks
Managing External Data
https://dvc.org/doc/user-guide/external-dependencies
https://dvc.org/doc/user-guide/managing-external-data
guide: consolidate external data mgmt guides #520
remote: create separate sections for remote types #499
clarify external outputs section #143
user-guide: clarify differences and use cases for external storage mechanisms? #566
Local Files and Directories
SSH
Amazon S3
Google Cloud Storage
HDFS
HTTP
Managing External Cache
Local Directory
SSH Directory
Amazon S3 Directory
Google Cloud Storage
HDFS
Advanced Data Sharing
Shared Development Server
https://dvc.org/doc/use-cases/shared-development-server
Mounted Cache Dir
guide: using NFS as a remote storage #103
Mounted Remote Storage
Synchronized Remote Storage
user-guide: document data sharing with unsupported storage types #648 (comment)
Managing Experiments
user-guide: add "folders" way of experimentation #159 (comment)
By Tags
By Branches
By Folders
Hybrid
Tips & Tricks / HowTo-s
https://dvc.org/doc/user-guide/running-dvc-on-windows
guide: add "Best Practices" #72
how to: "undo" 'dvc add' + other "how to undo mistakes" recipes #625
Manually Editing DVC-files
How to Use DVC with DB
Use a Bash Script to Create Pipelines
Move a Pipeline to Different Datasets
Update Your Pipeline by Editing DVC-files
how to: "update your pipeline by editing DVC-files" #230
How to Add Output Without Reruning Stage
cmd ref: improve
run
andcommit
/ how to add outs/deps without re-running stage? #460Managing Metrics
user-guide: write about managing metrics #59
How to Avoid/Resolve Conflicts in DVC-Files on Merge
provide guideline or mechanism to resolve conflicts in DVC files on merge #192
How to Store Data on Your Own Server
add "How to store data on your own server" tutorial/guide #54
Using External Data and Cache
how: use DVC when data is stored in an external drive #563
Fix #563: Managing Data Storage On An External Hard Drive #565
Build a Dataset Registry
use-cases: new case study based on Versioning tutorial dataset #674
use-cases: new datasets registry case study, et al. #679
Advanced Remote Scenarios
Using one remote for development and one for production, and switching
between them.
user-guide: add "Define remote through another remote" #237
cmd ref: explain remote://myremote notation in
import
&run
#108How to Use Jupyter Notebook
blog: DVC with Jupyter notebooks #96
Etc.
Contributing
Contributing Code
https://dvc.org/doc/user-guide/contributing
Contributing Docs
https://dvc.org/doc/user-guide/contributing-docs
Command Reference
[. . . . .]
DVC API
[. . . . .]
FAQ
https://dvc.org/doc/user-guide/running-dvc-on-windows
Some questions from the chat, whose answer is not obvious from the rest of the
docs. Maybe they can be categorized.
Changelog
[. . . . .]
The text was updated successfully, but these errors were encountered: