Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start: add meta title, description and key term (data pipelines) [SEO] #1857

Merged
merged 8 commits into from
Oct 29, 2020
16 changes: 12 additions & 4 deletions content/docs/start/data-pipelines.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
# Data Pipelines
---
title: 'Get Started: Data Pipelines'
description: 'Learn how to build and use DVC pipelines to capture, organize,
version, and reproduce your data science and machine learning workflows.'
---

# Get Started: Data Pipelines
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

Versioning large data files and directories for data science is great, but not
enough. How is data filtered, transformed, or used to train ML models? DVC
introduces a mechanism to capture _data pipelines_ — series of data processes
that produce a final result.

DVC pipelines and their data can also be easily versioned (using Git). This
allows you to better organize your project, and reproduce your workflow and
results later exactly as they were built originally!
allows you to better organize projects, and reproduce your workflow and results
later — exactly as they were built originally! For example, you could capture a
simple ETL workflow, organize a data science project, or build a detailed
machine learning pipeline.

## Pipeline stages

Expand Down Expand Up @@ -300,7 +308,7 @@ important problems:
and which commands will generate the pipeline results (such as an ML model).
Storing these files in Git makes it easy to version and share.
- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing
projects in way that it can be reproduced (built) is the fist necessary step
projects in way that it can be reproduced (built) is the first necessary step
before introducing CI/CD systems. See our sister project,
[CML](https://cml.dev/) for some examples.

Expand Down