Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog: February community gems #3261

Merged
merged 24 commits into from
Feb 28, 2022
Merged
Changes from 4 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
21ad7be
started Fec CGs
flippedcoder Feb 2, 2022
16dd7d2
Merge branch 'master' into blog/feb-comm-gems
flippedcoder Feb 3, 2022
286edc3
added initial draft
flippedcoder Feb 7, 2022
df0407f
fixed the url
flippedcoder Feb 7, 2022
243ac1e
CML/AWS rewording
casperdcl Feb 23, 2022
dd6b64c
Restyle blog: February community gems (#3318)
restyled-io[bot] Feb 23, 2022
e8b90d6
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
9ef1747
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
c647ebd
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
ad4e4b2
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
ddcbdc0
minor update
flippedcoder Feb 24, 2022
9d8c74a
Merge branch 'blog/feb-comm-gems' of https://github.com/iterative/dvc…
flippedcoder Feb 24, 2022
4a1dc85
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
7a12abb
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
347e4c8
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
fbd4d03
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
e69c6fe
Update content/blog/2022-02-28-february-22-community-gems.md
flippedcoder Feb 24, 2022
499052b
Merge branch 'blog/feb-comm-gems' of https://github.com/iterative/dvc…
flippedcoder Feb 24, 2022
128edd2
making minor edits
flippedcoder Feb 24, 2022
4ec9be3
more updates
flippedcoder Feb 24, 2022
cd58197
added cover image and march office hours link
flippedcoder Feb 24, 2022
7c91a70
more feedback updates
flippedcoder Feb 25, 2022
18f35f2
updated email link
flippedcoder Feb 28, 2022
3fe6872
added link to zntrack
flippedcoder Feb 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions content/blog/2022-02-28-february-22-community-gems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
title: February '22 Community Gems
date: 2022-02-28
description: >
A roundup of technical Q&A's from the DVC and CML community. This month:
comparing experiments, working with data, working with pipelines, and more.
descriptionLong: >
A roundup of technical Q&A's from the DVC and CML community. This month:
comparing experiments, working with data, working with pipelines, and more.
casperdcl marked this conversation as resolved.
Show resolved Hide resolved
picture: 2022-01-31/jan-community-gems.png
author: milecia_mcgregor
commentsUrl: https://discuss.dvc.org/t/february-22-community-gems/100132
tags:
- Data Versioning
- DVC Remotes
- DVC API
- DVC Stages
- Community
---

### [Is there a proper way of deleting DVC tracked files from cloud storage?](https://discord.com/channels/485586884165107732/563406153334128681/927618225989111880)
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

Thanks for the question @fireballpoint1!
casperdcl marked this conversation as resolved.
Show resolved Hide resolved

You can find the best way to delete files from your cloud storage in
[our docs](https://dvc.org/doc/command-reference/gc#removing-data-in-remote-storage).
Make sure you're super careful when deleting data from the cloud because it's an
irreversible action. Here's an example of a deletion command that will clear out
your workspace and the cloud storage.
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ dvc gc --workspace --cloud
```

This option only keeps the files and directories referenced in the workspace and
it removes everything else, including data in the cloud. By default, this
command will use the default remote you have set. You can specify a different
remote storage with the `--remote` option like this.
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ dvc gc --workspace --cloud --remote name_of_remote
```

### [I'm using DVC experiments for deep learning projects, but I'm running into a problem where the Git index gets corrupted when cache files are above 4 GB. What is the best workaround for this?](https://discord.com/channels/485586884165107732/563406153334128681/928939232033140736)
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

Great question from @charles.melby-thompson!

This is a known
[issue with experiments](https://github.com/iterative/dvc/issues/6181) and we
highly encourage you to comment on and follow this ticket to let us know what
you need. The reason this happens is because DVC will automatically track `-O`
outputs with Git internally since it assumes that any outputs that are not
explicitly part of your `.gitignore` file is part of the experiment state that
needs to be tracked.

You should be able to explicitly add the `-O` output file(s) to your
`.gitignore` as a workaround in the meantime.
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

### [Is there an easy way to visualize DVC experiment results without using the command line?](https://discord.com/channels/485586884165107732/485596304961962003/930150143259459644)
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

Good question @LucZ[Mad]!

If you bring those experiments into your regular Git workflow like, using
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved
`dvc exp branch` to create a branch for any experiment you want to share, you
could use [DVC Studio](https://studio.iterative.ai/) to visualize them.

We're working on support for viewing any pushed experiments in Studio right now
so if there's anything you want to see, make sure to comment on and follow
[this issue](https://github.com/iterative/studio-support/issues/45).

### [Is it possible to change the CML runner shutdown to stopping the instance after the idle timeout instead of terminating the instance using self-hosted runners on AWS?](https://discord.com/channels/485586884165107732/728693131557732403/933674203796873226)
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

This is another fantastic question from @jotsif!

Unfortunately no, the instance will be destroyed. If you're trying to preserve
the cache to try and speed up your experimentation time, you could
[check this out](https://aws.amazon.com/premiumsupport/knowledge-center/s3-transfer-data-bucket-instance/)
if you're using S3 for your remote storage.

Just be cautious since an instance that is in the "off" state might still be
considered in use for billing purposes. It's best to let the CML runner
terminate your instance and run `dvc pull` to restore the data.
casperdcl marked this conversation as resolved.
Show resolved Hide resolved

### [Where can I find more details on how the DVC Studio free version differs from the enterprise version?](https://discord.com/channels/485586884165107732/841856466897469441/933324508570472497)
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

Thanks for asking @Abdi!

You can find more info about the different
[DVC Studio tiers here](https://studio.iterative.ai/#pricing).

The Free version has all the features most individual users need, like
connecting to ML repositories, creating views, submiting experiments, and
generating plots. The Teams version allows you to create large teams for better
collaboration and sharing of views and settings with everyone. The Enterprise
version is more for needs around compliance, dedicated support, and on-premise
installation.
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

If you are trying to decide which plan to select, please email us at
`[email protected]` and we'll help you figure it out based on your needs.
casperdcl marked this conversation as resolved.
Show resolved Hide resolved

### [How do you `dvc commit` or get the `dvc status` of each case in a `foreach` stage?](https://discord.com/channels/485586884165107732/563406153334128681/938649682492686366)
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

It should be enough to do `dvc commit <stagename>@<foreach name>`.

For example, assuming you have a `params.yaml` that looks like this:

```yaml
languages:
- en
- it
- de
- fr
```

and a `dvc.yaml` that looks like this:

```yaml
stages:
train-model:
foreach: ${languages}
do:
cmd: echo "training '${item}' model"
```

then run:

```dvc
$ dvc commit train-model@en # commits the 'en' stage
```

### [I was wondering whether it was possible to reuse the same `dvc.yaml` file in multiple pipeline folders such that the exact same stages are run with different `params.yaml` files?](https://discord.com/channels/485586884165107732/485596304961962003/939099847288578079)
flippedcoder marked this conversation as resolved.
Show resolved Hide resolved

@louisv, thanks for this question!

It seems like you're looking for the parametrization functionality. You can
learn more about how it works
[in this doc](https://dvc.org/doc/user-guide/project-structure/pipelines-files#templating),
but here's a an example of what that might look like in the `dvc.yaml`.

```yaml
stages:
cleanups:
foreach: # List of simple values
- raw1
- labels1
- raw2
do:
cmd: clean.py "${item}"
outs:
- ${item}.cln
```

### [Is it possible to change the x-label in DVC Studio?](https://discord.com/channels/485586884165107732/841856466897469441/938857004187943003)

A great question about Studio from @PythonF!

You can set custom properties for your plot in your `dvc.yaml` like this:

```yaml
plots:
- plots_no_cache.csv:
cache: false
x: r
```

You can also use `dvc plots modify` to change the x-label or y-label for your
plots using commands similar to the following.

```dvc
$ dvc plots modify plots_no_cache.csv -x r -y q
```

---

https://media.giphy.com/media/h5Ct5uxV5RfwY/giphy.gif

At our March Office Hours Meetup we will be ...! [RSVP for the Meetup here]() to
stay up to date with specifics as we get closer to the event!

[Join us in Discord](https://discord.com/invite/dvwXA2N) to get all your DVC and
CML questions answered!