Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

June gems #1510

Merged
merged 5 commits into from
Jun 30, 2020
Merged

June gems #1510

merged 5 commits into from
Jun 30, 2020

Conversation

elleobrien
Copy link
Contributor

June Gems are here- going to try to get it out while it's still June! :)

@shcheklein shcheklein temporarily deployed to dvc-landing-june-gems-fvxu7n5o June 29, 2020 19:26 Inactive
@elleobrien elleobrien requested a review from shcheklein June 29, 2020 19:26
consuming, and the dependencies and outputs haven't changed. You can use the
`--no-exec` flag to get around this:

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add dvc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add $ before the command - here and in other places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some bugs like this in the previous Gems btw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. might be good to revise previous gems then too


_Just like this but with technical documentation._

### Q: After I pushed my local data to remote S3 storage, I noticed the file names are different in S3- they're hash values. [Can I make them more meaningful names?](https://discord.com/channels/485586884165107732/563406153334128681/717737163122540585)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to mention S3 - we can generalize it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, would be great to briefly provide motivation - e.g. deduplication , security - file are immutable, etc, GitFlow ...

In addition to dvc list mention data registry article and/or other commands dvc get, dvc import, Python dvc.api - - all of them provide a holistic data access layer for DVC-tracked objects (files, ML models, directories) which can be used usually as a drop-in replacement for regular data access libraries (e.g. aws boto,aws cli, in case of S3)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK have developed this answer more in the next version, let me know what you think

@shcheklein shcheklein temporarily deployed to dvc-landing-june-gems-fvxu7n5o June 29, 2020 21:46 Inactive
@elleobrien
Copy link
Contributor Author

@shcheklein revisions are pushed

### Q: After I pushed my local data to remote storage, I noticed the file names are different in my storage repository- they're hash values. [Can I make them more meaningful names?](https://discord.com/channels/485586884165107732/563406153334128681/717737163122540585)

No, but for a good reason! What you're seeing are cached files, and they're
stored in a special format that makes DVC versioning and addressing possible-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format -> way (we don't change format, it might confuse some folks). CSV stays CSV, we only change its name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm going to say "naming convention"

Copy link
Contributor Author

@elleobrien elleobrien Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated in the latest commit

@restyled-io restyled-io bot mentioned this pull request Jun 29, 2020
@shcheklein shcheklein temporarily deployed to dvc-landing-june-gems-fvxu7n5o June 29, 2020 23:04 Inactive
@elleobrien
Copy link
Contributor Author

I think all issues are addressed. Aiming to publish tomorrow AM so let's merge then?

@shcheklein shcheklein temporarily deployed to dvc-landing-june-gems-fvxu7n5o June 29, 2020 23:33 Inactive
@shcheklein shcheklein merged commit 194d5e2 into master Jun 30, 2020
Copy link
Contributor

@casperdcl casperdcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

DVC cache and remote if the contents of your dataset change frequently.

Generally, we would recommend first trying a plain unzipped directory. DVC is
designed to work with large numbers of files (on the order of millions) and has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra has :)

must set the `endpointurl` too. For example:

```dvc
$ dvc remote add -d myremote s3://mybucket/path/to/dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it best to use long flag names for commands in documentation? --default better than -d? Otherwise people may accidentally change their default remote when copy-pasting this command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants