-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use-cases: improve Data Registry case #795
Comments
This comment has been minimized.
This comment has been minimized.
I would agree if this was a tutorial or get started chapter with reproducible commands but here we are actually talking in a more hypothetical way. |
This comment has been minimized.
This comment has been minimized.
Actually, we first mention that "Instead of adding it it to both projects, B can simply import it from A." (implying simple project dependency). Like you noticed, this use case is about data registries so that's why we focus on them. As for the list of advantages. I had the exact same concern with @shcheklein at first haha. Most of them are not specific to data registries. The reason we have them all listed there though, is that we hope use cases can serve as a bit of marketing, since we imagine they can be the landing pages for some users, linked to the use case directly from a search engine (first web page they ever see in our website). So, we are selling DVC as a whole in here. |
What's wrong with "partitioned"? We then use the word "parts" several times. |
@Suor please notice my comments on some of your feedback above. I've also starting addressing your comments in #805 but maybe wait a little before reviewing that, until we have some more agreements in the discussions here. (Only the larger point about the example I haven't addressed or replied to yet.) |
It is presented in a too abstract way I guess, you read through it and have nothing to fix your mind on. And then you make a huge conceptual jump with "Keeping this in mind ...". There is no way keeping this, the mere possibility of import, in mind I would come to Date Registry in one step. So the whole Data Registry looks like a solution without a problem.
|
OK. I'm changing the later "can" words in this same hypothetical context to "could". Notice that there's also a "would". I rephrased other parts of the paragraph too. See #805 (review). |
@Suor "Keeping this in mind" is supposed to refer to "DVC also includes the Perhaps we should just remove the abstract example altogether and find another place to talk about project dependency?
|
I see you also left a clear suggestion for this @Suor... (I missed that comment before. 😅)
So yes, the problem here is that we don't want to talk about project dependency but about data registries. And keep it as short as possible. I think we're assuming people will know/understand the problems that can use this solution. What do you think @shcheklein? Also about my suggested way to address this concern:
|
This comment has been minimized.
This comment has been minimized.
Me again. OK, while yo guys think about it, I simplified the project inter-dependency mention per the following ideas above:
You may see the exact changes and continue this discussion on #805 (review). |
Last item pending here @Suor:
I agree that the intro to the example is a bit weird... It's similar to the old note project A and B example where we tried to just kind of mention something but in no more than one paragraph, so it ended up being too brief perhaps. I like how your suggestion of just stating problem and solving it sounds, but I'm not sure how exactly that would look. In a way this story is meant to state the problem. I'll think about this...
It actually starts from scratch though: 1) dataset split in 2 on a storage server, parts downloaded with
Again, since it's not a tutorial or get started chapter, we don't intend to provide end-to-end reproducible commands. We decided to add the expandable sections in case someone actually ran them and didn't get the expected results. |
…logic and readability per #795 (comment)
Alright. I tried to improve the logic of the example, not a major rewriting but significant rephrasing involved. Please review PR#805 and let's move this discussion over there. Please open reviews/comments there as needed. |
@jorgeorpinel has answered this already, but I think the problem also comes from the way we introduce it. We go from using regular imports/gets to setting up a dedicated data registry. While we should be comparing no DVC at all for data management (it means - ad-hoc conventions and total mess on S3) with the DVC Data Registry which effectively provides some "meta" information for the same data on S3. It means that I would not emphasize that it's a mess to chain multiple imports/gets. It's a mess to not use anything to organize data. And data registry just one of the ways to organize it. |
Good catch. Will review. |
I took a look at the new Data Registry page. Congrats @jorgeorpinel on compiling it and please don't be angry if you'll get a bit more than you expected :)
Here go some considerations big and small as they occur in the text.
could -> can
Also, why do we push data-registry from the start? Can't one repo simply depend on the other repo? Registry is only one scenario. The majority of advantages listed do not require data registry. I understand that this use case is named "Data Registry", but maybe we can lay some trail to it?
The text was updated successfully, but these errors were encountered: