-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing and retrieving data.frames #366
Conversation
…rage Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
…orrectly Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Swapping two variables when rewritten a data.frame results in a large diff while the information content of the data hasn't changed. Therefore the variables will be reordered to match the original order. Signed-off-by: Thierry Onkelinx <[email protected]>
When a line is moved in a file, the resulting diff is a deletion at the original location and an addition at the new location. Changing the order of the observations in a data.frame does not change the information content. Sorting the data before writing avoids unnecessary diffs. Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
…within the sorting variables Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Floris Vanderhaeghe <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
…tead of "data_repository" Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
dir.exists() is not available in R < 3.2.0 Signed-off-by: Thierry Onkelinx <[email protected]>
Thanks @ThierryO. I will try to find time the next few days to go through the pull request. |
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Signed-off-by: Thierry Onkelinx <[email protected]>
Hi Thierry, Thanks for the pull request. I apologize for the delay in reviewing it. The pull request implements functionality to handle I also have a technical comment. The vignette says that "Git stores the version history under the form of diffs: a list of lines which are deleted and a list of lines which are inserted at a specific line number in a file.". That's actually not correct, Git stores the entire content of each file (for efficiency, files may later be compressed into 'pack files'). The diff is calculated on the fly. I have created an example to illustrate this with some functions to read the internal content of a commit, tree and blob using base R. To read more about the git-internals, see https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
Kind regards |
Hi Stefan, I agree that it is not the core business of the libgit2. But it seems a nice addition as stated by you in #303 and by a view other people which reviewed this (ThierryO#1). The extra functionaly is quite lightweight so it shouldn't place much burden on git2r. Putting the functionality in a separate package would be overkill IMHO. It has only a few functions and it would add an extra dependency for the users. Having the functionality in git2r has the benefit it is will be much more likely to be used. I stand corrected on the diff topic. |
Pardon my butting in uninvited (I watch this repo). But as someone who depends on git2r in a few ways, I tend to agree with @stewid's inclination to stay a fairly minimal wrapper around libgit2. I think the "git2r extension package" is an interesting idea and don't think it's overkill even for the functionality already in this PR. There is a lot of interest in versioning data, so this PR is squarely in that space. Being it's own extension package would give it room to grow organically, as I think some lightweight tools in that space would be very well-received. |
After a discussion with my coworkers, we decided to transform this PR into a standalone package: git2rdata. |
This PR replaces PR #303 and solves issue #301. The functionality is described in a vignette, hence the extra knitr and rmarkdown dependencies.