Large datasets #15

eirenjacobson · 2017-04-18T22:30:39Z

Some of the data associated with my project is too big to upload to GitHub (>50 MB). Is there a way to keep those folders of data associated with the R project and track them with Git (although: they shouldn't change, if that matters) without actually uploading them to GitHub?

ha0ye · 2017-04-18T23:16:01Z

To store an offline backup, maybe use Git LFS.
Otherwise, you can not commit it.
You can also create a .gitignore file so that Git does not track it.

eirenjacobson · 2017-04-20T16:37:15Z

Update: I created a .gitignore file and told git to retroactively forget that those data folders and files ever exists (git rm -r --cached ). This worked for files, and also worked on my local machine (i.e., the folders and files no longer appeared as tracked by git) but when I pushed to the remote repo, git really really wanted to upload the contents of those folders even though it shouldn't have known they existed. After a few hours, I gave up, re-cloned, and re-constructed my directory with the .gitignore file present at the initial commit. This worked and I was able to push to the remote repo without uploading large data files.

ha0ye · 2017-04-20T18:25:09Z

Hmm, it sounds like "git rm -r cached" should have worked, but if you have intermediate commits with the files added in, they will be synced when you push to GitHub.

GrantRVD · 2017-04-21T19:07:43Z

@eirenjacobson: Removing files from a git-tracked folder's or repo's history is a non-trivial thing to do, since removing the file also means having to change the history of tracking that file, thus modifying your entire commit history for as long as the large file has been there. Essentially your only chance to fix the issue is to 'undo' the commit that added the file in the first place - so you either have to catch the problem before making any further commits or you have to rollback your project, which itself introduces the problem of then having to add back in all the changes you want to keep.

For future reference, doing it the easy way (i.e. removing the file after the commit that accidentally added it and before any others), @ha0ye's answer gives you part of the solution. You'll want to run these two commands.

git rm --cached <file_name>
git commit --amend -CHEAD

Note the double-dashes for certain arguments and the single-dash for CHEAD. You can add multiple files to the first line instead of <file_name>, separated by spaces. The first line stages the file for removal and the second amends the previous commit (the one that added the file to the git history) so that the specified file(s) doesn't appear anymore. It is not sufficient to just make a new commit with git commit -m <message>, because doing so wouldn't overwrite the commit that added the file, hence the --amend argument.

As for your original question about tracking a file without pushing it. There are two ways.

Create an entirely separate file for the data/files outside of the one connected to your remote repo
Add a new folder to .gitignore, put all the (new) files you don't want pushed to github in that folder, cd to the new folder, and run git init there. That will tell git to track these files separately, and they won't be included in any push to the remote repo connected to the parent directory.

eirenjacobson added help wanted question labels Apr 18, 2017

eirenjacobson assigned ha0ye and eirenjacobson Apr 18, 2017

ha0ye closed this as completed Apr 18, 2017

ha0ye reopened this Apr 20, 2017

ha0ye removed the help wanted label May 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large datasets #15

Large datasets #15

eirenjacobson commented Apr 18, 2017 •

edited

Loading

ha0ye commented Apr 18, 2017

eirenjacobson commented Apr 20, 2017

ha0ye commented Apr 20, 2017

GrantRVD commented Apr 21, 2017 •

edited

Loading

Large datasets #15

Large datasets #15

Comments

eirenjacobson commented Apr 18, 2017 • edited Loading

ha0ye commented Apr 18, 2017

eirenjacobson commented Apr 20, 2017

ha0ye commented Apr 20, 2017

GrantRVD commented Apr 21, 2017 • edited Loading

eirenjacobson commented Apr 18, 2017 •

edited

Loading

GrantRVD commented Apr 21, 2017 •

edited

Loading