Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large datasets #15

Open
eirenjacobson opened this issue Apr 18, 2017 · 4 comments
Open

Large datasets #15

eirenjacobson opened this issue Apr 18, 2017 · 4 comments
Assignees
Labels

Comments

@eirenjacobson
Copy link
Contributor

eirenjacobson commented Apr 18, 2017

Some of the data associated with my project is too big to upload to GitHub (>50 MB). Is there a way to keep those folders of data associated with the R project and track them with Git (although: they shouldn't change, if that matters) without actually uploading them to GitHub?

@ha0ye
Copy link
Contributor

ha0ye commented Apr 18, 2017

To store an offline backup, maybe use Git LFS.
Otherwise, you can not commit it.
You can also create a .gitignore file so that Git does not track it.

@ha0ye ha0ye closed this as completed Apr 18, 2017
@eirenjacobson
Copy link
Contributor Author

Update: I created a .gitignore file and told git to retroactively forget that those data folders and files ever exists (git rm -r --cached ). This worked for files, and also worked on my local machine (i.e., the folders and files no longer appeared as tracked by git) but when I pushed to the remote repo, git really really wanted to upload the contents of those folders even though it shouldn't have known they existed. After a few hours, I gave up, re-cloned, and re-constructed my directory with the .gitignore file present at the initial commit. This worked and I was able to push to the remote repo without uploading large data files.

@ha0ye ha0ye reopened this Apr 20, 2017
@ha0ye
Copy link
Contributor

ha0ye commented Apr 20, 2017

Hmm, it sounds like "git rm -r cached" should have worked, but if you have intermediate commits with the files added in, they will be synced when you push to GitHub.

@GrantRVD
Copy link

GrantRVD commented Apr 21, 2017

@eirenjacobson: Removing files from a git-tracked folder's or repo's history is a non-trivial thing to do, since removing the file also means having to change the history of tracking that file, thus modifying your entire commit history for as long as the large file has been there. Essentially your only chance to fix the issue is to 'undo' the commit that added the file in the first place - so you either have to catch the problem before making any further commits or you have to rollback your project, which itself introduces the problem of then having to add back in all the changes you want to keep.

For future reference, doing it the easy way (i.e. removing the file after the commit that accidentally added it and before any others), @ha0ye's answer gives you part of the solution. You'll want to run these two commands.

git rm --cached <file_name>
git commit --amend -CHEAD

Note the double-dashes for certain arguments and the single-dash for CHEAD. You can add multiple files to the first line instead of <file_name>, separated by spaces. The first line stages the file for removal and the second amends the previous commit (the one that added the file to the git history) so that the specified file(s) doesn't appear anymore. It is not sufficient to just make a new commit with git commit -m <message>, because doing so wouldn't overwrite the commit that added the file, hence the --amend argument.

As for your original question about tracking a file without pushing it. There are two ways.

  1. Create an entirely separate file for the data/files outside of the one connected to your remote repo
  2. Add a new folder to .gitignore, put all the (new) files you don't want pushed to github in that folder, cd to the new folder, and run git init there. That will tell git to track these files separately, and they won't be included in any push to the remote repo connected to the parent directory.

@ha0ye ha0ye removed the help wanted label May 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants