-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix very large git repository #77
Comments
@Bisaloo you're right! This is a problem we should address soon for the repository to be easier to clone for the users. We've been discussing with @GeraldineGomez and came to the conclusion that the best approach would be to follow the suggestions in this stackoverflow post, i.e. to use either Bare in mind that this shouldn't have a significant impact neither in the general functioning of the package nor of that of previous versions. This files are files generated by Stan whenever a model is ran for the first time locally, which is why some time ago we created the function For the tests we've opted, for time efficiency reasons (the tests would take for ever if each test file were to run each or even one model everytime), to store some benchmark stan model objects as serialized .json files that can be imported during tests by means of |
Yes, I used this approach in the past and it worked well. As mentioned in the previous message, you just have to be very careful during the last step,
Yes, I understand your motivations here but as you guessed it, this is going to be a problem with CRAN since they flag any package larger than 5 MB. However, we might be focusing on the wrong issue. I'm not completely sure it's the best strategy to store so much information in your |
I agree with @Bisaloo -- we should alter the Git history as the repo size is huge. I'm not concerned about these files disappearing from the git history as they're not important for the package functioning. |
We get down from 1.6GB to 79MB by running: git filter-repo --strip-blobs-bigger-than 4M Note that this works well in this specific case but should not be applied in any repo. It works here because the only large files are files we want to delete. If we have a mix of large files we want to keep and large files we want to delete, we need to use a smarter solution Files deleted from history:
|
@jamesmbaazam @davidsantiagoquevedo @ben18785 @zmcucunuba Apologies if we closed a PR or deleted a branch in which you were involved. This is an unfortunate side effect of cleaning the git history. The branches don't have any common history with the new Thanks for your understanding! |
Noted. Thanks. |
The serofoi git repository is currently extremely large, with a total size around 1.6 GB. This makes it difficult to clone, and contribute to.
A quick analysis reveals the problems come from these files in history:
R/stanmodels/
inst/extdata/stanmodels/
tests/*/fitting_results/
data/
, etc.) but their contribution is smallerNote that because git keeps the entire history, even though these files have been deleted, they keep contribute to the total repository size.
At this stage, I think it would be worthwhile to remove these files from history to be able to start fresh on a clean, lean, repo. However, this comes with some issues: deleting these files will rewrite the entire git history, which means that all contributors will have to delete their local clone and clone the repo again.
This is not completely unheard of. In fact, the carpentries have done something similar in all their lesson repos recently but it needs to be done with: 1. lots of caution, 2. the buy-in of all contributors
The text was updated successfully, but these errors were encountered: