Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository Cleaning #107

Closed
ChrisRackauckas opened this issue Nov 6, 2016 · 16 comments
Closed

Repository Cleaning #107

ChrisRackauckas opened this issue Nov 6, 2016 · 16 comments
Labels

Comments

@ChrisRackauckas
Copy link
Member

The repository was 750MB. It was pruned down to 250MB, but since this package is now less than 10 lines on master (and will be tagged soon), it's less than ideal that its folder is that large. The documentation was moved to DiffEqDocs, DiffEqTutorials, and DiffEqBenchmarks. The vast majority of the extra stuff is buildup due to these notebooks and pictures. I would like to clean these out of the repository since this will not break previous versions but reduce the size to something much more manageable.

However, from the previous mistake (JuliaLang/METADATA.jl#6963 (comment)) it's clear that doing this kind of cleaning will change the SHA1s. Thus for this to really work, I will need to put in a METADATA PR changing the SHA1s in the previous tags.

@tkelman

@tkelman
Copy link

tkelman commented Nov 6, 2016

See also the discussion at JuliaPlots/Plots.jl#264. Existing tags are really intended to remain fixed, the sha's should not be changed. If you do make this kind of change, it could easily cause problems in updating for anyone with an existing clone. Looking at the github traffic there were apparently 300+ clones, nearly 100 uniques over the past 2 weeks.

@ChrisRackauckas
Copy link
Member Author

I still think that 250MB of wasted space (since all of this just old documentation which is no longer even linked) is a large enough size that this is something that should be handled. I did come up with a long term solution (separate repos for docs/tutorials), it's just that Git histories make this still a problem. So I think that I would still like to try to make something happen.

There will be a major breaking change by tagging this current master. That's why I am thinking doing these changes together. DiffEq currently has the fortune that its ecosystem depends mostly on itself, and there are not version requirements on it (DifferentialEquations does not have dependents). Unless someone has specifically pinned a version, everyone will be on the latest release except for those on v0.4. So since it will never necessarily be a good thing to mess with the histories, I think fixing this now will be much easier than fixing it later. (This is very different than the Stats/StatsBase change, which did have a lot of dependents at the time of the switch. I think it was a good thing they switched when they did!)

Can we run a test? Make a repository with a simple code and some .gifs in it, take a version, clear the repo of the .gifs, change the tag SHA1s, and see if Pkg.update() handles it?

And maybe SciML/Roadmap#8 should be discussed concurrently.

@ChrisRackauckas
Copy link
Member Author

@StefanKarpinski mentioned in JuliaPlots/Plots.jl#264 about doing the test:

We could just overwrite all the SHA1s for versions in METADATA, but that seems like a huge risk to me – it's basically indistinguishable from an attack on METADATA and I'm not sure how the package manager will react. We could do some testing first and see what happens?

Can I run something like that right now? Can I make/register a repo for this, or would you like to?

@oxinabox
Copy link

oxinabox commented Nov 6, 2016

If we ever got shallow cloning for Pkg.
that would just solve this, right?

And I think in the long term, not using Github as a CDN is pretty desirable.

@ChrisRackauckas
Copy link
Member Author

Maybe @ViralBShah and @Keno should be in on this as well.

@ChrisRackauckas
Copy link
Member Author

Can we please give this some consideration? I am going to be tagging a major breaking change in about a week (getting rid of almost all of the code here for the modularization changes #59) but this will still be a very important package in the ecosystem. This major change will also be coupled with blog posts and other forms of social media outreach and so it will be the best time to make such a large change. Could I please have the permission to register and try it on a test repo?

@ViralBShah
Copy link

ViralBShah commented Nov 12, 2016

The basic takeaway so far has been that there is no way to clean the history at the moment. Your best bet is to create a new package with a different name. :-(

Or wait for Pkg3, which is maybe 3-4 months away.

@ChrisRackauckas
Copy link
Member Author

Or wait for Pkg3, which is maybe 3-4 months away.

Pkg3 will solve this? How?

@ViralBShah
Copy link

@StefanKarpinski

Basically we won't have the existing METADATA, allowing for a clean break.

@ChrisRackauckas
Copy link
Member Author

Okay, then I can wait and do the repo-clean when that change happens? That works for me. I'm in no rush if there's an easy solution in the pipeline.

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Nov 2, 2017

Can we make a move on this, or at least have a very clear move in place for the Pkg3 transition?

The issue here is that the history cannot be edited to get rid of the big files without changing the SHA1s. Some ways to handle this would be:

  1. Edit the history and fix the SHA1s in METADATA
  2. Delete some previous tags from METADATA, and tag a new version after doing history edits that dump all of the oldest versions.
  3. When transitioning to Pkg3, only register a new tag and dump the history then (if the break is actually allowed to be "clean" like that, i.e. breaking the previous METADATA version)
  4. Get single branch cloning or tarball downloads as part of Pkg3, which doesn't solve the problem but masks it.
  5. If Pkg3 isn't tied to SHAs, then it fixes itself in the transition once it's okay to break Pkg installs?

This is going to need to be fixed sometime, and if it needs to be breaking I'd rather not keep putting it off.

@Keno
Copy link

Keno commented Nov 2, 2017

If Pkg3 isn't tied to SHAs, then it fixes itself in the transition one it's okay to break Pkg installs?

This is the case, Pkg3 uses tree hashes, so as long as the tree is the same for tagged versions it doesn't care about the commit history.

@tkelman
Copy link

tkelman commented Nov 2, 2017

There's no way to do this in the existing repo without breaking installation for all past releases of Julia though. Would work if Pkg3 was pointing to a new, separate version of the repo.

@YingboMa
Copy link
Member

YingboMa commented Sep 6, 2018

I think that this issue can be closed with Pkg3.

@ChrisRackauckas
Copy link
Member Author

I still would like to do a repo clean. Users just aren't effected anymore though so its priority is much lower.

@tkelman
Copy link

tkelman commented Sep 6, 2018

Users of all past versions of Julia and the package would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants