Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package has a large filesystem footprint #264

Closed
tanmaykm opened this issue May 18, 2016 · 66 comments
Closed

package has a large filesystem footprint #264

tanmaykm opened this issue May 18, 2016 · 66 comments

Comments

@tanmaykm
Copy link

Plots.jl filesystem footprint is at around 110MB, mostly from its git history.

$ ~/.julia/v0.4/Plots $ du -ks .
110248  .
$ ~/.julia/v0.4/Plots $ cd .git/
$ ~/.julia/v0.4/Plots/.git $ du -ks .
108760  .

And they seem to be due to files deleted quite some time back.

size(kB)   packed(kB)  location
27420      7343        examples/meetup/wine.ipynb
11348      3266        examples/meetup/nnet.ipynb
4368       1395        examples/meetup/nnet.ipynb
2709       2482        img/gadfly/gadfly_example_2.gif
2690       2466        img/immerse/immerse_example_2.gif
2630       2400        img/gadfly/gadfly_example_2.gif
2508       697         examples/meetup/wine.ipynb
2399       2274        img/pyplot/pyplot_example_2.gif
2310       2174        img/pyplot/pyplot_example_2.gif
2296       2177        img/pyplot/pyplot_example_2.gif
2234       2117        img/pyplot/pyplot_example_2.gif
1845       1821        examples/meetup/iheartplots.gif
1797       1772        examples/meetup/iheartplots.gif
1761       659         examples/meetup/nnet.ipynb
1568       1092        examples/meetup/wine.ipynb
1483       1047        img/qwt/qwt_example_2.gif
1459       829         examples/palettes.ipynb
1431       1079        examples/meetup/wine.ipynb
1233       944         examples/meetup/wine.ipynb
1203       905         examples/meetup/wine.ipynb
...

It will be good to purge/reduce it in some way.

@tbreloff
Copy link
Member

I agree! Do you have any experience with this? Last time I tried to do
something like this I nearly corrupted the whole repo (never ever use git
lfs). I'd like to do this without screwing anything up.

On Wednesday, May 18, 2016, Tanmay Mohapatra [email protected]
wrote:

Plots.jl filesystem footprint is at around 110MB, mostly from its git
history.

$ ~/.julia/v0.4/Plots $ du -ks .
110248 .
$ ~/.julia/v0.4/Plots $ cd .git/
$ ~/.julia/v0.4/Plots/.git $ du -ks .
108760 .

And they seem to be due to files deleted quite some time back.

size(kB) packed(kB) location
27420 7343 examples/meetup/wine.ipynb
11348 3266 examples/meetup/nnet.ipynb
4368 1395 examples/meetup/nnet.ipynb
2709 2482 img/gadfly/gadfly_example_2.gif
2690 2466 img/immerse/immerse_example_2.gif
2630 2400 img/gadfly/gadfly_example_2.gif
2508 697 examples/meetup/wine.ipynb
2399 2274 img/pyplot/pyplot_example_2.gif
2310 2174 img/pyplot/pyplot_example_2.gif
2296 2177 img/pyplot/pyplot_example_2.gif
2234 2117 img/pyplot/pyplot_example_2.gif
1845 1821 examples/meetup/iheartplots.gif
1797 1772 examples/meetup/iheartplots.gif
1761 659 examples/meetup/nnet.ipynb
1568 1092 examples/meetup/wine.ipynb
1483 1047 img/qwt/qwt_example_2.gif
1459 829 examples/palettes.ipynb
1431 1079 examples/meetup/wine.ipynb
1233 944 examples/meetup/wine.ipynb
1203 905 examples/meetup/wine.ipynb
...

It will be good to purge/reduce it in some way.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#264

@KristofferC
Copy link
Contributor

@tbreloff
Copy link
Member

Thanks @KristofferC I think this looks like a good option. I'm going to put it off a little longer until my dev work slows down.

@KristofferC
Copy link
Contributor

You realize that this will make all previous releases void since all commit SHA's will change. People will not be able to just run a Pkg.update. It definitely requires some thought how to do it best.

@tanmaykm
Copy link
Author

Yes, appears difficult to do without some sort of support in the package manager.

@tbreloff
Copy link
Member

Yes... I assume I'll have to update METADATA with new commit info for each tag? I'm not ready to tackle that mess yet.

@tbreloff
Copy link
Member

tbreloff commented Oct 3, 2016

@ViralBShah @StefanKarpinski

As mentioned above, the suggestion was to use https://rtyley.github.io/bfg-repo-cleaner/. I think this will require updates to the commit shas in METADATA... do you foresee any other issues?

@tbreloff
Copy link
Member

tbreloff commented Oct 3, 2016

I see 116MB total... 114MB is in the .git directory, and 1.2MB is from the plotly-latest.min.js file which is no longer bundled with Plots. So without the bloated git history, Plots should be about 1MB.

tom@tom-office-ubuntu:~/.julia/v0.5/Plots$ du  |sort -n |tail -n20
56  ./.git/objects/f6
72  ./src/deprecated/backends
140 ./.git/refs/tags
144 ./src/deprecated
180 ./src/backends
612 ./src
688 ./.git/logs/refs/remotes/cache/pull
688 ./.git/refs/remotes/cache/pull
704 ./.git/logs/refs/remotes/cache
704 ./.git/refs/remotes/cache
736 ./.git/refs/remotes
748 ./.git/logs/refs/remotes
800 ./.git/logs/refs
844 ./.git/logs
912 ./.git/refs
1204    ./deps
108084  ./.git/objects/pack
112648  ./.git/objects
114540  ./.git
116476  .

@ViralBShah
Copy link
Contributor

@StefanKarpinski says this is impossible. I guess the only real option is to wait for Pkg3. With libgit2, we can't even do shallow clones.

@tbreloff
Copy link
Member

tbreloff commented Oct 3, 2016

I don't understand why this would be impossible. @StefanKarpinski maybe you can explain a little more? If the repo had a fresh commit history and we updated METADATA appropriately, why wouldn't this work? It might require people to manually delete their local download of Plots, I suppose? (which wouldn't make it impossible, just annoying)

@ViralBShah
Copy link
Contributor

I think having people removing Plots.jl is ok to clean this up once, while the package is still in relatively early days and before it really explodes.

@ViralBShah
Copy link
Contributor

Also cc @Keno @tkelman

@StefanKarpinski
Copy link

We could just overwrite all the SHA1s for versions in METADATA, but that seems like a huge risk to me – it's basically indistinguishable from an attack on METADATA and I'm not sure how the package manager will react. We could do some testing first and see what happens?

@tkelman
Copy link
Contributor

tkelman commented Oct 3, 2016

It would likely break updating for anyone who has an existing clone. Try tesing with single branch clones first, which wouldn't rewrite the existing tags completely.

@tbreloff
Copy link
Member

tbreloff commented Oct 3, 2016

If we ever decide to do this, I tried locally and I think these are the non-METADATA steps:

cp -R ~/.julia/v0.5/Plots /tmp/Plots_backup
cd /tmp
git clone --mirror [email protected]:tbreloff/Plots.jl.git
java -jar ~/Downloads/bfg-1.12.13.jar  --strip-blobs-bigger-than 10K --protect-blobs-from master,dev,backports,sd/dev Plots.jl
cd Plots.jl
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push
# cross fingers

@tbreloff
Copy link
Member

tbreloff commented Oct 3, 2016

I have a local repo that seems to be small and in tact:

tom@tom-office-ubuntu:~/.julia/v0.5/Reinforce$ du -h /opt/Plots_purge_test2
44K /opt/Plots_purge_test2/.git/hooks
8.0K    /opt/Plots_purge_test2/.git/logs/refs/heads
12K /opt/Plots_purge_test2/.git/logs/refs
20K /opt/Plots_purge_test2/.git/logs
8.0K    /opt/Plots_purge_test2/.git/refs/heads
4.0K    /opt/Plots_purge_test2/.git/refs/tags
16K /opt/Plots_purge_test2/.git/refs
8.0K    /opt/Plots_purge_test2/.git/info
2.4M    /opt/Plots_purge_test2/.git/objects/pack
8.0K    /opt/Plots_purge_test2/.git/objects/info
2.4M    /opt/Plots_purge_test2/.git/objects
4.0K    /opt/Plots_purge_test2/.git/branches
2.6M    /opt/Plots_purge_test2/.git
28K /opt/Plots_purge_test2/test
8.0K    /opt/Plots_purge_test2/deps
180K    /opt/Plots_purge_test2/src/backends
72K /opt/Plots_purge_test2/src/deprecated/backends
144K    /opt/Plots_purge_test2/src/deprecated
612K    /opt/Plots_purge_test2/src
3.3M    /opt/Plots_purge_test2

@ViralBShah
Copy link
Contributor

The real question is how will Pkg interact with it. Is there any way to simulate or test that?

@tbreloff
Copy link
Member

tbreloff commented Oct 3, 2016

Pkg3 was mentioned, but I haven't heard anything about it since JuliaCon... what's the latest?

@tkelman
Copy link
Contributor

tkelman commented Oct 3, 2016

Pkg3 would only be relevant here if it's planning to move entirely away from using git repos and instead using non-repo tarball downloads of packages.

Updating from an existing install to a repo where the history has been rewritten is not likely to work smoothly. Pkg.rm doesn't actually delete the package which will make this messier to deal with for users, and there's the separate .cache bare clone to worry about.

@tbreloff
Copy link
Member

tbreloff commented Oct 3, 2016

Just for discussions sake, what would be the proper way to purge a repo from a user's system so that Pkg would download a fresh copy and start from scratch? (i.e. the most conservative way to make sure it's gone)

@ViralBShah
Copy link
Contributor

ViralBShah commented Oct 3, 2016

How about creating Plots.jl as a new package with a new name? And then eventually renaming this package eventually to the new package so that we retain the issues and PRs.

The downside is that we lose the nice Plots.jl name.

@tkelman
Copy link
Contributor

tkelman commented Oct 3, 2016

You'd need to delete Plots from all the .cache folders (there are several, sometimes but not always with symlinks shared between julia versions), all copies of Plots from .trash, and from the Pkg.dir for each version of Julia where it's been installed. A new repo would be safer.

@ViralBShah
Copy link
Contributor

My understanding is that this needs a new repo, if we don't want to inconvenience users - and that there is no clean way otherwise. Shouldn't we do it sooner rather than later? Perhaps UnionOfPlots.jl. :-)

I would hate to lose the issues and such, but perhaps github support can help us migrate them over.

@ViralBShah
Copy link
Contributor

My understanding on Pkg3 from @StefanKarpinski is that it is expected in the 0.6 release timeframe. However, code readiness and migration to Pkg3 are completely different things, and perhaps even if Pkg3 is ready optimistically by around Jan 2017, it may take another 2-3 months to work out the kinks and migrate.

@tbreloff
Copy link
Member

The name is not changing. Users can be inconvenienced once if they need to
be.

On Sunday, October 16, 2016, Viral B. Shah [email protected] wrote:

My understanding on Pkg3 from @StefanKarpinski
https://github.com/StefanKarpinski is that it is expected in the 0.6
release timeframe. However, code readiness and migration to Pkg3 are
completely different things, and perhaps even if Pkg3 is ready
optimistically by around Jan 2017, it may take another 2-3 months to work
out the kinks and migrate.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#264 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492t44yy4OBpGZnWCIYXRmgbpGnxWoks5q0eWXgaJpZM4IhK8W
.

@ViralBShah
Copy link
Contributor

I will only say that I wish this package was not 100MB. I am ok with whatever solution you choose to go with.

@tbreloff
Copy link
Member

I'm going to do this today, so prepare for the carnage. My plan:

  • create a backup repo at JuliaPlots/PlotsBackup.jl
  • follow the steps I listed here: package has a large filesystem footprint #264 (comment)
  • prepare a METADATA PR that removes most old tags and updates the sha for any commits that I keep
  • after the PR is merged, post an announcement to remove and re-install Plots

@tkelman
Copy link
Contributor

tkelman commented Oct 17, 2016

If you break the existing tags, we'll be redirecting METADATA to point to a fork. Please don't break existing tags. Deleting them from METADATA won't be merged.

@tbreloff
Copy link
Member

Backed up: https://github.com/JuliaPlots/PlotsBackup.jl

@tkelman would you prefer to change the url to point to the backup until the dust settles?

@tbreloff
Copy link
Member

Yup... and how many packages do that?

@tkelman
Copy link
Contributor

tkelman commented Oct 17, 2016

Registered packages, none directly. JuliaBox does. Shipping products do.

there's very little in the ecosystem that depends on Plots directly

Except users' analyses that they generated for a publication that needs revision a month later.

@Keno
Copy link

Keno commented Oct 17, 2016

Is is possible to use months-old versions with any of these at this point though?

@tbreloff
Copy link
Member

If I understand the implications of the output below, only the packages ExperimentalAnalysis and ImplicitEquations will possibly force older versions of Plots to be installed:

tom@tom-office-ubuntu:~/.julia/v0.5/METADATA$ grep Plots */versions/*/requires
ApproxFun/versions/0.1.0/requires:Plots 0.5
ApproxFun/versions/0.2.0/requires:Plots 0.7.5
ApproxFun/versions/0.2.1/requires:Plots 0.7.5
ApproxFun/versions/0.2.2/requires:Plots 0.8.1
ApproxFun/versions/0.3.0/requires:Plots 0.8.2
ApproxFun/versions/0.3.1/requires:Plots 0.8.2
ApproxFun/versions/0.3.2/requires:Plots 0.8.2
ApproxFun/versions/0.3.3/requires:Plots 0.8.2
AverageShiftedHistograms/versions/0.2.0/requires:TextPlots
AverageShiftedHistograms/versions/0.2.1/requires:TextPlots
AverageShiftedHistograms/versions/0.2.2/requires:UnicodePlots
AverageShiftedHistograms/versions/0.3.0/requires:UnicodePlots
AverageShiftedHistograms/versions/0.3.0/requires:Plots
AverageShiftedHistograms/versions/0.4.0/requires:UnicodePlots
AverageShiftedHistograms/versions/0.5.0/requires:UnicodePlots
AverageShiftedHistograms/versions/0.5.1/requires:UnicodePlots
AverageShiftedHistograms/versions/0.5.2/requires:UnicodePlots
BenchmarkProfiles/versions/0.0.1/requires:Plots
ControlSystems/versions/0.1.1/requires:Plots
ControlSystems/versions/0.1.2/requires:Plots
ControlSystems/versions/0.1.3/requires:Plots
ControlSystems/versions/0.1.4/requires:Plots v0.7.4
ControlSystems/versions/0.2.0/requires:Plots v0.7.4
DifferentialEquations/versions/0.0.1/requires:Plots
DifferentialEquations/versions/0.0.2/requires:Plots
DifferentialEquations/versions/0.0.3/requires:Plots
DifferentialEquations/versions/0.1.0/requires:Plots
DifferentialEquations/versions/0.1.1/requires:Plots
DifferentialEquations/versions/0.1.2/requires:Plots
DifferentialEquations/versions/0.1.3/requires:Plots
DifferentialEquations/versions/0.1.4/requires:Plots
DifferentialEquations/versions/0.2.0/requires:Plots
DifferentialEquations/versions/0.2.1/requires:Plots
DifferentialEquations/versions/0.3.0/requires:Plots
DifferentialEquations/versions/0.4.0/requires:Plots 0.9.2
DifferentialEquations/versions/0.4.1/requires:Plots 0.9.2
DifferentialEquations/versions/0.4.2/requires:Plots 0.9.2
EEG/versions/0.0.3/requires:Plots 0.0 0.7
EEG/versions/0.0.4/requires:Plots 0.8.0 0.9.0
EEG/versions/0.1.0/requires:Plots 0.8.0 0.9.0
EEG/versions/0.1.1/requires:Plots
EEG/versions/0.2.0/requires:Plots
ExperimentalAnalysis/versions/0.0.1/requires:Plots 0.0 0.7
ExperimentalAnalysis/versions/0.0.2/requires:Plots 0.0 0.7
ImplicitEquations/versions/0.1.0/requires:Plots 0.5.0 0.5.1
JWAS/versions/0.1.1/requires:Plots
PlotRecipes/versions/0.0.1/requires:Plots
PlotRecipes/versions/0.0.2/requires:Plots
PlotRecipes/versions/0.0.3/requires:Plots
PlotRecipes/versions/0.0.4/requires:Plots
PlotRecipes/versions/0.0.5/requires:Plots
PlotRecipes/versions/0.0.5/requires:StatPlots
PlotRecipes/versions/0.0.6/requires:Plots
PlotRecipes/versions/0.0.6/requires:StatPlots
PlotRecipes/versions/0.1.0/requires:Plots
PlotRecipes/versions/0.1.0/requires:StatPlots
Robotlib/versions/0.0.1/requires:Plots
Robotlib/versions/0.0.2/requires:Plots
StatPlots/versions/0.0.1/requires:Plots
StatPlots/versions/0.0.2/requires:Plots
StatPlots/versions/0.0.3/requires:Plots
StatPlots/versions/0.1.0/requires:Plots
StatPlots/versions/0.1.1/requires:Plots
SymPy/versions/0.2.29/requires:Plots 0.4.0
SymPy/versions/0.2.30/requires:Plots 0.4.0
SymPy/versions/0.2.31/requires:Plots
SymPy/versions/0.2.32/requires:Plots
SymPy/versions/0.2.33/requires:Plots
SymPy/versions/0.2.34/requires:Plots
SymPy/versions/0.2.35/requires:Plots
SymPy/versions/0.2.36/requires:Plots
SymPy/versions/0.2.37/requires:Plots
SymPy/versions/0.2.38/requires:Plots
SymPy/versions/0.2.39/requires:Plots
SymPy/versions/0.2.40/requires:Plots

@StefanKarpinski
Copy link

Renaming Plots to PlotsBackup isn't the problem since GitHub handles redirection for you. The problem is then putting something in the place where Plots used to be but which is a completely unrelated git repo, which will confuse anyone's installation who has Plots installed. We went through this with the Stats/StatsBase renaming and it was rough – it seems like a bad experience to foist on Plots users. I'm not sure about the implications of changing tagged versions but I don't really like it. It seems likely to cause problems. @tbreloff, if you want to go that way, you should make a fork of METADATA and try it (and get some other people to try it as well with Plots previously installed). Otherwise, I think what @tkelman is proposing using single-branch clones is the best way to go, although I have some technological doubts there tbh.

@tkelman
Copy link
Contributor

tkelman commented Oct 17, 2016

GitHub handles redirection for you

Only if you actually do a rename under the settings. If you just push a separate copy as a brand new repository, then there aren't any redirections.

@tbreloff
Copy link
Member

Well, the repo size doesn't really bother me. And I've spent about 10 hours too many on this issue. I'm just burnt out with Julia package management. If you guys care so much about the repo size, then let me break old tags. If you don't care enough, then I'll close the issue.

@ViralBShah
Copy link
Contributor

It is quite likely that Plots.jl has fewer users than the stats packages did. I'd rather make a clean break personally and get a smaller package. Or just leave it as it is - since clearly there isn't a solution that satisfies all constraints.

For my own use, I'll probably make my own clean copy of the Plots.jl repo, if the size bothers me too much going forward. Feel free to close it.

@mkborregaard
Copy link
Member

It'd be great to have this done, keeping the name and infrastructure of Plots. With the current development of Pkg3, does it look like that will support a good solution?

@KristofferC
Copy link
Contributor

AFAIU Pkg3 will cause a large enough change that everything has to be "redone" and then SHA:s for old versions could be changed.

@mkborregaard
Copy link
Member

This will be fixed by Pkg3, I've added wontfix but won't close until Pkg3 is out.

@EMCP
Copy link

EMCP commented Dec 21, 2017

Repo size is preventing me from using Plots on slower internet connections (which unfortunately I deal with a lot out in rural areas of USA). Anything that can be done to trim package sizes helps productivity for me and anyone leveraging what I'm doing.

I'm currently waiting for the Pkg.add("Plots") to finish and have no idea if I'm close or far, due to Julia's lack of progress bar.

@KristofferC
Copy link
Contributor

Pkg3 will use tarballs for standard packages which tend to be less than a megabyte. So this issue will be resolved when Pkg3 is merged into Base which will happen for 0.7

@pkofod
Copy link
Contributor

pkofod commented Feb 5, 2018

Close this as something that won't be fixed in Pkg2 ? Pkg3 will be out "soon" and this will be fixed automatically. Until then there is nothing we can do, so...

@KristofferC
Copy link
Contributor

Latest Plots archive is 192KB which is what will be downloaded in Pkg3. Could keep this open just to have something to close when Pkg3 lands ;)

@pkofod
Copy link
Contributor

pkofod commented Feb 5, 2018

I think Pkg3 landing will leave you plenty of issues to close, but if you need something to do that afternoon we can keep this open :)

@KristofferC
Copy link
Contributor

KristofferC commented Feb 5, 2018

As a teaser, this is installing Plots from scratch (in real time):

https://giphy.com/gifs/xUOwG0sbnLhrPjd9hm

@mkborregaard
Copy link
Member

Holy Moly, that's amazing. @pkofod and @KristofferC we could also close this issue ritually at some point where we're in a position to clink cold beer bottles together?

@pkofod
Copy link
Contributor

pkofod commented Feb 5, 2018

Sounds like a plan!

@StefanKarpinski
Copy link

Just wait until the build steps are handled by BinDep2 and we spend some more time optimizing the resolver. Then this will really fly.

@mkborregaard
Copy link
Member

I think it is time - and I will provide the cold beer bottles to clink together 🎉 ! So, when and where? :-)

@pkofod
Copy link
Contributor

pkofod commented Aug 8, 2018

Could we do it tomorrow? :)

@pkofod
Copy link
Contributor

pkofod commented Aug 10, 2018

Do it when you tag :)

@KristofferC
Copy link
Contributor

Missed the beer but I think this can be closed now anyway 🎉

@mkborregaard
Copy link
Member

I'll buy both of you beer when I get the chance. And anyone else in this thread :-)

@pkofod
Copy link
Contributor

pkofod commented Aug 17, 2018

A Plots.jl with a small filesystem footprint... This is a whole new world!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants