Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove non-essential files from the distribution #562

Closed
wants to merge 1 commit into from

Conversation

stevecrozz
Copy link
Contributor

Just as I've done with nokogiri recently (sparklemotion/nokogiri#1719), I'd like to trim the size of this package (the one published to rubygems) so that installations are faster and the gem takes up less space on disk. So what I've attempted to do here is include only the files that are essential to kramdown's functionality, plus the software license which seems like it should be included in all distributions.

@gettalong gettalong self-requested a review December 28, 2018 07:51
@gettalong gettalong self-assigned this Dec 28, 2018
@gettalong
Copy link
Owner

I'm okay with removing Rakefile, setup.rb, benchmark/ and doc/ from the distribution.

@stevecrozz
Copy link
Contributor Author

@gettalong are you saying you don't want to remove the test folder? That seems to be the big one. Here's a top-level look at the file sizes in 1.17:

12K   kramdown-1.17.0/AUTHORS
116K  kramdown-1.17.0/benchmark
16K   kramdown-1.17.0/bin
12K   kramdown-1.17.0/CONTRIBUTERS
12K   kramdown-1.17.0/COPYING
32K   kramdown-1.17.0/data
384K  kramdown-1.17.0/doc
992K  kramdown-1.17.0/lib
32K   kramdown-1.17.0/man
20K   kramdown-1.17.0/Rakefile
12K   kramdown-1.17.0/README.md
44K   kramdown-1.17.0/setup.rb
6.9M  kramdown-1.17.0/test
12K   kramdown-1.17.0/VERSION

@gettalong
Copy link
Owner

The difference in the gem size is 51KB, 135.680 bytes with test/, 83.456 bytes without.

@stevecrozz
Copy link
Contributor Author

@gettalong How are you finding those numbers? In terms of the gem package container, I'm seeing:

$ curl -s https://rubygems.org/downloads/kramdown-1.17.0.gem > kramdown-1.17.0.gem
$ du -hs kramdown-1.17.0.gem 
256K	kramdown-1.17.0.gem

Then unpacked, I am seeing something a little different than what I'd posted before:

$ tar xf kramdown-1.17.0.gem 
$ rm kramdown-1.17.0.gem 
$ du -hs *
4.0K	checksums.yaml.gz
248K	data.tar.gz
8.0K	metadata.gz
$ tar zxf data.tar.gz 
$ rm -f checksums.yaml.gz data.tar.gz metadata.gz 
$ du -hs *
4.0K	AUTHORS
60K	benchmark
8.0K	bin
4.0K	CONTRIBUTERS
4.0K	COPYING
16K	data
232K	doc
480K	lib
24K	man
12K	Rakefile
4.0K	README.md
36K	setup.rb
2.5M	test
4.0K	VERSION

@gettalong
Copy link
Owner

gettalong commented Dec 29, 2018

Okay, I had some additional changes. By removing the paths mentioned by me earlier I get 138.752 bytes with the test/ directory, 86.528 without for the gem file.

Unpacked it will be more but if you care about the space for your specific use case I would recommend installing everything and then running a command like rm -rf path_to_gems/*/test path_to_gems/*/spec. This way you will achieve your goal much sooner and easier.

@stevecrozz
Copy link
Contributor Author

That does work well. But what I'm attempting now is something more like a public service.

I'm going to try on an argument here and see if I can convince you about the test folder. One thing I've noticed recently is how large docker images for Ruby apps can be. And I think it reflects poorly on Ruby when folks are setting up a few dozen microservices and the Ruby images are all 20x the size of the Golang services. It is simple enough to remove the files we don't need when building images, but I think this should be the default scenario rather than an opt-in scenario.

I've come to understand the gem as something we build from the original source code repository, but not necessarily the same as the source code repository. It seems analogous to the difference between installing prepackaged binaries and installing from source.

Plus, it is simple enough to grab the full source code corresponding to the gem version you currently have. If that's what is needed, bundler can do it from git, or we could write a script to grab the corresponding archive from: https://github.com/gettalong/kramdown/releases without relying on git or bundler.

If you're still not convinced, that's ok. I can update the PR as you've already requested. Let me know what you think.

One other option that I've considered is to package two different gems, one with all the original source files and one without. So there could be a kramdown and kramdown-full or something like that.

@gettalong
Copy link
Owner

Thanks for taking your time to write down your thoughts!

For me, Rubygems is a distribution mechanism for kramdown, another one would be the repo contents from a certain git tag. In all cases, the tests should be distributed with the source because than you can verify that the library works on your machine as expected.

If you want to create a docker image and want to make it minimal, you will have to do manual steps anyway. For example, if you would use Debian as base, you would need to delete the /usr/share/doc directory because that is created automatically. So for Debian, since they test their packages before distribution, tests are not important on end-user machines but docs are.

It boils down to the priorities one has, in your case, total size of dependencies. To achieve that goal you have to delete everything which is considered unnecessary for your goal, e.g. you would also delete man/man1/kramdown.1 but I would never remove that from the gem.

My conclusio is: Since you have to do manual steps anyway, an additional rm ... in the docker file doesn't really add complexity.

@stevecrozz
Copy link
Contributor Author

Since you mentioned Debian, let me see if it would change your mind if you consider an image that's an order of magnitude smaller.

Here are two Dockerfile that both are descended from ruby:2.6-alpine
https://gist.github.com/stevecrozz/b97671f9f79fcc447f3a1826e3e25080#file-gistfile1-txt

After building them:

kramdown-without-test   latest        7e17683e0227        2 seconds ago       42.2MB
kramdown-with-test      latest        6bf429a9f232        27 minutes ago      42.8MB
ruby                    2.6-alpine    1267b8a8b7a2        7 days ago          41.8MB

Aside from deleting the bundle cache, there are no manual steps needed for me other than to trim the gem itself. Still, I can see where you're coming from and you may simply have a different idea than I do of how you want to use rubygems.org. For now, I'm going to assume that's the case and update this PR.

@ashmaroli
Copy link
Contributor

@stevecrozz Your commit b75c96f is confusing now.
It looks like you're proposing to delete all those files from the repository itself ⚠️ instead of from just the kramdown gem..!!

@stevecrozz
Copy link
Contributor Author

Thanks @ashmaroli. That was a mistake of course. I've pushed a new commit with the correct change. This one removes only parts requested by @gettalong from the gem rather than from the whole git repo.

@gettalong
Copy link
Owner

Thanks! I have merged your commit.

@gettalong gettalong closed this Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants