Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues continued #652

Open
chrono2002 opened this issue May 25, 2020 · 8 comments
Open

Performance issues continued #652

chrono2002 opened this issue May 25, 2020 · 8 comments
Labels
status/ready Issue ready to be worked on. type/support Issue with general questions or troubleshooting.

Comments

@chrono2002
Copy link

chrono2002 commented May 25, 2020

Hi!
I've continued to dance around buildpacks and got the following issues/considerations:

  1. Why rebuild everything when application tag changes?

For example, I have app:v1. Changed a few lines and publish it as app:v2. Now pack are doing complete rebuilt instead of just swapping (copying) same layers. Staying just one tag you lose the ability to rollback.

  1. Why rebuild the same layer in several apps?

For example, I have app1:v1 and app2:v1. They use the same code engine with a few lines of difference. Now pack must recompile everything, cause it cannot use a cache from app1 for app2. Could you please implement this in future?

  1. Hit another performance issue with pack doing Sisyphus labor.

I've got an app and two cached layers. They are 600mb+60mb in sizes as a result of first run compilation process. On next run buildpack just checks the manifest and does nothing. Just like an example in docs. But each time time I run pack it does something for about ~30 sec. And that's not build.sh or detect.sh. Probably caused by "Writing tarball..." with very high IO. So why bother writing when nothing changes? All two of cached layers stays the same. Even app haven't changed. Why not just ask: "Hey, cache, are you changed?" "Nope, I haven't". "Allrighty, then! I won't rewrite".

Here is the log:

root@server:~/buildpacks# time pack --timestamps build app:v1 --builder mybuilder --buildpack mybuildpack --path apps/app --env-file app/.config --no-pull --verbose

2020/05/25 03:14:19.785394 Selected run image stack-run:alpine
2020/05/25 03:14:19.799860 Adding buildpack io.buildpacks.sample.node version 0.0.1 to builder
2020/05/25 03:14:19.799881 Setting custom order
2020/05/25 03:14:19.799889 Saving builder with the following buildpacks:
2020/05/25 03:14:19.799907 -> [email protected]
2020/05/25 03:14:21.332550 Using build cache volume pack-cache-0dafcf80cb15.build
2020/05/25 03:14:21.332581 ===> CREATING
[creator] ---> DETECTING
[creator] ======== Results ========
[creator] pass: [email protected]
[creator] Resolving plan... (try #1)
[creator] io.buildpacks.sample.node 0.0.1
[creator] ---> ANALYZING
[creator] Analyzing image "9c01602fd67903533a6891435931bf94f40e0e9811701fe8760ee40b5041645a"
[creator] Restoring metadata for "io.buildpacks.sample.node:cache" from cache
[creator] Writing layer metadata for "io.buildpacks.sample.node:cache"
[creator] Restoring metadata for "io.buildpacks.sample.node:v13.14.0" from cache
[creator] Writing layer metadata for "io.buildpacks.sample.node:v13.14.0"
[creator] ---> RESTORING
[creator] Restoring data for "io.buildpacks.vkapps.sample:cache" from cache
[creator] Restoring data for "io.buildpacks.vkapps.sample:v13.14.0" from cache
[creator] Retrieving data for "sha256:8d74fc3db08572e0391b241f46a4663b76bb42c316052cd62fc5629393afb192"
[creator] Retrieving data for "sha256:6b81558fa1677a5076928b691d65f1bdda85252d0c2b860e7bf180d0f35b96ce"
[creator] ---> BUILDING
[creator] ---> Node Buildpack
[creator] ---> Deploying Node... cached
[creator] ---> Deploying Node modules... cached
[creator] ---> EXPORTING
[creator] no project metadata found at path './project-metadata.toml', project metadata will not be exported
[creator] Reusing layers from image with id '9c01602fd67903533a6891435931bf94f40e0e9811701fe8760ee40b5041645a'
[creator] Writing tarball for layer "launcher"
[creator] Reusing layer 'launcher'
[creator] Layer 'launcher' SHA: sha256:32608bc6d97e0193edb0360555b3e08dc6dfe1187833d8548cdd9e662213935b
[creator] Layer 'app' SHA: sha256:befebf423bdc47e462a54ce45cc1ed2c3194538970407190f934896f403470f3
[creator] Reusing 1/1 app layer(s)
[creator] Writing tarball for layer "config"
[creator] Adding layer 'config'
[creator] Layer 'config' SHA: sha256:37a2db62023dceeb551e3621a1a9183a5916068723372153608fb09c519237cc
[creator] *** Images (ec6b7efa839b):
[creator] index.docker.io/library/app:v1
[creator]
[creator] *** Image ID: ec6b7efa839b0b1630f60de17b45fc323eb8ed82a10f956a0e77b67f6b3660b9
[creator] Writing tarball for layer "io.buildpacks.sample.node:cache"
[creator] Adding cache layer 'io.buildpacks.sample.node:cache'
[creator] Layer 'io.buildpacks.sample.node:cache' SHA: sha256:80555b16c1bb97a3608b562dc6d50e0ae77677186d358abbaa2eacb542181227
[creator] Writing tarball for layer "io.buildpacks.sample.node:v13.14.0"
[creator] Reusing cache layer 'io.buildpacks.sample.node:v13.14.0'
[creator] Layer 'io.buildpacks.sample.node:v13.14.0' SHA: sha256:8d74fc3db08572e0391b241f46a4663b76bb42c316052cd62fc5629393afb192
2020/05/25 03:14:56.605185 Successfully built image app:v1

real 0m36.844s
user 0m0.064s
sys 0m0.060s

Docker version 19.03.6, build 369ce74a3c
latest pack from master

@chrono2002 chrono2002 added status/triage Issue or PR that requires contributor attention. type/enhancement Issue that requests a new feature or improvement. labels May 25, 2020
@ekcasey
Copy link
Member

ekcasey commented May 28, 2020

Probably caused by "Writing tarball..." with very high IO. So why bother writing when nothing changes? All two of cached layers stays the same.

@chrono2002 The lifecycle needs to calculate the layer diffID (essentially generate a tar from a given layer directory) to figure out whether it can actually reuse a cached layer*. We write a layer tarball at the same time that we calculate the hash so that it is available if we need to use it. We could consider hashing first and then regenerating/writing it only when necessary. The cost of traversing and reading a layer dir twice when we need a new layer might be worth it vs. the cost of writing a layer tar when we don't need - but I would want to put some numbers around that. It would depend on how many builds fall into each case, the number and size of files in a layer, etc. etc.

We don't share caches between builds b/c of concerns around cache poisoning. Also, these caches don't grow forever, we remove layers that aren't used in a given rebuild. We may be able to implement a less safe faster cache mode, but it would require some non-trivial design decisions worthy of an RFC.

*worth noting: this isn't true for reused cache=false, build=false layers, which can be reused from the image registry itself w/o a cache (there is some additional complexity in the daemon case that i can explain / we should document, but suffice it to say, it is faster). In general, a layer should only be cache=true if it is needed at build time or if the presence of the previous layer contents improves performance at build time when the layer needs to be updated.

@natalieparellano
Copy link
Member

@chrono2002 if you re-run your pack build with --timestamps that could give more insight around what is taking so long. We'd be happy to help investigate it further.

@natalieparellano
Copy link
Member

Oh hmm, nevermind - I see that was part of your original output. Thinking...

@natalieparellano natalieparellano added type/support Issue with general questions or troubleshooting. and removed status/triage Issue or PR that requires contributor attention. type/enhancement Issue that requests a new feature or improvement. labels May 28, 2020
@natalieparellano
Copy link
Member

Related discussion here re: timestamps: buildpacks/lifecycle#293

@jromero
Copy link
Member

jromero commented Jun 3, 2020

This RFC might be of interest to this discussion as it would greatly improve performance: buildpacks/rfcs#85

@chrono2002, it would be nice to have your thoughts there as well.

@chrono2002
Copy link
Author

The lifecycle needs to calculate the layer diffID (essentially generate a tar from a given layer directory) to figure out whether it can actually reuse a cached layer*. We write a layer tarball at the same time that we calculate the hash so that it is available if we need to use it.

Can I control it somehow? For example, touch a file ./cache_is_changed for recalculation to start and do not recalculate it automagically? That's what Steven said: "give the users ability to decide" :)

We don't share caches between builds b/c of concerns around cache poisoning.

That's ok. But if my environment is friendly, can I share then? :)

*worth noting: this isn't true for reused cache=false, build=false layers, which can be reused from the image registry itself w/o a cache (there is some additional complexity in the daemon case that i can explain / we should document, but suffice it to say, it is faster)

Please explain. Glad to hear any possible way to escape that tarring/untarring circle of reincarnation :)

@chrono2002
Copy link
Author

By the way, I have mounted /var/lib/docker as shm, still overhead is ~10 sec.

@chrono2002
Copy link
Author

I think it's always a disaster without pre-warming. I suggested a few features here to solve that cache attach/detach issue: buildpacks-community/kpack#244

@jromero jromero added the status/ready Issue ready to be worked on. label Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/ready Issue ready to be worked on. type/support Issue with general questions or troubleshooting.
Projects
None yet
Development

No branches or pull requests

4 participants