Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: improve image loading performance #826

Merged
merged 19 commits into from
Nov 16, 2021

Conversation

sbaier1
Copy link
Contributor

@sbaier1 sbaier1 commented Oct 26, 2021

What

This improves image loading speeds by reducing file writes and dependencies.

The tool node is no longer necessary for loading files, the images are streamed directly from an input stream to the cluster nodes.
It also adds the missing reading from stdin feature, for complete-ness.

Misc: In the initial cluster lookup for the command, i added resolution from the node list to the cluster specified in the command to ensure that the Nodes slice is properly populated with the nodes of the given cluster.
I'd like some insights here as to how this worked prior to my changes, because from my debugging, it seemed like the slice was always initialized as nil and looping over the nodes actually never worked as intended. But that can't possibly be the case(?)

How

Image streaming via Reader in the container runtime interface

I added facilities for getting a Reader for given images in the runtime interface (Basically just an interface for save/export endpoints). I think it's fair to assume that any container runtime is going to provide some sort of endpoint for saving / exporting / streaming images in some way, just the same as loading images.

Importing images directly to k3s nodes

I also extended the interfacing with k3s nodes a bit to allow exec-ing commands with stdin streams attached.

This allows us to pipe the incoming image streams directly to the k3s nodes, reducing the intermediate write of images to the tools node, as well as the entire startup of the tools node, because it's no longer needed at all for this operation.

In summary, instead of
image load -> start tools node -> copy to tools node -> load to nodes
it's now
image load -> open readers to all sources -> open exec channel and load directly to nodes

Why

Via #386
I am also using skaffold and observing slow load speeds. Of course, in skaffold we could also optimize by loading all images at once but this is a bit of a hassle because skaffold is currently designed to load images separately to be able to give progress indication for individual images. With this change, the image loading speed i observed from some rudimentary tests is about 2x faster

It also seemed to me that k3d writes to disk a bit excessively for this.
This greatly improves loading performance from my observations and in my use-case. (a single-node k3d cluster, loading multiple images to a single node using separate k3d image import commands)

Implications

The (internal) breaking changes are:

  • the Runtime interface adds the GetImageStream(context.Context, []string) (io.ReadCloser, error) facility for streaming images from the CRI (and the corresponding implementation for the docker runtime)
  • i overloaded the ExecInNode method to add the facilities for attaching a stdin stream to the exec invocation.
  • i am not 100% sure how this affects performance for multi-node cases. I believe the performance should be pretty much the same, i don't know whether it negatively affects performance to stream the image. Perhaps you have some sort of performance test suite we can run here?

Also, i am fairly new to Go, so if any of my code isn't idiomatic or generally good please point it out so i can learn and fix it.

@all-contributors please add sbaier1 for code

@sbaier1 sbaier1 changed the title feat(image-load): improve image loading performance refactor(image-load): improve image loading performance Oct 26, 2021
@iwilltry42 iwilltry42 added this to the v5.1.0 milestone Oct 27, 2021
@iwilltry42
Copy link
Member

Hi @sbaier1 , thank you for this improvement and for your thorough explanation. Amazing! :)

I think, that back then I went for using the tools node to make it easier to work with remote docker daemons.
Example: imagine being connected to a docker daemon running on a remote server and running k3d there. Now you want to use k3d image import: With your solution, would that mean you first stream the image to your local machine and then back to the K3s containers? If that's the case, I assume that import performance would go down quite a bit, especially on a slow connection 🤔

Nonetheless, I think there's a lot to improve around the image importing functionality, so I'm super happy that you're having a look into it :)

FWIW: I also still consider myself as "new to Go" and I definitely was new to Go when I started writing k3d, so no worries :D

@sbaier1
Copy link
Contributor Author

sbaier1 commented Oct 27, 2021

I think, that back then I went for using the tools node to make it easier to work with remote docker daemons. Example: imagine being connected to a docker daemon running on a remote server and running k3d there. Now you want to use k3d image import: With your solution, would that mean you first stream the image to your local machine and then back to the K3s containers? If that's the case, I assume that import performance would go down quite a bit, especially on a slow connection 🤔

ahh, that does make sense; perhaps i should add a flag (and maybe an env var?) to the image import command then which could be used to switch the mode of operation to retain the old method as well?

@iwilltry42
Copy link
Member

ahh, that does make sense; perhaps i should add a flag (and maybe an env var?) to the image import command then which could be used to switch the mode of operation to retain the old method as well?

This or even better (or in addition to the flag?) detect, whether the docker daemon is local or remote (e.g. an ssh:// or tcp:// connection string with non-localhost would be remote). 🤔

@sbaier1
Copy link
Contributor Author

sbaier1 commented Oct 27, 2021

This or even better (or in addition to the flag?) detect, whether the docker daemon is local or remote (e.g. an ssh:// or tcp:// connection string with non-localhost would be remote). 🤔

sounds good, i'll give it a try and update the PR when i have time 👍

@iwilltry42
Copy link
Member

Alternatively, we could re-use your logic in the tools node to save the disk write in general 🤔
Meaning that the tools node would receive a list of images and container IDs and would then takeover the whole process of saving the image and streaming/piping it directly to the K3s containers.
You wouldn't get rid of the tools node, but I guess there would still be a performance increase which would also work on all setups.. what do you think?

@sbaier1
Copy link
Contributor Author

sbaier1 commented Oct 28, 2021

Alternatively, we could re-use your logic in the tools node to save the disk write in general 🤔 Meaning that the tools node would receive a list of images and container IDs and would then takeover the whole process of saving the image and streaming/piping it directly to the K3s containers. You wouldn't get rid of the tools node, but I guess there would still be a performance increase which would also work on all setups.. what do you think?

would be something to consider as a follow-up for sure, for now i added the mode setting option you proposed, if that is fine with you?

So now there is a mode flag for explicitly setting the loading method/mode and an auto discovery is enabled by default which uses the tools node when a remote host is set.

@sbaier1
Copy link
Contributor Author

sbaier1 commented Nov 4, 2021

@iwilltry42 WDYT?

@iwilltry42 iwilltry42 self-assigned this Nov 8, 2021
Copy link
Member

@iwilltry42 iwilltry42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going 👍
I left some comments here and there and some suggestions to make naming more explicit, which makes working easier e.g. with auto-completion (i.e. typing types.Import will suggest all ImportModes for auto-completion):

Sorry for taking so long to find the time to review this.

docs/usage/commands/k3d_image_import.md Outdated Show resolved Hide resolved
cmd/image/imageImport.go Outdated Show resolved Hide resolved
cmd/image/imageImport.go Outdated Show resolved Hide resolved
pkg/client/tools.go Outdated Show resolved Hide resolved
pkg/client/tools.go Outdated Show resolved Hide resolved
pkg/client/tools.go Outdated Show resolved Hide resolved
pkg/client/tools.go Outdated Show resolved Hide resolved
pkg/client/tools.go Outdated Show resolved Hide resolved
pkg/client/tools.go Outdated Show resolved Hide resolved
pkg/types/types.go Outdated Show resolved Hide resolved
@iwilltry42 iwilltry42 modified the milestones: v5.1.0, v5.2.0 Nov 9, 2021
pkg/client/tools.go Outdated Show resolved Hide resolved
pkg/client/tools.go Outdated Show resolved Hide resolved
@iwilltry42
Copy link
Member

Looks pretty cool so far and I think we're close to the merge with just those two leftover comments to go 👍

@sbaier1
Copy link
Contributor Author

sbaier1 commented Nov 16, 2021

please make sure the diff here is correct.

i'm not 100% sure i understood what was originally going on there. i simply added an explicit lookup for nodes of the cluster, but i'm not sure if that is the correct approach.

other than that, IMO this is done

cmd/image/imageImport.go Outdated Show resolved Hide resolved
pkg/client/tools.go Outdated Show resolved Hide resolved
@iwilltry42 iwilltry42 changed the title refactor(image-load): improve image loading performance [Enhancement]: improve image loading performance Nov 16, 2021
@iwilltry42 iwilltry42 merged commit d0898bf into k3d-io:main Nov 16, 2021
@iwilltry42
Copy link
Member

Thanks a lot @sbaier1 ! Great enhancement :)

@iwilltry42 iwilltry42 modified the milestones: v5.3.0, v5.1.1 Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants