-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: checkpoint image defintion #962
Comments
Hi @adrianreber. From the description, I'm thinking an artifact containing the checkpoint, and a subject pointing back to the base image, might be a good model for this. That work only recently merged into main, and we're looking for users to prove out the design. (Note that artifact here includes both the new media type, and using the existing image manifest with a custom config media type, and you'll likely start with the latter for portability.) |
@sudo-bmitch while I think that is interesting thought to consider the |
@estesp good catch. You're right, if you are spanning repositories or registries, the subject/referrers is a bad match. |
IIRC, folks have surfaced use-cases like this and it was decided that
artifacts _could_ span registries.
It's a bit crazy to think these would have to be on the same registry.
Where the snapshot data may likely even just be on a local ephemeral
registry for a quick relocation migration.
|
nod subject refers may be limited to local sha wrt. the referrers api (as is).. but we can use a different mechanism for base image manifest ref in a new artifact type (or new manifest type) .. you just would not get a list of checkpoints on a base image through the new referrers api (when the checkpoints are remote..).. or we can change up subject (refers) to also support a proxy/redirect pattern with the usual domain/image |
One thing that we need to consider with respect to the base image is that if the base image has changed (e.g., has been updated or removed) the associated checkpoints might not work anymore. For instance, CRIU restore might fail due to missing or changed files. Thus, it would be useful to have a mechanism that allows us to find all checkpoint images in a registry that are associated with specific base image. |
That's where subject/referrers is useful, since you reference a specific digest, the base cannot change and the registry knows the relationship. Though in this case, that relationship is flipped, delete the base image, and untagged checkpoints would be eligible for GC. But this only works if checkpoints are limited to the same repository, which I don't think we want. I don't think any solution would allow the registry to know there is a checkpoint in a different repository, and especially not in a different registry. |
Thanks everyone for the feedback. Seems like this is the right place for this discussion. I am not sure I can completely follow the discussion but artifacts sound like a good place for the checkpoint data. Trying to understand how this could work I looked at the https://github.com/opencontainers/image-spec/blob/main/artifact.md and just to make sure I understand it correctly, could a checkpoint image, using artifacts look like this: {
"mediaType": "application/vnd.oci.artifact.manifest.v1+json",
"blobs": [
{
"mediaType": "application/gzip",
"size": 12345,
"digest": "sha256:12343725d74f4bfb94c9e86d64170f7521aad8221a5de834851470ca142da630"
},
{
"mediaType": "application/json",
"size": 123,
"digest": "sha256:56783725d74f4bfb94c9e86d64170f7521aad8221a5de834851470ca142da630"
}
],
"annotations": {
"org.opencontainers.image.checkpoint.created": "2022-10-19T14:42:55Z",
"org.opencontainers.image.checkpoint.name":"<container name>",
"org.opencontainers.image.checkpoint.checkpointer":"criu",
"org.opencontainers.image.checkpoint.runtime":"runc",
"org.opencontainers.image.checkpoint.engine":"cri-o",
"org.opencontainers.image.checkpoint.base.digest":"sha256:afff3924849e458c5ef....d51",
"org.opencontainers.image.checkpoint.base.name":"docker.io/library/alpine",
"org.opencontainers.image.checkpoint.id":"3fc2f9bf82e9",
"org.opencontainers.image.checkpoint.version":"1"
}
} One of the two There was one sentence in the artifact definition which was a bit confusing:
For the checkpoint image we would use it by the container runtime. Just wanted to make sure it is the right place. Concerning referrers: I also think that the base image could be in any other registry and should be retrieved by the container engine using |
For portability, I'd hold back on using the artifact manifest. We only just merged that and I don't think there are any public registries that have added support. When pushing using the image manifest, you'll want to change the config media type from
I think the key differentiator is they modify how a runtime executes an image, rather than being an image themselves. |
Let's make sure that Kubernetes SIG Architecture is happy to endorse this use of a Kubernetes domain name. |
Things I can imagine needing to track somehow:
|
How can this move forward? It it not really clear to me if the discussions about artifacts resulted in anything specific because I probably do not really understand the artifact discussions. From my point of view if would be good to have a definition how to store a checkpoint image in a registry. A checkpoint image needs to include:
|
And some more questions
|
Yes, this is needed and would be one of json files I mentioned. In CRI-O we are currently tracking this.
Also a good idea. |
Some questions regarding incremental checkpoints
IMO it is too early to specify anything for incremental checkpoints at this point. AFAIK we don't have any implementation that uses them yet. But we should keep this use-case in mind for a later extension of the specification. |
@hesch CRIU supports incremental checkpoints via the With checkpoint images, we currently include a complete snapshot that is stored in a single layer. To extend the current approach to support incremental checkpoints we could create an image with multiple layers, where each layer includes a snapshot of the memory changes, and the final layer includes the complete checkpoint.
IMHO, the approach described above could enable incremental checkpoints without special metadata. @adrianreber What do you think? |
Extra metadata could be helpful: it lets an implementation spot that a snapshot isn't compatible (because the implementation only supports single layer checkpoints) with less effort. That's especially relevant if early implementations are likely to miss out that support. |
@rst0git I also think that approach would be good. If I understand this wiki page correctly, there is also the possibility to have incremental checkpoints with the full |
Any recommendations how we can move this forward? It is not clear to me what the current situation is. It seems nobody is against it. We basically need something to put a couple of binary blobs into the image, a couple of JSON files and some additional metadata in the annotations. Any way we can get this defined? |
@adrianreber, have you considered packing the checkpoint data in an OCI artifact? I was experimenting it like this (taken with containerd + criu), and it was accepted by Harbor:
|
Thanks for your interest in this topic. It is not really a question of how to do it. We have a working solution and there are many ideas on how to do it. In the end I am open to anything. I would just like to have a standard. We originally looked at the containerd format, but that uses binary protobuf blobs which is a dependency we want to avoid. JSON would work, but not protobuf. The overhead to decode protobuf seems to complicate the format unnecessarily. For a standard we would prefer JSON. We have working container migration in combination with Kubernetes, but currently we use our CRI-O only format. We would just like to have it standardized for better interoperability between runtimes and especially engines. We already migrated containers from Podman to CRI-O and I am pretty positive that it should be doable between many container engines, but a standard would be nice. |
Background
Over the last couple of years we were working on CRIU based checkpointing. We introduced a simple export file format in Podman to easily move a complete checkpoint from one system to another. A complete checkpoint include the checkpoint files created by CRIU, the current container configuration (
spec.dump
) the content of the file-system diff against the orginal image, log files and metadata:For the Kubernetes use cased we used this layout for the checkpoint archives we are creating in CRI-O (merged) and containerd (still under review). Also during the discussion around the KEP introducing checkpointing to Kubernetes (kubernetes/enhancements#2008) containerd's OCI based checkpoint image was mentioned.
Saving the checkpoint data in an image that can be pushed to a registry made a lot of sense to us and we looked at the containerd checkpoint image. Unfortunately it includes containerd internal protobuf dumps which did not seem useful to have in checkpoint images.
So we created another image format by simply copying the tarball which we already have to an OCI image with some metadata. For CRI-O we currently use the following:
We are currently using
buildah
to create this kind of image:In Kubernetes we can now point to the image
quay.io/adrianreber/checkpoint-test:tag34
and CRI-O, based on the annotations, will detect that a restore is required and not just acreate
/start
.At this point we can restore a checkpoint image created by Podman in CRI-O and we probably can make sure to understand the image created by containerd. But at his point we already have three slightly different implementations (still compatible (Podman, containerd (not merged) and CRI-O) of the same thing and one (the original from containerd) which is not compatible due to the use of containerd internal protobuf structures.
To avoid another image format containing checkpoint information we would like to propose a definition of what a checkpoint image should look like and we hope this is the right location for it.
###Proposal
Over the last couple of years we have slowly added additional information to the checkpoint archive used by Podman, but for this proposal we want to start with the minimal set of information which we think would be important to have in an image. This is based on the current image we are using for Kubernetes in combination with CRI-O.
We would like to add following annotations to a checkpoint image so that it can be easily identified as such an image:
criu
)runc
andcrun
that are able to create checkpoints. There is some support inyouki
. We are working to make sure thatrunc
can restorecrun
checkpoints and vice-versa, but having this information in the annotations allows the consumer of this checkpoint image to decide early if it can restore this image without looking at the actual checkpoint content.)I am unsure if a custom media type is needed to describe the layer containing the actual checkpoint data. Looking at what we currently use
I am undecided. If we want to have additional information in the checkpoint we could just put it into this one layer. So far everything we worked with is put in tar archives or just plain files. Having multiple layers with the different content (rootfs diff,
/dev/shm
content) is a possibility but we can also just put it in one single layer.Our proposal what would be in this layer is the following:
checkpoint.criu
.From our work over the last few years creating such images we think this would be enough as a starting point.
I hope this is the right place to bring up such a request. We are open to (almost) any changes. The important part, from our point of view, is that we find a way to have a well defined layout for a checkpoint image because at this point it starts to get complicated to ensure that all involved tools can work together. From our point of view there is no technical reason that a checkpoint created with one tool (either runc, crun or containerd, CRI-O, Podman) should not work with the others. If we have seen problems so far it was always about missing or different metadata.
The text was updated successfully, but these errors were encountered: