Re-run initContainers in a Deployment when containers exit on error #3676

szh · 2022-12-06T14:16:15Z

I'm copying this issue from kubernetes/kubernetes#52345 because it seems that this is the appropriate repo for it.

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: Container in a Deployment exits on error, container is restarted without first re-running the initContainer.

What you expected to happen: Container in a Deployment exits on error, initContainer is re-run before restarting the container.

How to reproduce it (as minimally and precisely as possible):

Sample spec:

kind: "Deployment"
apiVersion: "extensions/v1beta1"
metadata:
  name: "test"
  labels:
    name: "test"
spec:
  replicas: 1
  selector:
    matchLabels:
      name: "test"
  template:
    metadata:
      name: "test"
      labels:
        name: "test"
    spec:
      initContainers:
        - name: sleep
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - sleep
            - 1s
      containers:
        - name: test
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - /bin/sh
            - exit 1

Implementation Context:

I have an initContainer that waits for a service running in Kubernetes to detect its existence via pod annotations, and send it an HTTP request, upon which it writes this value to disk. The main container then reads this value upon startup and "unwraps" it via another service, upon which it stores the unwrapped value in memory.

The value that is written to disk by the initContainer is a one-time read value, in that once it is used the value is then expired. The problem is that if the main container ever restarts due to fatal error, it loses that unwrapped value and upon startup tries to unwrap the expired value again, leading to an infinite crashing loop until I manually delete the pod, upon which a new pod is created, the initContainer runs, and all is again well.

I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.

Enhancement Description

One-line enhancement description (can be used as a release note):
Kubernetes Enhancement Proposal:
Discussion Link:
Primary contact (assignee):
Responsible SIGs:
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y):
- Beta release target (x.y):
- Stable release target (x.y):
Alpha
- KEP (k/enhancements) update PR(s):
- Code (k/k) update PR(s):
- Docs (k/website) update PR(s):

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

The text was updated successfully, but these errors were encountered:

szh · 2022-12-06T14:17:34Z

/sig node

thockin · 2022-12-13T00:50:13Z

This is a challenging use-case. How do you trigger this if your app has 2 containers? What if one of them is a sidecar that you (the pod author) don't really know about or control?

It seems to me that initContainer (as defined today) is a poor fit here - your app startup could either do this itself or you can wrap it in another tool/script that does the unwrap and then starts your app. That answer is, itself, somewhat unsatisfying because it means you can't decouple those ideas or those container images or credentials/permissions.

@SergeyKanzhelev since "keystone" came up in the sidecar discussion too - this is what I really meant when we started the idea. It doesn't mean "this is an app" vs "this is a sidecar" - it means "if this one goes down, everything goes down" Most pods would not use this feature at all, but those who need it KNOW they need it.

@jpbetz since you're looking at the lifecycle stuff, too.

sftim · 2022-12-22T21:03:37Z

@thockin What you term “keystone” containers, I've heard named “essential” (eg in https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definitions)

thockin · 2022-12-22T22:04:33Z

EVERYONE should know by now NOT to let me name things :)

jpbetz · 2023-01-13T21:05:34Z

I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.

This is the direction I started thinking when I saw this issue. I agree with @thockin that the initContainers are a poor fit. initContainers are containers that initialize the pod and they do exactly that.

Say it was possible to define a Deployment with a restartPolicy=Never pod (today it can only be Always). That would make the desired pod lifecycle clear for this "initContainer initializes a one-time read value" case-- if the main container fails terminate the pod and create a new one to replace it. But would have the major downside of requiring a new pod be scheduled each time the main container failed. That's probably not what most people would want?

One alternative would be a sidecar that can produce a "one-time read value". Each time the main container starts, it retrieves a new "one-time read value" from the sidecar. It would then be possible to have a simple process in the main container that retrieves the "one-time read value", writes it to the appropriate location on disk and then starts the main process for the container.

k8s-triage-robot · 2023-04-13T21:36:22Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Ugzuzg · 2023-04-14T07:27:50Z

/remove-lifecycle stale

SergeyKanzhelev · 2023-05-05T21:43:26Z

@Ugzuzg do you plan to work on this for 1.28? I see you removed the stale lifecycle.

bzhang-liveperson · 2023-08-15T17:54:11Z

Wondering if this can make it into 1.29?

k8s-triage-robot · 2024-01-26T11:33:01Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

bzhang-liveperson · 2024-01-26T22:18:11Z

/remove-lifecycle stale

k8s-triage-robot · 2024-04-25T22:18:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

objnf-dev · 2024-04-27T12:06:30Z

/remove-lifecycle stale

k8s-triage-robot · 2024-07-26T12:49:34Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-08-25T13:26:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 6, 2022

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 6, 2022

This was referenced Dec 6, 2022

KEP: New pod restartPolicy to restart the whole pod instead of just a container kubernetes/community#2342

Closed

Re-run initContainers in a Deployment when containers exit on error kubernetes/kubernetes#52345

Open

thockin mentioned this issue Dec 13, 2022

Deployment, ReplicaSet, StatefulSet, DaemonSet require restartPolicy=Always but don't NEED to kubernetes/kubernetes#114437

Open

thockin mentioned this issue Feb 1, 2023

Sidecar containers #3759

Closed

4 tasks

thockin mentioned this issue Feb 16, 2023

Terminate pod on container completion #3582

Open

4 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 13, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 14, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 25, 2024

thockin removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-run initContainers in a Deployment when containers exit on error #3676

Re-run initContainers in a Deployment when containers exit on error #3676

szh commented Dec 6, 2022

szh commented Dec 6, 2022

thockin commented Dec 13, 2022

sftim commented Dec 22, 2022

thockin commented Dec 22, 2022 via email

jpbetz commented Jan 13, 2023 •

edited

Loading

k8s-triage-robot commented Apr 13, 2023

Ugzuzg commented Apr 14, 2023

SergeyKanzhelev commented May 5, 2023

bzhang-liveperson commented Aug 15, 2023

k8s-triage-robot commented Jan 26, 2024

bzhang-liveperson commented Jan 26, 2024

k8s-triage-robot commented Apr 25, 2024

objnf-dev commented Apr 27, 2024

k8s-triage-robot commented Jul 26, 2024

k8s-triage-robot commented Aug 25, 2024

Re-run initContainers in a Deployment when containers exit on error #3676

Re-run initContainers in a Deployment when containers exit on error #3676

Comments

szh commented Dec 6, 2022

Enhancement Description

szh commented Dec 6, 2022

thockin commented Dec 13, 2022

sftim commented Dec 22, 2022

thockin commented Dec 22, 2022 via email

jpbetz commented Jan 13, 2023 • edited Loading

k8s-triage-robot commented Apr 13, 2023

Ugzuzg commented Apr 14, 2023

SergeyKanzhelev commented May 5, 2023

bzhang-liveperson commented Aug 15, 2023

k8s-triage-robot commented Jan 26, 2024

bzhang-liveperson commented Jan 26, 2024

k8s-triage-robot commented Apr 25, 2024

objnf-dev commented Apr 27, 2024

k8s-triage-robot commented Jul 26, 2024

k8s-triage-robot commented Aug 25, 2024

jpbetz commented Jan 13, 2023 •

edited

Loading