[BUG] Should we restart all pods to resync cache after vapp was updated ? #11

Eikykun · 2022-01-19T08:52:06Z

What happened:
If vapp specHash changed, the proxy container will block leader election to restart main container.
But only leader pod restarted, other proxy container always wait for main container to restart.

Why it happened:
In controller-runtime LeaderElector, it has 2 loops to run the leader election:

acquire() lock
renew() lock

Only 2 will panic after catch an error.
Leader pod in loop 2, but other pods in loop 1.

What you expected to happen:
My expectation is to restart all pods after vapp specHash changed...

The text was updated successfully, but these errors were encountered:

FillZpp · 2022-01-20T06:48:27Z

@Eikykun Yeah, that would be a serious problem. Maybe a tricky solution:

Return a StatusNotFound for leaderelection Get.
Return a mock success for leaderelection Create, but we don't really send it to apiserver. At this moment, this controller will take for it has been leader.
Return StatusNotAcceptable for next Get, so that controller will fail to renew and exit.

The question is controller will be leader for 1~2s in this way. I'm not sure if it is acceptable or it there some better solutions?

Eikykun · 2022-01-20T09:50:11Z

@Eikykun Yeah, that would be a serious problem. Maybe a tricky solution:

Return a StatusNotFound for leaderelection Get.

Return a mock success for leaderelection Create, but we don't really send it to apiserver. At this moment, this controller will take for it has been leader.

Return StatusNotAcceptable for next Get, so that controller will fail to renew and exit.

The question is controller will be leader for 1~2s in this way. I'm not sure if it is acceptable or it there some better solutions?

Only one pod is allowed to be leader for safety...
But all pods will be leaders in this scheme？Maybe we need to design a process to restart pods sequentially.

FillZpp · 2022-01-20T11:27:12Z

I'm just thinking... How about trigger the controller container restart by specific liveness probe?

When ctrlmesh webhook injects ctrlmesh-init and ctrlmesh-proxy containers into pods, it can also set the liveness probe for the original controller container. For example it checks a file in shared volume, then ctrlmesh-proxy can trigger the container to be restarted by deleting the file...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Should we restart all pods to resync cache after vapp was updated ? #11

[BUG] Should we restart all pods to resync cache after vapp was updated ? #11

Eikykun commented Jan 19, 2022 •

edited

Loading

FillZpp commented Jan 20, 2022

Eikykun commented Jan 20, 2022 •

edited

Loading

FillZpp commented Jan 20, 2022

[BUG] Should we restart all pods to resync cache after vapp was updated ? #11

[BUG] Should we restart all pods to resync cache after vapp was updated ? #11

Comments

Eikykun commented Jan 19, 2022 • edited Loading

FillZpp commented Jan 20, 2022

Eikykun commented Jan 20, 2022 • edited Loading

FillZpp commented Jan 20, 2022

Eikykun commented Jan 19, 2022 •

edited

Loading

Eikykun commented Jan 20, 2022 •

edited

Loading