Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[openshift] add support for recycling StatefulSet #1627

Merged
merged 21 commits into from
Jun 14, 2021

Conversation

maorfr
Copy link
Contributor

@maorfr maorfr commented Jun 10, 2021

part of https://issues.redhat.com/browse/APPSRE-3351

Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

this PR adds support to apply every additional changes to StatefulSets without incurring any downtime.

When we detect a fields immutable error for a StatefulSet:

  1. delete statefulset with cascade false to prevent the pods from being deleted
  2. apply the updated statefulset
  3. delete one pod at a time and wait for the new statefulset to bring it up

@maorfr maorfr requested review from rporres, jmelis and apahim June 10, 2021 08:50
reconcile/openshift_base.py Outdated Show resolved Hide resolved
@apahim
Copy link
Contributor

apahim commented Jun 10, 2021

I don't see tests, still, CI seems happy. Is that expected? Saw some discussion about not reducing the coverage, this clearly reduces it. Are we in a grace period or something?

Copy link
Contributor

@rporres rporres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this with StatefulSets scares the **** out of me, but this approach is cool. The new SS doesn't do anything as it already has the number of pods it needs. Great.

I would add a good chunk of documentation (basically what you have in the description)

Another thing I don't know how to fix is pods getting a long time to get deleted. I remember Telemeter pods taking a long time to get rebooted because they needed to wait a lot to upload something. If that's something that is in the Pod spec, at least we should wait that time before declaring the Pod not ready...

reconcile/utils/oc.py Outdated Show resolved Hide resolved
reconcile/utils/oc.py Show resolved Hide resolved
reconcile/utils/oc.py Show resolved Hide resolved
reconcile/utils/oc.py Outdated Show resolved Hide resolved
@maorfr
Copy link
Contributor Author

maorfr commented Jun 10, 2021

@rporres do note that when we apply the new STS, it is not happy. this is why we need to delete the pods one by one. after each deletion, the new STS will kick in and create the pod (we are just deleting an orphaned one from the previous STS).

about pods that take a long time to be deleted - this is why i use 20 as the max attempts for the pod ready validation. 20 attempts is 210 seconds (1 + 2 + ... + 20) which i hope would be enough. if not, we can either extend the attempts here or add support through the saas file for a configurable number (maybe even use timeout).

@rporres
Copy link
Contributor

rporres commented Jun 10, 2021

do note that when we apply the new STS, it is not happy. this is why we need to delete the pods one by one. after each deletion, the new STS will kick in and create the pod (we are just deleting an orphaned one from the previous STS).

That's my understanding too of what you're trying to achieve here

@maorfr
Copy link
Contributor Author

maorfr commented Jun 10, 2021

@apahim Required test coverage of 19.0% reached. Total coverage: 19.14%

@rporres
Copy link
Contributor

rporres commented Jun 10, 2021

about pods that take a long time to be deleted - this is why i use 20 as the max attempts for the pod ready validation. 20 attempts is 210 seconds (1 + 2 + ... + 20) which i hope would be enough. if not, we can either extend the attempts here or add support through the saas file for a configurable number (maybe even use timeout).

Thanos receive pods have terminationGracePeriodSeconds set to 900 seconds, so our retry time has to take that into account or we will declare pods as not ready while they're behaving as configured.

I don't think retry time should be configurable as it would mean duplicating configuration that the pods already have

What I don't know yet how to achieve is to avoid waiting the terminationGracePeriodSeconds if the pod has already been properly terminated so that we don't have to wait more than we need 🤔

This entry is concise and helpful about the mechanism.

https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace

@apahim
Copy link
Contributor

apahim commented Jun 10, 2021

@apahim Required test coverage of 19.0% reached. Total coverage: 19.14%

oh, so we have some fat to burn, got it. Thanks!

@maorfr
Copy link
Contributor Author

maorfr commented Jun 10, 2021

@rporres i think i gave you a wrong answer. the waiting time is for a new pod to become ready. waiting for the pod to be deleted is a part of the oc delete step, and i believe we get that for free. i mean - oc delete itself will wait for the pod to terminate before returning.

@maorfr
Copy link
Contributor Author

maorfr commented Jun 10, 2021

changed approach a bit to simplify and not not decide between cascading or not.

@rporres
Copy link
Contributor

rporres commented Jun 10, 2021

waiting for the pod to be deleted is a part of the oc delete step, and i believe we get that for free

Yes, I also think you're right. oc waits for the pod to be deleted to return, so I also believe that we get that for free

The other times we could take into account is the readiness/startup probes ones, but I would leave that for another iteration as it can be quite complicated to set up.

Copy link
Contributor

@rporres rporres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two nits, this looks good otherwise :)

reconcile/openshift_base.py Outdated Show resolved Hide resolved
reconcile/openshift_base.py Outdated Show resolved Hide resolved
reconcile/openshift_base.py Outdated Show resolved Hide resolved
reconcile/utils/oc.py Outdated Show resolved Hide resolved
reconcile/utils/oc.py Outdated Show resolved Hide resolved
reconcile/utils/oc.py Show resolved Hide resolved
self.validate_pod_ready(namespace, name)

@retry(max_attempts=20)
def validate_pod_ready(self, namespace, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Methods that check some status and force the program to halt if it doesn't meet some conditions are usually assertions. In English it reads clearer if this is called assert_pod_ready(...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have an old itch from my days as a test automation developer that assertions only happen in tests, and using "assert" in a function name indicates using an assertion?

Copy link
Contributor

@Piojo Piojo Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in my eyes this acts like one: it is called in the middle of some processing and breaks the flow entirely by raising an exception/aborting whatever if some consistency condition isn't met. It is true that this one isn't meant to be disabled with -DNDEBUG.

It was a nit, just that I like it when the naming makes it clear that this is an integrity check and that normal flow of the program can break here.

Copy link
Contributor

@Piojo Piojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By any reasonable standard, apply has become too long. We need to plan to split it.

Then, I am very uncomfortable with the fact that code reductions in other areas are being used as an excuse to not have a single test here. There is a lot of value for future iterations on testing at least get_owned_pods and validate_pod_ready. Simple tests (warning! untested!):

from unittest import TestCase
from unittest.mock import patch, MagicMock

import reconcile.utils.oc as oc

class TestGetOwnedPods(TestCase):
    @mock.patch.object(oc.OC, 'get')
    @mock.patch.object(oc.OC, 'get_obj_root_owner')
    def test_get_owned_pods(self, get_owner, oc_get):

        oc_get.return_value = {
            'items': ['pod1', 'pod2', 'pod3']
        }
        get_owner.side_effect = [
            {
                'kind': 'mykind',
                'metadata': {'name': 'myname'}
            },
            {
                'kind': 'notmykind',
                'metadata': {'name': 'notmyname'},
            }
            {
                'kind': 'mykind',
                'metadata': {'name': 'notmyname'}
            }
        ]

        resource = MagicMock(kind='mykind', name='myname')

        o = OC()
        pods = o.get_owned_pods('namespace', resource)
        self.assertEqual(pods, ['pod1'])

class TestValidatePodReady(TestCase):
    @mock.patch.object(oc.OC, 'get')
    def test_validate_pod_ready_all_good(self, oc_get):
        oc_get.return_value = {
            'status': {
                'containerStatuses': [
                    {
                        'name': 'container1',
                        'ready': True,
                    },
                    {
                        'name': 'container2',
                        'ready': True
                    }
                ]
            }
        }
        o = OC()
        # Bypass the retry stuff
        o.validate_pod_ready.__wrapped__('namespace', 'podname')
        # Feel free to add assertions around the call to oc_get, but
        # they aren't important on this test

    @mock.patch.object(oc.OC, 'get')
    def test_validate_pod_ready_one_missing(self, oc_get):
        oc_get.return_value = {
            'status': {
                'containerStatuses': [
                    {
                        'name': 'container1',
                        'ready': True,
                    },
                    {
                        'name': 'container2',
                        'ready': False
                    }
                ]
            }
        }

        o = OC()
        with self.assertRaises(oc.PodNotReadyError):
            o.validate_pod_ready.__wrapped__('namespace', 'podname')

Simple stuff like this will give the next person opening this file (which may be you in 6 months) a lot of confidence on what they're doing. And they will also speed up your own development iterations massively.

Would you consider these 2 test cases for inclusion?

@maorfr
Copy link
Contributor Author

maorfr commented Jun 13, 2021

added tests! thanks for helping get them started!

about apply's length - can you say exactly what makes it too long?
it does one thing - oc apply, and it does it well (including catching errors and overcoming them), it has very few local variables, it has less than 4 levels of indentation. it does have a lot of arguments, but not too many?

i agree that this section is not pretty:

# skip if namespace does not exist (as it will soon)
# do not skip if this is a cluster scoped integration
if namespace != 'cluster' and not oc.project_exists(namespace):
msg = f"[{cluster}/{namespace}] namespace does not exist (yet)."
if wait_for_namespace:
logging.info(msg + ' waiting...')
wait_for_namespace_exists(oc, namespace)
else:
logging.warning(msg)
return

but otherwise, how would you refactor it?

@maorfr
Copy link
Contributor Author

maorfr commented Jun 14, 2021

in any case, we can discuss this in jira: https://issues.redhat.com/browse/APPSRE-3355

proceeding to merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants