Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downgrade ambassador #510

Merged
merged 3 commits into from
Apr 17, 2019
Merged

downgrade ambassador #510

merged 3 commits into from
Apr 17, 2019

Conversation

ryandawsonuk
Copy link
Contributor

v0.40.2 of ambassador is more stable for us than 0.50.0. Avoids the grpc problems

Small refactor of the bad graphs tests. It does retries to get a status but then doesn't use that status. This refactor removes an extra call and therefore avoids the risk of the extra call failing

@ryandawsonuk ryandawsonuk changed the title get tests running reliably WIP: get tests running reliably Apr 16, 2019
@ryandawsonuk ryandawsonuk changed the title WIP: get tests running reliably get tests running reliably Apr 16, 2019
@ryandawsonuk ryandawsonuk changed the title get tests running reliably WIP: get tests running reliably Apr 16, 2019
@ryandawsonuk
Copy link
Contributor Author

Current status is am able to get the bad graphs tests working by removing:

-       @Scheduled(fixedDelay = 5000)
-    public void watchv1alpha2() throws JsonProcessingException, ApiException, IOException {
-        logger.debug("The time is now {}", dateFormat.format(new Date()));
-        final String version = VERSIONS[0];
-        runWatch(version);
-    }

from cluster-manager/src/main/java/io/seldon/clustermanager/k8s/SeldonDeploymentWatcher.java. The cause of the problems there is actually that when the namemismatch graph is deployed there's a conflict in the attempt to update k8s in the clustermanager. It hits:

019-04-16 12:53:45.042 ERROR 1 --- [ool-16-thread-1] i.s.c.k8s.KubeCRDHandlerImpl             : Failed to update deployment in kubernetes 

io.kubernetes.client.ApiException: Conflict
        at io.kubernetes.client.ApiClient.handleResponse(ApiClient.java:882) ~[client-java-3.0.0.jar!/:na]
        at io.kubernetes.client.ApiClient.execute(ApiClient.java:798) ~[client-java-3.0.0.jar!/:na]
        at io.kubernetes.client.apis.CustomObjectsApi.replaceNamespacedCustomObjectWithHttpInfo(CustomObjectsApi.java:3751) ~[client-java-api-3.0.0.jar!/:na]
        at io.kubernetes.client.apis.CustomObjectsApi.replaceNamespacedCustomObject(CustomObjectsApi.java:3732) ~[client-java-api-3.0.0.jar!/:na]
        at io.seldon.clustermanager.k8s.KubeCRDHandlerImpl.updateSeldonDeploymentStatus(KubeCRDHandlerImpl.java:149) ~[classes!/:0.2.7-SNAPSHOT]
        at io.seldon.clustermanager.k8s.SeldonDeploymentControllerImpl$FailStatusTask.run(SeldonDeploymentControllerImpl.java:83) [classes!/:0.2.7-SNAPSHOT]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 

This seems to be because we're trying to support two different versions of the CRD and k8s is automatically rewriting the resources to the new version.

Removing the watch on v2 resolves the problem for the bad graphs tests but then we get occasional brief 500 errors during the rolling updates tests. Have got the logs for the components during those runs but they don't show any log of the errors. Not clear where the 500s come from.

I believe we're going to go with not introducing a new version of the CRD for now as we believe we don't actually need to since we're only making additions.

@ryandawsonuk ryandawsonuk changed the title WIP: get tests running reliably get tests running reliably Apr 17, 2019
@ryandawsonuk
Copy link
Contributor Author

CRD change is #513 - merging both and will then check master is ok

@ryandawsonuk ryandawsonuk changed the title get tests running reliably downgrade ambassador Apr 17, 2019
@ryandawsonuk ryandawsonuk merged commit 8d33e21 into master Apr 17, 2019
@ryandawsonuk ryandawsonuk deleted the ambassador_and_test_reliability branch April 17, 2019 10:31
agrski pushed a commit that referenced this pull request Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants