Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change service key to allow container services to always match correctly #4043

Merged
merged 7 commits into from
Sep 6, 2022
Merged

Change service key to allow container services to always match correctly #4043

merged 7 commits into from
Sep 6, 2022

Conversation

ukclivecox
Copy link
Contributor

@ukclivecox ukclivecox commented Apr 8, 2022

What this PR does / why we need it:

Services created for each node in the inference graph were using the same label key which means only 1 per pod would be active. This is ok for when the service orchestrator is in same pod as it would not use the services but use localhost directly.

The change is to create a label per node name so services are always correctly finding their pods.

  • Adds unique service label
  • Adds an example notebook with tests for various tranform-model-transform flows.

Which issue(s) this PR fixes:

Fixes #4036
Fixes #4302

Special notes for your reviewer:

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ukclivecox ukclivecox requested a review from axsaucedo April 8, 2022 16:18
@ukclivecox
Copy link
Contributor Author

/test integration

@ukclivecox
Copy link
Contributor Author

/test notebooks

@ukclivecox
Copy link
Contributor Author

/test integration

@ukclivecox
Copy link
Contributor Author

/test notebooks

@ukclivecox
Copy link
Contributor Author

/test integration

@ukclivecox
Copy link
Contributor Author

/test notebooks

@axsaucedo
Copy link
Contributor

axsaucedo commented Apr 19, 2022

It seems all integration tests passed (not sure why it's marked as failed as here it says pass) and only 1 notebook test failed (which is often flaky)

Edit: ok it seems the parallel tests are failing for the integration tests

@axsaucedo
Copy link
Contributor

/test integration

@axsaucedo
Copy link
Contributor

/test notebooks

@axsaucedo
Copy link
Contributor

Ok it seems like the operator upgrade tests are still failing and the tracing test is still failing, so it may be an issue - rerunning to confirm

@axsaucedo
Copy link
Contributor

/test integration

@axsaucedo
Copy link
Contributor

/test notebooks

@axsaucedo
Copy link
Contributor

/test integration

@axsaucedo
Copy link
Contributor

/test notebooks

1 similar comment
@axsaucedo
Copy link
Contributor

/test notebooks

@axsaucedo
Copy link
Contributor

@cliveseldon I've been testing this locally, it seems like all works well as the svcorch model does work in 1.14.0-dev. I am finding some strange behaviour, but it's not clear it's form this PR nor whether it's only me - namely I am running some tests and I'm finding some strange behaviour, I'm currently testing in one of Clive's branches, but when i run the helm upgrade it doesn't trigger a model container bounce for the upgrade from 1.13.1 -> 1.14.0-dev for some strange reason (but it does for 1.12.0->1.13.1 as well as for 1.12.0->1.14.0-dev), is this behaviour consistent for you as well?

@seldondev seldondev removed the size/L label Aug 27, 2022
@ukclivecox
Copy link
Contributor Author

/test notebooks

@ukclivecox
Copy link
Contributor Author

Screenshot_2022-08-27_15-37-02

@ukclivecox
Copy link
Contributor Author

/test integration

1 similar comment
@axsaucedo
Copy link
Contributor

/test integration

@axsaucedo
Copy link
Contributor

Seems only flaky test is the rolling upgrade from 1.14.0, from discussion it's expected the rolling updates to potentially fail so we should be good to merge, re-running to validate flakiness on this specific test

/test integration

@seldondev
Copy link
Collaborator

@cliveseldon: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
integration 3c48938 link /test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.

@axsaucedo
Copy link
Contributor

It seems it was indeed flaky, as now the failed one was the test_label_update[1.13.1] - looks good to merge
/approve

@axsaucedo axsaucedo merged commit f5c3a29 into SeldonIO:master Sep 6, 2022
@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: axsaucedo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants