-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
475 last operation description #619
475 last operation description #619
Conversation
Changes Unknown when pulling 0f253da on maleck13:475-last-operation-description into ** on openshift:master**. |
@jmrodri I know you are working async bind at the moment. I think this PR should wait and be rebased on your work once it lands. Likely I will need to make some changes to support last operation updates in async bind operations also. |
Related #542 |
0f253da
to
674aebb
Compare
Changes Unknown when pulling 674aebb on maleck13:475-last-operation-description into ** on openshift:master**. |
674aebb
to
3e8ca31
Compare
The broker code is in good shape I think (at least ready for review). Still need help understanding how to test the module |
3e8ca31
to
0175a9a
Compare
Test Case: Build broker image from this branch
Example code for the apb is here https://github.com/maleck13/keycloak-apb/blob/add-last-ops/roles/provision-keycloak-apb/tasks/provision-keycloak.yml#L11 |
4507c4c
to
384da98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Lot's of new tests!
I'm going to look into pulling last_operation info during the $actions. It would be cool to see this running all the time in the gate.
pkg/apb/watch_pod.go
Outdated
} | ||
send(state) | ||
podStatus := pod.Status | ||
log.Debugf("pod [%s] in phase %s", podName, podStatus.Phase) | ||
switch podStatus.Phase { | ||
case apiv1.PodFailed: | ||
if errorPullingImage(podStatus.ContainerStatuses) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want a w.Stop()
here too before we return.
|
||
time.Sleep(time.Duration(apbWatchInterval) * time.Second) | ||
if podEvent.Type == watch.Deleted { | ||
w.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we only get in here if the pod is deleted before either PodSucceeded or PodFailed? In other words, I think you can produce this when you delete the apb during execution. If that's the case, I think we should either return an error or log that the apb was deleted. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think you are correct. In most cases the pod would succeed or fail. So it it was somehow deleted it would likely be an error and it should be reported
|
||
log.Debug("bindjob: returned from apb.Bind") | ||
//read our status updates and send on updated JobMsgs for the subscriber to persist | ||
for su := range stateUpdates { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're blocking on reading the bind status updates is this really async? Do we have to block on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So previously the bind
function itself was synchronous and would block job function. Now we have put that into its own go routine but we still need to wait until the Job is complete within the Run function. So we block on the status updates channel as this is only closed once the Bind job is complete
The Job itself though is started in its own go routine within the work engine and so is still asynchronous
https://github.com/openshift/ansible-service-broker/pull/619/files#diff-bcf626f58af278748816aca2379c1928R53
pkg/broker/unbinding_job.go
Outdated
|
||
go func() { | ||
defer func() { | ||
metrics.UnbindingJobFinished() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be the same general workflow for all the actions. Maybe at some point we can break apart the Run function into multiple functions to make the API more granular. It's something we can tackle down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or potentially abstract it into a single run function as they all effectively do the same thing
@rthallisey I have addressed your requested changes and comments |
@maleck13 can you rebase this to pull in some fixes to travis? |
344cfa1
to
a3291d9
Compare
Looks like this needs a rebase @maleck13. Trying to get through outstanding PRs this week, happy to come back to this when it's green. |
61df7eb
to
4f0301d
Compare
@eriknelson can you have another look at this? |
@maleck13 @rthallisey grabbing lunch and I will review. Thanks guys. |
log.Debugf( | ||
"Watching pod [ %s ] in namespace [ %s ] for completion", | ||
podName, | ||
namespace, | ||
) | ||
|
||
k8scli, err := clients.Kubernetes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we get the v1.PodInterface
in this method?
I think that is better because it now ever caller has to get the client for the only purpose of passing into a function. I think this encapsulates that behaviour and if it errors we bubble up that error.
If the only reason we are doing it is for testing then I think we need to do a better job of mocking out the core componets rather then passing values into functions. I think that we got bite by doing that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want unit tests for a method like this, then at some point we will need to use an interface to allow us to mock the interactions with external dependencies.
Having a package level function used internally as part of this method, means we cannot mock it out.
Really this is just a simple form of dependency injection. The package level functions can make this tricky. I would prefer to have a unit test for the logic here, than not have one? I took a similar approach to the provision function here
https://github.com/maleck13/ansible-service-broker/blob/4f0301dac3dae580cb5f393baafaa70a2d859073/pkg/broker/broker.go#L377
Except it is passed in via the constructor. It would be a bigger refactor to
In this case to have the dependency passed as part of a constructor as it would likely mean changes all the way up the call chain.
I think we need to do a better job of mocking out the core componets rather then passing values into functions
On that point though, longer term it may be worth discussion to change some of the package functions to be struct methods and make use of the kubernetes.Interface
in place of the *client.ClientSet
concrete type.
If we use the kubernetes.Interface
and refactor to allow it to be injected as part of the main method via constructors and dependency injection, it will allow us to have a very powerful set of unit tests as we can control the behavior of the tests by mocking out just the client as generally this is what underpins all of the external communications. This becomes even more true if we move to CRDs instead of a separate etcd.
I can create a follow on issue for this.
@maleck13 Thanks again for your contribution. Had a chance to sit down with @shurley this morning and talk through this one. Overall, it's looking pretty good and it's absolutely a feature we need
The reasoning is that in the near future, we're expecting to break the apb
Happy to continue the discussion on these points, otherwise I'll make sure we stay on top of updates so things this isn't sitting in a PR for too much longer. EDIT: @shawn-hurley raised an important point re: the return channels that requires an amendment to this. Hashing that out and I will update this comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on some discussions.
Background: The issue with 1) is that the channel is not going to be returned to a caller until the function has finished, which suggests the need to wrap the work in a go routine...which requires a channel to allow the caller to synchronize the work. So, this PR has opened a really productive dialogue around what, exactly, we want the public interface for This really isn't in the scope of this PR, so in an effort to bring in your work, it makes sense to merge this, log some issues, and fix them as a follow up. We do still need a rebase. |
@maleck13 please rebase and we're ready to bring this into master. |
fix type in SubscriberDAO after code review
rebase and update the unbind and bind jobs to accept state updates add the sync and async job start to unbind and bind. Minor updates address review feedback add a w.Stop on pod failed and return error on pod deleted
4f0301d
to
1cfb140
Compare
@eriknelson Great excited to have this finally merged. Thanks for your detailed comments. Adding some thoughts of my own here:
It's possible I am restating the problem. I generally agree that it would be nice to return a channel from the Job. I played around with this option early on when working on the feature, but it presented me with some issues:
So I went with what seemed the least disruptive approach that kept the broker design the same.
Interesting, I had actually hoped that change would reduce the complexity. The status updates are of no value here, you are correct, as there is no opportunity for them to be seen. Also happy to chat further or jump on call. |
I actually initially thought we weren't using the subscribers for this, but I was incorrect, so I have to thank you for preserving that. Definitely an architecture we want to keep, especially with a revamped engine. I'm not sure I understand this statement though; regardless of whether or not the caller creates the channel and passes it in, or if the apb package returns the channel, the caller (broker job) is going to have to monitor that channel for messages, wrap them in
I'm personally not too concerned about this. As we continue, the OSB domain is going to get pushed into some kind of a Broker Framework, and the APB domain is getting pushed into a Anyway, we're of the opinion this PR adds quite a bit of value as-is, and most of the feedback is more appropriate for a follow up that's part of the work moving in the direction I described. I'm planning on posting a proposal with some code on top of this for review from the community, since I think I can better explain myself with actual code. Thanks again for your help! |
* add test cases for provision subscriber and job fix type in SubscriberDAO after code review * initial pass at last operation description implementation rebase and update the unbind and bind jobs to accept state updates add the sync and async job start to unbind and bind. Minor updates address review feedback add a w.Stop on pod failed and return error on pod deleted
Describe what this PR does and why we need it:
broker implementation for issue 475 last operation description
Note
This pr is based on top of the previous PR #610 so probably better to review once that one is closed out.
Changes proposed in this pull request
Does this PR depend on another PR (Use this to track when PRs should be merged)
depends-on #610depends-on #671
Which issue this PR fixes (This will close that issue when PR gets merged)
fixes #475