-
Notifications
You must be signed in to change notification settings - Fork 382
Proposal for Broker relist API #1086
Comments
cc @eriknelson |
I suggest that this not be in scope for the initial implementation of this proposal |
just so I fully grok it, what is the type of |
Just to be clear while the But I definitely need to be able to manually trigger AND have it sync on a regular interval. |
TYPO: |
TYPO: thus -> this |
Does the PUT take any arguments or expect any input? Or is it enough to be a PUT /brokers/sync? |
This would be my question: if Also, is Duration defined as a timeout from last sync, I.E., a manual sync will reset that timer? |
Yes |
While this might be not part of this PR (based on @pmorie comment). Won't this be a separate work item?
Or something like that. |
I'm not sure what you mean here, @jmrodri - can you clarify? |
@pmorie work item #4 states to add the porcelain command. The proposal also mentioned an "optional" parameter which we deemed not part of this PR. But that optional parameter would still need work to add. So it would seem that would need its own work item. Or we can remove that entire sentence from this proposal. |
I'm curious why /brokers/sync is not feasible. What are the blockers for it. Furthermore, I'd like to at least address what happens on error cases and how they are communicated to the user? For example, upon initial broker registration if a broker fails for one or another reason, what should happen here? Are we going to set the status to failed? Same if for example we get malformed response from the broker what should happen here. |
I don't like the word sync as it implies to me two way communication. The platform could be pushing updates to the broker. I know that's not possible with the osbapi. with respect to global frequency and validations, I don't understand why to get rid of the flag and not use it for the default. Agree on describing the error cases. |
There are a few problems with using the cleared status as a signal to update:
|
Alright, so I learned something today about Kubernetes, thanks to @liggitt, and I think we need to reformat some of the particulars around manual syncing (as well as whether we should be using a checksum). More information coming soon. |
I think we need to talk about this in SIG architecture, because we're butting up against a problem other APIs have which is unsolved. Here's the generic formulation of the problem: What is the right API construction to allow:
|
So, there is a field in This field is used to solve the 'have i reconciled this version of this object's spec yet?' problem as follows:
This seems like a much better solution than using a checksum. However, with the Broker resource, there is an additional wrinkle that we need to be able to indicate that a manual reconciliation has been requested. I think that we should go to SIG architecture to discuss this. @liggitt and I both agree that we would bet on this being the ultimate guidance:
I don't think this needs to block work from happening on this subject, and we should continue with the planned design, but we should hold off on the manual sync part until we get some guidance from SIG architecture (next Monday). @liggitt - does this faithfully recreate what we discussed? |
@jmrodri relist, refresh, rereconcile, re-something. |
The updated understanding here, #1086 (comment) & #1086 (comment) matches my understanding of what the appropriate way to do this is. |
I may have been thinking of the |
using something like the Generation field makes sense but I'd like to understand why checksum couldn't be used for the same purpose - meaning, just clear it to force a reconciliation. |
On Aug 7th we discussed this and as part of that we touched on the checksum vs generation issue. Also #1095 was created - however the minutes do not show that we formally agreed, or disagree, with that approach. I suspect we do but I'll add it to an upcoming call just to get final confirmation from the group. |
Checksum is in the status, which users don't have write access to. |
yea, that comment was written before we chatted about Generation on the call. However, does it make sense for Generation to be outside of Status? It seems to me that that property is more of an internally processing thing and letting a user directly touch it seems a bit scary. I think @arschles's idea of having some kind of URL/route/poke-mechanism would be nicer. This way we also hide the specifics of how a reconcile is managed from the user - meaning, we can switch between checksum and Generation, or anything else, at will but the UX remains the same. |
@duglin Generation is in object meta and not user-settable, I wonder if we have our wires crossed. |
ah good, for some reason I thought it was outside of Status and therefore in user-data, didn't think about the meta stuff. |
I have updated the OP to add details on failure handling and error reporting |
I'll be coordinating with @pmorie on an implementation for this. |
Circling back before the discussion in today's meeting. Here's an option which I've alluded to but haven't finished writing up anywhere yet:
Advantages of this approach:
|
I think 'easy for users to understand' is a bit of a matter of opinion. For example in the GCP to reset a VM instance, you do a put to .../reset which resets the machine. |
@vaikas-google Yeah, we could definitely have the subresource bump The only drawback I see of having a subresource that bumps |
Agree on status wiping not being something I'm thrilled about. Just trying to understand why the original proposal of /sync (with possibility of rename to relist, or whatevs.) is not what we should be pursuing :) |
my 2 cents...
|
Sounds like we're converging gradually on a subresource that bumps a |
I think so, my only lingering question is around the newly proposed "generated" flag (or whatever its called), and I'm wondering whether we can use that instead of having to create a new property which would seem to have similar effect on what action we take. For example, could the PUT to /sync tweak the "generated" property or the "observedGenerated" property to get us what we want? Perhaps just zero-ing out the observedGenerated one? I guess it depends on the semantics we want for reconcile() if a reconcile() is already underway. What's kube's normal m.o. on this? |
I still need to finish the write-up for generation... I think that will make things clear. Going to try to do that for meeting today. |
that would be cool - I'd like to have us get that one behind us since I think it touches a lot of places. |
Actually i'm looking through the generation issue and I think I can just answer your questions here @duglin
There are actually two pieces of the generation proposal:
We can't clear the Does that make sense? |
when you say "give the user permission to change the status", wouldn't that only be true if we asked them to touch the field directly? If we modified the field via a |
As of this writing, the original post of this issue is up-to-date and includes specification for |
It's a little strange to me we've kept |
Can we close this now? |
As implemented, this totally ignores the global field set on the CLI. It is also now set as a default in the apiserver, and not from the controller. But not using this value: |
/close |
Proposal: Broker Sync API
This document is a proposal for adding the API surface area and implementation to service-catalog for adding the to sync an individual broker server's resources.
It is a formalization of issue #705, which introduces most of these ideas.
Problem
There exists a need in the service-catalog system to keep the Kubernetes resources (i.e.
ServiceClass
) in sync with the broker servers that the system knows about (viaBroker
resources). This proposal calls this action "syncing".As of this writing (8/1/2017), the service-catalog controller manager polls all broker servers on the same hard-coded time duration to keep resources up to date.
We need to add the ability for a user to sync a specific, individual broker via two methods:
High Level Solution
We propose the following additions to
Broker
resources:spec.syncBehavior
- eitherDuration
orManual
Duration
is specified, thespec.syncDuration
field may be set (see below)Manual
is specified, the broker'ssync
subresource (i.e/brokers/sync
, a sibling to the current/brokers/status
) will be used to trigger any future syncDuration
is the defaultspec.syncDuration
- the frequency, expressed in1m5s
format, by which the controller re-syncs with the broker server. If this field is omitted, the default is15m
. This field is only valid ifspec.syncBehavior
is set toDuration
spec.relistRequests
- a strictly-increasing integer counter that can be manually incremented by a user to manually trigger a re-syncsync
subresource (i.e. similar to/brokers/status
) - When this subresource is requested with aPUT
, the referenced broker'sspec.relistRequests
field will be incremented, causing a re-sync (see the previous item)Additional Details
Duration
SyncIf a
Broker
resource has aspec.syncBehavior
set toDuration
, an operator may still trigger a manual re-sync. Regardless of how a re-sync was executed, the controller will always measureDuration
based on the time since the last successful re-sync.Sync Frequency Flag
The controller currently has a command line flag to specify the global sync frequency (as a time duration). This flag should be deprecated and ignored by the controller now, and removed before our beta release. We may optionally add a new flag to specify the default value to use when
spec.syncDuration
is missing but required.Porcelain Command (
kubectl
plugin)Additionally, since this proposal adds a
/brokers/sync
subresource, we will need to add a new "porcelain" command, implemented as akubectl
plugin (plugin work is ongoing in PR #840), that exposes the PUT call on thus subresource.Failure Handling
Currently, when the controller encounters a failure, it begins trying to re-sync on an exponential backoff. This behavior should continue regardless of whether it is in
Duration
orManual
sync mode.Error Reporting
Currently, when the controller does an initial or re-sync operation, and it encounters an error, it writes an error message to the
Broker
resource's conditions field. It should continue to do so regardless of manual or automatic resyncImplementation Notes
This section contains details on how we would go about implementing this specific proposal. It can be amended to accommodate changes to the proposal.
Validations
The following validations should be added to the service-catalog API server:
spec.syncBehavior
must be eitherDuration
orManual
spec.syncDuration
must not be set ifspec.syncBehavior
is set toManual
spec.syncDuration
must be a valid value if it is set. Note that the Gotime.ParseDuration
function is suitable for use in checking and parsing this valuespec.relistRequests
must be greater than the value it was priorTesting
spec.syncBehavior
defaults toDuration
andspec.syncDuration
defaults to15m
.Work Items
We believe that the tasks represented to implement this proposal cleanly separate into the following pull requests (PRs) in order:
spec.syncBehavior = Duration
andspec.syncDuration = 15m
/brokers/sync
subresource won't be implemented at this point, butspec.syncBehavior = Manual
will still be possible because the controller still will pay attention tospec.relistRequests
and users can still manually increasespec.relistRequests
/brokers/sync
subresource. This makesspec.syncBehavior = Manual
possible, but not convenient because the "porcelain CLI command" ('kubectl` plugin) won't yet be availablespec.syncBehavior = Manual
easy to useRequesting Reviews From
The text was updated successfully, but these errors were encountered: