-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify recovery semantics #116
Comments
I think it is clear from 3.2.9 that the responsibility of infrastructure and applications is to be able to satisfy the requirements of the normal LRA life cycle in the presence of failure. |
Note that there is a test in the TCK called acceptTest that tests that participants which return |
@tomjenkinson you are right, I think this issue can be closed as "already fixed" |
@mmusgrov @tomjenkinson |
The LRA_HTTP_RECOVERY_HEADER was added to the Java client API (and was in the original REST part of the spec when the project was in the sandbox) to allow participants to tell the coordinator to contact them on different URLs. This was to support participant recovery. Since we no longer have the REST endpoints and Java Client API it is not so simple to incorporate the same functionality. If we loose the LRA_HTTP_RECOVERY_HEADER in milestone 1.0 then we loose |
BTW even without participant recovery functionality the LRA_HTTP_RECOVERY_HEADER header is still required - it is returned on responses to client requests that result in the enlistment of a participant - it enables the caller and the participant to know that a participant has been enlisted in the LRA. Also note that the header is included when participant callbacks are issued (compensate/complete/etc). I will check that this is clear in the spec. Actually participant recovery functionality is quite easy to keep if we add an @Recover
Response participantHasMoved(
@HeaderParam(LRA_HTTP_RECOVERY_HEADER) String recoveryId) {...}
@Compensate
aMethod(...) {}
etc The implementation sees the presence of the recovery header and replaces the participant (ie uses this new resource to complete/compensate instead of the old one) and puts the new recovery URI in the LRA_HTTP_RECOVERY_HEADER response header so that the caller can move it again if required. This would also fix the requirement I mentioned above about needing something to trigger a recovery attempt (the TCK test called |
+1, except we must not (nor should we need to) be changing the recovery header ID |
My understanding is that business logic iterates outstanding recoveryIds and calls them during the processes recovery sequence. |
@rdebusscher @ochaloup @mstefank and others, I'd like to get agreement on this before Rudy goes on PTO (which I think means by the end of the week). During this weeks hangout the primary objection came from Ondra stating that recovery should not be exposed in the business logic. But this should not be a problem since the method does not need to be exposed to other services because issue #35 can be used to hide it from such users. Martin's objection was that applications should be using service URLs so should never need to change their endpoints. But unless we have good reason we should not restrict which endpoints a service can be contacted on. Note that the reason I have called it recovery and raised the proposal on this github issue is that the feature would typically be used if the participant crashes during the final phase of the protocol (ie when the LRA is in the state completing/compensating) and wishes to come up on a different IP address or port. The fact that participants can also use the feature to trigger replay of the end phase of the protocol on the participant in order to discover the outcome is just a bonus. |
My objection is about giving the user a new annotation. It does not make any difference if the I think the ability to redefine the location for the participant is reasonable. But it would be better not enforcing it with annotation at the participant side. The programmatic API would be a better fit for this. With this proposal the behaviour of the coordinator regarding It could be worthy to consider splitting the capability of the participant location enlistment into two - one for changing location and the other for recovery invocation. |
@mmusgrov @ochaloup @xstefank and all, I have created a scenario using the Participant recovery in this gist https://gist.github.com/rdebusscher/ad914ba528dbe638f926e6bf2fdbd3e9 I cannot see how the LRA gets properly cancelled without specific code in the client (which is not desired) |
In response to Ondra: To your first comment:
The two APIs (programmatic and annotations) are intended to be semantically equivalent. And to the remainder of your comments, we do already do what you are asking for: Your second comment:
is already covered in the specification document
and the LRA javadoc:
The Red Hat implementation exposes this header whenever our coordinator invokes participant callbacks (though I have not said that in the spec yet since we still have this current open issue). Your third comment is invalid:
The LRA specification is focused purely on the annotations and the coordinator is private to our implementation of the spec. All that needs to happen is for the implementation (of the LRA spec) to notice the Recover annotation and check two things 1) that the recovery header is present on the incoming request and 2) that the enclosing resource conforms to our requirements to be a participant (ie has a Compensate method) - it is very similar to how the implementation should enlist a participant (the only difference would be that it should update it To your last comment:
I don't see why that would be necessary: the application is saying "I would like to be asked to compensate at a new location please". Since typically the application uses this recovery behaviour when the LRA is in the state compensating or completing, then after a crash it makes eminent sense for the implementation to replay the termination phase at the new location (it would be surprising if we specified it otherwise). In response to Rudy: I am not altogether clear about your scenario so my response may be off. You say
If service A crashes and comes back up on a different endpoint the method annotated with @recover should be called by something. This will cause the coordinator/implementation to replay the compensation invocation on the participant where the @recover annotation is present. |
I don't know what is behind this intention but I don't see it as a good idea here. If there is a programmatic API then there is no much reason to provide this functionality with the annotation.
I see. When I was reading it I thought it would be good to be clarified but as you summarized that for me I have no more objections.
I do understand. I wrote my comment rather for a consideration and discussion. I understand that this comment nearly touches the interoperability. Here we talk about API not about the message communication protocol.
Ok. It makes sense to me. |
So are there any objections if I go ahead and raise a PR for this so that people can see:
Also I don't think we should be specifying optional behaviour unless there are good reasons to do so. |
The only option I see is that the participant informs the coordinator that it is up (and thus each participant needs some kind of unique ID - the Client Id ?) and that recovery is 'automatic' in the sense it doesn't needs an action from any other system. |
IIRC the idea is that:
@mmusgrov - is that correct? |
@tomjenkinson yes that is how I would imagine it would be used. From a spec implementation point of view the behaviour is very similar to how participants are enlisted, ie the implementation is fully capable of re-registering the participant. It is more burdensome for the application but the presence of the annotation on participants is optional. Note that the implementation has to support the new annotation but it is optional whether or not participants use it. @rdebusscher I think Tom's response addresses your question but I will create a PR to show how I would imagine it works. |
I don't like the idea that each JAX-RS resource needs to be stateful and even store that info on a persistent store so that it can retrieve that info when it is started up on a different server/Location. This means also that some kind of management is needed to know where to store and retrieve this type of data.
This means that the participant needs to be able to contact the Service Registry System to exactly know on which address it is available. Since there is no standard around this, interfaces with all existing and future new Systems needs to be developed/maintained. This is not viable. |
But since you all agree that this seems the only possible way of solving this problem, please go ahead and include it as an OPTIONAL feature. |
I don't think we should be specifying optional behaviour unless there are good reasons to do so. |
If a resource does not want to take advantage of this annotation then that is fine for it to omit it, ie the use of the annotation on a participant is optional. Since it is simple for the LRA spec implementation to implement the new behaviour and since there is no compelling reason to make it optional for the spec it should be mandatory. |
I experimented last night more with this recovery scenario and slightly adjusted the Payara POC to allow participant recovery, even in the case where the service is brought up on another host with another URL. When the participant registers itself at startup of the service with an ID like - the coordinator perfectly know when and where each participant is, also after a crash. So there is no need for the LRA_HTTP_RECOVERY_HEADER or any other annotation within the API. Within the spec we just need to mention that implementation must guarantee that in case of a crash, the compensate and compete calls will occur, also in the case the service is started somewhere else after a crash. |
Sorry for the late entry to this discussion. @rdebusscher I like the idea but the problem is that we don't want to require that there is a coordinator. In routing slip based solution, I don't think the registration on startup is possible to implement. It depends if this should be something to consider as all available implementations now use coordination. On the question of the new annotation I have still not cleared my mind. |
The @recover feature is useful for couple of reasons:
In case 1 the implementation does not need the recovery id since it can work out what the end point is. This new suggestion of a "client id" is changing the spec and we already have a solution for the problem in hand. |
I think this feature could be a useful addition so I am moving it to the next milestone (rather than close it as will not fix). |
Section 3.2.9. Recovery Requirements of the spec document discusses recovery but it is under specified. This issue is to formalise our stance on recovery. The preference is to leave it unspecified since the spec is already clear that the outcome of closing/cancelling an LRA MUST either have an eventual end state otherwise it MUST be logged.
The text was updated successfully, but these errors were encountered: