[CLOUD-2262][EAP7-1192] txn propagation over ejb remoting on OpenShift #175

ochaloup · 2019-02-14T15:38:16Z

https://issues.jboss.org/browse/EAP7-1192
https://issues.jboss.org/browse/CLOUD-2262

The design document for the EAP7-1192/CLOUD-2262. It needs to be review by @tadamski and it could be good to be checked by @fmarchioni

ochaloup · 2019-02-18T14:53:43Z

Just updating the status: Tomek reviewed the PR and I did some small adjustment in the text. Please @fmarchioni let us know what you think from the QA perspective. Thank you.

fmarchioni · 2019-02-18T16:04:55Z

Thanks @ochaloup I have gone through the document.
In terms of QA, we have some tests for subordinate transactions, covering both JMS and JPA combinations (TxPropagationJMSCrashRecoveryTestCase, TxPropagationJPACrashRecoveryTestCase), however the testsuite is based on a combination of instruments to control the start/stop of nodes (Arquillian Container API) when entering precise methods (Byteman). As I understand, this logic will be rewritten using a combination of the XTF API and Byteman to halt the server at specific points.
Regarding the doc I have some questions:

What storage should we test it with ? (file system/ JDBC Storage/ both)
Regarding the transaction stickiness issue, I assume this boils down to using "DNS_PING" as protocol to use the headless Service (ClusterIP=None), right? Do we need any change in the EJB remoting client so that it is then capable to connect to one of them particularly and guarantee stickiness (SFSB) or proper load balancing (SLSB) ? In other words, once configured as described in 5.7.1.2. Configuring DNS_PING it will work out of the box?

ochaloup · 2019-02-19T11:24:56Z

Hi @fmarchioni ,

if you don't mind I will put my answers to points

testcases - yes, it would be good to rewrite/enhance the testcases which are part of the crashrec testsuite (TxPropagationJMSCrashRecoveryTestCase) for the OpenShift
the scope of the testing in TxPropagationJMSCrashRecoveryTestCase is quite extensive. Here we aim rather for basic crash scenarios (like crash at buisness, before prepare is finished, after prepare is finished). We do no expect all fine grained scenarios which are part of the crashrec.
the scenarios should be enahnance with OpenShift specific scenarios - like what happens on rescheduling the pod etc. I summarized the testcases which would be nice to cover somehow at https://docs.google.com/document/d/1BbkjjCPWea7hQJgYPRRIvPKFpGyQPfAm4rBBFj4Eijg/edit#bookmark=id.a8zmvdx04xhm
@fmarchioni please, what are your expectation about testing this? Do you have already some idea what could be part of test plan from your perspective?
It's really good point about the JDBC storage vs. file system. I need to update the text. We currently talk only about the file system storage. The issue of the JDBC storage is that no all data are stored in narayana object store and as the possible next feature request we should provide a way to store all data narayana object store or have a way to configure WFTC to store data in database. The feature request is relevant especially as OpenShift Online runs with the JDBC storages, if I understand correctly.
Yes. You are right. Headless service and DNS_PING (we currently use in our checks jgroups ping and kube ping but we should be using dns ping I expect). The changes in the EJB remoting would not be necessary if we stay with configuration of remote outbound connection. Then only configuration changes will be hopefully necessary.
If we want to use the programatic manner of connecting to the remote server then yes there will be needed changes in EJB.
@tadamski would you be so kind and clarify what is the expected scope here? Do we expect to use EJB programatic approach (possibly the https://issues.jboss.org/browse/JBEAP-16149)?
In other words configuration changes are expected to be needed on top of the dns ping but that will be hopefully only configuration changes.

tadamski · 2019-02-19T12:28:40Z

@fmarchioni @ochaloup EJB-client programmatic configuration should work. As JBEAP-16149 is still in progress, for now, we have to provide the AS with default Elytron configuration that provides appropriate rules for all JTA invocations. The stateful-set template that we are using to deploy services to OpenShift uses that configuration out of the box currently. So to sum up, each configuration should work - if it doesn't we have to analyze it.

ochaloup · 2019-02-19T13:16:16Z

@tadamski ok, I see, I thought the EJB-client programmatic configuration does not work with txn recovery... it should be probably verified.

tadamski · 2019-02-19T13:41:14Z

@fmarchioni we should concentrate of filesystem object-store to make sure that stateful-set persistent storage guarantees are being held

@ochaloup I agree, but we imo we should treat all not working configs as bugs and if they are detected they should be jirified and fixed; my current understanding is that we have workarounds for the known bugs

bstansberry · 2019-02-21T19:33:47Z

@ochaloup I didn't realize this had been submitted and had a link to your branch from a meeting today so i commented there. If I get a chance I'll try and make the same comments here. You can see the comments here but the context of what I was commenting on is lost.

fmarchioni · 2019-02-22T09:34:07Z

Thanks @ochaloup and @bstansberry for your comments.
As discussed yesterday on IRC with @ochaloup, it would be good to know if the current solution design (based on Stateful Sets and DNS_PING) covers also other possible failures, such as Host failure. In this case, for example, can the transaction be recovered on another Node (if any), bound to another Host?
I agree with @bstansberry the supported scenario should be more clearly defined otherwise we risk to end up in a "grey area", where customers' expectations are fully bound to Java EE specifications no matter what the execution environment is.
Adding @tadamski in the loop.

ochaloup · 2019-02-24T09:32:06Z

@bstansberry sorry, for misguiding you. The comments should be available here ochaloup@4e7a523#commitcomment-32415797

I updated the PR based on your comments. Just I need to rewrite the section about hard requirements.

ochaloup · 2019-02-24T09:35:47Z

@fmarchioni yes, I will update the requirements to be clearer. Just when I will be back from PTO.

In this case, for example, can the transaction be recovered on another Node (if any), bound to another Host?

Yes, it should be and we expect it works. Just is still under investigating if this functionality is really out of the box functionality or some changes in remoting/clustering/transactions are needed.
The reason is that for starefulset the hostname is stable only IP address is changed.

bstansberry · 2019-04-11T22:07:24Z

openshift/CLOUD-2262.adoc

+* distributed transactional operations among those applications should be fully supported
+* transactions should recover properly if the transaction is interrupted
+* users would create the applications using provided template, which would hide the complexity associated with operation in cloud
+* users would be able to configure connections between applications by configuring remoting subsystem in 'standalone-openshift.xml'


Is the requirement here that the user provides their own standalone-openshift.xml, or is there a requirement that our images will be able to add necessary elements to the standard standalone-openshift.xml prior to server launch, i.e. by reading env vars and adding management resources?

I think than in tech-preview (CD17) we would like it to be configured by standalone-openshift.xml modification. The target solution should decouple user from OpenShift internals and create the config based on provided data (maybe from template). I think we can implement it as an extension after we are sure that the core functionality is working correctly.

@tadamski So "configured by standalone-openshift.xml modification" means the user provides their own standalone-openshift.xml?

If so, +1. I don't really want to see significant new work on us doing specialized xml config customization on the user's behalf in the WF 17 / WF 18 time frame. There's a lot of work going on related to using Galleon in s2i and we don't have bandwidth to take on a other forms of config customization.

@bstansberry Yes - users have to provide their own standalone-openshift.xml. I'm also in favor of that solution.

Minor point -- it would be good to clarify that a bit in the doc so there's no need to refer to these comments if there's ever any question about this in the future.

TBH I needed to research the code a bit more. I'm working on draft implementation currently and will update the document this week according to what I have learned.

bstansberry · 2019-04-14T17:51:03Z

This LGTM.

@jmesnil @jfdenise FYI, as this conceptually relates to the WildFly Operator
(in that the Operator would want to handle equivalent setups in some fashion
) and, less directly/clearly, to future s2i, as it involves the user bringing their own standalone-openshift.xml (which must be supported for many uses case beyond this, including basic backward compatibility) and perhaps some day wanting to do other config manipulation.

simkam · 2019-05-02T11:37:24Z

@tadamski @ochaloup as discussed on meeting. Can you please clarify scope of this?

users would create the applications using provided template, which would hide the complexity associated with operation in cloud

This means new template in https://github.com/jboss-container-images/jboss-eap-7-openshift-image/tree/eap72-dev/templates, right?

users would be able to configure connections between applications by configuring remoting subsystem in 'standalone-openshift.xml'

As already mentioned by @bstansberry. It would be good to clarify that a bit in the doc.

Are we going to provide some example of 'standalone-openshift.xml'?
Are we going to just document what configuration changes users need to do? Where to document it upstream?
- WildFly documentation?

Also I would like to see mentioned why it should be technical preview and what is missing for full support.

something like users bringing their own 'standalone-openshift.xml' in TP vs our images able to create configuration based on env vars.

ochaloup · 2019-08-12T14:40:51Z

Updated with definition of operator being requirements for the scale-down recovery functionality.

mnovak1 · 2019-08-14T07:25:02Z

@tadamski @ochaloup I have one more question this RFE, does only recovery for EJB remoting on Openshift start to be supported? Does this RFE include all the scenarios which migration pod handled as well?

Also by implementing this RFE, migration pod must not be used, right? I think it should be mentioned as well.

ochaloup · 2019-08-19T10:25:54Z

@mnovak1 yes, correct.

The RFE is considered to include all scenarios which migration pod handled.
The migration pod can't be used together with the operator.
** I updated the text of hard requirements

I don't understand what you mean by 'only recovery for EJB remoting on OpenShift start'? Only EJB remoting is supported - not JTS. One node recovery is supported as well (transaction is not propagated to the different node). Something more which I should add?

mnovak1 · 2019-08-19T10:35:30Z

@ochaloup thanks

I don't understand what you mean by 'only recovery for EJB remoting on OpenShift start'? Only EJB
remoting is supported - not JTS. One node recovery is supported as well (transaction is not propagated to the different node). Something more which I should add?

Sorry, I meant XTS transactions - distributed XA transactions.

ochaloup · 2019-08-19T11:39:47Z

@mnovak1 I will try to clarify this. Let me know if there is something I should elaborate on.

The scope of the RFE is only EJB remoting for distributed transactions[1]. But the XA transactions are supported as part of this RFE implicitly too. The XA transaction is any transaction with multiple resources driven by a transaction manager. The XA transaction can be then distributed over the EJB call to a different transaction manager which plays a role of a subordinate transaction manager and is driven by the top level one (the one which started the transaction).

The XTS (SOAP WebService transactions) is not in the scope of this RFE. Neither RTS (atomic RESTtransactions, probably not so interesting as WFLY provides it still as a dev preview), nor JTS (transactions over IIOP with CORBA interfaces).

[1] WildFly/Narayana.io documentation differs the distributed transactions (a transaction context passed amongst transaction managers) and XA transactions (a transaction manager which manages multiple remote/distributed resources/participants).
Be aware that some resources refers to the XA transaction as a distributed transaction on the internet.

mnovak1 · 2019-08-20T05:51:46Z

@ochaloup Thanks and sorry for confusion. I wanted to be clarified that XTS and JTS (transactions over IIOP with CORBA interfaces) are still not supported by this RFE.

…d requirement, adding points to the implementation plan

… WildFly operator is requirements for the safe recovery

ochaloup · 2019-10-21T13:32:59Z

@bstansberry do you think this could be merged now?

ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch 5 times, most recently from 8c8334c to 12a627d Compare February 18, 2019 14:52

ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch from 12a627d to 4e7a523 Compare February 19, 2019 11:27

ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch from 4e7a523 to 666da3f Compare February 24, 2019 09:31

bstansberry reviewed Apr 11, 2019

View reviewed changes

ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch 2 times, most recently from a0bd991 to 770a7e7 Compare June 5, 2019 14:39

ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch from 770a7e7 to ed0dda8 Compare August 12, 2019 13:35

[CLOUD-2262][EAP7-1192] txn propagation over ejb remoting on OpenShift

ff66260

tadamski and others added 6 commits October 3, 2019 10:49

[CLOUD-2262][EAP7-1192] Hard requirements

885c63c

[CLOUD-2262][EAP7-1192] update QE contact

e871c24

[CLOUD-2262][EAP7-1192] automatic scale down functionality is the har…

b1244e6

…d requirement, adding points to the implementation plan

[CLOUD-2262][EAP7-1192] scale down functionality is done by operator,…

a69de81

… WildFly operator is requirements for the safe recovery

specify that migration pod is not supported to work with operator

3370c1b

adding info on heuristics outcome

e482dbb

ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch from 26bfd02 to e482dbb Compare October 3, 2019 09:14

simkam approved these changes Oct 11, 2019

View reviewed changes

bstansberry merged commit dc6a36c into wildfly:master Nov 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLOUD-2262][EAP7-1192] txn propagation over ejb remoting on OpenShift #175

[CLOUD-2262][EAP7-1192] txn propagation over ejb remoting on OpenShift #175

ochaloup commented Feb 14, 2019 •

edited

Loading

ochaloup commented Feb 18, 2019 •

edited

Loading

fmarchioni commented Feb 18, 2019

ochaloup commented Feb 19, 2019

tadamski commented Feb 19, 2019

ochaloup commented Feb 19, 2019

tadamski commented Feb 19, 2019 •

edited

Loading

bstansberry commented Feb 21, 2019

fmarchioni commented Feb 22, 2019 •

edited

Loading

ochaloup commented Feb 24, 2019

ochaloup commented Feb 24, 2019

bstansberry Apr 11, 2019

tadamski Apr 11, 2019

bstansberry Apr 12, 2019

tadamski Apr 13, 2019

bstansberry Apr 14, 2019

tadamski Apr 15, 2019

bstansberry commented Apr 14, 2019

simkam commented May 2, 2019

ochaloup commented Aug 12, 2019

mnovak1 commented Aug 14, 2019 •

edited

Loading

ochaloup commented Aug 19, 2019 •

edited

Loading

mnovak1 commented Aug 19, 2019

ochaloup commented Aug 19, 2019 •

edited

Loading

mnovak1 commented Aug 20, 2019

ochaloup commented Oct 21, 2019

[CLOUD-2262][EAP7-1192] txn propagation over ejb remoting on OpenShift #175

[CLOUD-2262][EAP7-1192] txn propagation over ejb remoting on OpenShift #175

Conversation

ochaloup commented Feb 14, 2019 • edited Loading

ochaloup commented Feb 18, 2019 • edited Loading

fmarchioni commented Feb 18, 2019

ochaloup commented Feb 19, 2019

tadamski commented Feb 19, 2019

ochaloup commented Feb 19, 2019

tadamski commented Feb 19, 2019 • edited Loading

bstansberry commented Feb 21, 2019

fmarchioni commented Feb 22, 2019 • edited Loading

ochaloup commented Feb 24, 2019

ochaloup commented Feb 24, 2019

bstansberry Apr 11, 2019

Choose a reason for hiding this comment

tadamski Apr 11, 2019

Choose a reason for hiding this comment

bstansberry Apr 12, 2019

Choose a reason for hiding this comment

tadamski Apr 13, 2019

Choose a reason for hiding this comment

bstansberry Apr 14, 2019

Choose a reason for hiding this comment

tadamski Apr 15, 2019

Choose a reason for hiding this comment

bstansberry commented Apr 14, 2019

simkam commented May 2, 2019

ochaloup commented Aug 12, 2019

mnovak1 commented Aug 14, 2019 • edited Loading

ochaloup commented Aug 19, 2019 • edited Loading

mnovak1 commented Aug 19, 2019

ochaloup commented Aug 19, 2019 • edited Loading

mnovak1 commented Aug 20, 2019

ochaloup commented Oct 21, 2019

ochaloup commented Feb 14, 2019 •

edited

Loading

ochaloup commented Feb 18, 2019 •

edited

Loading

tadamski commented Feb 19, 2019 •

edited

Loading

fmarchioni commented Feb 22, 2019 •

edited

Loading

mnovak1 commented Aug 14, 2019 •

edited

Loading

ochaloup commented Aug 19, 2019 •

edited

Loading

ochaloup commented Aug 19, 2019 •

edited

Loading