Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLOUD-2262][EAP7-1192] txn propagation over ejb remoting on OpenShift #175

Merged

Conversation

ochaloup
Copy link
Contributor

@ochaloup ochaloup commented Feb 14, 2019

https://issues.jboss.org/browse/EAP7-1192
https://issues.jboss.org/browse/CLOUD-2262

The design document for the EAP7-1192/CLOUD-2262. It needs to be review by @tadamski and it could be good to be checked by @fmarchioni

@ochaloup ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch 5 times, most recently from 8c8334c to 12a627d Compare February 18, 2019 14:52
@ochaloup
Copy link
Contributor Author

ochaloup commented Feb 18, 2019

Just updating the status: Tomek reviewed the PR and I did some small adjustment in the text. Please @fmarchioni let us know what you think from the QA perspective. Thank you.

@fmarchioni
Copy link

Thanks @ochaloup I have gone through the document.
In terms of QA, we have some tests for subordinate transactions, covering both JMS and JPA combinations (TxPropagationJMSCrashRecoveryTestCase, TxPropagationJPACrashRecoveryTestCase), however the testsuite is based on a combination of instruments to control the start/stop of nodes (Arquillian Container API) when entering precise methods (Byteman). As I understand, this logic will be rewritten using a combination of the XTF API and Byteman to halt the server at specific points.
Regarding the doc I have some questions:

  1. What storage should we test it with ? (file system/ JDBC Storage/ both)
  2. Regarding the transaction stickiness issue, I assume this boils down to using "DNS_PING" as protocol to use the headless Service (ClusterIP=None), right? Do we need any change in the EJB remoting client so that it is then capable to connect to one of them particularly and guarantee stickiness (SFSB) or proper load balancing (SLSB) ? In other words, once configured as described in 5.7.1.2. Configuring DNS_PING it will work out of the box?

@ochaloup
Copy link
Contributor Author

Hi @fmarchioni ,

if you don't mind I will put my answers to points

  • testcases - yes, it would be good to rewrite/enhance the testcases which are part of the crashrec testsuite (TxPropagationJMSCrashRecoveryTestCase) for the OpenShift
  • the scope of the testing in TxPropagationJMSCrashRecoveryTestCase is quite extensive. Here we aim rather for basic crash scenarios (like crash at buisness, before prepare is finished, after prepare is finished). We do no expect all fine grained scenarios which are part of the crashrec.
  • the scenarios should be enahnance with OpenShift specific scenarios - like what happens on rescheduling the pod etc. I summarized the testcases which would be nice to cover somehow at https://docs.google.com/document/d/1BbkjjCPWea7hQJgYPRRIvPKFpGyQPfAm4rBBFj4Eijg/edit#bookmark=id.a8zmvdx04xhm
    @fmarchioni please, what are your expectation about testing this? Do you have already some idea what could be part of test plan from your perspective?
  • It's really good point about the JDBC storage vs. file system. I need to update the text. We currently talk only about the file system storage. The issue of the JDBC storage is that no all data are stored in narayana object store and as the possible next feature request we should provide a way to store all data narayana object store or have a way to configure WFTC to store data in database. The feature request is relevant especially as OpenShift Online runs with the JDBC storages, if I understand correctly.
  • Yes. You are right. Headless service and DNS_PING (we currently use in our checks jgroups ping and kube ping but we should be using dns ping I expect). The changes in the EJB remoting would not be necessary if we stay with configuration of remote outbound connection. Then only configuration changes will be hopefully necessary.
    If we want to use the programatic manner of connecting to the remote server then yes there will be needed changes in EJB.
    @tadamski would you be so kind and clarify what is the expected scope here? Do we expect to use EJB programatic approach (possibly the https://issues.jboss.org/browse/JBEAP-16149)?
  • In other words configuration changes are expected to be needed on top of the dns ping but that will be hopefully only configuration changes.

@ochaloup ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch from 12a627d to 4e7a523 Compare February 19, 2019 11:27
@tadamski
Copy link
Contributor

@fmarchioni @ochaloup EJB-client programmatic configuration should work. As JBEAP-16149 is still in progress, for now, we have to provide the AS with default Elytron configuration that provides appropriate rules for all JTA invocations. The stateful-set template that we are using to deploy services to OpenShift uses that configuration out of the box currently. So to sum up, each configuration should work - if it doesn't we have to analyze it.

@ochaloup
Copy link
Contributor Author

@tadamski ok, I see, I thought the EJB-client programmatic configuration does not work with txn recovery... it should be probably verified.

@tadamski
Copy link
Contributor

tadamski commented Feb 19, 2019

@fmarchioni we should concentrate of filesystem object-store to make sure that stateful-set persistent storage guarantees are being held

@ochaloup I agree, but we imo we should treat all not working configs as bugs and if they are detected they should be jirified and fixed; my current understanding is that we have workarounds for the known bugs

@bstansberry
Copy link
Contributor

@ochaloup I didn't realize this had been submitted and had a link to your branch from a meeting today so i commented there. If I get a chance I'll try and make the same comments here. You can see the comments here but the context of what I was commenting on is lost.

@fmarchioni
Copy link

fmarchioni commented Feb 22, 2019

Thanks @ochaloup and @bstansberry for your comments.
As discussed yesterday on IRC with @ochaloup, it would be good to know if the current solution design (based on Stateful Sets and DNS_PING) covers also other possible failures, such as Host failure. In this case, for example, can the transaction be recovered on another Node (if any), bound to another Host?
I agree with @bstansberry the supported scenario should be more clearly defined otherwise we risk to end up in a "grey area", where customers' expectations are fully bound to Java EE specifications no matter what the execution environment is.
Adding @tadamski in the loop.

@ochaloup ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch from 4e7a523 to 666da3f Compare February 24, 2019 09:31
@ochaloup
Copy link
Contributor Author

@bstansberry sorry, for misguiding you. The comments should be available here ochaloup@4e7a523#commitcomment-32415797

I updated the PR based on your comments. Just I need to rewrite the section about hard requirements.

@ochaloup
Copy link
Contributor Author

@fmarchioni yes, I will update the requirements to be clearer. Just when I will be back from PTO.

In this case, for example, can the transaction be recovered on another Node (if any), bound to another Host?

Yes, it should be and we expect it works. Just is still under investigating if this functionality is really out of the box functionality or some changes in remoting/clustering/transactions are needed.
The reason is that for starefulset the hostname is stable only IP address is changed.

* distributed transactional operations among those applications should be fully supported
* transactions should recover properly if the transaction is interrupted
* users would create the applications using provided template, which would hide the complexity associated with operation in cloud
* users would be able to configure connections between applications by configuring remoting subsystem in 'standalone-openshift.xml'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the requirement here that the user provides their own standalone-openshift.xml, or is there a requirement that our images will be able to add necessary elements to the standard standalone-openshift.xml prior to server launch, i.e. by reading env vars and adding management resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think than in tech-preview (CD17) we would like it to be configured by standalone-openshift.xml modification. The target solution should decouple user from OpenShift internals and create the config based on provided data (maybe from template). I think we can implement it as an extension after we are sure that the core functionality is working correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tadamski So "configured by standalone-openshift.xml modification" means the user provides their own standalone-openshift.xml?

If so, +1. I don't really want to see significant new work on us doing specialized xml config customization on the user's behalf in the WF 17 / WF 18 time frame. There's a lot of work going on related to using Galleon in s2i and we don't have bandwidth to take on a other forms of config customization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstansberry Yes - users have to provide their own standalone-openshift.xml. I'm also in favor of that solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point -- it would be good to clarify that a bit in the doc so there's no need to refer to these comments if there's ever any question about this in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I needed to research the code a bit more. I'm working on draft implementation currently and will update the document this week according to what I have learned.

@bstansberry
Copy link
Contributor

This LGTM.

@jmesnil @jfdenise FYI, as this conceptually relates to the WildFly Operator
(in that the Operator would want to handle equivalent setups in some fashion
) and, less directly/clearly, to future s2i, as it involves the user bringing their own standalone-openshift.xml (which must be supported for many uses case beyond this, including basic backward compatibility) and perhaps some day wanting to do other config manipulation.

@simkam
Copy link
Contributor

simkam commented May 2, 2019

@tadamski @ochaloup as discussed on meeting. Can you please clarify scope of this?

users would create the applications using provided template, which would hide the complexity associated with operation in cloud

This means new template in https://github.com/jboss-container-images/jboss-eap-7-openshift-image/tree/eap72-dev/templates, right?

users would be able to configure connections between applications by configuring remoting subsystem in 'standalone-openshift.xml'

As already mentioned by @bstansberry. It would be good to clarify that a bit in the doc.

  • Are we going to provide some example of 'standalone-openshift.xml'?
  • Are we going to just document what configuration changes users need to do? Where to document it upstream?
    • WildFly documentation?

Also I would like to see mentioned why it should be technical preview and what is missing for full support.

  • something like users bringing their own 'standalone-openshift.xml' in TP vs our images able to create configuration based on env vars.

@ochaloup ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch 2 times, most recently from a0bd991 to 770a7e7 Compare June 5, 2019 14:39
@ochaloup ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch from 770a7e7 to ed0dda8 Compare August 12, 2019 13:35
@ochaloup
Copy link
Contributor Author

Updated with definition of operator being requirements for the scale-down recovery functionality.

@mnovak1
Copy link

mnovak1 commented Aug 14, 2019

@tadamski @ochaloup I have one more question this RFE, does only recovery for EJB remoting on Openshift start to be supported? Does this RFE include all the scenarios which migration pod handled as well?

Also by implementing this RFE, migration pod must not be used, right? I think it should be mentioned as well.

@ochaloup
Copy link
Contributor Author

ochaloup commented Aug 19, 2019

@mnovak1 yes, correct.

  • The RFE is considered to include all scenarios which migration pod handled.
  • The migration pod can't be used together with the operator.
    ** I updated the text of hard requirements

I don't understand what you mean by 'only recovery for EJB remoting on OpenShift start'? Only EJB remoting is supported - not JTS. One node recovery is supported as well (transaction is not propagated to the different node). Something more which I should add?

@mnovak1
Copy link

mnovak1 commented Aug 19, 2019

@ochaloup thanks

I don't understand what you mean by 'only recovery for EJB remoting on OpenShift start'? Only EJB
remoting is supported - not JTS. One node recovery is supported as well (transaction is not propagated to the different node). Something more which I should add?

Sorry, I meant XTS transactions - distributed XA transactions.

@ochaloup
Copy link
Contributor Author

ochaloup commented Aug 19, 2019

@mnovak1 I will try to clarify this. Let me know if there is something I should elaborate on.

The scope of the RFE is only EJB remoting for distributed transactions[1]. But the XA transactions are supported as part of this RFE implicitly too. The XA transaction is any transaction with multiple resources driven by a transaction manager. The XA transaction can be then distributed over the EJB call to a different transaction manager which plays a role of a subordinate transaction manager and is driven by the top level one (the one which started the transaction).

The XTS (SOAP WebService transactions) is not in the scope of this RFE. Neither RTS (atomic RESTtransactions, probably not so interesting as WFLY provides it still as a dev preview), nor JTS (transactions over IIOP with CORBA interfaces).

[1] WildFly/Narayana.io documentation differs the distributed transactions (a transaction context passed amongst transaction managers) and XA transactions (a transaction manager which manages multiple remote/distributed resources/participants).
Be aware that some resources refers to the XA transaction as a distributed transaction on the internet.

@mnovak1
Copy link

mnovak1 commented Aug 20, 2019

@ochaloup Thanks and sorry for confusion. I wanted to be clarified that XTS and JTS (transactions over IIOP with CORBA interfaces) are still not supported by this RFE.

@ochaloup ochaloup force-pushed the EAP7-1192-openshift-ejb-txn-propagation branch from 26bfd02 to e482dbb Compare October 3, 2019 09:14
@ochaloup
Copy link
Contributor Author

@bstansberry do you think this could be merged now?

@bstansberry bstansberry merged commit dc6a36c into wildfly:master Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants