Skip to content

Commit

Permalink
[CLOUD-2262][EAP7-1192] txn propagation over ejb remoting on OpenShift
Browse files Browse the repository at this point in the history
  • Loading branch information
ochaloup committed Feb 19, 2019
1 parent 346262a commit 4e7a523
Showing 1 changed file with 278 additions and 0 deletions.
278 changes: 278 additions & 0 deletions openshift/CLOUD-2262.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
= Transactions propagation and recovery over EJB remoting on OpenShift
:author: Ondrej Chaloupka
:email: [email protected]
:toc: left
:icons: font
:idprefix:
:idseparator: -
:keywords: openshift,transactions,EJB,remoting,recovery

== Overview

The OpenShift JBoss EAP images
https://access.redhat.com/documentation/en-us/red_hat_jboss_enterprise_application_platform/7.0/html-single/red_hat_jboss_enterprise_application_platform_for_openshift/#unsupported_transaction_recovery[do not support transaction handling]

This comment has been minimized.

Copy link
@bstansberry

bstansberry Feb 21, 2019

The primary reference should be 7.2, not 7.0.

This comment has been minimized.

Copy link
@ochaloup

ochaloup Feb 24, 2019

Author Owner

+1

with subordinate transactions. That means it does not support the case
when JBoss EAP instance calls other JBoss EAP instance over EJB remote call
while transaction context is passed along such call. Currently, in such
scenario, the transaction outcome is arbitrary.

=== The problem outline - bare metal example

Let's step back to define the example scenario. We have two bare-metal installations
of JBoss EAP instances. The first JBoss EAP (naming it as `A`) calls the second one
(naming it as `B`) with EJB remote call. Let's depict it as `A -> B`.

In such case, there is a call from one host with an IP address to another host.
The server `A` starts a transaction (we say top-level transaction is started)
and make the EJB call to `B`. Along with this call, the transaction context is passed.
Passing context means there is a transaction id passed along the EJB remote call.
The receiving side (server `B`) obtains the transaction id and it starts
a child transaction on its side (we call it subordinate transaction).
This subordinate transaction have set up the parent transaction being that top-level
one started at server `A`. The subordinate transaction cannot decide independently
about the transaction outcome. The subordinate transaction is driven by the top-level.

. a client calls EJB at server `A`
. top-level transaction of id `top-level-A` is started
. server `A` calls server `B` with EJB call
. transaction context with id `top-level-A` is propagated to server `B`
. server `B` starts subordinate transaction id `subordinate-B` with parent transaction of id `top-level-A`
. server `B` does some business logic, ends the work and return the call back to `B`
. server `B` waits for being called from server `A`
. server `A` finishes business logic, ends the work and start with finishing the transaction `top-level-A`
** finishing the transaction means starting https://developer.jboss.org/wiki/TwoPhaseCommit2PC[two phase commit]
. server `A` calls prepare on all resources it works with, one of them is the remote call to B with the transaction `subordinate-B`
. server `A` makes the remote `prepare` call to server `B`
. server `B` prepares all the resources it had worked with,
the `subordinate-B` is marked as prepared and the server returns back to `A`
. server `A` declares the `top-level-A` to be prepared
. server `A` makes the remote `commit` call to server `B`
. server `B` commits all resources it had worked with,
the `subordinate-B` is marked as committed and returns back to `A`
. server `A` declares the `top-level-A` to be committed

What happens when a crash of some server happens during the transaction processing?
There is a process called recovery manager which periodically tries to finish
transactions in error state.

Let' say the server `A` had crashed somewhere during the work on finishing the transaction.
Then the `subordinate-B` transaction is sitting idle waiting for being instructed
from server `A`. When server `A` is restarted it loads information about
unfinished transactions. The recovery manager at server `A` makes the remote
call to server `B` to finish transaction `subordinate-B`.

If there is the server `B` which crashes then server `A` periodically tries
to connect to server `B` until the server `B` is restarted (with the same
hostname/IP address it was available before the crash). If `B` is restarted
then it accepts the call and the `subordinate-B` could be finished.

On top of that, the Narayana transaction manager works with the notion of `node identifier`.
The `node identifier` is stored as part of the transaction id in the Narayana object store.
That means that the transaction id `subordinate-B` is capable to provide
information that node `A` is the node which started the top level transaction `top-level-A`
which is its parent transaction.

Narayana uses http://narayana.io/docs/project/index.html#d0e9393[presumed abort strategy].
That simply means if there is no information found in the Narayana object store
about the transaction then all discovered participants are commanded to roll-back.
This for example means that if Narayana has no record about transaction `subordinate-B`
in the object store and after restart of server `A` it finds that there is such
transaction it ask the server `B` to roll-back it.+

This comment has been minimized.

Copy link
@bstansberry

bstansberry Feb 21, 2019

This is a bit unclear to me. Narayana has no record about transaction 'subordinate-B' but then the server restarts and it finds there is such a transaction... how?

This comment has been minimized.

Copy link
@ochaloup

ochaloup Feb 24, 2019

Author Owner

I updated the text with

The server `A` finds about existence of the transaction in the server `B` because
the server `A` continuously asks all the registered remote locations if they
have some transactions to finish. Here the server `A` checks if server `B`
has some in-doubt transaction to finish. Server `B` responses that there is
`subordinate-B`. Server `A` does have no record in the objet store
thus commands the server `B` to roll-back it.

Hopefully it will be clearer now.

It's important to say that record to object store is saved only after successful
prepare (as phase of the two-phase commit procedure) happens.

NOTE: Information about unfinished (in-doubt) transactions
is basically stored at three places - Narayana transaction object store
contains in-doubt transaction info,
the configuration of `standalone.xml` stores definition of remote outbound connection
for the server knows the endpoints to connect to
and unfinished transactions at the side of EJB remoting are presented by
files stored with
https://github.com/wildfly/wildfly-transaction-client/blob/1.1.3.Final/src/main/java/org/wildfly/transaction/client/provider/jboss/FileSystemXAResourceRegistry.java[WildFly Transaction Client].

=== The problem outline - OpenShift deployment

If a user deploys an EAP instance to OpenShift he considers smooth use of horizontal
scaling with OpenShift primitives. That's the JBoss EAP image is deployed
under the OpenShift service and a pod is started for it.
The user expects to be able to scale up and scale down the number of EAP instances
running under the service. By default
https://github.com/jboss-container-images/jboss-eap-7-openshift-image/blob/CD15/templates/eap-cd-basic-s2i.json#L298[the offered configuration for EAP]
is with using the `DeploymentConfig` which is backed by the
https://docs.openshift.com/container-platform/3.9/architecture/core_concepts/deployments.html#deployments-and-deployment-configurations[ReplicationController].

Use of the `ReplicationController` means the `service` provides automatic
load balancing with round robin
when multiple instances of EAP are started under the scope of one service.
The pods started by the `ReplicationController` are provided with floating hostname
which is constructed in a way like `eap-app-<arbitrary-hash>`. When the pod is rescheduled
(for example the node is switched off) the pod is started again but with different
hostname. Then the storage of the node is ephemeral by default. But as we said
the recovery expects the data to survive restarting the pod.

Let's show the issues that the OpenShift brings for the transaction propagation.

Storage volatility::
if the pod crashes there is not ensured that the data stored in the Narayana
object store will be available for the newly started pod
Pod hostname volatility::
if the pod crashes it's started with different hostname which means the server `A`
has an issue to contact the server `B` as `B` is assigned at a different "place"
than it was before the restart
Service calls are not transaction sticky::
the service uses load balancing of requests. If we say we have two server instances
started under each service. Let's say the first instance of server `A` calls
the instance of server `B` with the transaction being propagated. It could happen
that the follow-up call for the `prepare/commit` hits the second instance
of server `B` which has no idea about the existence of such transaction.
Scale-down object store orphanage::
if the user decides to scale-down the number of instances under the particular service
then there could be left unfinished records of in-doubt transactions
in the orphaned object store

=== Solution design

The solution is about to be constructed on OpenShift primitives to provides
a similar environment to bare metal.

The https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/[StatefulSet]
brings storage and hostname stability. The `StatefulSet` starts pods with
stable hostname which is not changed even after pod is restarted/rescheduled.
It ensures the same data storage, as it was before pod restart, will be bound to the restarted pod.
`StatefulSet` "deactivates" the service load balancing capabilities and leaves
the application to manage the balancing on its own. Here the JBoss EAP
clustering abilities will be used to ensures the transaction stickiness.
Handling of data from orphaned object store after scale-down anticipates user manual intervention.
The user has to manually deactivate the pod from receiving traffic,
then let the pod to finish all the unfinished transactions and
and then it can turn-off the pod. If he does not do so he can experience
unfinished blocked transactions in database or JMS brokers.

NOTE: The scela-down issues should be possible automatized with use of the
https://github.com/luksa/statefulset-scaledown-controller[StatefulSet Scale-Down Controller].
The usage of this is currently not expected to be in scope of thie feature
request. \+
The StatefulSet Scale-Down Controller is not the Kubernetes/OpenShift native
object but it's an extension provided to manage this kind of situation. It was
productized and is used by AMQ messaging system to migrate messages from the
orphaned pods. See
https://access.redhat.com/documentation/en-us/red_hat_amq/7.2/html/deploying_amq_broker_on_openshift_container_platform/journal-recovery-broker-ocp[Red Hat AMQ Broker documentation]
and the Jira related to this functionality is
https://issues.jboss.org/browse/ENTMQBR-1859[ENTMQBR-1859].

If we take the individual issues this setup is about to solve them.

* _Storage volatility_ is about to be solved by the fact that `StatefulSet`
guarantees to bind the same storage with same data to the re-started pod
* _Pod hostname volatility_ is about to be solved by `StatefulSet` as the
restarted pod remains with the same hostname as it had before restart
* _Service calls are not transaction sticky_ is about to be solved by using
JBoss EAP clustering. The JBoss EAP instances belonging under one service
will establish cluster. This way the EJB remoting client will query the
https://kubernetes.io/docs/concepts/services-networking/service/#headless-services[headless service].
The `headless service` returns hostnames of all instances under the service.
The EJB remoting client is then capable to connect to one of them particularly
and guarantee stickiness for Stateful beans and for transction calls
or uses the proper load balancing capability if Stateless beans are called. +
When a new EAP instances are started then EJB remoting client is capable to gather
new cluster topology and works based on the new setup.
* _Scale-down object store orphanage_ is about to be solved by
manual user intervation. He can't let the pod being removed from the service
until time all the transactions are processed. Hopefully the functionality
of the graceful shutdown that JBoss EAP provides could be used here.

=== Known related issues

In the current setup, the transaction propagation with recovery works only when
a remote outbound connection is used. Up to that, there are some issues on transaction propagation
over EJB which are related to https://issues.jboss.org/browse/JBEAP-13963[JBEAP-13963].

Up to this the programmatic way for defining the EJB remote call (aka. dynamically
call EJB without the use of remote outbound connection configuration for it)
should be possible. That's tracked as issue https://issues.jboss.org/browse/JBEAP-16149[JBEAP-16149].

This comment has been minimized.

Copy link
@bstansberry

bstansberry Feb 21, 2019

If handling this scenario is a non-requirement, please note it as such in Non-Requirements.

This comment has been minimized.

Copy link
@ochaloup

ochaloup Feb 24, 2019

Author Owner

@bstansberry we are currently working on a workaround that specific configuration could make the programmatic way working. Possibly the customer needs it.
I'm not clear about this. @tadamski, please, do you know if the programmatic way is the non-requirement?


For the issues of the subordinate transactions which was fixed recently
there is https://issues.jboss.org/browse/WFTC-52[WFTC-52] which was causing
OOM on the remote side when EJB remoting with transactions was used.

==== Design details

The current proposal works with notion of the file system store to be used.
That's where we use the StatefulSet and expect it provides us the stable file system storage.
The JBDC storage is not covered as part of this analysis.
The transaction manager is capable to store transaction into database (JDBC object store)
but the WildFly Transaction Client stores data only on filesystem so far.
This should be considered as a new feature request.

This comment has been minimized.

Copy link
@bstansberry

bstansberry Feb 21, 2019

This paragraph is an example of what goes in "Non-Requirements".

This comment has been minimized.

Copy link
@ochaloup

ochaloup Feb 24, 2019

Author Owner

+1, added to the non-requirements


== Issue Metadata

=== Issue

* https://issues.jboss.org/browse/CLOUD-2262[CLOUD-2262]
* https://issues.jboss.org/browse/EAP7-1192[EAP7-1192]

=== Related Issues

* https://issues.jboss.org/browse/JBEAP-13963[JBEAP-13963]
* https://issues.jboss.org/browse/JBEAP-16149[JBEAP-16149]
* https://issues.jboss.org/browse/WFTC-52[WFTC-52]
* https://issues.jboss.org/browse/CLOUD-2261[CLOUD-2261]
* https://issues.jboss.org/browse/CLOUD-2542[CLOUD-2542]


=== Dev Contacts

* mailto:[email protected][Tomasz Adamski]
* mailto:[email protected][Ondra Chaloupka]

=== QE Contacts

=== Testing By

[ ] Engineering

[x] QE

* mailto:[email protected][Francesco Marchioni]

=== Affected Projects or Components

* Narayana (transactions)
* EJB
* Remoting
* Elytron

=== Other Interested Projects

* Clustering

== Requirements

=== Hard Requirements

* EJB remote calls have to work when application is deployed on JBoss EAP on OpenShift

This comment has been minimized.

Copy link
@bstansberry

bstansberry Feb 21, 2019

This is too vague. The supported scenario should be described. There's a lot implied in the (excellent) Overview so you don't have to spell everything out, but details like the remote server must exposes a headless service for these invocations are important.

* The transaction consistency has to be guaranteed for the transactions propagated
over the EJB remote calls

This comment has been minimized.

Copy link
@bstansberry

bstansberry Feb 21, 2019

The mechanisms for delivering this feature need to be described via requirements. The overview describes the architectural pieces (StatefulSets, headless services) but what are we providing to support that architecture?

This comment has been minimized.

Copy link
@bstansberry

bstansberry Feb 21, 2019

Same basic point -- how are the remote outbound connection configurations supplied? User brings their own standalone-openshift.xml? Or is some sort of configuration assistance part of the requirements?

This comment has been minimized.

Copy link
@bstansberry

bstansberry Feb 21, 2019

Related to this, if it's not obvious the requirements should also be clear what we're not changing; e.g. if this is provided via a particular template but none of the other templates are impacted.

This comment has been minimized.

Copy link
@ochaloup

ochaloup Feb 24, 2019

Author Owner

@bstansberry I'm adding it to my TODO list and I will update it but later. I'm going for a week for PTO. Thanks for pointing this out.

=== Nice-to-Have Requirements

=== Non-Requirements

== Developer Resources

* https://docs.google.com/document/d/1BbkjjCPWea7hQJgYPRRIvPKFpGyQPfAm4rBBFj4Eijg/edit?usp=sharing[Distributed transaction support in OpenShift]

//== Implementation Plan
////
Delete if not needed. The intent is if you have a complex feature which can
not be delivered all in one go to suggest the strategy. If your feature falls
into this category, please mention the Release Coordinators on the pull
request so they are aware.
////

== Test Plan

== Community Documentation
////
Generally a feature should have documentation as part of the PR to wildfly master, or as a follow up PR if the feature is in wildfly-core. In some cases though the documentation belongs more in a component, or does not need any documentation. Indicate which of these will happen.
////

0 comments on commit 4e7a523

Please sign in to comment.