-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLException: Unable to enlist connection to existing transaction
when accessing multiple persistence units in the same transaction since 3.8.2
#39283
Comments
Hello! I have the same situation where I'm trying to perform select operations on two datasources in single transaction. Error stacktrace:
This error is reproduced when trying to run the next test:
|
This seems caused by #39072.
|
Downgrading to Agroal 2.2 resolves the issue. So something in Agroal 2.3 broke this. |
Is it safe to downgrade agroal to 2.2 in pom.xml or better return to 3.8.1 and waiting for a fix? |
I think this is an improvement, but this is a pretty big change, meaning that if you have not paid attention to the transaction handling when using multiple data sources you might need several code changes to fix this. And that's why I think it might be a good idea to revert the Agroal upgrade and introduce it in the next minor version upgrade. Because otherwise this might block some users from upgrading to patches containing security fixes. Also this might be worth explaining in https://quarkus.io/guides/transaction as you might come across this issue the moment you add the second datasource in the application. In my case I have several data sources in use and they are in separate modules and I consider them being independent and that's why I have tried to keep the transaction handling separated - event though I have noticed earlier that Quarkus has allowed me to do queries in separate datasources using the same transaction.
or if I needed to have separate transactions inside the same method I could use the programmatic approach:
I hope these tips could be helpful to someone. |
At the very least this needs an entry in the migration guide, both 3.9 and 3.8... looks like we collectively skipped that, sorry @gastaldi . |
To be honest, I think the change should be reverted for There is still the open question whether we can set the agroal version back to |
What is the fundemental change in agroal that caused this ? |
|
It seems that agroal/agroal@342ee87 is the commit that caused the issue to appear. However, this alone does not seem to be the root cause. The root cause seems to be agroal/agroal@ced9e8b. It feels like this The example I gave in my previous comment (#39283 (comment)) does not use XA at all. Thus, I do not understand why |
Thanks @turing85 for the reproducer! I'm sorry that I have to say these codes were working before but not right. In your case, if you want to do the The root case in So I think the change in @maxandersen I think we definily need a document to describe these changes and impacts. |
Thank you @zhfeng for the review. I stroke-through my reproducer. |
@zhfeng comment is spot on. Agroal added the Unfortunately, the current exception does not describe the issue and is not very meaningful. I'll try to improve that. |
The question still is: why was this change made? And should we really include it in a patch-level update of an LTS? Was something broken (as in "transactional properties were violated", not in "actual behaviour was different from documented behaviour") before? |
@zhfeng can you clarify that last question ? If we need to break Lts behaviour we need to have a reason. Trying to grok what was actually happening in previous versions ? Sounds like commits was potentially happening outside a tx. For some users that might be tolerable (bad; I know) so is there a way for those users to get that old behaviour back or are you saying this is literally broken behaviour in all cases? |
Yeah, in previous version, if it involves multi non-XA datasources in a transaction, Narayana Transaction Manager CAN NOT guarantees that DO THEM ALL OR NOTHING. I think it could violate the Atomic property of Transaction. Also this does not work in the crash recovery secenario. And I understand that in some cases, users don't want such a Strong Consistent transaction behavior.
Yeah, I think there is a propery we can set in Narayana to allow multi LastResources like arjPropertyManager.getCoreEnvironmentBean().setAllowMultipleLastResources(true); but it should be set before creating the instance of @mmusgrov What do you think if we can introudce such a propery in |
@zhfeng we could add that property but it is transactionally unsafe so we I doubt we'd add it. |
I am for adding this property. For quarkus |
Even allowing a single one-phase aware resource to join an XA transaction containing two-phase aware resources is transactionally unsafe. Beyond that, allowing two such resources is asking for trouble and we will get many users asking us why the integrity of their data is compromised. I can anticipate that it will end badly and give Quarkus a poor reputation. |
If we really need the updated deps we should add the flag. |
Worse than a patch version of an LTS containing breaking changes...? |
Adding it to the management api/property config implies that Quarkus will support users who fall foul of the consequences of using this behaviour, I would even go so far as saying they'd be better of disabling transactions and winging it instead since that would be better than having naive users thinking that transactions are giving them protection - we already give them the option of disabling recovery, which is a bit odd, and allowing multiple last resources would only compound the problem of allowing transactionally unsafe usage of the extension. I'd be agreeable to telling them that quarkus supports system properties that they can use to enable this behaviour but not for adding it directly to the extension config, ie it would become a workaround for a known defect. Finally as I mentioned above LRCO is unsafe but there is a safe alternative, called Commit Markable Resource, but that only allows a single 1-phase aware resource to join an XA transaction. |
@maxandersen , @mmusgrov So... how do we proceed now? What is the process? |
Can't I just tell you the system property and then then someone updates the docs? |
If I undestand the comment from @zhfeng correctly, this is not sufficient. |
There is no sufficient fix: LRCO is transactionally unsafe. |
Since there is no fix or property that can be exposed on Quarkus side to mitigate the problem, I think that at least the reproducer from @jacopo-cavallarin should be used as a starting point to write a Quarkus blog post about this issue, in order to illustrate how to restructure the code and fix the problematic transactions. |
@zhfeng the narayana-jta extension needs to know about the new class:
Although it is a while since I looked at extension code so I will defer to your expertise. |
@mmusgrov as you can see, the It's because all of the properties have been calculated at build time and stored in a staic Map. The only way is to use private static ConcurrentMap<String, Object> getBeanInstances() {
if (beanInstances == null) {
beanInstances = new ConcurrentHashMap<String, Object>();
}
return beanInstances;
} and replace all of the reference to So the question is do we need such changes to enable system properties at runtime in native mode? or just in build time is enough with the fix of #40250 |
Hi @mmusgrov I open jbosstm/narayana#2248 to make some changes in BeanPopulator. Then we can reload the properties in native mode. |
I am on it (building quarkus takes forever on my machine...)
|
I can confirm that this seems to work. Here is a patch to build with the Subject: [PATCH] use snapshot
---
Index: pom.xml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/pom.xml b/pom.xml
--- a/pom.xml (revision f07d24763f1ad5cccb6a01db559a67f18ae52026)
+++ b/pom.xml (date 1714151479713)
@@ -20,8 +20,8 @@
<!-- Quarkus versions -->
<quarkus.platform.artifact-id>quarkus-bom</quarkus.platform.artifact-id>
- <quarkus.platform.group-id>io.quarkus.platform</quarkus.platform.group-id>
- <quarkus.platform.version>3.8.2</quarkus.platform.version>
+ <quarkus.platform.group-id>io.quarkus</quarkus.platform.group-id>
+ <quarkus.platform.version>999-SNAPSHOT</quarkus.platform.version>
<!-- Test dependency versions -->
<truth.version>1.4.2</truth.version>
@@ -256,9 +256,9 @@
<scope>import</scope>
</dependency>
<dependency>
- <groupId>${quarkus.platform.group-id}</groupId>
+ <groupId>io.quarkus.platform</groupId>
<artifactId>quarkus-camel-bom</artifactId>
- <version>${quarkus.platform.version}</version>
+ <version>3.8.2</version>
<type>pom</type>
<scope>import</scope>
</dependency> |
trying to follow here but struggling :) whats the status ? |
The work around is to use
|
proposal from a call with @mmusgrov, @Sanne, @zhfeng, @gsmet:
|
@jacopo-cavallarin I/we are quite curious to understand what if anything you had in place when using multiple PU's with no XA support. how did you handle failure? are/were you aware of that you couldn't rely on curious to hear from others too as we would like to figure out how we best handle this critical change in behavior that ensures transactional safety - but also understand some systems are build which can work without as strict constraints. |
We simply weren't aware of this problem and assumed that everything would work as expected. Maybe we've been lucky and haven't encountered this problem because in most cases I can think of we write to only one datasource, and only read from the rest. In addition, enabling XA on our applications is not an easy task, as we also need to enable it on all our affected database instances (and from what I gather this requires a server restart, at least for PostgreSQL). That's why it's very important for us to have a viable workaround while we solve this. |
@jacopo-cavallarin thanks for the info - and fully understood and as of now you can set those system properties running jvm mode. It is though just a workaround we we'll want to add config flag for it to ensure it is more explicitly documented as not recommended. To be clear even write from one and read to another can be problematic. note, you don't necessarily need to enable XA on your application but if you don't you will have to manually manage your transactions because without it you can't just rely on @transactional to get right behavior. Anyhow, your case is why its important we change to error by default since you haven't realized you were running non-transactional. |
I created a draft of the fix we discussed: #40365 . I won't have the time to finalize this today so I would appreciate:
Thanks! |
I was able to verify that it fixes the issue in native. |
I can also verify that it fixes the issue on my reproducer 👍 |
Thank you @turing85! |
A bit late to join the party, but we were also impacted by this when we tried to migrate from 3.8.1 to 3.8.2. We held off on migrating for a bit. Do I understand correctly that we now have an option to go to 3.8.5 and opt-in to the same (but unsafe) behaviour that was there before the fix from agroal? |
Yes. This is what we did in some of our applications.
I am no expert on the topic of XA, so take this with a grain of salt. XA works by a two-phase commit protocol. in the first phase, the transaction manager (aka. narayana) asks every resource "Can I commit?". If a resource responds with "yes" it "commits to commit", i.e. it says "I guarantee that you can commit". If all resoruces in the first phase answer with "yes", then the second phase is entered. In this phase, the transaction manager prompts all resources to commit. Importantly, as soon as one resource commits, all other reosurces have to commit, otherwise the XA-properties are violated. Now, to allow non-XA resources, narayana uses a small trick: it skips those resources in the 1st phase, and commits them first thing in the 2nd phase. This creates a small gap. It could happen that all XA-resources report "can commit" in the 1st phase. Then, in the 2nd phase, the first non-XA resource commits, but the second XA-resource fails to commit. This represents a problem, since the 1st resource cannot roll back (since it is already committed), but the 2nd cannot commit. This violates the XA-properties. If I understand @mmusgrov's comment correctly, there is also a risk if only one non-XA reosurce is used. But I do not understand where the "gap" lies here. Maybe he can elaborate.
If possible, this is probably the best thing to do. |
@turing85 There is a small window where the non-XA resource commits but then the transaction manager crashes just before it writes its commit decision to a transaction log and before it starts telling those XA resources to commit. When the TM restarts it doesn't know about the transaction because it has no log and the non-XA resource doesn't know about the transaction branch because it successfully committed, and the XA resources will have timed out and rolled back - ie there's a heuristic outcome meaning some resources committed while others rolled back. The failure mode is sufficiently rare that most users accept the risk. But once one starts adding more non-XA resources the exposure to failures increases, in a non-linear fashion, because of the extra time and the extra network calls to multiple resources. |
Describe the bug
After updating from 3.8.1 to 3.8.2, some of our tests that insert data in multiple PUs within a single transaction now fail and throw an exception.
That exception is thrown whenever we attempt to access two or more persistence units within a single
@Transactional
method. This worked fine in previous releases.We suspect that the bug is due to Agroal 2.3, since we encountered the same problem weeks ago while attempting to force the 2.3 version on older Quarkus releases.
Expected behavior
The transaction commits successfully without any error.
Actual behavior
The transaction is rolled back and this exception is thrown:
How to Reproduce?
Reproducer: https://github.com/jacopo-cavallarin/agroal-bug-reproducer
Clone the linked repo and follow the instructions in the README
Output of
uname -a
orver
Darwin M0-055116363 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020 arm64
Output of
java -version
openjdk version "21.0.1" 2023-10-17 LTS OpenJDK Runtime Environment Temurin-21.0.1+12 (build 21.0.1+12-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (build 21.0.1+12-LTS, mixed mode)
Quarkus version or git rev
3.8.2
Build tool (ie. output of
mvnw --version
orgradlew --version
)Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae) Maven home: /Users/jacopocavallarin/.m2/wrapper/dists/apache-maven-3.9.6-bin/3311e1d4/apache-maven-3.9.6 Java version: 21.0.1, vendor: Eclipse Adoptium, runtime: /Users/jacopocavallarin/.sdkman/candidates/java/21.0.1-tem Default locale: en_IT, platform encoding: UTF-8 OS name: "mac os x", version: "14.2.1", arch: "aarch64", family: "mac"
Additional information
No response
The text was updated successfully, but these errors were encountered: