-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patched eclipselink 2.7.6 Having blowups in ReadLockManager.removeReadLock(ReadLockManager.java:97) The implementation of CacheKey.equals(CacheKey.java:331 and CacheKey.HashCode are not safe #2114
Comments
We received feedback today from the project team that our patching has improved things, and they no longer need to perform application server restarts due to these NPExceptions. Normally projects do not experience this problem, but this specific project for whatever reason was being heavily affected by this bug. It might still be too early to determine if this is the end of the story, but we have consistently patched this issue for every version from 2.4.2 to 2.6.7 in the past. The fixes we implemented there will certainly not cause harm, as they all enhance the robustness of the code. However, they do not explain the strange phenomenon of cache keys becoming null in the primary key. Unfortunately, we lack the expert knowledge and experience in EclipseLink to provide such an explanation. |
I wanted to provide an update regarding our ongoing efforts to address the issue. Unfortunately, our fix is not yet complete. Recently, the project encountered the following stack trace while utilizing our latest patch:
Despite our efforts, it remains possible to encounter I will keep you informed of any progress as we continue to patch and improve the situation. |
I am quite certain that I understand why our ConcurrencyUtil logic encountered issues within the org.eclipse.persistence.internal.helper.ConcurrencyUtil.createStringWithSummaryOfReadLocksAcquiredByThread(ReadLockManager, String) method. The reason is straightforward, in my opinion. It stems from the fact that the data structure we are iterating over is not a deep copy. Specifically, the read lock manager, org.eclipse.persistence.internal.helper.ReadLockManager.getMapThreadToReadLockAcquisitionMetadata(), returns an unmodifiable map. However, the underlying map and its list values can be modified by concurrent threads behind the scenes. To address this, the org.eclipse.persistence.internal.helper.ReadLockManager.getMapThreadToReadLockAcquisitionMetadata() method should return a deep clone of the map and its contained entries. This adjustment now makes much more sense. I will show the improved org.eclipse.persistence.internal.helper.ReadLockManager.getMapThreadToReadLockAcquisitionMetadata() in a second. |
The following is what unbkreable code looks like: /** Getter for {@link #mapThreadToReadLockAcquisitionMetadata} returns a deep clone */
public synchronized Map<Long, List<ReadLockAcquisitionMetadata>> getMapThreadToReadLockAcquisitionMetadata() {
// We cannot simply return an unmodifiable map here. There are two reasons for this:
// 1. The code consuming this unmodifiable map might be surprised by changes to the map itself caused by this
// read lock manager.
// If threads continue to acquire and release read locks, it could impact this object.
// 2. Additionally, the list values contained in the map could also be affected by threads that are reading and
// releasing read locks.
// Our approach should be to provide the ConcurrencyUtil with a safe deep clone of the data structure for its
// massive dumps.
// (a) Let's start by creating an independent result map that the caller can safely iterate over.
Map<Long, List<ReadLockAcquisitionMetadata>> resultMap = new HashMap<>();
// (b) depp clone the data strcuture
for (Entry<Long, List<ReadLockAcquisitionMetadata>> currentEntry : mapThreadToReadLockAcquisitionMetadata
.entrySet()) {
ArrayList<ReadLockAcquisitionMetadata> deepCopyOfCurrentListOfMetadata = new ArrayList<>(currentEntry.getValue());
resultMap.put(currentEntry.getKey(), unmodifiableList(deepCopyOfCurrentListOfMetadata) );
}
// (c) even if our result map is deep clone of our internal sate
// we still want to return it as unmodifable so that callers do not have the illusion
// they are able to hack the state of the read lock manager from the outside.
// If any code tris to manipulate the returned clone they should get a blow up to be dispelled of any illiusions
return Collections.unmodifiableMap(resultMap);
} The modification to the I will still strength the code of both the read lock manager and the ConcurrencyUtil with further level of paranoia/safety-check code to ensure I never see again a NPexception in this code area. I will provide the patch later today. |
In this post, I am presenting the second patch for this bug, which builds upon the first patch previously provided. The associated commit message for these changes is as follows:
0001-TRK-32315-Addressing-Concurrency-Issues-in-ReadLockM.patch and the literal files are as follows |
About NPE primaryKey
I'm not sure about this approach, because we silently accept incorrect value |
Hi, Regarding the issue of Now, on the topic of the NPE working with data from Regarding this specific matter:
So, the situation regarding the read lock metadata is quite clear. As for updates, I am awaiting feedback from the project. They have been operating for quite some time before encountering this issue. It may take several weeks to receive confirmation that improvements are indeed underway. Thank you. |
Let me summarize the types of null pointer exceptions that are patch should be addressing. This pattern A:
This pattern B:
This pattern C:
I do not have any NPExceptin for the cache key hashcode method. Pattern B and C are most likely heavily married to one another, it is that illusion of a null entry that I explained above. Thanks. |
The project team tells me they are happy with the patched eclipselink, for the time being at least. |
Hello on the beginning You mentioned |
We’ve encountered a rare but critical bug in a project using Oracle-patched EclipseLink 2.7.6, which was modified to align with newer versions like 2.7.9. This bug is associated with the CacheKey class in EclipseLink, which lacks robust equals and hashCode methods. The issue arises when a CacheKey.key unexpectedly becomes null, leading to unforeseen complications.
Historically, we’ve addressed this bug in our internally maintained versions of EclipseLink, specifically versions 2.4.2, 2.6.4, 2.6.5, and 2.6.7. These versions, released with various WebLogic versions, have been patched to include concurrency manager improvements.
Currently, there’s a bugfix for these older versions of EclipseLink that isn’t present in any of the 2.7.X tags. This fix should likely be integrated into the official branches to prevent future occurrences.
Although this bug is infrequent, its potential impact necessitates proactive measures. Here’s a brief overview of the bug we’ve recently re-encountered.
The stack-trace is sliced out, on purpose, to only cover eclipselink code and obscure any private information.
The application became unresponsive with all manner of exceptions in eclipselink, but at the point of origin we could see exceptions with a stack trace like this:
The stack trace indicates that
ReadLockManager.removeReadLock(ReadLockManager.java:97)
encounters an unexpected scenario where it is possible for "null" to be added toReadLockManager
.We do not know how it is possible, but facts are facts.
Crucial here, the
CacheKey
'sequals
method is not safeguarded againstNullPointerException
.Specifically, the issue lies within this code segment:
https://github.com/eclipse-ee4j/eclipselink/blob/2.7.14/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/identitymaps/CacheKey.java
We have the patched method to look like this:
Notice that the method above will not blow up if the input parameter key is the null object.
The
hashCode()
method is another area that lacked safety. We have patched the implementation ofhashCode()
to address this issue. Previously, it was implemented as follows:The code above is prone to failure if
this.key
is null. While the exact cause is unknown, we are certain that this scenario can occur.To enhance safety, we have revised the implementation to be more robust against null values. The new implementation is as follows:
We will provide an attachment containing the fully patched
CacheKey
object.Additionally, we have made further enhancements to the metadata of the
CacheKey
object. Specifically, we have reintroduced a segment of code at the beginning of the class that was previously added to earlier versions of EclipseLink. This same code has now been incorporated into version 2.7.6 as follows:The updated code introduces two new fields:
primaryKeyAtTimeOfCreation
andcacheKeyInstanceCreationNumber
, along with associated metadata. This enhancement allows us to meticulously monitor the lifecycle of a cache key. We have added two immutable fields that capture the original key of the cache key at the moment of its creation.Furthermore, we have implemented a unique instance creation number for each object. This distinctive number enables us to definitively determine in the
equals
andhashCode
methods whether two objects are the same instance.Below, you can observe how the new immutable field
primaryKeyAtTimeOfCreation
is assigned during the object's construction phase:In summary, we have fortified the class by implementing the following improvements:
Revised
equals
andhashCode
Methods: We revisited the implementation of theequals
andhashCode
methods to ensure they behave as expected under normal circumstances and remain resilient when encountering null objects or keys.Enhanced
CacheKey
Metadata: We have augmented theCacheKey
metadata to better trace its challenging lifecycle and to identify the origins of corrupted cache keys in extensive dumps after the key has become null. To achieve this, we introduced new fields to theCacheKey
that are established at the time of construction and are immutable. Specifically, these fields are:protected final Object primaryKeyAtTimeOfCreation;
protected final long cacheKeyInstanceCreationNumber;
Additionally, for the comprehensive dump analysis that reports on the status of the cache keys, we have added the method:
public Object getKeyEnsuredToBeNotNull()
which returnsthis.key
if it is not null, orthis.getPrimaryKeyAtTimeOfCreation()
ifthis.key
is null.This methodical approach ensures that the
CacheKey
class is robust and reliable throughout its lifecycle.Our patching efforts extended beyond the
CacheKey
. In the past, we've encountered issues with massive dumps causing failures when executing the method:This was due to the following line of code, which proved problematic:
To resolve this, we've implemented a safer alternative:
With these changes, the trace message now includes two additional pieces of metadata:
cacheKeyInstanceCreationNumber
andprimaryKeyAtTimeOfCreation
, enhancing the diagnostic capabilities as demonstrated in the code snippet above. This ensures a more robust and fail-safe approach in handling cache keys within our system.In summary, we strongly recommend adopting our enhancements. We plan to test this patched version of EclipseLink 2.7.6 on the current project to determine if their ongoing problems get resolved.
In the uploaded TRK-32315_patchedCode.zip
you will find our modified files ConcurrencyUtil.java, CacheKey.java and TraceLocalizationResource.java.
TRK-32315_patchedCode.zip
We will keep you posted on this issue, in particular when we make a breakthrough.
The text was updated successfully, but these errors were encountered: