-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce synchronization on field data cache #27365
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
+@jasontedor for the double-checked locking solution. |
test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
The field data cache can come under heavy contention in cases when lots of search threads are hitting it for doc values. This commit reduces the amount of contention here by using a double-checked locking strategy to only lock when the cache needs to be initialized. Relates #27365
The field data cache can come under heavy contention in cases when lots of search threads are hitting it for doc values. This commit reduces the amount of contention here by using a double-checked locking strategy to only lock when the cache needs to be initialized. Relates #27365
I am not convinced this is a good fix. From my perspective double checked locking doesn't apply in this case since assume all objects inside the map are safely published which is by no means guaranteed. You can apply this pattern if you read volatile variable to prevent synchronization but not to access a
either we use a ConcurrentHashMap here or we synchronize the map. But this change is broken depending on the implementation of the |
@s1monw this fix is |
I completely agree with @s1monw here. This line IndexFieldDataCache cache = fieldDataCaches.get(fieldName); does not establish a happens-before edge with any other access to |
while your fix might work with some map implementations you have no guarantee that it does on all JVMs or spec impls. it might work on all but then by accident. You are reading from a datastructure that is not threadsafe that is concurrently modified. you are basically using single writer multiple reader principle but this datastrucuture is only designed for single writer single reader. For instance if the |
I don't think that the issue is related to synchronization but rather to a misusage of this cache by |
@jimczi very good point ++ |
Good catch @s1monw. @tinder-xli The problem is again the |
@jasontedor @s1monw ok I think I got what you are saying, so looks like using a concurrenthashmap will resolve this issue right? since it will try to synchronize the entry before removing it thus get() won't see a key which has already been removed from the map. |
@tinder-xli I think I would rather see the impact of backporting #25644 first. |
@jasontedor i don't see "backport" label on that PR - is that change going to be in 5.6.x too? |
@tinder-xli It does not have the label because we have not decided to backport it. If we do backport it would be in 5.6.x. |
also I see |
It is called only if the document is on a new segment (different from the previous document) so at most it will be called N times, N being the number of segments in the shard. This is why we ensure that the document are sorted by doc ids just before the loop. In fact we could call |
@tinder-xli You're missing this guard: // if the reader index has changed we need to get a new doc values reader instance
if (subReaderContext == null || hit.docId() >= subReaderContext.docBase + subReaderContext.reader().maxDoc()) {
int readerIndex = ReaderUtil.subIndex(hit.docId(), context.searcher().getIndexReader().leaves());
subReaderContext = context.searcher().getIndexReader().leaves().get(readerIndex);
data = context.getForField(fieldType).load(subReaderContext);
values = data.getScriptValues();
} This helps give the semantics that @jimczi is describing. |
I had a chat with @colings86 regarding the backport of #25644 to 5.6. |
During our perf test against elasticsearch, we noticed two synchronized block in the call stack of fetching doc values, which i think is necessary and cause a serious gc issue for us, especially when "size" param is large and fetch docvalue fields.
There is a synchronized block for getting from fieldDataCaches map, which is unnecessary when cache != null. And
getForField
method is called for every hit thus blocking at here impact performance a lot. We suggest changing to double checked locking and only do synchronize whencache==null
.We see threads waiting on getForField method in our JFR recording.
gradle test
all passed in my local.For reference, below is a sample query we used for testing:
This PR links to #27350
@jasontedor @s1monw