Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure to access RecoveryState#fileDetails under lock #43839

Merged
merged 3 commits into from
Jul 3, 2019

Conversation

paulward24
Copy link
Contributor

The field fileDetails (a HashMap, i.e., not thread safe)

https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/indices/recovery/RecoveryState.java#L679

is used only in synchronzied methods (in about 20 locations), e.g.,:

https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/indices/recovery/RecoveryState.java#L767-L768

i.e., including .size().

This is correct, because according to JDK:

If multiple threads access a hash map concurrently, and at least
one of the threads modifies the map structurally, it must be
synchronized externally.

https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html

However, in the 21st location, here:

https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/indices/recovery/RecoveryState.java#L958

the method is not synchronized.

This CR simply adds the keyword synchronized to the method, just like for all the other places.

…0 locations, but not in 1 (the 21st) location.
@dnhatn
Copy link
Member

dnhatn commented Jul 1, 2019

@paulward24 Can you please sign CLA? Thank you for reporting and working on this issue.

@dnhatn dnhatn self-requested a review July 1, 2019 21:50
@dnhatn dnhatn added the :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. label Jul 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn added the >bug label Jul 1, 2019
@dnhatn dnhatn self-assigned this Jul 1, 2019
@paulward24
Copy link
Contributor Author

I signed the CLA.

Thanks for pointing this out!

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member

dnhatn commented Jul 1, 2019

@elasticmachine test this please

1 similar comment
@dnhatn
Copy link
Member

dnhatn commented Jul 1, 2019

@elasticmachine test this please

@original-brownbear
Copy link
Member

Jenkins run elasticsearch-ci/1

@paulward24
Copy link
Contributor Author

I see 2 tests failing.

I don't know why this would happen, but it seems unlikely that they fail from this patch.

I.e., the patch is just synchronizing the get() of a HashMap --- there is nothing to deadlock or break there.

I am not familiar with the internals of ElasticSearch to be able to debug those two tests

Nhat, can you please take a look?

Thanks!!

@original-brownbear
Copy link
Member

I think the fix for the failures here is incoming in #43861

@original-brownbear
Copy link
Member

original-brownbear commented Jul 2, 2019

@paulward24 could you merge master into your branch please so we can try building again? The test failure should be fixed in master now.

@dnhatn
Copy link
Member

dnhatn commented Jul 2, 2019

@elasticmachine update branch

@paulward24
Copy link
Contributor Author

I see this "@elasticmachine update branch" from Nhat.

Should I still do the merge?

@dnhatn
Copy link
Member

dnhatn commented Jul 2, 2019

Yes. please merge master into your branch. Thank you!

@paulward24
Copy link
Contributor Author

Ok, I did --- how do I update this CR ?

@dnhatn
Copy link
Member

dnhatn commented Jul 2, 2019

@elasticmachine test this please

1 similar comment
@paulward24
Copy link
Contributor Author

@elasticmachine test this please

@paulward24
Copy link
Contributor Author

ok, I don't know why these tests fail.

Unlikely to be related to the patch

@original-brownbear
Copy link
Member

Jenkins test this

(@paulward24 yea the failures are the result of a temporary infrastructure issue we were experiencing, let's see if it passed :))

@original-brownbear
Copy link
Member

Jenkins run elasticsearch-ci/packaging-sample

@dnhatn dnhatn merged commit 8e413f8 into elastic:master Jul 3, 2019
@dnhatn
Copy link
Member

dnhatn commented Jul 3, 2019

@paulward24 Thanks again for working on this :).

@dnhatn dnhatn changed the title HashMap is is not thread safe. Field fileDetails is synchronized in 20 locations, but not in 1 (the 21st) location. Ensure to access RecoveryState#fileDetails under lock Jul 3, 2019
@paulward24
Copy link
Contributor Author

Thank you Nhat and Armin for all the hard work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.3.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants