-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve security-crypto threadpool overflow handling #111369
Conversation
Pinging @elastic/es-security (Team:Security) |
Hi @gwbrown, I've created a changelog YAML for you. |
}, listener::onFailure)); | ||
}, exception -> { | ||
// Crypto threadpool queue is full, invalidate this cache entry and make sure nothing is going to wait on it | ||
logger.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this might cause log spam, but I don't want to pull it all the way back to debug because I think it's good to be able to easily tell when this is happening.
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
...plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/ApiKeyServiceTests.java
Outdated
Show resolved
Hide resolved
...plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/ApiKeyServiceTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM good job, nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice job finding this bug
Prior to this PR, when the security-crypto threadpool queue overflows and rejects API key hashing submissions, a toxic value (specifically, a future which will never be completed) is added to the API key auth cache. This toxic cache value causes future authentication attempts with that API key to fail by timeout, because they will attempt to wait for the toxic future, until that value is invalidated and removed from the cache. Additionally, this will hold on to memory for each request that waits on the toxic future, even after the request has timed out. This PR adds a unit test to replicate this case, and adjusts the code which submits the key hashing task to the security-crypto threadpool to properly handle this point of failure by invalidating the cached future and notifying waiting handlers that the computation has failed.
💔 Backport failed
You can use sqren/backport to manually backport by running |
Prior to this PR, when the security-crypto threadpool queue overflows and rejects API key hashing submissions, a toxic value (specifically, a future which will never be completed) is added to the API key auth cache. This toxic cache value causes future authentication attempts with that API key to fail by timeout, because they will attempt to wait for the toxic future, until that value is invalidated and removed from the cache. Additionally, this will hold on to memory for each request that waits on the toxic future, even after the request has timed out. This PR adds a unit test to replicate this case, and adjusts the code which submits the key hashing task to the security-crypto threadpool to properly handle this point of failure by invalidating the cached future and notifying waiting handlers that the computation has failed.
Prior to this PR, when the security-crypto threadpool queue overflows and rejects API key hashing submissions, a toxic value (specifically, a future which will never be completed) is added to the API key auth cache. This toxic cache value causes future authentication attempts with that API key to fail by timeout, because they will attempt to wait for the toxic future, until that value is invalidated and removed from the cache. Additionally, this will hold on to memory for each request that waits on the toxic future, even after the request has timed out. This PR adds a unit test to replicate this case, and adjusts the code which submits the key hashing task to the security-crypto threadpool to properly handle this point of failure by invalidating the cached future and notifying waiting handlers that the computation has failed.
Prior to this PR, when the security-crypto threadpool queue overflows and rejects API key hashing submissions, a toxic value (specifically, a future which will never be completed) is added to the API key auth cache. This toxic cache value causes future authentication attempts with that API key to fail by timeout, because they will attempt to wait for the toxic future, until that value is invalidated and removed from the cache. Additionally, this will hold on to memory for each request that waits on the toxic future, even after the request has timed out. This PR adds a unit test to replicate this case, and adjusts the code which submits the key hashing task to the security-crypto threadpool to properly handle this point of failure by invalidating the cached future and notifying waiting handlers that the computation has failed.
Prior to this PR, when the security-crypto threadpool queue overflows and rejects API key hashing submissions, a toxic value (specifically, a future which will never be completed) is added to the API key auth cache. This toxic cache value causes future authentication attempts with that API key to fail by timeout, because they will attempt to wait for the toxic future, until that value is invalidated and removed from the cache. Additionally, this will hold on to memory for each request that waits on the toxic future, even after the request has timed out. This PR adds a unit test to replicate this case, and adjusts the code which submits the key hashing task to the security-crypto threadpool to properly handle this point of failure by invalidating the cached future and notifying waiting handlers that the computation has failed.
Prior to this PR, when the security-crypto threadpool queue overflows and rejects API key hashing submissions, a toxic value (specifically, a future which will never be completed) is added to the API key auth cache. This toxic cache value causes future authentication attempts with that API key to fail by timeout, because they will attempt to wait for the toxic future, until that value is invalidated and removed from the cache. Additionally, this will hold on to memory for each request that waits on the toxic future, even after the request has timed out.
This PR adds a unit test to replicate this case, and adjusts the code which submits the key hashing task to the security-crypto threadpool to properly handle this point of failure by invalidating the cached future and notifying waiting handlers that the computation has failed.