Create an ES client per simulated client instead of per worker. #1516
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a prerequisite for #1067 and also indirectly relates to #1350.
As discussed in the context of API keys, runners shouldn't have to concern themselves with ES transport options; they should just be provided ES client objects that already have the right options set. This PR is the first step in that direction.
Previously, an ES client was created per worker and shared among all of the worker's simulated clients. Now, each worker will instead create distinct ES client objects per simulated client that the worker is responsible for. This unique ES client object is what ultimately gets passed to the runner(s) invoked by that simulated client.
In lower-level terms, a worker is basically represented by an
AsyncIoAdapter
object, while a client basically maps to anAsyncExecutor
object. AnAsyncIoAdapter
will now create an ES client perclient_id
that it's responsible for, and pass it to theAsyncExecutor
that's instantiated for thatclient_id
. TheAsyncExecutor
ultimately invokes the runner(s) that issue requests via the ES client.This isn't the only possible implementation, though. As the discussion linked above details, version
8.0
+ of the ES Python client offers a new.options()
method on clients for overriding transport options on a per-request basis. It doesn't mutate the client, but rather returns a new client object with the provided transport options set.We originally planned on using this new API, either by upgrading our client dependency (which we've paused for now) or by implementing
.options()
ourselves in Rally today. However, after getting into the details more, it seems that.options()
isn't strictly necessary right now, and may actually make implementation more difficult and risky. Here's why:While the
.options()
method is meant for request-level overrides, creating clients with transport options defined in the first place is still the normal way of doing things. "Overriding default transport settings at request time" is one way to conceptualize what we're trying to do here, but it's still perfectly valid for workers to just instantiate ES clients upfront with the transport options they already know they'll need. Now, at a lower level, individual runners may have a genuine need for overriding those options (e.g. setting arequest_timeout
orignore_status
), but it's not necessary at the level of simulated clients in order to support something like client-specific authentication. The downside could be the overhead of creating potentially many more ES client objects, but......
.options()
will do that, anyway. As mentioned,.options()
doesn't mutate the object it's called on, but rather creates a new object of its type. And leads to the real trickiness, which is that Rally's async ES client objects aren't just vanilla instances ofAsyncElasticsearch
. They also inherit fromRequestContextHolder
and are therefore stateful. My initial experiments with porting.options()
to Rally quickly resulted in benchmarks hanging forever. I have a sense of why this is happening, but in light of (1) above, it doesn't seem worth the time or risk to pursue, at least not right now. We will need to figure this out eventually for the client upgrade, however, since.options()
is how individual runners will need to set options likerequest_timeout
,ignore_status
, as mentioned.So, to sum up, implementing
.options()
ourselves doesn't seem to offer obvious advantages at the moment, and making it work would probably require refactoring the code that Rally relies on for measuring timings and scheduling task execution, which is risky to say the least.Regarding the potential overhead of creating more clients, I'm curious to hear input on this from the likes of @DJRickyB and @dliappis. If we merge as-is, we'll quickly get some feedback from the nightlies, but I'd think that if this is going to cause problems, it will be with very high client counts. I ran a logging indexing benchmark with 1,000 bulk indexing clients with both
master
and this branch of Rally and found no evidence of this change causing a client-side bottleneck. It's just one data point, so I'm happy to flesh out a larger number of test cases if we feel that's in order, but wanted to get the team's thoughts before investing a bunch of time in that.