Benchmarks: Re-factored benchmark infra #7602

aleph-zero · 2014-09-04T22:16:41Z

Major re-factoring to use a dual-channel strategy for executing
benchmarks. Uses cluster metadata for managing lifecycle events, but
transport channel to send benchmark definitions and results between
master and executor nodes.

This commit also adds:

Wildcard requests for pause/resume/abort.
Fixes transport request ACTION string to conform to security naming
conventions.
Fixed bug that incorrectly calculated total requested iterations in
cases where summary computation was called more than once.
Rename field 'total_completed_queries' for better readability.
Clear values for slowest when re-calculating summary results.
Simplified use of barriers and semaphores in testing logic to allow
suspending benchmark execution for testing various states.
Handle cases where executor nodes drop from the cluster during
execution.

Major re-factoring to use a dual-channel strategy for executing benchmarks. Uses cluster metadata for managing lifecycle events, but transport channel to send benchmark definitions and results between master and executor nodes. This commit also adds: - Wildcard requests for pause/resume/abort. - Fixes transport request ACTION string to conform to security naming conventions. - Fixed bug that incorrectly calculated total requested iterations in cases where summary computation was called more than once. - Rename field 'total_completed_queries' for better readability. - Clear values for slowest when re-calculating summary results. - Simplified use of barriers and semaphores in testing logic to allow suspending benchmark execution for testing various states. - Handle cases where executor nodes drop from the cluster during execution.

Re-construct benchmark search requests to use the request headers and context from the originating request. This is to ensure we execute with the proper credentials.

aleph-zero · 2014-09-04T23:38:43Z

src/main/java/org/elasticsearch/action/benchmark/BenchmarkExecutor.java

+        }
+
+        return newSearchRequests;
+    }


@uboness @javanna Could you comment as to whether this is the correct way to pass the original request headers and context through to sub-searches?

Minor refactoring in response to PR criticism - Subclass AbstractComponent instead of AbstractLifecycleComponent - Move utility method out of AbstractBenchmarkService

The mapping of competitor names to semaphores never changes. It is better to make it an immutable map, which is what this commit does.

Changed comment about propagating headers and context to be more general.

Adding explanatory comments for lock usage in BenchmarkExecutor. Minor readability improvements to conditional logic.

When computing the slowest requests, use the average time instead of the maximum time.

s1monw · 2014-09-18T09:48:32Z

src/main/java/org/elasticsearch/action/benchmark/BenchmarkCoordinatorService.java

+
+                final InternalCoordinatorState ics = new InternalCoordinatorState(request, nodeIds, listener);
+
+                ics.onReady    = new OnReadyStateChangeListener(ics);


I had a really hard time to understand all these *StateChangeListeners and I think they should all be folded into the InternalCoordinatorState It looks a lot like C where you pass a struct into functions to work around the lack of OO concepts. I think it complicates this class a lot, can you please move it out?

I also don't understand the synchronisation on the ChangeListeners since they should not be accessed concurrently? If you wanna protect the state it should be syncing on the state instead?

Latest commit changes this pattern per your comments. Would you take a look and let me know if the changes are in-line with your thinking?

Moved *StateChangeListener methods into the per-benchmark State class. Renamed InternalCoordinatorState to State. Added BatchedResponder<T> for handling wildcard responses. Moved log() method to toString() in BenchmarkMetaData.

Make the State class private, not protected.

Replace a big if block with an EnumSet.contains().

Simplified concurrency controls for BenchmarkState. Removed locks from BenchmarkExecutor and BenchmarkState and add concurrency control to BenchmarkExecutorService

imotov · 2014-09-23T17:40:17Z

rest-api-spec/api/benchmark.submit.json

        "documentation": "http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-benchmark.html",
        "methods": ["PUT"],
        "url": {
-            "path": "/_bench",
+            "path": "/_bench/_submit",


Since interfaces changed, it would be great to update docs as well.

imotov · 2014-09-23T18:40:39Z

src/main/java/org/elasticsearch/action/benchmark/BenchmarkExecutor.java

+
+                    final int numMeasurements = settings.multiplier() * searchRequests.size();
+                    final long[] timeBuckets  = new long[numMeasurements];
+                    final long[] docBuckets   = new long[numMeasurements];



It feels like timeBuckets and docBuckets can be combined into some sort of stats class and a lot of bucket manipulation code sprinkled through BenchmarkExecutor can be nicely encapsulated there.

Moved these into a Measurements class and cleaned up the code to not pass around arrays.

When all the executor nodes for a running benchmark drop out of the cluster, we need to force the benchmark to fail and return to the caller. Also fixes a bug that could sometimes cause listener.onResponse() to be called twice

imotov · 2014-09-24T08:24:38Z

A slight variation of the same failing scenario:

start node with -Des.node.bench=false as a master on port 9200
start node with -Des.node.bench=true as a master on port 9201
start benchmark by sending it to port 9201

curl -XPUT 'localhost:9201/_bench/_submit?pretty=true' -d '{
    "name": "my_benchmark",
    "competitors": [ {
        "name": "my_competitor",
        "requests": [ {
            "query": {
                "match": { "_all": "a*" }
            }
        } ]
    } ]
}'

while this benchmark is running, kill node running on port 9200 using kill -9
run curl 'localhost:9201/_bench/_status?pretty' notice that my_benchmark is now stuck with COMPLETED status in cluster state.
run curl -XPOST 'localhost:9201/_bench/_abort/my_benchmark?pretty' - it never returns, and you can see in the log files on the server message like this:

org.elasticsearch.ElasticsearchIllegalStateException: benchmark [my_benchmark]: missing internal state
    at org.elasticsearch.action.benchmark.BenchmarkCoordinatorService$5.onResponse(BenchmarkCoordinatorService.java:355)
    at org.elasticsearch.action.benchmark.BenchmarkCoordinatorService$5.onResponse(BenchmarkCoordinatorService.java:341)
    at org.elasticsearch.action.benchmark.BenchmarkStateManager$UpdateTask.clusterStateProcessed(BenchmarkStateManager.java:172)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:466)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

imotov · 2014-09-24T09:00:31Z

src/main/java/org/elasticsearch/action/benchmark/competition/CompetitionSummary.java

        builder.field(Fields.TOTAL_QUERIES, totalQueries);
        builder.field(Fields.CONCURRENCY, concurrency);
        builder.field(Fields.MULTIPLIER, multiplier);
-        builder.field(Fields.AVG_WARMUP_TIME, avgWarmupTime);
+        builder.field(Fields.AVG_WARMUP_TIME, Double.valueOf(formatter.format(avgWarmupTime)));


Would it make more sense to allow clients to reduce precision if needed? I realize that the json output looks much more readable this way, but it just seems strange and arbitrary to reduce precision to 2 decimal points in the serialization code.

Refactored state management in the executor. All state is now managed in the executor service class.

Add Measurements class instead of passing around arrays of long everywhere.

Moved all state management into State class. Previously it was scattered throughout the executor service.

imotov · 2014-09-30T12:34:03Z

src/main/java/org/elasticsearch/action/benchmark/BenchmarkExecutorService.java

+                    if (entry.nodeStateMap().get(nodeId()) == BenchmarkMetaData.Entry.NodeState.COMPLETED &&
+                        entry.nodeStateMap().get(nodeId()) == BenchmarkMetaData.Entry.NodeState.ABORTED &&
+                        entry.nodeStateMap().get(nodeId()) == BenchmarkMetaData.Entry.NodeState.FAILED) {
+                        break;


Is this possible?

Changed API endpoints to not use ‘_’ in the path twice. Old endpoint format was: ‘/_bench/_{action}’ New format is: ‘/_bench/{action}’

Detect cases where the master node changed. In such cases we abort the benchmark, clean out the cluster state, and send a failure back to the caller.

javanna · 2015-03-20T10:35:56Z

Waiting for #6914, marked as stalled

jpountz · 2015-09-11T09:23:29Z

This PR is already hard to merge due to recent changes, and this won't get better until #6914 is in. Closing: we will have to do things from scratch again anyway.

Andrew Selden added 2 commits September 4, 2014 15:00

Benchmarks: Use original request headers/context

d205bf4

Re-construct benchmark search requests to use the request headers and context from the originating request. This is to ensure we execute with the proper credentials.

aleph-zero reviewed Sep 4, 2014
View reviewed changes

Andrew Selden added 5 commits September 4, 2014 16:53

Benchmarks: Minor refactoring

b59238f

Minor refactoring in response to PR criticism - Subclass AbstractComponent instead of AbstractLifecycleComponent - Move utility method out of AbstractBenchmarkService

Benchmarks: Make semaphore map immutable

aab025c

The mapping of competitor names to semaphores never changes. It is better to make it an immutable map, which is what this commit does.

Benchmarks: Change comments about headers/context

5d2cc01

Changed comment about propagating headers and context to be more general.

Benchmarks: Explanatory comments for lock usage

219d70b

Adding explanatory comments for lock usage in BenchmarkExecutor. Minor readability improvements to conditional logic.

Benchmarks: Compare slowest time to average not max

c45d073

When computing the slowest requests, use the average time instead of the maximum time.

s1monw reviewed Sep 18, 2014
View reviewed changes

Andrew Selden added 4 commits September 18, 2014 16:13

Benchmarks: Moved state listeners to State class

448374e

Moved *StateChangeListener methods into the per-benchmark State class. Renamed InternalCoordinatorState to State. Added BatchedResponder<T> for handling wildcard responses. Moved log() method to toString() in BenchmarkMetaData.

Benchmarks: Make State class private

8583dde

Make the State class private, not protected.

Benchmarks: Use EnumSet for checking node state

86e2052

Replace a big if block with an EnumSet.contains().

Benchmarks: Simplified concurrency

c0d9ae3

Simplified concurrency controls for BenchmarkState. Removed locks from BenchmarkExecutor and BenchmarkState and add concurrency control to BenchmarkExecutorService

imotov reviewed Sep 23, 2014
View reviewed changes

Benchmarks: Fail orphaned benchmarks

8c5e08a

When all the executor nodes for a running benchmark drop out of the cluster, we need to force the benchmark to fail and return to the caller. Also fixes a bug that could sometimes cause listener.onResponse() to be called twice

imotov reviewed Sep 24, 2014
View reviewed changes

Andrew Selden added 3 commits September 24, 2014 16:22

Benchmarks: Refactor state management

0207042

Refactored state management in the executor. All state is now managed in the executor service class.

Bencharks: Cleanup measurements

414099c

Add Measurements class instead of passing around arrays of long everywhere.

Benchmarks: Consolidate state management

478dc7b

Moved all state management into State class. Previously it was scattered throughout the executor service.

imotov reviewed Sep 30, 2014
View reviewed changes

Andrew Selden added 2 commits October 7, 2014 13:11

Benchmarks: Change API endpoints

0dff36a

Changed API endpoints to not use ‘_’ in the path twice. Old endpoint format was: ‘/_bench/_{action}’ New format is: ‘/_bench/{action}’

Benchmarks: Master failure resiliency

23fb48f

Detect cases where the master node changed. In such cases we abort the benchmark, clean out the cluster state, and send a failure back to the caller.

drewr force-pushed the master branch from dcc3da0 to 7c20a8a Compare February 20, 2015 16:48

javanna added :Benchmark >feature stalled labels Mar 20, 2015

javanna added the v2.0.0-beta1 label Mar 20, 2015

clintongormley removed the v2.0.0-beta1 label Jun 6, 2015

jpountz closed this Sep 11, 2015

jasontedor deleted the feature/bench branch August 12, 2016 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks: Re-factored benchmark infra #7602

Benchmarks: Re-factored benchmark infra #7602

aleph-zero commented Sep 4, 2014

aleph-zero Sep 4, 2014

s1monw Sep 18, 2014

aleph-zero Sep 18, 2014

imotov Sep 23, 2014

aleph-zero Oct 7, 2014

imotov Sep 23, 2014

aleph-zero Sep 25, 2014

imotov commented Sep 24, 2014

imotov Sep 24, 2014

imotov Sep 30, 2014

javanna commented Mar 20, 2015

jpountz commented Sep 11, 2015


		final InternalCoordinatorState ics = new InternalCoordinatorState(request, nodeIds, listener);

		ics.onReady = new OnReadyStateChangeListener(ics);

Benchmarks: Re-factored benchmark infra #7602

Benchmarks: Re-factored benchmark infra #7602

Conversation

aleph-zero commented Sep 4, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imotov commented Sep 24, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna commented Mar 20, 2015

jpountz commented Sep 11, 2015