Concurrent Searching (Experimental) #1500

reta · 2021-11-03T14:25:31Z

Signed-off-by: Andriy Redko [email protected]

Description

Allows to use experimental Apache Lucene low-level API which allows to parallelize execution of the search across segment.

Implement multi-collectors query context (possibly, needs API change to use CollectorManagers instead)

Implement profilers support

The query profiling is supported but it has some limitations and differences with respect to naming. The profilers capture the time each collection type has taken and return it as the cumulative summary. The fact that segments may have been searched concurrently is not reflected (it requires significant overhauling of the profiling implementation), so the "real" perceived time may be smaller then profilers will report. The collector names are replaced with collector manager names (fe MinimumScoreCollector vs MinimumCollectorManager).

To compare, here is the usual search profile

"collector" : [
   {
     "name" : "MinimumScoreCollector",
     "reason" : "search_min_score",
     "time_in_nanos" : 4702942,
     "children" : [
       {
         "name" : "SimpleTopScoreDocCollector",
         "reason" : "search_top_hits",
         "time_in_nanos" : 3450252
       }
     ]
   }
 ]

vs the one where concurrent segment search is enabled

"collector" : [
  {
    "name" : "MinimumCollectorManager",
    "reason" : "search_min_score",
    "time_in_nanos" : 74827,
    "children" : [
      {
        "name" : "SimpleTopDocsCollectorManager",
        "reason" : "search_top_hits",
        "time_in_nanos" : 98044
      }
    ]
  }
]

Additionally, since TopDocs are merged, the query's shardIndex field is also populated, usual search profile has it set to -1, for example:

"query" : [
  {
    "type" : "ConstantScoreQuery",
    "description" : "ConstantScore(SearchAfterSortedDocQuery(sort=<int: \"rank\">, afterDoc=doc=287 score=NaN shardIndex=-1 fields=[-2062713761]))",
    "time_in_nanos" : 1009712,
    ...
  }
]

vs the one where concurrent segment search is enabled

"query" : [
  {
    "type" : "ConstantScoreQuery",
    "description" : "ConstantScore(SearchAfterSortedDocQuery(sort=<int: \"rank\">, afterDoc=doc=287 score=NaN shardIndex=1 fields=[-2062713761]))",
    "time_in_nanos" : 35667,
      ....
  }
]

Implement early termination and timeout support

The forced early termination is simulated (without abrupt search termination) using Apache Lucene normal collection termination mechanism. The number of results is reduced to the desired one (if necessary) at the post search step. Essentially, to summarize, when early termination is requested and concurrent segment search is enabled, be the engine is going to do more work and (very likely) come up with more search hits than have been asked.

If query finished by timeout, no results are being returned, even if returning partial results is acceptable. The reason for that exception propagation mechanism does not leave a chance for reducers to execute.
Add benchmarks suite

Splitting into subtasks:

Extract changes related to profiler support (Concurrent Searching (Experimental): modify profiling implementation to support concurrent data collection #1673)
Extract changes related to query context support
Move the concurrent search implementation to sandbox

Future Enhancements:

Issues Resolved

Fixes #1286

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

opensearch-ci-bot · 2021-11-03T14:26:02Z

Can one of the admins verify this patch?

opensearch-ci-bot · 2021-11-03T14:28:24Z

✅ Gradle Wrapper Validation success 8a4cda05609f86455b7d70be9220234e7af82f7c

reta · 2021-11-03T14:33:38Z

@anasalkouz still work in progress, but large chunk of the work has been done already, would be great to hear your thoughts, thank you.

opensearch-ci-bot · 2021-11-03T14:35:35Z

✅ Gradle Precommit success 8a4cda05609f86455b7d70be9220234e7af82f7c

opensearch-ci-bot · 2021-11-03T14:41:53Z

❌ Gradle Check failure 8a4cda05609f86455b7d70be9220234e7af82f7c
Log 969

Reports 969

opensearch-ci-bot · 2021-11-03T18:28:25Z

✅ Gradle Wrapper Validation success 7119aefcf48eba8c261cce88c86785047d51799b

opensearch-ci-bot · 2021-11-03T18:35:47Z

✅ Gradle Precommit success 7119aefcf48eba8c261cce88c86785047d51799b

opensearch-ci-bot · 2021-11-03T18:44:18Z

❌ Gradle Check failure 7119aefcf48eba8c261cce88c86785047d51799b
Log 970

Reports 970

opensearch-ci-bot · 2021-11-03T21:08:21Z

✅ Gradle Wrapper Validation success 2afb9feabf4377a7e9e7c9fbe99c670f8d426b5b

opensearch-ci-bot · 2021-11-03T21:15:36Z

✅ Gradle Precommit success 2afb9feabf4377a7e9e7c9fbe99c670f8d426b5b

opensearch-ci-bot · 2021-11-03T21:26:31Z

❌ Gradle Check failure 2afb9feabf4377a7e9e7c9fbe99c670f8d426b5b
Log 971

Reports 971

opensearch-ci-bot · 2021-11-04T17:37:58Z

✅ Gradle Wrapper Validation success e149955cd0fc472e3a69a317401b8916b6016317

opensearch-ci-bot · 2021-11-04T17:40:54Z

❌ Gradle Precommit failure e149955cd0fc472e3a69a317401b8916b6016317
Log 1512

opensearch-ci-bot · 2021-11-04T17:50:00Z

❌ Gradle Check failure e149955cd0fc472e3a69a317401b8916b6016317
Log 981

Reports 981

opensearch-ci-bot · 2021-11-04T18:28:17Z

✅ Gradle Wrapper Validation success 0445cca10fe47c28cbb7e0f951eebded76df6fdd

opensearch-ci-bot · 2021-11-04T18:35:55Z

✅ Gradle Precommit success 0445cca10fe47c28cbb7e0f951eebded76df6fdd

opensearch-ci-bot · 2021-11-04T18:43:59Z

❌ Gradle Check failure 0445cca10fe47c28cbb7e0f951eebded76df6fdd
Log 982

Reports 982

opensearch-ci-bot · 2021-11-04T20:09:55Z

✅ Gradle Wrapper Validation success e0dbd38fdfa94699ca5ff66df25bfd42d8664098

opensearch-ci-bot · 2021-11-04T20:18:31Z

✅ Gradle Precommit success e0dbd38fdfa94699ca5ff66df25bfd42d8664098

opensearch-ci-bot · 2021-11-04T20:22:55Z

❌ Gradle Check failure e0dbd38fdfa94699ca5ff66df25bfd42d8664098
Log 984

Reports 984

opensearch-ci-bot · 2021-11-05T21:02:23Z

✅ Gradle Wrapper Validation success c8f34e7d23959f775e8e050bb18a8d941daa175f

opensearch-ci-bot · 2021-11-05T21:09:14Z

✅ Gradle Precommit success c8f34e7d23959f775e8e050bb18a8d941daa175f

opensearch-ci-bot · 2021-11-05T21:42:27Z

✅ Gradle Check success c8f34e7d23959f775e8e050bb18a8d941daa175f
Log 995

Reports 995

reta · 2021-11-09T14:09:51Z

Preliminary macro benchmarks:

Using rally with pmc track:

rally race --track pmc --target-hosts localhost:9200 --pipeline benchmark-only --track-params="force_merge_max_num_segments:20"   --kill-running-processes

Note: desired throughput for search tasks (default, term, phrase) has been increased from 20 ops/s to 50 ops/s. The benchmarks were running on the same node as Opensearch server.

Opensearch configuration: 1 node, -Xmx4g, all other JVM options kept unchanged

Hardware: i7 10th gen (12 cores), Ubuntu 21.10

Side by side Comparison

Metric	Task	#1 seq	#2 seq	#1 concur	#2 concur	Unit
Segment count		248	240	262	265
Min Throughput	default	49.54	49.54	49.41	49.42	ops/s
Mean Throughput	default	49.59	49.59	49.48	49.48	ops/s
Median Throughput	default	49.59	49.59	49.48	49.49	ops/s
Max Throughput	default	49.64	49.64	49.54	49.54	ops/s
50th percentile latency	default	8.01098	9.04569	6.98968	6.67651	ms
90th percentile latency	default	12.0628	17.3791	10.4292	11.063	ms
99th percentile latency	default	21.9861	32.1529	17.705	12.3328	ms
100th percentile latency	default	30.4458	36.6906	24.7913	12.392	ms
50th percentile service time	default	6.634	7.88215	5.67168	5.32377	ms
90th percentile service time	default	10.7652	14.7414	8.47473	9.57959	ms
99th percentile service time	default	16.0205	17.5388	12.6657	10.8485	ms
100th percentile service time	default	29.1359	35.5346	23.4873	11.27	ms
error rate	default	0	0	0	0	%
Min Throughput	term	49.83	49.69	49.86	49.82	ops/s
Mean Throughput	term	49.85	49.72	49.88	49.84	ops/s
Median Throughput	term	49.85	49.73	49.88	49.84	ops/s
Max Throughput	term	49.86	49.76	49.9	49.86	ops/s
50th percentile latency	term	10.3155	9.07136	7.7421	7.84862	ms
90th percentile latency	term	13.0895	10.3129	9.42092	10.5664	ms
99th percentile latency	term	23.3423	17.8859	27.9807	21.3598	ms
100th percentile latency	term	28.1488	25.9053	31.4844	31.0409	ms
50th percentile service time	term	9.10882	8.0295	6.21632	6.66419	ms
90th percentile service time	term	11.5806	8.76473	7.94414	8.36192	ms
99th percentile service time	term	15.5951	14.6678	15.1197	14.1959	ms
100th percentile service time	term	15.7594	23.4487	23.4279	30.0025	ms
error rate	term	0	0	0	0	%
Min Throughput	phrase	49.81	49.78	49.79	49.77	ops/s
Mean Throughput	phrase	49.83	49.8	49.81	49.8	ops/s
Median Throughput	phrase	49.84	49.8	49.82	49.81	ops/s
Max Throughput	phrase	49.85	49.83	49.83	49.83	ops/s
50th percentile latency	phrase	6.97764	6.50972	5.8718	5.91395	ms
90th percentile latency	phrase	8.76465	8.82422	8.03951	7.22674	ms
99th percentile latency	phrase	16.792	14.9502	11.5943	12.4171	ms
100th percentile latency	phrase	28.3097	29.4971	21.9315	12.977	ms
50th percentile service time	phrase	5.61003	5.35381	4.51923	4.7856	ms
90th percentile service time	phrase	7.38752	7.42578	6.4436	5.31315	ms
99th percentile service time	phrase	11.0224	9.91337	9.79194	10.3351	ms
100th percentile service time	phrase	26.4146	27.7785	20.5301	11.4582	ms
error rate	phrase	0	0	0	0	%

The index_searcher thread pool stats (averaged):

"index_searcher" : {
    "threads" : 12,
    "queue" : 0,
    "active" : 0,
    "rejected" : 0,
    "largest" : 12,
    "completed" : 519319
}

dblock · 2022-03-23T20:52:44Z

We could merge this for 3.0, wdyt @nknize?

dblock · 2022-03-23T20:53:08Z

@andrross another CR please?

Signed-off-by: Andriy Redko <[email protected]>

opensearch-ci-bot · 2022-03-23T21:31:35Z

❌ Gradle Check failure 491a78f
Log 3716

Reports 3716

andrross · 2022-03-23T21:46:26Z

Test failure #2442

start gradle check

andrross · 2022-03-23T21:51:35Z

start gradle check

...oncurrent-search/src/main/java/org/opensearch/search/query/ConcurrentQueryPhaseSearcher.java

server/src/main/java/org/opensearch/search/query/EarlyTerminatingListener.java

nknize · 2022-03-23T22:01:12Z

We could merge this for 3.0, wdyt @nknize?

Since we're targeting sandbox we can merge in 2.x and don't have to wait for 3.0.

Also, the 2.0 branch is cut (as well as the 2.x branch). So when this is ready we can merge to main and backport to 2.x without having to wait for the version bump.

Signed-off-by: Andriy Redko <[email protected]>

opensearch-ci-bot · 2022-03-23T22:28:00Z

✅ Gradle Check success 491a78f
Log 3722

Reports 3722

opensearch-ci-bot · 2022-03-23T22:35:20Z

✅ Gradle Check success 491a78f
Log 3723

Reports 3723

opensearch-ci-bot · 2022-03-23T23:06:50Z

✅ Gradle Check success 513706d
Log 3724

Reports 3724

dblock · 2022-03-24T18:21:11Z

We could merge this for 3.0, wdyt @nknize?

Since we're targeting sandbox we can merge in 2.x and don't have to wait for 3.0.

Also, the 2.0 branch is cut (as well as the 2.x branch). So when this is ready we can merge to main and backport to 2.x without having to wait for the version bump.

I merged and marked to backport 2.x.

* Concurrent Searching (Experimental) Signed-off-by: Andriy Redko <[email protected]> * Addressingf code review comments Signed-off-by: Andriy Redko <[email protected]> (cherry picked from commit b6ca0d1)

nknize · 2022-03-24T18:33:12Z

🎉

reta · 2022-03-24T18:43:56Z

@dblock @nknize @andrross thanks guys! I will create followup improvement issue to cover aggregations and early termination

andrross · 2022-03-24T18:54:36Z

Should we create a meta issue to listing all the things needing to be done to promote it out of the sandbox?

reta · 2022-03-24T18:55:16Z

Should we create a meta issue to listing all the things needing to be done to promote it out of the sandbox?

👍 , I will do that shortly

reta · 2022-03-24T19:17:08Z

#2587

Signed-off-by: Andriy Redko <[email protected]> (cherry picked from commit b6ca0d1)

reta force-pushed the issue-1286.poc branch from 8a4cda0 to 7119aef Compare November 3, 2021 18:26

reta force-pushed the issue-1286.poc branch from 7119aef to 2afb9fe Compare November 3, 2021 21:06

reta force-pushed the issue-1286.poc branch from 2afb9fe to e149955 Compare November 4, 2021 17:36

reta force-pushed the issue-1286.poc branch from e149955 to 0445cca Compare November 4, 2021 18:25

reta force-pushed the issue-1286.poc branch from 0445cca to e0dbd38 Compare November 4, 2021 20:04

reta force-pushed the issue-1286.poc branch from e0dbd38 to c8f34e7 Compare November 5, 2021 20:59

Concurrent Searching (Experimental)

491a78f

Signed-off-by: Andriy Redko <[email protected]>

reta force-pushed the issue-1286.poc branch from 2cff964 to 491a78f Compare March 23, 2022 21:01

andrross reviewed Mar 23, 2022

View reviewed changes

...oncurrent-search/src/main/java/org/opensearch/search/query/ConcurrentQueryPhaseSearcher.java Outdated Show resolved Hide resolved

server/src/main/java/org/opensearch/search/query/EarlyTerminatingListener.java Outdated Show resolved Hide resolved

Addressingf code review comments

513706d

Signed-off-by: Andriy Redko <[email protected]>

andrross approved these changes Mar 23, 2022

View reviewed changes

dblock merged commit b6ca0d1 into opensearch-project:main Mar 24, 2022

dblock added the backport 2.x Backport to 2.x branch label Mar 24, 2022

opensearch-trigger-bot bot mentioned this pull request Mar 24, 2022

[Backport 2.x] Concurrent Searching (Experimental) #2584

Merged

nknize pushed a commit that referenced this pull request Mar 25, 2022

Concurrent Searching (Experimental) (#1500) (#2584)

abd5ce7

Signed-off-by: Andriy Redko <[email protected]> (cherry picked from commit b6ca0d1)

sohami mentioned this pull request Apr 14, 2022

[RFC] Searchable Remote Index #2900

Closed

dblock mentioned this pull request Apr 14, 2022

Adding @reta to OpenSearch maintainers. #2905

Merged

1 task

sohami mentioned this pull request Mar 28, 2023

[Discuss] Move concurrent-search out of sandbox plugin to core under feature flag #6859

Closed

sohami mentioned this pull request May 2, 2023

[Concurrent Segment Search] Query profile stats with concurrent execution of query phase #7355

Closed

LifeIsStrange mentioned this pull request Oct 29, 2023

Enable query phase parallelism by default elastic/elasticsearch#101230

Merged

Concurrent Searching (Experimental) #1500

Concurrent Searching (Experimental) #1500

Conversation

reta commented Nov 3, 2021 • edited Loading

Description

Issues Resolved

Check List

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

reta commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 3, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 4, 2021

opensearch-ci-bot commented Nov 5, 2021

opensearch-ci-bot commented Nov 5, 2021

opensearch-ci-bot commented Nov 5, 2021

reta commented Nov 9, 2021

Preliminary macro benchmarks:

Side by side Comparison

dblock commented Mar 23, 2022

dblock commented Mar 23, 2022

opensearch-ci-bot commented Mar 23, 2022

andrross commented Mar 23, 2022

andrross commented Mar 23, 2022

nknize commented Mar 23, 2022 • edited Loading

opensearch-ci-bot commented Mar 23, 2022

opensearch-ci-bot commented Mar 23, 2022

opensearch-ci-bot commented Mar 23, 2022

dblock commented Mar 24, 2022

nknize commented Mar 24, 2022

reta commented Mar 24, 2022

andrross commented Mar 24, 2022

reta commented Mar 24, 2022

reta commented Mar 24, 2022

reta commented Nov 3, 2021 •

edited

Loading

nknize commented Mar 23, 2022 •

edited

Loading