-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Searching (Experimental) #1500
Conversation
Can one of the admins verify this patch? |
✅ Gradle Wrapper Validation success 8a4cda05609f86455b7d70be9220234e7af82f7c |
@anasalkouz still work in progress, but large chunk of the work has been done already, would be great to hear your thoughts, thank you. |
✅ Gradle Precommit success 8a4cda05609f86455b7d70be9220234e7af82f7c |
❌ Gradle Check failure 8a4cda05609f86455b7d70be9220234e7af82f7c |
✅ Gradle Wrapper Validation success 7119aefcf48eba8c261cce88c86785047d51799b |
✅ Gradle Precommit success 7119aefcf48eba8c261cce88c86785047d51799b |
❌ Gradle Check failure 7119aefcf48eba8c261cce88c86785047d51799b |
✅ Gradle Wrapper Validation success 2afb9feabf4377a7e9e7c9fbe99c670f8d426b5b |
✅ Gradle Precommit success 2afb9feabf4377a7e9e7c9fbe99c670f8d426b5b |
❌ Gradle Check failure 2afb9feabf4377a7e9e7c9fbe99c670f8d426b5b |
✅ Gradle Wrapper Validation success e149955cd0fc472e3a69a317401b8916b6016317 |
❌ Gradle Precommit failure e149955cd0fc472e3a69a317401b8916b6016317 |
❌ Gradle Check failure e149955cd0fc472e3a69a317401b8916b6016317 |
✅ Gradle Wrapper Validation success 0445cca10fe47c28cbb7e0f951eebded76df6fdd |
✅ Gradle Precommit success 0445cca10fe47c28cbb7e0f951eebded76df6fdd |
❌ Gradle Check failure 0445cca10fe47c28cbb7e0f951eebded76df6fdd |
✅ Gradle Wrapper Validation success e0dbd38fdfa94699ca5ff66df25bfd42d8664098 |
✅ Gradle Precommit success e0dbd38fdfa94699ca5ff66df25bfd42d8664098 |
❌ Gradle Check failure e0dbd38fdfa94699ca5ff66df25bfd42d8664098 |
✅ Gradle Wrapper Validation success c8f34e7d23959f775e8e050bb18a8d941daa175f |
✅ Gradle Precommit success c8f34e7d23959f775e8e050bb18a8d941daa175f |
✅ Gradle Check success c8f34e7d23959f775e8e050bb18a8d941daa175f |
Preliminary macro benchmarks:Using
Note: desired throughput for search tasks (default, term, phrase) has been increased from 20 ops/s to 50 ops/s. The benchmarks were running on the same node as Opensearch server. Opensearch configuration: 1 node, -Xmx4g, all other JVM options kept unchanged Hardware: i7 10th gen (12 cores), Ubuntu 21.10 Side by side Comparison
The
|
We could merge this for 3.0, wdyt @nknize? |
@andrross another CR please? |
Signed-off-by: Andriy Redko <[email protected]>
Test failure #2442 start gradle check |
start gradle check |
...oncurrent-search/src/main/java/org/opensearch/search/query/ConcurrentQueryPhaseSearcher.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/query/EarlyTerminatingListener.java
Outdated
Show resolved
Hide resolved
Since we're targeting sandbox we can merge in 2.x and don't have to wait for 3.0. Also, the 2.0 branch is cut (as well as the 2.x branch). So when this is ready we can merge to main and backport to 2.x without having to wait for the version bump. |
Signed-off-by: Andriy Redko <[email protected]>
I merged and marked to backport 2.x. |
* Concurrent Searching (Experimental) Signed-off-by: Andriy Redko <[email protected]> * Addressingf code review comments Signed-off-by: Andriy Redko <[email protected]> (cherry picked from commit b6ca0d1)
🎉 |
Should we create a meta issue to listing all the things needing to be done to promote it out of the sandbox? |
👍 , I will do that shortly |
Signed-off-by: Andriy Redko <[email protected]> (cherry picked from commit b6ca0d1)
Signed-off-by: Andriy Redko [email protected]
Description
Allows to use experimental Apache Lucene low-level API which allows to parallelize execution of the search across segment.
Implement multi-collectors query context (possibly, needs API change to use
CollectorManager
s instead)Implement profilers support
The query profiling is supported but it has some limitations and differences with respect to naming. The profilers capture the time each collection type has taken and return it as the cumulative summary. The fact that segments may have been searched concurrently is not reflected (it requires significant overhauling of the profiling implementation), so the "real" perceived time may be smaller then profilers will report. The collector names are replaced with collector manager names (fe
MinimumScoreCollector
vsMinimumCollectorManager
).To compare, here is the usual search profile
vs the one where concurrent segment search is enabled
Additionally, since TopDocs are merged, the query's shardIndex field is also populated, usual search profile has it set to
-1
, for example:vs the one where concurrent segment search is enabled
Implement early termination and timeout support
The forced early termination is simulated (without abrupt search termination) using Apache Lucene normal collection termination mechanism. The number of results is reduced to the desired one (if necessary) at the post search step. Essentially, to summarize, when early termination is requested and concurrent segment search is enabled, be the engine is going to do more work and (very likely) come up with more search hits than have been asked.
If query finished by timeout, no results are being returned, even if returning partial results is acceptable. The reason for that exception propagation mechanism does not leave a chance for reducers to execute.
Add benchmarks suite
Splitting into subtasks:
Future Enhancements:
Issues Resolved
Fixes #1286
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.