Implement opensearch index partition creation supplier and PitWorker without processing indices #2821

graytaylor0 · 2023-06-03T07:07:03Z

Description

Implements the OpenSearchIndexPartitionSupplier, which will list all indices of the source cluster and filter them by the include and exclude regex patterns.

Also starts implementation for PitWorker to use the scheduling configuration with source coordinator without processing the object. When calling getNextPartition returns empty, the PitWorker will back off and retry in a fixed 30 seconds. The code added for PitWorker here would be essentially the same for ScrollWorker (the interactions with source coordinator will be very similar.

OpenSearchService now uses a scheduled executor service instead of spawning new threads directly. This allows scheduling to start at the start_time. The stop method will let the SearchWorker future know to not grab another partition, and will get 30 seconds to complete processing of the index before the executor service shuts it down completely. This means that if 30 seconds is not enough time to finish processing the current partition, duplicate data processing will occur (unless there is a state that can track where to pick up from, at which point the SearchWorker could saveState and give up the partition gracefully)

Tested and expected partitions are created and "processed" successfully

Issues Resolved

Related to #1985

Check List

New functionality includes testing.
New functionality has been documented.
- New functionality has javadoc added
Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…without processing indices Signed-off-by: Taylor Gray <[email protected]>

cmanning09

It looks fundamentally correct to me. I am offering a few suggestions to improve the code. Take them or leave them.

cmanning09 · 2023-06-05T17:01:38Z

...ce/src/main/java/org/opensearch/dataprepper/plugins/source/opensearch/OpenSearchService.java

+        final long waitTimeBeforeStartMillis = startTime.toEpochMilli() - Instant.now().toEpochMilli() < 0 ? 0L :
+                startTime.toEpochMilli() - Instant.now().toEpochMilli();
+
+        LOG.info("The opensearch source will start processing data at {}. It is currently {}", startTime, Instant.now().toString());


Is the .toString() redundant?

cmanning09 · 2023-06-05T17:04:16Z

...h/dataprepper/plugins/source/opensearch/worker/OpenSearchIndexPartitionCreationSupplier.java

+            return false;
+        }
+
+        final IndexParametersConfiguration indexParametersConfiguration = openSearchSourceConfiguration.getIndexParametersConfiguration();


Take or leave it: This and other configurations values can be saved as a class variable instead of creating a new local variable for every index filter. These values are always the same.

cmanning09 · 2023-06-05T17:06:10Z

...h/dataprepper/plugins/source/opensearch/worker/OpenSearchIndexPartitionCreationSupplier.java

+
+        final IndexParametersConfiguration indexParametersConfiguration = openSearchSourceConfiguration.getIndexParametersConfiguration();
+
+        if (Objects.isNull(openSearchSourceConfiguration.getIndexParametersConfiguration())) {


Can be simplified to: if (Objects.isNull(indexParametersConfiguration)) {

cmanning09 · 2023-06-05T17:16:43Z

...h/dataprepper/plugins/source/opensearch/worker/OpenSearchIndexPartitionCreationSupplier.java

+                .collect(Collectors.toList());
+    }
+
+    private boolean isIndexIncludedInOneOfTheIncludePatternsAndNotExcludedInAnExcludePattern(final IndicesRecord indicesRecord) {


Take or leave it: A simpler name and one that would extend well if we added other filtering options could be:

shouldIndexBeProcessed

isIndexSelected

The current method name is very long and a little too specific. I am open to other ideas as well.

I like shouldIndexBeProcessed

cmanning09 · 2023-06-05T18:23:05Z

...h/dataprepper/plugins/source/opensearch/worker/OpenSearchIndexPartitionCreationSupplier.java

+        for (final OpenSearchIndex index : includedIndices) {
+            final Matcher matcher = index.getIndexNamePattern().matcher(indicesRecord.index());
+
+            if (matcher.matches()) {
+                matchesIncludedPattern = true;
+                break;
+            }
+        }
+
+        boolean matchesExcludePattern = false;
+
+        for (final OpenSearchIndex index : excludedIndices) {
+            final Matcher matcher = index.getIndexNamePattern().matcher(indicesRecord.index());
+
+            if (matcher.matches()) {
+                matchesExcludePattern = true;
+                break;
+            }
+        }


You can eliminate code duplication, variable re-assignment and breaks by pull this into a separate function:

pubic boolean doesIndexMatchPattern(final List<OpenSearchIndex> indices, final IndicesRecord indicesRecord) { for (final OpenSearchIndex index : excludedIndices) { final Matcher matcher = index.getIndexNamePattern().matcher(indicesRecord.index()); if (matcher.matches()) { return true; } } return false; }

Then you can do something simple like:

final boolean matchesIncludedPattern = includedIndices.isEmpty() ? true : doesIndexMatchPattern(includedIndices, indicesRecord); final boolean matchesExcludePattern = doesIndexMatchPattern(excludedIndices, indicesRecord); return matchesIncludedPattern && !matchesExcludePattern;

Signed-off-by: Taylor Gray <[email protected]>

asifsmohammed · 2023-06-06T22:31:55Z

...er/plugins/source/opensearch/worker/client/OpenSearchIndexPartitionCreationSupplierTest.java

+
+        final List<PartitionIdentifier> partitionIdentifierList = createObjectUnderTest().apply(Collections.emptyMap());
+
+        assertThat(partitionIdentifierList, notNullValue());


nit: I think we should verify number of partitions here.

asifsmohammed · 2023-06-06T22:39:36Z

...rg/opensearch/dataprepper/plugins/source/opensearch/worker/client/ElasticsearchAccessor.java

@@ -56,4 +56,9 @@ public SearchScrollResponse searchWithScroll(SearchScrollRequest searchScrollReq
    public void deleteScroll(DeleteScrollRequest deleteScrollRequest) {
        //todo: implement
    }
+
+    @Override
+    public Object getClient() {


Why do we implement ClusterClientFactory here if we are returning null?

The ElasticSearch accessor is not implemented yet

…without processing indices (opensearch-project#2821) Implement opensearch index partition creation supplier and PitWorker without processing indices Signed-off-by: Taylor Gray <[email protected]> Signed-off-by: Marcos_Gonzalez_Mayedo <[email protected]>

Implement opensearch index partition creation supplier and PitWorker …

111ab30

…without processing indices Signed-off-by: Taylor Gray <[email protected]>

graytaylor0 requested review from chenqi0805, engechas, dinujoh, kkondaka, cmanning09, asifsmohammed, dlvenable and oeyh as code owners June 3, 2023 07:07

cmanning09 previously approved these changes Jun 5, 2023

View reviewed changes

graytaylor0 dismissed cmanning09’s stale review via ca6b874 June 5, 2023 18:16

cmanning09 reviewed Jun 5, 2023

View reviewed changes

graytaylor0 force-pushed the PartitionSupplier branch from ca6b874 to 71f86d1 Compare June 5, 2023 18:23

Address PR comments

05624cf

Signed-off-by: Taylor Gray <[email protected]>

graytaylor0 force-pushed the PartitionSupplier branch from 71f86d1 to 05624cf Compare June 5, 2023 18:26

cmanning09 approved these changes Jun 5, 2023

View reviewed changes

asifsmohammed approved these changes Jun 6, 2023

View reviewed changes

graytaylor0 merged commit 10d3984 into opensearch-project:main Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement opensearch index partition creation supplier and PitWorker without processing indices #2821

Implement opensearch index partition creation supplier and PitWorker without processing indices #2821

graytaylor0 commented Jun 3, 2023 •

edited

Loading

cmanning09 left a comment

cmanning09 Jun 5, 2023

cmanning09 Jun 5, 2023

cmanning09 Jun 5, 2023

cmanning09 Jun 5, 2023

graytaylor0 Jun 5, 2023

cmanning09 Jun 5, 2023

asifsmohammed Jun 6, 2023

asifsmohammed Jun 6, 2023

graytaylor0 Jun 6, 2023


		final IndexParametersConfiguration indexParametersConfiguration = openSearchSourceConfiguration.getIndexParametersConfiguration();

		if (Objects.isNull(openSearchSourceConfiguration.getIndexParametersConfiguration())) {


		final List<PartitionIdentifier> partitionIdentifierList = createObjectUnderTest().apply(Collections.emptyMap());

		assertThat(partitionIdentifierList, notNullValue());

Implement opensearch index partition creation supplier and PitWorker without processing indices #2821

Implement opensearch index partition creation supplier and PitWorker without processing indices #2821

Conversation

graytaylor0 commented Jun 3, 2023 • edited Loading

Description

Issues Resolved

Check List

cmanning09 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graytaylor0 commented Jun 3, 2023 •

edited

Loading