Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opensearch Sink: refactor all index related operations into IndexManager classes for easier future extension #361

Merged
merged 2 commits into from
Oct 11, 2021

Conversation

jianghancn
Copy link
Contributor

Description

Problem: OpenSearch Sink’s source code needs to be refactored to make it easily extensible to support more index types. The existing code has some anti-pattern implementations. For example, in the IndexStateManagement class, if-else statements are used to switch between different index types.

Solution:

  • Refactoring for Better Extensibility
    Create an IndexType enum list which maintains the complete list of supported index types.
    public enum IndexType {
    TRACE_ANALYTICS_RAW,
    TRACE_ANALYTICS_SERVICE_MAP,
    CUSTOM; //will be used to handle generic event ingestion, inlcuding log.
    }

  • Create IndexManager interface, which is implemented by subclasses corresponding to index types: TraceAnalyticsRawIndexManager, TraceAnalyticsServiceMapIndexManager, DefaultIndexManager.

  • Delete IndexStateManagement class. The implementations in IndexStateManagement class will be moved to relevant IndexManager sub-classes. For example, implementations for Raw Trace Analytics in IndexStateManagement will be moved to TraceAnalyticsRawIndexManager.

  • Move OpenSearchSink:: checkAndCreateIndexTemplate() to corresponding IndexManager sub-classes because its implementation is related to index type.
    Add a new IndexManagerFactory which produces an IndexManager instance corresponding to the IndexType input.

Note: This refactoring doesn't change business logic. I.e. behavior remains the same as before.

Update Proposal

This was part of a list of changes proposed in the RFC doc

Check List

  • [ x] New functionality includes testing.
    • [ x] All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • [ x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ger classes for easier future extension

Signed-off-by: Han Jiang <[email protected]>
Comment on lines 56 to 57
private IndexManagerFactory indexManagerFactory;
private IndexManager indexManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

restHighLevelClient.indices().putTemplate(putIndexTemplateRequest, RequestOptions.DEFAULT);
}

public abstract Optional<String> checkAndCreatePolicy() throws IOException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for refactoring the Index creation.

I am concerned the proposed architecture does not scale well. The logic for supporting ISM for TraceAnalyticsRawIndex is not unique an generally defines how we should create an ISM index. If we are to add support for ISM as part of log ingestion or custom pipelines we may end up nearly duplicating the code defined in the TraceAnalyticsRawIndexManager with the exceptions of the constants used in that class.

I would like to propose a single index manager that leverages an enum type. The enums can define whether or not they support ISM.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat along the same lines, I suggested a composition approach.

We could have enums which provide the correct composition as well if we want to continue down the enum path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions. Thanks, guys. I have moved ISM related methods to IsmPolicyManagementStrategy:

public interface IsmPolicyManagementStrategy {

    Optional<String> checkAndCreatePolicy() throws IOException;

    List<String> getIndexPatterns(final String indexAlias);

    boolean checkIfIndexExistsOnServer(final String indexAlias) throws IOException;

    CreateIndexRequest getCreateIndexRequest(final String indexAlias);
}


public class IsmPolicyManagement implements IsmPolicyManagementStrategy {.....}

public class NoIsmPolicyManagement implements IsmPolicyManagementStrategy {.......}

@@ -17,6 +17,6 @@
],
"ism_template": {
"index_patterns": ["otel-v1-apm-span-*"]
}
} //7.12 or 7.13 or later needs this field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON doesn't support comments. This needs to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

import java.util.List;
import java.util.Optional;

public class TraceAnalyticsRawIndexManager extends IndexManager {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that rather than using inheritance here, we should use composition. That is, the IndexManager should take in some sort of "Index Strategy" (I think there has to be a better name here).

This class could then become the "IsmIndexStrategy" and the methods from the base class which are overridden here could be part of a "SingleIndexStrategy". Both strategies would have code that can be re-used.

Then the "IsmIndexStrategy" will need to take in customization parameters like the index alias, the policy names, and templates. I'll note some of the configuration parameters below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David. Your suggestions are incorporated in the next commit.

@Override
public Optional<String> checkAndCreatePolicy() throws IOException {
// TODO: replace with new _opensearch API
final String endPoint = "/_opendistro/_ism/policies/" + IndexConstants.RAW_ISM_POLICY;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAW_ISM_POLICY -> This can be a configuration parameter for IsmIndexStrategy (see my comment above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David. Your suggestions are incorporated in the next commit.

public Optional<String> checkAndCreatePolicy() throws IOException {
// TODO: replace with new _opensearch API
final String endPoint = "/_opendistro/_ism/policies/" + IndexConstants.RAW_ISM_POLICY;
Request request = createPolicyRequestFromFile(endPoint, IndexConstants.RAW_ISM_FILE_WITH_ISM_TEMPLATE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAW_ISM_FILE_WITH_ISM_TEMPLATE -> This can be a configuration parameter for IsmIndexStrategy (see my comment above).

} catch (ResponseException e1) {
final String msg = e1.getMessage();
if (msg.contains("Invalid field: [ism_template]")) {
request = createPolicyRequestFromFile(endPoint, IndexConstants.RAW_ISM_FILE_NO_ISM_TEMPLATE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAW_ISM_FILE_NO_ISM_TEMPLATE -> This can be a configuration parameter for IsmIndexStrategy (see my comment above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David. Your suggestions are incorporated in the next commit.

throw e2;
}
}
return Optional.of(IndexConstants.RAW_ISM_POLICY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAW_ISM_POLICY -> This can be a configuration parameter for IsmIndexStrategy (see my comment above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David. Your suggestions are incorporated in the next commit.


@Override
protected CreateIndexRequest getCreateIndexRequest(final String indexAlias) {
final String initialIndexName = indexAlias + "-000001";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"-000001" -> This could also be a configuration parameter for IsmIndexStrategy (see my comment above). But, I can see that this may also be re-used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In next commit, moved the string, "-000001", to DEFAULT_INDEX_SUFFIX. Currently, we only have this single index suffix. In the future, when we have more sophisticated patterns, we will pass the pattern around.

restHighLevelClient.indices().putTemplate(putIndexTemplateRequest, RequestOptions.DEFAULT);
}

public abstract Optional<String> checkAndCreatePolicy() throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat along the same lines, I suggested a composition approach.

We could have enums which provide the correct composition as well if we want to continue down the enum path.

when(indicesClient.getIndexTemplate(any(GetIndexTemplatesRequest.class), any())).thenReturn(getIndexTemplatesResponse);

try {
defaultIndexManager.checkAndCreateIndexTemplate(false, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use assertThrows(). If this doesn't throw then all of your verify statements below will not run. And thus you can't be sure it is testing what it should be testing. There shouldn't be a need to have a try-catch in tests anymore.

Please check elsewhere for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

when(restHighLevelClient.indices()).thenReturn(indicesClient);
}

@Test(expected = NullPointerException.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this test to JUnit 5 and use assertThrows rather than expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import com.amazon.dataprepper.plugins.sink.opensearch.index.ismpolicy.NoIsmPolicyManagement;
import org.opensearch.client.RestHighLevelClient;

public class DefaultIndexManager extends IndexManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this level of abstraction? Is the lack of dependecy injection in this project impacting eliminating this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if there is a DI framework like Guice, it may help eliminate children classes. There are other reasons I like have this inheritance:

  1. It allows us to clearly separately instantiate an IndexManager class of specific purpose.
  2. Because we have no.1, we can separate the unit test for different index types into different unit test classes.
  3. It would be clearer and time-saving for users of index manager to figure out which setup to use giving an index type. A programatic user of index manage (in my implementation, it is the index manager factory) also can have less to worry about and have a simpler implementation.
  4. It is easier for runtime debugging. It's easier to figure out which index type the index manager instance is for

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping these implementations of IndexManager is still important. The OpenSearch plugin defines what each of these types is via an enum which is configured in the pipeline. This strong correspondence between the enum and a known setup for the index lends itself well to these classes (DefaultIndexManager, TraceAnalyticsManager).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revist this when we have injection. I believe all of this could be achieved with injection via a map of ENUMs to different implementations of an IndeManger. I believe this is a simplier design and reduces our reliance on inheritence. This does not limit our ability to test the different configuration options nor should it decrease runtime debugging or increase time for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the root problem is that these inherited classes are being exposed. From what I can tell of the code, we should be able to make them private static inner classes of the IndexManagerFactory class. At that point, it matters little whether it has its own inherited classes to expose them, or builds them all manually. It is just an implementation detail of IndexManagerFactory. I think a quick turnaround PR with this change would be appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Good suggestion, David. I am making the IndexManger
sub-classes private in the factory class. See: #414

I didn't make them static. I have not found being static is necessary.
Please let me know if you have a strong reason to make them static.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be static nested classes because they are not tied to the IndexManagerFactory class instance itself.

From the documentation:

A static nested class interacts with the instance members of its outer class (and other classes) just like any other top-level class. In effect, a static nested class is behaviorally a top-level class that has been nested in another top-level class for packaging convenience. Inner Class and Nested Static Class Example also demonstrates this.

https://docs.oracle.com/javase/tutorial/java/javaOO/nested.html

package com.amazon.dataprepper.plugins.sink.opensearch.index;

import com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchSinkConfiguration;
import org.junit.After;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using JUnit5 for all new tests. I won't block the PR for this, but please be sure to use JUnit 5/Jupiter going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Currently, the Sink Plugin depends on JUnit4, not Junit5. We can make a pull request to specifically upgrade the JUnit version.

import com.amazon.dataprepper.plugins.sink.opensearch.index.ismpolicy.NoIsmPolicyManagement;
import org.opensearch.client.RestHighLevelClient;

public class DefaultIndexManager extends IndexManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revist this when we have injection. I believe all of this could be achieved with injection via a map of ENUMs to different implementations of an IndeManger. I believe this is a simplier design and reduces our reliance on inheritence. This does not limit our ability to test the different configuration options nor should it decrease runtime debugging or increase time for users.

@jianghancn jianghancn merged commit 06d53ae into opensearch-project:main Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants