-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opensearch Sink: refactor all index related operations into IndexManager classes for easier future extension #361
Conversation
…ger classes for easier future extension Signed-off-by: Han Jiang <[email protected]>
private IndexManagerFactory indexManagerFactory; | ||
private IndexManager indexManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
restHighLevelClient.indices().putTemplate(putIndexTemplateRequest, RequestOptions.DEFAULT); | ||
} | ||
|
||
public abstract Optional<String> checkAndCreatePolicy() throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for refactoring the Index creation.
I am concerned the proposed architecture does not scale well. The logic for supporting ISM for TraceAnalyticsRawIndex is not unique an generally defines how we should create an ISM index. If we are to add support for ISM as part of log ingestion or custom pipelines we may end up nearly duplicating the code defined in the TraceAnalyticsRawIndexManager with the exceptions of the constants used in that class.
I would like to propose a single index manager that leverages an enum type. The enums can define whether or not they support ISM.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat along the same lines, I suggested a composition approach.
We could have enums which provide the correct composition as well if we want to continue down the enum path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions. Thanks, guys. I have moved ISM related methods to IsmPolicyManagementStrategy:
public interface IsmPolicyManagementStrategy {
Optional<String> checkAndCreatePolicy() throws IOException;
List<String> getIndexPatterns(final String indexAlias);
boolean checkIfIndexExistsOnServer(final String indexAlias) throws IOException;
CreateIndexRequest getCreateIndexRequest(final String indexAlias);
}
public class IsmPolicyManagement implements IsmPolicyManagementStrategy {.....}
public class NoIsmPolicyManagement implements IsmPolicyManagementStrategy {.......}
@@ -17,6 +17,6 @@ | |||
], | |||
"ism_template": { | |||
"index_patterns": ["otel-v1-apm-span-*"] | |||
} | |||
} //7.12 or 7.13 or later needs this field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON doesn't support comments. This needs to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
import java.util.List; | ||
import java.util.Optional; | ||
|
||
public class TraceAnalyticsRawIndexManager extends IndexManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that rather than using inheritance here, we should use composition. That is, the IndexManager
should take in some sort of "Index Strategy" (I think there has to be a better name here).
This class could then become the "IsmIndexStrategy" and the methods from the base class which are overridden here could be part of a "SingleIndexStrategy". Both strategies would have code that can be re-used.
Then the "IsmIndexStrategy" will need to take in customization parameters like the index alias, the policy names, and templates. I'll note some of the configuration parameters below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David. Your suggestions are incorporated in the next commit.
@Override | ||
public Optional<String> checkAndCreatePolicy() throws IOException { | ||
// TODO: replace with new _opensearch API | ||
final String endPoint = "/_opendistro/_ism/policies/" + IndexConstants.RAW_ISM_POLICY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAW_ISM_POLICY -> This can be a configuration parameter for IsmIndexStrategy
(see my comment above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David. Your suggestions are incorporated in the next commit.
public Optional<String> checkAndCreatePolicy() throws IOException { | ||
// TODO: replace with new _opensearch API | ||
final String endPoint = "/_opendistro/_ism/policies/" + IndexConstants.RAW_ISM_POLICY; | ||
Request request = createPolicyRequestFromFile(endPoint, IndexConstants.RAW_ISM_FILE_WITH_ISM_TEMPLATE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAW_ISM_FILE_WITH_ISM_TEMPLATE -> This can be a configuration parameter for IsmIndexStrategy
(see my comment above).
} catch (ResponseException e1) { | ||
final String msg = e1.getMessage(); | ||
if (msg.contains("Invalid field: [ism_template]")) { | ||
request = createPolicyRequestFromFile(endPoint, IndexConstants.RAW_ISM_FILE_NO_ISM_TEMPLATE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAW_ISM_FILE_NO_ISM_TEMPLATE -> This can be a configuration parameter for IsmIndexStrategy
(see my comment above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David. Your suggestions are incorporated in the next commit.
throw e2; | ||
} | ||
} | ||
return Optional.of(IndexConstants.RAW_ISM_POLICY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAW_ISM_POLICY -> This can be a configuration parameter for IsmIndexStrategy
(see my comment above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David. Your suggestions are incorporated in the next commit.
|
||
@Override | ||
protected CreateIndexRequest getCreateIndexRequest(final String indexAlias) { | ||
final String initialIndexName = indexAlias + "-000001"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"-000001" -> This could also be a configuration parameter for IsmIndexStrategy
(see my comment above). But, I can see that this may also be re-used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In next commit, moved the string, "-000001", to DEFAULT_INDEX_SUFFIX. Currently, we only have this single index suffix. In the future, when we have more sophisticated patterns, we will pass the pattern around.
restHighLevelClient.indices().putTemplate(putIndexTemplateRequest, RequestOptions.DEFAULT); | ||
} | ||
|
||
public abstract Optional<String> checkAndCreatePolicy() throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat along the same lines, I suggested a composition approach.
We could have enums which provide the correct composition as well if we want to continue down the enum path.
when(indicesClient.getIndexTemplate(any(GetIndexTemplatesRequest.class), any())).thenReturn(getIndexTemplatesResponse); | ||
|
||
try { | ||
defaultIndexManager.checkAndCreateIndexTemplate(false, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use assertThrows()
. If this doesn't throw then all of your verify statements below will not run. And thus you can't be sure it is testing what it should be testing. There shouldn't be a need to have a try-catch in tests anymore.
Please check elsewhere for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
when(restHighLevelClient.indices()).thenReturn(indicesClient); | ||
} | ||
|
||
@Test(expected = NullPointerException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this test to JUnit 5 and use assertThrows
rather than expected
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…attern Signed-off-by: Han Jiang <[email protected]>
import com.amazon.dataprepper.plugins.sink.opensearch.index.ismpolicy.NoIsmPolicyManagement; | ||
import org.opensearch.client.RestHighLevelClient; | ||
|
||
public class DefaultIndexManager extends IndexManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this level of abstraction? Is the lack of dependecy injection in this project impacting eliminating this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if there is a DI framework like Guice, it may help eliminate children classes. There are other reasons I like have this inheritance:
- It allows us to clearly separately instantiate an IndexManager class of specific purpose.
- Because we have no.1, we can separate the unit test for different index types into different unit test classes.
- It would be clearer and time-saving for users of index manager to figure out which setup to use giving an index type. A programatic user of index manage (in my implementation, it is the index manager factory) also can have less to worry about and have a simpler implementation.
- It is easier for runtime debugging. It's easier to figure out which index type the index manager instance is for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping these implementations of IndexManager
is still important. The OpenSearch plugin defines what each of these types is via an enum which is configured in the pipeline. This strong correspondence between the enum and a known setup for the index lends itself well to these classes (DefaultIndexManager, TraceAnalyticsManager).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revist this when we have injection. I believe all of this could be achieved with injection via a map of ENUMs to different implementations of an IndeManger. I believe this is a simplier design and reduces our reliance on inheritence. This does not limit our ability to test the different configuration options nor should it decrease runtime debugging or increase time for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the root problem is that these inherited classes are being exposed. From what I can tell of the code, we should be able to make them private static inner classes of the IndexManagerFactory
class. At that point, it matters little whether it has its own inherited classes to expose them, or builds them all manually. It is just an implementation detail of IndexManagerFactory
. I think a quick turnaround PR with this change would be appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Good suggestion, David. I am making the IndexManger
sub-classes private in the factory class. See: #414
I didn't make them static. I have not found being static is necessary.
Please let me know if you have a strong reason to make them static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be static nested classes because they are not tied to the IndexManagerFactory
class instance itself.
From the documentation:
A static nested class interacts with the instance members of its outer class (and other classes) just like any other top-level class. In effect, a static nested class is behaviorally a top-level class that has been nested in another top-level class for packaging convenience. Inner Class and Nested Static Class Example also demonstrates this.
https://docs.oracle.com/javase/tutorial/java/javaOO/nested.html
package com.amazon.dataprepper.plugins.sink.opensearch.index; | ||
|
||
import com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchSinkConfiguration; | ||
import org.junit.After; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using JUnit5 for all new tests. I won't block the PR for this, but please be sure to use JUnit 5/Jupiter going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Currently, the Sink Plugin depends on JUnit4, not Junit5. We can make a pull request to specifically upgrade the JUnit version.
import com.amazon.dataprepper.plugins.sink.opensearch.index.ismpolicy.NoIsmPolicyManagement; | ||
import org.opensearch.client.RestHighLevelClient; | ||
|
||
public class DefaultIndexManager extends IndexManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revist this when we have injection. I believe all of this could be achieved with injection via a map of ENUMs to different implementations of an IndeManger. I believe this is a simplier design and reduces our reliance on inheritence. This does not limit our ability to test the different configuration options nor should it decrease runtime debugging or increase time for users.
Description
Problem: OpenSearch Sink’s source code needs to be refactored to make it easily extensible to support more index types. The existing code has some anti-pattern implementations. For example, in the IndexStateManagement class, if-else statements are used to switch between different index types.
Solution:
Refactoring for Better Extensibility
Create an IndexType enum list which maintains the complete list of supported index types.
public enum IndexType {
TRACE_ANALYTICS_RAW,
TRACE_ANALYTICS_SERVICE_MAP,
CUSTOM; //will be used to handle generic event ingestion, inlcuding log.
}
Create IndexManager interface, which is implemented by subclasses corresponding to index types: TraceAnalyticsRawIndexManager, TraceAnalyticsServiceMapIndexManager, DefaultIndexManager.
Delete IndexStateManagement class. The implementations in IndexStateManagement class will be moved to relevant IndexManager sub-classes. For example, implementations for Raw Trace Analytics in IndexStateManagement will be moved to TraceAnalyticsRawIndexManager.
Move OpenSearchSink:: checkAndCreateIndexTemplate() to corresponding IndexManager sub-classes because its implementation is related to index type.
Add a new IndexManagerFactory which produces an IndexManager instance corresponding to the IndexType input.
Note: This refactoring doesn't change business logic. I.e. behavior remains the same as before.
Update Proposal
This was part of a list of changes proposed in the RFC doc
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.