-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for bulk create option #1561
Adds support for bulk create option #1561
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1561 +/- ##
============================================
+ Coverage 94.51% 94.55% +0.04%
- Complexity 1152 1236 +84
============================================
Files 158 162 +4
Lines 3297 3509 +212
Branches 268 290 +22
============================================
+ Hits 3116 3318 +202
- Misses 128 129 +1
- Partials 53 62 +9
Help us with your feedback. Take ten seconds to tell us how you rate us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for making this contribution. I had a couple comments for some general code improvements, but nothing major.
if (bulkRequest.getOperationsCount() > 0) { | ||
flushBatch(bulkRequest); | ||
} else { | ||
throw new RuntimeException("Invalid action: " + action); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never have to throw a RuntimeException for an invalid configuration. I think it would be best to create an enum for the possible BulkActions
, and to have an IllegalArgumentException
be thrown from the IndexConfiguration
class when trying to convert the action
String passed by the user to the enum type
@@ -260,6 +273,11 @@ public Builder withIsmPolicyFile(final String ismPolicyFile) { | |||
return this; | |||
} | |||
|
|||
public Builder withAction(final String action) { | |||
this.action = action; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add validation on construction by adding the following line here (given that BulkActions is an enum)
checkArgument(EnumUtils.isValidEnum(BulkActions.class, action), "action must be one of the folllowing: BulkActions.values()")
flushBatch(bulkRequest); | ||
} | ||
|
||
} else if(IndexConfiguration.BULK_ACTION_INDEX.equals(action)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can limit the amount of duplicate code by refactoring the if create, else if index
to look something like this. It would also be nice to move creation of the BulkOperation
to a Factory class with functions of provideBulkOperationForCreate
and provideBulkOperationForIndex
, but that isn't necessary for this PR.
for (final Record<Object> record : records) {
final SerializedJson document = getDocument(record.getData());
final Optional<String> docId = getDocumentIdFromDocument(document);
BulkOperation bulkOperation;
if (action.equals("index")) {
final IndexOperation.Builder<Object> indexOperationBuilder = new IndexOperation.Builder<>()
.index(indexManager.getIndexAlias());
.document(document);
if (docId.isPresent()) {
indexOperationBuilder.id(docId);
}
indexOperationBuilder.build();
bulkOperation = new BulkOperation.Builder()
.index(indexOperationBuilder.build())
.build();
} else if (action.equals("create")) {
final CreateOperation.Builder<Object> createOperationBuilder = new CreateOperation.Builder<>()
.index(indexManager.getIndexAlias());
.document(document);
if (docId.isPresent()) {
createOperationBuilder.id(docId);
}
createOperationBuilder.build();
bulkOperation = new BulkOperation.Builder()
.create(createOperationsBuilder.build())
.build();
}
final long estimatedBytesBeforeAdd = bulkRequest.estimateSizeInBytesWithDocument(bulkOperation);
if (bulkSize >= 0 && estimatedBytesBeforeAdd >= bulkSize && bulkRequest.getOperationsCount() > 0) {
flushBatch(bulkRequest);
bulkRequest = bulkRequestSupplier.get();
}
bulkRequest.addOperation(bulkOperation);
}
// Flush the remaining requests
if (bulkRequest.getOperationsCount() > 0) {
flushBatch(bulkRequest);
}
}
}
private Optional<String> getDocumentIdFromDocument(final SerializedJson document) {
final Map documentAsMap;
try {
documentAsMap = objectMapper.readValue(document.getSerializedJson(), Map.class);
} catch (IOException e) {
throw new RuntimeException(e);
}
if (documentAsMap != null) {
final String docId = (String) documentAsMap.get(documentIdField);
if (docId != null) {
return Optional.of(docId);
}
}
return Optional.empty();
}
Awesome - thanks! I will get those addressed shortly. |
@graytaylor0 Thanks a lot for those comments. Everything made sense. I pushed changes up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jzonthemtn for this contribution! I have one request regarding the naming in the pipeline configurations.
...pensearch/src/main/java/com/amazon/dataprepper/plugins/sink/opensearch/bulk/BulkActions.java
Outdated
Show resolved
Hide resolved
Looks like I messed up in there by merging from |
Thanks for making those changes! The best way to fix the DCO will be to rebase your branch against main with
|
Signed-off-by: jzonthemtn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jzonthemtn for this contribution! And thank you for the quick responses to our feedback.
@jzonthemtn Any reason this was closed? If it was an accident we can reopen and merge. |
No, I don't know why. I can only guess I was in the wrong issue. Sorry for my clumsiness. |
…ct#1561) Signed-off-by: jzonthemtn <[email protected]>
Description
Adds support for bulk create option
Issues Resolved
#1457 Create-only actions in OpenSearch bulk requests
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.