Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create-only actions in OpenSearch bulk requests #1457

Closed
dlvenable opened this issue Jun 1, 2022 · 5 comments
Closed

Create-only actions in OpenSearch bulk requests #1457

dlvenable opened this issue Jun 1, 2022 · 5 comments
Assignees
Labels
good first issue Good for newcomers
Milestone

Comments

@dlvenable
Copy link
Member

dlvenable commented Jun 1, 2022

Is your feature request related to a problem? Please describe.

Data Prepper only uses the index action of the Bulk API. In some cases, teams want to create only if the document does not already exist. The OpenSearch _bulk API does support this via a create action.

Describe the solution you'd like

Create a new action property of the OpenSearch sink. The default value will be index (this is the current behavior). Pipeline authors can define this value to be either create or index.

Data Prepper will send bulk requests with all events using the configured action.

If OpenSearch responds that the document could not be created because it exists, this document will be dropped without any retry or DLQ. This is a reasonable expectation for pipeline authors who use create since their goals are to avoid updating existing documents.

Describe alternatives you've considered (Optional)

Data Prepper could key off of values directly in the document to determine if create or index should be used. This would be somewhat more complicated since pipeline authors would need to define conditions and fallbacks. This approach could be added later if needed. But, the proposed solution should be simpler for pipeline authors.

Additional context

In addition to index and create, the Bulk API also supports update and delete. The proposal above would allow us to add these in future iterations.

@dlvenable dlvenable added the good first issue Good for newcomers label Jun 8, 2022
@jzonthemtn
Copy link
Contributor

@dlvenable Happy to take on this one if it's ready to be worked.

@dlvenable
Copy link
Member Author

@jzonthemtn , Yes, this one is ready to work when you are available. I'll assign it to you. Thanks!

@jzonthemtn
Copy link
Contributor

Hi @dlvenable, I'm working on integration tests for this. It looks like a running OpenSearch is required for the OpenSearchSinkIT tests. The OpenSearchIntegrationHelper gets the host names via the tests.opensearch.host system property. I just wanted to verify my understanding here is correct before getting too deep into it.

@dlvenable
Copy link
Member Author

Thanks @jzonthemtn ! Yes, you would need to run OpenSearch. You can do this via Docker.

I normally run them using the GitHub Actions as a template.

Start Docker:

docker pull opensearchproject/opensearch:${{ matrix.opensearch }}
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -d opensearchproject/opensearch:${{ matrix.opensearch }}

Run the tests:

./gradlew :data-prepper-plugins:opensearch:integrationTest --tests "com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchIT" -Dtests.opensearch.host=localhost:9200 -Dtests.opensearch.user=admin -Dtests.opensearch.password=admin
./gradlew :data-prepper-plugins:opensearch:integrationTest -Dtests.opensearch.host=localhost:9200 -Dtests.opensearch.user=admin -Dtests.opensearch.password=admin -Dtests.opensearch.bundle=true

graytaylor0 pushed a commit that referenced this issue Jul 30, 2022
@dlvenable dlvenable added this to the v2.0 milestone Aug 8, 2022
@dlvenable
Copy link
Member Author

This was resolved in #1561 .

engechas pushed a commit to engechas/data-prepper that referenced this issue Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants