Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OpenSearch Sink upsert action fails to create new document if it doesn't exist already #3934

Closed
shadabk96 opened this issue Jan 9, 2024 · 1 comment
Assignees
Labels
bug Something isn't working
Milestone

Comments

@shadabk96
Copy link

Describe the bug
My sink configuration is specified as:

  sink:
    - opensearch:
        hosts: [ "https://search-opport/..." ]
        index: "poc"
        document_id_field: "opportunity_id"
        routing_field: "opportunity_id"
        action: "upsert"
...

However, when it gets documents with a new opportunity_id value, it throws

2024-01-06T03:45:00.550 [s3-log-pipeline-sink-worker-2-thread-2] WARN  org.opensearch.dataprepper.plugins.sink.opensearch.BulkRetryStrategy - operation = Update, error = [marketplaceId-ATVPDKIKX0DER#merchantId-47624524402#ruleId-ssep_new_keyword_suggestion]: document missing

To Reproduce
Steps to reproduce the behavior:

  1. Set up OpenSearch Ingestion pipeline with S3 source, newline code and parse_json processor.
  2. Specify action: "upsert" for OpenSearch sink
  3. Add a file to S3 with 1 JSON document per new line.
  4. Check data prepper logs in Cloudwatch

Expected behavior
JSON documents provided in source S3 file should be ingested in OpenSearch as expected.

Screenshots
N/A

Environment (please complete the following information):

  • Amazon OpenSearch Ingestion Service

Additional context
Possible root cause:
I did a deep dive in the github library and came across this commit where the functionality was added. I see it uses UpdateOperation here to create requests to be added to bulkOperation. This might not be the correct approach as OpenSearch _bulk API expects doc_as_upsert as true for using Upsert with Update in bulk (Doc ref). I see that this is also supported in the opensearch-java library used by data prepper (ref).

@dlvenable
Copy link
Member

Fixed by #4178

@github-project-automation github-project-automation bot moved this from Unplanned to Done in Data Prepper Tracking Board Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

3 participants