Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add support for the AWS OpenSearch Ingestion Pipeline #405

Open
adilnaimi opened this issue Jun 5, 2023 · 16 comments
Open

[FEATURE] Add support for the AWS OpenSearch Ingestion Pipeline #405

adilnaimi opened this issue Jun 5, 2023 · 16 comments
Labels
enhancement New feature or request

Comments

@adilnaimi
Copy link

Is your feature request related to a problem?

No

What solution would you like?

Native access to the newly released AWS feature, OpenSearch Ingestion Pipeline

What alternatives have you considered?

Our current workaround uses requests-aws4auth but unnecessarily adds another layer of complexity to our codebase.

Do you have any additional context?

No

@adilnaimi adilnaimi added enhancement New feature or request untriaged Need triage labels Jun 5, 2023
@saimedhi saimedhi removed the untriaged Need triage label Jun 5, 2023
@saimedhi
Copy link
Collaborator

saimedhi commented Jun 5, 2023

@wbeckler, please take a look at this feature request.

@wbeckler
Copy link
Contributor

wbeckler commented Jun 6, 2023

@adilnaimi Would you explain more how you would use the opensearch client for accessing the ingestion service?

@adilnaimi
Copy link
Author

@wbeckler -- It could be this is not the right repository where I should create my issue, and I'm willing to move it to the appropriate repository if suggested. Currently, we use opensearch-py client to establish a connection with the AWS OpenSearch cluster. However, with the introduction of the AWS ingestion pipeline service (powered by Data Prepper), we are interested in leveraging its various features, such as DLQ (Dead Letter Queue). To achieve this, we have set up a pipeline between our application and the OpenSearch cluster as follows: app -> pipeline -> OpenSearch.

Currently, there is no native method available to connect to the OpenSearch Ingestion Pipeline (I'm unsure why)—both OpenSearch-py and boto3 lack built-in support for the Pipeline functionality. I'm looking for the appropriate approach to accessing the OpenSearch Ingestion Pipeline and would appreciate any suggestions or recommendations.

@wbeckler
Copy link
Contributor

wbeckler commented Jun 8, 2023

I'm not sure where that request would belong either. It sounds like you're looking for a data-prepper version of this: https://github.com/vklochan/python-logstash

Maybe you could fork it or start from scratch and make the first client for data-prepper? If so, leave a comment here so if anyone else wants to help out they'll see this and find you.

@Utkarsh-Aga
Copy link

Utkarsh-Aga commented Sep 20, 2023

@wbeckler @adilnaimi - Just want to re-confirm my understanding that we are looking for a method which we have in opensearch.py package to ingest the data to the Pipeline instead of calling the HTTPS endpoint directly via curl or any other method -

awscurl --service osis --region us-east-1 \
    -X POST \
    -H "Content-Type: application/json" \
    -d '[{"time":"2014-08-11T11:40:13+00:00","remote_addr":"122.226.223.69","status":"404","request":"GET http://www.k2proxy.com//hello.html HTTP/1.1","http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)"}]' \
    https://{pipeline-endpoint}.us-east-1.osis.amazonaws.com/log-pipeline/test_ingestion_path

Just like using client.cat.indices instead of curl https://{domain-endpoint}/_cat/indices

I ask this because if my understanding is correct, I would be interested to be part of this.

@dblock
Copy link
Member

dblock commented Sep 21, 2023

@Utkarsh-Aga do you think it should be some kind of option, new namespace, require a separate client instance, or be a separate client altogether because it's specific to an AWS service?

  • client = Client(ingestion_pipeline: "https://{pipeline-endpoint}.us-east-1.osis.amazonaws.com/log-pipeline/test_ingestion_path")
  • client.ingestion_pipeline(...)...
  • something else?

@Utkarsh-Aga
Copy link

@dblock I believe, while creating the authentication for the client we need to provide the service as osis and then we can leverage options like client.ingestion_pipeline(...)

@dblock
Copy link
Member

dblock commented Sep 21, 2023

So we would treat osis as a plugin? My question is whether we're better off writing another library that depends on opensearch-py? Either way, since the service is not available in open source, it should be behind import aws.ingestion_pipeline.

Want to contribute @Utkarsh-Aga?

@Utkarsh-Aga
Copy link

@dblock -
Yeah, we can have it as a plugin, because it would only have one functionality to send the data over http/https to the Ingestion Pipeline.
Further, I would be happy to contribute to it.
But I did not get this statement it should be behind import aws.ingestion_pipeline., can you please elaborate a bit more on this ?

@dblock
Copy link
Member

dblock commented Sep 21, 2023

Yeah, we can have it as a plugin, because it would only have one functionality to send the data over http/https to the Ingestion Pipeline. Further, I would be happy to contribute to it.

Awesome.

But I did not get this statement it should be behind import aws.ingestion_pipeline., can you please elaborate a bit more on this ?

Since it's an AWS service, and not a generic feature of OpenSearch open source, you can't treat it like other plugins. We can't have client.ingestion_pipeline, it would need to be client.aws_ingestion_pipeline, but that's really ugly. So as a developer I think I'd like to be able to do something like this:

from opensearch.aws import IngestionPipeline

client = Client(
  plugins: IngestionPipeline
)
client.ingestion_pipeline....

Does it make sense?

@Utkarsh-Aga
Copy link

Got it, Thanks a lot for these details @dblock, it made things super clear.
I would start my research on this on how to implement it.

Since, I would be contributing to this first time, so I should just follow - CONTRIBUTING guide?

@dblock
Copy link
Member

dblock commented Sep 21, 2023

Yes! Let us know if you need help. A good place to ask general questions is the public Slack - https://opensearch.org/slack.html

@Utkarsh-Aga
Copy link

Sure, Thanks.

@dlvenable
Copy link
Member

@adilnaimi , @Utkarsh-Aga ,

As noted above, there is an existing http source you can use. But, it seems you are interested in having APIs that look similar to OpenSearch APIs. We have an existing issue to support the _bulk API - opensearch-project/data-prepper#248. Another feature we've discussed is having an API that looks like existing OpenSearch document APIs. For example, POST {index}/_doc. Is this something along of the lines of what you are looking for?

@Utkarsh-Aga
Copy link

Hello @dlvenable - Yes, if we have a support for _bulk or _doc, then it might help, and we need not create a different plugin in opensearch.py to send the data to http source.
Would let @adilnaimi also once confirm, if that is the same thing, they were looking for?

@dblock dblock changed the title [FEATURE] Support AWS OpenSearch Ingestion Pipeline [FEATURE] Add support for the AWS OpenSearch Ingestion Pipeline Nov 10, 2023
@jzonthemtn
Copy link

This feature would be helpful to the OpenSearch UBI efforts. As commented in Data Prepper, being able to utilize an OpenSearch-like API to index UBI events and queries through Data Prepper would provide the user flexibility for storing UBI data with low overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants