Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch source plugin #1985

Closed
dlvenable opened this issue Oct 30, 2022 · 4 comments
Closed

OpenSearch source plugin #1985

dlvenable opened this issue Oct 30, 2022 · 4 comments
Assignees
Labels
plugin - source A plugin to receive data from a service or location.
Milestone

Comments

@dlvenable
Copy link
Member

Is your feature request related to a problem? Please describe.

Some users have looked for a way to migrate data from one OpenSearch cluster to another. One of the missing components here would be the ability to retrieve data from OpenSearch. Other users have expressed a need to transform data from OpenSearch clusters.

Describe the solution you'd like

Include an OpenSearch source plugin for Data Prepper. It will need some of the following configurations.

  • Connection configurations similar to the opensearch sink.
  • Source index or indices. This should also support dynamic indices based on date.
  • Possibly query options to filter down the data. If not specified, all data in the index would be used.
  • Schedule configurations for reading data.

Additional context

This should be similar to the logstash-input-opensearch-plugin provided in the OpenSearch project.

https://opensearch.org/blog/community/2022/05/introducing-logstash-input-opensearch-plugin-for-opensearch/
https://github.com/opensearch-project/logstash-input-opensearch

@dlvenable dlvenable added the plugin - source A plugin to receive data from a service or location. label Oct 30, 2022
@ashoktelukuntla
Copy link
Contributor

ashoktelukuntla commented Feb 17, 2023

Create a source plugin which would enable users to bulk read, bulk write on a scheduled manner to a given OpenSearch cluster. This plugin should be extendable to take user defined additional sources. Users should be able to create/schedule pipeline for migration of data by

Auto discovery i.e. Listing all the indexes or Take given Index
Iterate over a index , read/fetch complete data
Enrich/transform Data (Optional)
Sink (can be any say) OpenSearch using Data Prepper
Reconcile/report comparing source and sink data
Cron can be used to schedule the migration of data. Example: schedule: "* * * * *" ' will load data every minute

Additional context

Plugin should be able to take configurations data related to cluster including hostname:port, user credentials, optional - index and query (e.g. match_all) .

I would envision the following sequence of steps

cat indices - https://opensearch.org/docs/1.2/opensearch/rest-api/cat/cat-indices/
Iterate over an index
Query index i.e. match_all or scroll query for a large indices
Enrich/transform Data (Optional)
Data Prepper pipeline to ingest data in to opensearch
Report on data from sink and source
**References: **

This should be similar to the logstash-input-opensearch-plugin provided in the OpenSearch project.

https://opensearch.org/blog/community/2022/05/introducing-logstash-input-opensearch-plugin-for-opensearch/
https://github.com/opensearch-project/logstash-input-opensearch
https://opensearch.org/docs/2.4/opensearch/point-in-time/
https://opensearch.org/docs/2.4/api-reference/scroll/

Sample YML should be :

source:
      opensearch:
      version: x
      query: "..."
      indexes:
        - logs-*
      schedule: PT20S

rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 8, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 9, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 11, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 12, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 12, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 12, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 12, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 16, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 16, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 16, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 17, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 17, 2023
@dlvenable dlvenable moved this from Unplanned to To do in Data Prepper Tracking Board May 17, 2023
@dlvenable dlvenable added this to the v2.4 milestone May 17, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 18, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 18, 2023
rajeshLovesToCode added a commit to rajeshLovesToCode/data-prepper that referenced this issue May 18, 2023
mallikagogoi7 added a commit to mallikagogoi7/data-prepper that referenced this issue May 19, 2023
mallikagogoi7 added a commit to mallikagogoi7/data-prepper that referenced this issue May 19, 2023
@cmanning09 cmanning09 moved this from To do to In progress in Data Prepper Tracking Board May 24, 2023
@cmanning09
Copy link
Contributor

OpenSearchSource

High level proposal for class design for this plugin

mallikagogoi7 added a commit to mallikagogoi7/data-prepper that referenced this issue May 25, 2023
mallikagogoi7 added a commit to mallikagogoi7/data-prepper that referenced this issue May 29, 2023
@cmanning09
Copy link
Contributor

Latest PR: #2806 from @graytaylor0

@dlvenable dlvenable modified the milestones: v2.4, v2.5 Aug 10, 2023
@dlvenable dlvenable assigned graytaylor0 and unassigned cmanning09 Sep 11, 2023
@github-project-automation github-project-automation bot moved this from In progress to Done in Data Prepper Tracking Board Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment