diff --git a/docs/integrations/README.md b/docs/integrations/README.md new file mode 100644 index 0000000000..d83736d9bb --- /dev/null +++ b/docs/integrations/README.md @@ -0,0 +1,8 @@ +# OpenSearch Integrations + +This is the developer documentation for OpenSearch Integrations. + +Some major documents to look at: +- [Setup](setup.md) explains the major steps of the integration setup process behind the scenes, + which gives context for how integration content is assembled. To get more into developing + integrations directly, there's the related [Config](config.md) document. diff --git a/docs/integrations/config.md b/docs/integrations/config.md new file mode 100644 index 0000000000..4f41b379be --- /dev/null +++ b/docs/integrations/config.md @@ -0,0 +1,85 @@ +# Integration Configuration + +**Date:** March 22, 2024 + +The bulk of an integration's functionality is defined in its config. Let's look a bit at the config +for the current [Nginx integration](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/__data__/repository/nginx/nginx-1.0.0.json), +with some fields pruned for legibility, to get a better understanding of what information it +contains. + +```json5 +{ + "name": "nginx", + "version": "1.0.0", + "workflows": [ + { + "name": "queries" + }, + { + "name": "dashboards" + } + ], + "components": [ + { + "name": "communication", + "version": "1.0.0" + }, + { + "name": "http", + "version": "1.0.0" + }, + { + "name": "logs", + "version": "1.0.0" + } + ], + "assets": [ + { + "name": "nginx", + "version": "1.0.0", + "extension": "ndjson", + "type": "savedObjectBundle", + "workflows": ["dashboards"] + }, + { + "name": "create_table", + "version": "1.0.0", + "extension": "sql", + "type": "query" + }, + { + "name": "create_mv", + "version": "1.0.0", + "extension": "sql", + "type": "query", + "workflows": ["dashboards"] + } + ], + "sampleData": { + "path": "sample.json" + } +} +``` + +There are generally four key components to an integration's functionality, a lot of what's left is metadata or used for rendering. + +- `assets` are the items that are associated with the integration, including queries, dashboards, + and index patterns. Originally the assets were just one `ndjson` file of exported Saved Objects + (today a `savedObjectBundle`), but to support further options it was transformed to a list with + further types. The assets are available under the [directory of the same name](https://github.com/opensearch-project/dashboards-observability/tree/4e1e0e585/server/adaptors/integrations/__data__/repository/nginx/assets). + The currently supported asset types are: + - `savedObjectBundle`: a saved object export. This typically includes an index pattern and a dashboard querying it, and it indicates that the integration expects data that conforms to this index pattern (see `components` below). + - `query`: A SQL query that is sent to OpenSearch Spark. You can read more about it at the + [opensearch-spark repository](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md). +- `workflows` are conditional flags that toggle whether or not an asset should be installed. They're + selected by the user before installing the integration. By default, an asset is included under + every workflow. Currently, workflows are only enabled for integrations that support S3 data source + installations, and workflows are run in order of type (`query`s are always run before `savedObjectBundle`s). +- `components` define the format of the data expected for saved queries and dashboards. This format + is specified by the components. These are typically shared between related integrations to allow + things like correlation by field. The current standard components defined here and in the + [OpenSearch Catalog](https://github.com/opensearch-project/opensearch-catalog) are heavily + inspired by [OpenTelemetry](https://opentelemetry.io/). The components can be used for validation + when connecting an integration to an index pattern. It's highly recommended to reuse existing + components where possible. +- `sampleData` is loaded after the rest of the integration setup process when users select the "Try it" option. diff --git a/docs/integrations/setup.md b/docs/integrations/setup.md new file mode 100644 index 0000000000..aba7f6a51e --- /dev/null +++ b/docs/integrations/setup.md @@ -0,0 +1,35 @@ +# Integrations Setup + +**Date:** March 22, 2024 + +When an integration is being installed, there are several steps executed in the process of getting +everything up and running. This document describes the major steps of installing an integration that +happen behind the scenes, to make it more clear how to implement content. It's generally recommended to read this along with the [Config document](config.md). + +Currently, two types of integration assets are supported with a synchronous install. The full +installation process installs these separately, in two major chunks. + +- The frontend side of the setup is in + [setup_integrations.tsx](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/public/components/integrations/components/setup_integration.tsx#L450). + This is where the installation flow is selected based on the type of integration being installed, + integration `query`s are ran if available, and eventually the build request is sent to the + backend. +- On the backend the request is routed to a + [builder](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/integrations_builder.ts#L32) + that handles some further reference tidying (rewriting UUIDs to avoid collisions, modifying which + index is read, etc) and makes the final integration instance object. + +This process is a little confusing and perhaps more convoluted than it needs to be. This is known to +the author in hindsight. + +## Query Mapping + +If working on S3-based integrations, it's worth noting that queries have some values +[substituted](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/public/components/integrations/components/setup_integration.tsx#L438) when installing. They are: + +- `{s3_bucket_location}` to locate data. +- `{s3_checkpoint_location}` to store intermediate results, which is required by Spark. +- `{object_name}` used for giving tables a unique name per-integration to avoid collisions. + +For some query examples, it can be worth looking at the assets for the +[VPC integration](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/__data__/repository/aws_vpc_flow/assets/README.md).