Skip to content

Commit

Permalink
[Automatic Import] Adding base cel generation as experimental feature (
Browse files Browse the repository at this point in the history
…#195309)

## Summary

This PR adds base level support for CEL input configuration generation
for Automatic Import.

## How this works

For this phase of the CEL generation, we will produce three things:

1. A simple CEL program. This will contain logic for querying an
endpoint and mapping its response to events for processing based on an
OpenAPI spec file. It does **not** contain more complex functionality
for things like authentication.
2. An initial state. This will be based on the program and contain
defaults based on the openapi spec file.
3. A list of state variables that need redaction from the logs. 

These three pieces will be available for user review, and then plumbed
directly into the manifest file as default values for their
corresponding settings where the user can modify as needed.

Note: It is not yet expected that the generated output will be fully
functional without any tweaking or add-on's from the user for things
like authentication.

## (Temporary) UI Flow

If a user selects CEL during the datastream step, after completion of
the review, the user will then be able to upload and review the new CEL
steps.

The generated results shown to the user, and are then plumbed as
defaults to the input settings, where a user is able to modify during
configuration of the integration.

(Note: this flow will be changed with forthcoming UX designs)

## Feature flag

This feature will be behind an experimental feature flag for now, as the
design is still a work in progress. To enable this feature, add
`xpack.integration_assistant.enableExperimental: ['generateCel']` to
kibana.yml

## Maintainer's notes

- UI tests were intentionally omitted for now, as the UI implemented is
only temporary until we have a UX design.
- Some OpenAPI specs are too large to be uploaded at this time. I am
working on adding support for that and have added another item to the
[meta issue](#193074) as such

Relates: #193074
___ 

<details>
  <summary>Screenshots</summary>
  
After selecting CEL during datastream configuration and reviewing those
results, the user will be brought to a new screen to upload an open api
spec
<img width="650" alt="upload"
src="https://github.com/user-attachments/assets/efdace4a-cc26-4f33-8b67-35c08df5f640">

The user can upload the spec file (as long as it isn't over the file
upload limit)
<img width="650" alt="spec uploaded"
src="https://github.com/user-attachments/assets/9fd1b868-f8da-4d3c-b975-522bf66e05a5">

The user waits while the LLM runs
<img width="650" alt="Screenshot 2024-10-09 at 1 37 59 PM"
src="https://github.com/user-attachments/assets/3eca6b97-4525-4496-89b0-3002a97fa27d">

The user can view results 
<img width="650" alt="review"
src="https://github.com/user-attachments/assets/ee44fb16-fd3a-48c4-975f-706e6d381339">

The results are automatically pasted into the config, where the user may
further edit and configure the input
<img width="635" alt="Screenshot 2024-10-08 at 11 17 46 AM"
src="https://github.com/user-attachments/assets/45151e13-0fd9-4f9a-bbfe-68e6f9b0e671">

</details>

<details>
  <summary>Sample results </summary>

source:
[MISP](https://raw.githubusercontent.com/MISP/MISP/develop/app/webroot/doc/openapi.yaml)
  
program:
```
(
  request("POST", state.url + "/events/restSearch?" + {
    "page": [string(state.page)],
    "limit": [string(state.limit)],
    "sort": ["date"],
    "direction": ["asc"]
  }.format_query()).with({
    "Header": {
      "Content-Type": ["application/json"]
    }
  }).do_request().as(resp,
    resp.StatusCode == 200 ?
      bytes(resp.Body).decode_json().as(body, {
        "events": body.map(e, {
          "message": e.encode_json()
        }),
        "want_more": body.size() == state.limit,
        "page": state.page + 1,
        "limit": state.limit
      })
    :
      {
        "events": [{
          "error": {
            "code": string(resp.StatusCode),
            "id": string(resp.Status),
            "message": string(resp.Body)
          }
        }],
        "want_more": false
      }
  )
)
```

intiial state:
```
page : 1
limit : 50
```

redact vars:
```
[ ]
```

</details>
  • Loading branch information
kgeller authored Oct 11, 2024
1 parent 869ceec commit 7f24e38
Show file tree
Hide file tree
Showing 65 changed files with 2,528 additions and 29 deletions.
115 changes: 115 additions & 0 deletions x-pack/plugins/integration_assistant/__jest__/fixtures/cel.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

export const celTestState = {
dataStreamName: 'testDataStream',
apiDefinition: 'apiDefinition',
lastExecutedChain: 'testchain',
finalized: false,
apiQuerySummary: 'testQuerySummary',
exampleCelPrograms: [],
currentProgram: 'testProgram',
stateVarNames: ['testVar'],
stateSettings: { test: 'testDetails' },
redactVars: ['testRedact'],
results: { test: 'testResults' },
};

export const celQuerySummaryMockedResponse = `To cover all events in a chronological manner for the device_tasks endpoint, you should use the /v1/device_tasks GET route with pagination parameters. Specifically, use the pageSize and pageToken query parameters. Start with a large pageSize and use the nextPageToken from each response to fetch subsequent pages until all events are retrieved.
Sample URL path:
/v1/device_tasks?pageSize=1000&pageToken={nextPageToken}
Replace {nextPageToken} with the actual token received from the previous response. Repeat this process, updating the pageToken each time, until you've retrieved all events.`;

export const celProgramMockedResponse = `Based on the provided context and requirements, here's the CEL program section for the device_tasks datastream:
\`\`\`
request("GET", state.url + "/v1/device_tasks" + "?" + {
"pageSize": [string(state.page_size)],
"pageToken": [state.page_token]
}.format_query()).with({
"Header": {
"Content-Type": ["application/json"]
}
}).do_request().as(resp,
resp.StatusCode == 200 ?
bytes(resp.Body).decode_json().as(body, {
"events": body.tasks.map(e, {"message": e.encode_json()}),
"page_token": body.nextPageToken,
"want_more": body.nextPageToken != null
}) : {
"events": {
"error": {
"code": string(resp.StatusCode),
"message": string(resp.Body)
}
},
"want_more": false
}
)
\`\`\``;

export const celProgramMock = `request("GET", state.url + "/v1/device_tasks" + "?" + {
"pageSize": [string(state.page_size)],
"pageToken": [state.page_token]
}.format_query()).with({
"Header": {
"Content-Type": ["application/json"]
}
}).do_request().as(resp,
resp.StatusCode == 200 ?
bytes(resp.Body).decode_json().as(body, {
"events": body.tasks.map(e, {"message": e.encode_json()}),
"page_token": body.nextPageToken,
"want_more": body.nextPageToken != null
}) : {
"events": {
"error": {
"code": string(resp.StatusCode),
"message": string(resp.Body)
}
},
"want_more": false
}
)`;

export const celStateVarsMockedResponse = ['config1', 'config2', 'config3'];

export const celStateDetailsMockedResponse = [
{
name: 'config1',
default: 50,
redact: false,
},
{
name: 'config2',
default: '',
redact: true,
},
{
name: 'config3',
default: 'event',
redact: false,
},
];

export const celStateSettings = {
config1: 50,
config2: '',
config3: 'event',
};

export const celRedact = ['config2'];

export const celExpectedResults = {
program: celProgramMock,
stateSettings: {
config1: 50,
config2: '',
config3: 'event',
},
redactVars: ['config2'],
};
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,8 @@ export const mockedRequestWithPipeline = {
dataStreamName: 'audit',
currentPipeline: currentPipelineMock,
};

export const mockedRequestWithApiDefinition = {
apiDefinition: '{ "openapi": "3.0.0" }',
dataStreamName: 'audit',
};
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

/*
* NOTICE: Do not edit this file manually.
* This file is automatically generated by the OpenAPI Generator, @kbn/openapi-generator.
*
* info:
* title: Automatic Import CEL Input API endpoint
* version: 1
*/

import { z } from '@kbn/zod';

import { DataStreamName, Connector, LangSmithOptions } from '../model/common_attributes.gen';
import { ApiDefinition } from '../model/cel_input_attributes.gen';
import { CelInputAPIResponse } from '../model/response_schemas.gen';

export type CelInputRequestBody = z.infer<typeof CelInputRequestBody>;
export const CelInputRequestBody = z.object({
dataStreamName: DataStreamName,
apiDefinition: ApiDefinition,
connectorId: Connector,
langSmithOptions: LangSmithOptions.optional(),
});
export type CelInputRequestBodyInput = z.input<typeof CelInputRequestBody>;

export type CelInputResponse = z.infer<typeof CelInputResponse>;
export const CelInputResponse = CelInputAPIResponse;
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
openapi: 3.0.3
info:
title: Automatic Import CEL Input API endpoint
version: "1"
paths:
/api/integration_assistant/cel:
post:
summary: Builds CEL input configuration
operationId: CelInput
x-codegen-enabled: true
description: Generate CEL input configuration
tags:
- CEL API
requestBody:
required: true
content:
application/json:
schema:
type: object
required:
- apiDefinition
- dataStreamName
- connectorId
properties:
dataStreamName:
$ref: "../model/common_attributes.schema.yaml#/components/schemas/DataStreamName"
apiDefinition:
$ref: "../model/cel_input_attributes.schema.yaml#/components/schemas/ApiDefinition"
connectorId:
$ref: "../model/common_attributes.schema.yaml#/components/schemas/Connector"
langSmithOptions:
$ref: "../model/common_attributes.schema.yaml#/components/schemas/LangSmithOptions"
responses:
200:
description: Indicates a successful call.
content:
application/json:
schema:
$ref: "../model/response_schemas.schema.yaml#/components/schemas/CelInputAPIResponse"
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import { expectParseSuccess } from '@kbn/zod-helpers';
import { getCelRequestMock } from '../model/api_test.mock';
import { CelInputRequestBody } from './cel_input_route.gen';

describe('Cel request schema', () => {
test('full request validate', () => {
const payload: CelInputRequestBody = getCelRequestMock();

const result = CelInputRequestBody.safeParse(payload);
expectParseSuccess(result);
expect(result.data).toEqual(payload);
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import type { AnalyzeLogsRequestBody } from '../analyze_logs/analyze_logs_route.gen';
import type { BuildIntegrationRequestBody } from '../build_integration/build_integration.gen';
import type { CategorizationRequestBody } from '../categorization/categorization_route.gen';
import type { CelInputRequestBody } from '../cel/cel_input_route.gen';
import type { EcsMappingRequestBody } from '../ecs/ecs_route.gen';
import type { RelatedRequestBody } from '../related/related_route.gen';
import type { DataStream, Integration, Pipeline } from './common_attributes.gen';
Expand Down Expand Up @@ -65,6 +66,12 @@ export const getCategorizationRequestMock = (): CategorizationRequestBody => ({
samplesFormat: { name: 'ndjson' },
});

export const getCelRequestMock = (): CelInputRequestBody => ({
dataStreamName: 'test-data-stream-name',
apiDefinition: 'test-api-definition',
connectorId: 'test-connector-id',
});

export const getBuildIntegrationRequestMock = (): BuildIntegrationRequestBody => ({
integration: getIntegrationMock(),
});
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

/*
* NOTICE: Do not edit this file manually.
* This file is automatically generated by the OpenAPI Generator, @kbn/openapi-generator.
*
* info:
* title: Cel Input Attributes
* version: not applicable
*/

import { z } from '@kbn/zod';

/**
* String form of the Open API schema.
*/
export type ApiDefinition = z.infer<typeof ApiDefinition>;
export const ApiDefinition = z.string();

/**
* Optional CEL input details.
*/
export type CelInput = z.infer<typeof CelInput>;
export const CelInput = z.object({
program: z.string(),
stateSettings: z.object({}).catchall(z.unknown()),
redactVars: z.array(z.string()),
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
openapi: 3.0.3
info:
title: Cel Input Attributes
version: "not applicable"
paths: {}
components:
x-codegen-enabled: true
schemas:
ApiDefinition:
type: string
description: String form of the Open API schema.

CelInput:
type: object
description: Optional CEL input details.
required:
- program
- stateSettings
- redactVars
properties:
program:
type: string
stateSettings:
type: object
additionalProperties: true
redactVars:
type: array
items:
type: string

Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import { z } from '@kbn/zod';

import { ESProcessorItem } from './processor_attributes.gen';
import { CelInput } from './cel_input_attributes.gen';

/**
* Package name for the integration to be built.
Expand Down Expand Up @@ -178,6 +179,10 @@ export const DataStream = z.object({
* The format of log samples in this dataStream.
*/
samplesFormat: SamplesFormat,
/**
* The optional CEL input configuration for the dataStream.
*/
celInput: CelInput.optional(),
});

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,9 @@ components:
samplesFormat:
$ref: "#/components/schemas/SamplesFormat"
description: The format of log samples in this dataStream.
celInput:
$ref: "./cel_input_attributes.schema.yaml#/components/schemas/CelInput"
description: The optional CEL input configuration for the dataStream.

Integration:
type: object
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import { z } from '@kbn/zod';

import { Mapping, Pipeline, Docs, SamplesFormat } from './common_attributes.gen';
import { ESProcessorItem } from './processor_attributes.gen';
import { CelInput } from './cel_input_attributes.gen';

export type EcsMappingAPIResponse = z.infer<typeof EcsMappingAPIResponse>;
export const EcsMappingAPIResponse = z.object({
Expand Down Expand Up @@ -58,3 +59,8 @@ export const AnalyzeLogsAPIResponse = z.object({
parsedSamples: z.array(z.string()),
}),
});

export type CelInputAPIResponse = z.infer<typeof CelInputAPIResponse>;
export const CelInputAPIResponse = z.object({
results: CelInput,
});
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,11 @@ components:
type: array
items:
type: string

CelInputAPIResponse:
type: object
required:
- results
properties:
results:
$ref: "./cel_input_attributes.schema.yaml#/components/schemas/CelInput"
1 change: 1 addition & 0 deletions x-pack/plugins/integration_assistant/common/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ export const ECS_GRAPH_PATH = `${INTEGRATION_ASSISTANT_BASE_PATH}/ecs`;
export const CATEGORIZATION_GRAPH_PATH = `${INTEGRATION_ASSISTANT_BASE_PATH}/categorization`;
export const ANALYZE_LOGS_PATH = `${INTEGRATION_ASSISTANT_BASE_PATH}/analyzelogs`;
export const RELATED_GRAPH_PATH = `${INTEGRATION_ASSISTANT_BASE_PATH}/related`;
export const CEL_INPUT_GRAPH_PATH = `${INTEGRATION_ASSISTANT_BASE_PATH}/cel`;
export const CHECK_PIPELINE_PATH = `${INTEGRATION_ASSISTANT_BASE_PATH}/pipeline`;
export const INTEGRATION_BUILDER_PATH = `${INTEGRATION_ASSISTANT_BASE_PATH}/build`;
export const FLEET_PACKAGES_PATH = `/api/fleet/epm/packages`;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@
export type ExperimentalFeatures = typeof allowedExperimentalValues;

const _allowedExperimentalValues = {
// Leaving this in here until we have a 'real' experimental feature
testFeature: false,
/**
* Enables whether the user is able to utilize the LLM to generate the CEL input configuration.
*/
generateCel: false,
};

/**
Expand Down
Loading

0 comments on commit 7f24e38

Please sign in to comment.