Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
# Backport This will backport the following commits from `main` to `8.x`: - [[Auto Import] CSV format support (#194386)](#194386) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ilya Nikokoshev","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-14T10:24:58Z","message":"[Auto Import] CSV format support (#194386)\n\n## Release Notes\r\n\r\nAutomatic Import can now create integrations for logs in the CSV format.\r\nOwing to the maturity of log format support, we thus remove the verbiage\r\nabout requiring the JSON/NDJSON format.\r\n\r\n## Summary\r\n\r\n**Added: the CSV feature**\r\n\r\nThe issue is #194342 \r\n\r\nWhen the user adds a log sample whose format is recognized as CSV by the\r\nLLM, we now parse the samples and insert the\r\n[csv](https://www.elastic.co/guide/en/elasticsearch/reference/current/csv-processor.html)\r\nprocessor into the generated pipeline.\r\n\r\nIf the header is present, we use it for the field names and add a\r\n[drop](https://www.elastic.co/guide/en/elasticsearch/reference/current/drop-processor.html)\r\nprocessor that removes a header from the document stream by comparing\r\nthe values to the header values.\r\n\r\nIf the header is missing, we ask the LLM to generate a list of column\r\nnames, providing some context like package and data stream title.\r\n\r\nShould the header or LLM suggestion provide unsuitable for a specific\r\ncolumn, we use `column1`, `column2` and so on as a fallback. To avoid\r\nduplicate column names, we can add postfixes like `_2` as necessary.\r\n\r\nIf the format appears to be CSV, but the `csv` processor returns fails,\r\nwe bubble up an error using the recently introduced\r\n`ErrorThatHandlesItsOwnResponse` class. We also provide the first\r\nexample of passing the additional attributes of an error (in this case,\r\nthe original CSV error) back to the client. The error message is\r\ncomposed on the client side.\r\n\r\n**Removed: supported formats message**\r\n \r\nThe message that asks the user to upload the logs in `JSON/NDJSON\r\nformat` is removed in this PR:\r\n\r\n<img width=\"741\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/34d571c3-b12c-44a1-98e3-d7549160be12\">\r\n\r\n\r\n**Refactoring**\r\n \r\nThe refactoring makes the \"→JSON\" conversion process more uniform across\r\ndifferent chains and centralizes processor definitions in\r\n`.../server/util/processors.ts`.\r\n\r\nLog format chain now expects the LLM to follow the `SamplesFormat` when\r\nproviding the information rather than an ad-hoc format.\r\n \r\nWhen testing, the `fail` method is [not supported in\r\n`jest`](https://stackoverflow.com/a/54244479/23968144), so it is\r\nremoved.\r\n\r\nSee the PR for examples and follow-up.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"6a72037007d8f71504f444911c9fa25adfb1bb89","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["v9.0.0","release_note:feature","backport:prev-minor","Team:Security-Scalability","Feature:AutomaticImport"],"title":"[Auto Import] CSV format support","number":194386,"url":"https://github.com/elastic/kibana/pull/194386","mergeCommit":{"message":"[Auto Import] CSV format support (#194386)\n\n## Release Notes\r\n\r\nAutomatic Import can now create integrations for logs in the CSV format.\r\nOwing to the maturity of log format support, we thus remove the verbiage\r\nabout requiring the JSON/NDJSON format.\r\n\r\n## Summary\r\n\r\n**Added: the CSV feature**\r\n\r\nThe issue is #194342 \r\n\r\nWhen the user adds a log sample whose format is recognized as CSV by the\r\nLLM, we now parse the samples and insert the\r\n[csv](https://www.elastic.co/guide/en/elasticsearch/reference/current/csv-processor.html)\r\nprocessor into the generated pipeline.\r\n\r\nIf the header is present, we use it for the field names and add a\r\n[drop](https://www.elastic.co/guide/en/elasticsearch/reference/current/drop-processor.html)\r\nprocessor that removes a header from the document stream by comparing\r\nthe values to the header values.\r\n\r\nIf the header is missing, we ask the LLM to generate a list of column\r\nnames, providing some context like package and data stream title.\r\n\r\nShould the header or LLM suggestion provide unsuitable for a specific\r\ncolumn, we use `column1`, `column2` and so on as a fallback. To avoid\r\nduplicate column names, we can add postfixes like `_2` as necessary.\r\n\r\nIf the format appears to be CSV, but the `csv` processor returns fails,\r\nwe bubble up an error using the recently introduced\r\n`ErrorThatHandlesItsOwnResponse` class. We also provide the first\r\nexample of passing the additional attributes of an error (in this case,\r\nthe original CSV error) back to the client. The error message is\r\ncomposed on the client side.\r\n\r\n**Removed: supported formats message**\r\n \r\nThe message that asks the user to upload the logs in `JSON/NDJSON\r\nformat` is removed in this PR:\r\n\r\n<img width=\"741\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/34d571c3-b12c-44a1-98e3-d7549160be12\">\r\n\r\n\r\n**Refactoring**\r\n \r\nThe refactoring makes the \"→JSON\" conversion process more uniform across\r\ndifferent chains and centralizes processor definitions in\r\n`.../server/util/processors.ts`.\r\n\r\nLog format chain now expects the LLM to follow the `SamplesFormat` when\r\nproviding the information rather than an ad-hoc format.\r\n \r\nWhen testing, the `fail` method is [not supported in\r\n`jest`](https://stackoverflow.com/a/54244479/23968144), so it is\r\nremoved.\r\n\r\nSee the PR for examples and follow-up.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"6a72037007d8f71504f444911c9fa25adfb1bb89"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/194386","number":194386,"mergeCommit":{"message":"[Auto Import] CSV format support (#194386)\n\n## Release Notes\r\n\r\nAutomatic Import can now create integrations for logs in the CSV format.\r\nOwing to the maturity of log format support, we thus remove the verbiage\r\nabout requiring the JSON/NDJSON format.\r\n\r\n## Summary\r\n\r\n**Added: the CSV feature**\r\n\r\nThe issue is #194342 \r\n\r\nWhen the user adds a log sample whose format is recognized as CSV by the\r\nLLM, we now parse the samples and insert the\r\n[csv](https://www.elastic.co/guide/en/elasticsearch/reference/current/csv-processor.html)\r\nprocessor into the generated pipeline.\r\n\r\nIf the header is present, we use it for the field names and add a\r\n[drop](https://www.elastic.co/guide/en/elasticsearch/reference/current/drop-processor.html)\r\nprocessor that removes a header from the document stream by comparing\r\nthe values to the header values.\r\n\r\nIf the header is missing, we ask the LLM to generate a list of column\r\nnames, providing some context like package and data stream title.\r\n\r\nShould the header or LLM suggestion provide unsuitable for a specific\r\ncolumn, we use `column1`, `column2` and so on as a fallback. To avoid\r\nduplicate column names, we can add postfixes like `_2` as necessary.\r\n\r\nIf the format appears to be CSV, but the `csv` processor returns fails,\r\nwe bubble up an error using the recently introduced\r\n`ErrorThatHandlesItsOwnResponse` class. We also provide the first\r\nexample of passing the additional attributes of an error (in this case,\r\nthe original CSV error) back to the client. The error message is\r\ncomposed on the client side.\r\n\r\n**Removed: supported formats message**\r\n \r\nThe message that asks the user to upload the logs in `JSON/NDJSON\r\nformat` is removed in this PR:\r\n\r\n<img width=\"741\" alt=\"image\"\r\nsrc=\"https://github.com/user-attachments/assets/34d571c3-b12c-44a1-98e3-d7549160be12\">\r\n\r\n\r\n**Refactoring**\r\n \r\nThe refactoring makes the \"→JSON\" conversion process more uniform across\r\ndifferent chains and centralizes processor definitions in\r\n`.../server/util/processors.ts`.\r\n\r\nLog format chain now expects the LLM to follow the `SamplesFormat` when\r\nproviding the information rather than an ad-hoc format.\r\n \r\nWhen testing, the `fail` method is [not supported in\r\n`jest`](https://stackoverflow.com/a/54244479/23968144), so it is\r\nremoved.\r\n\r\nSee the PR for examples and follow-up.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <[email protected]>","sha":"6a72037007d8f71504f444911c9fa25adfb1bb89"}}]}] BACKPORT--> Co-authored-by: Ilya Nikokoshev <[email protected]>
- Loading branch information