Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Endpoint security integration tests can fail due to saved object errors #162639

Closed
cmacknz opened this issue Jul 25, 2023 · 7 comments · Fixed by #162724
Closed

[Fleet] Endpoint security integration tests can fail due to saved object errors #162639

cmacknz opened this issue Jul 25, 2023 · 7 comments · Fixed by #162724
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@cmacknz
Copy link
Member

cmacknz commented Jul 25, 2023

The endpoint security integration tests in https://github.com/elastic/elastic-agent/blob/main/testing/integration/endpoint_security_test.go can randomly fail with errors related to saved objects and transforms.

The first error I've observed is Saved object [epm-packages/endpoint] not found}:

    endpoint_security_test.go:83: 
        	Error Trace:	/home/ubuntu/agent/testing/integration/endpoint_security_test.go:248
        	            				/home/ubuntu/agent/testing/integration/endpoint_security_test.go:83
        	Error:      	Received unexpected error:
        	            	http error response with code 404: {StatusCode:404 Error:Not Found Message:Saved object [epm-packages/endpoint] not found}
        	Test:       	TestInstallAndCLIUninstallWithEndpointSecurity

There is a second error that frequently follows this one, data_frame_transform_state_and_stats-endpoint.metadata_current-default-8.9.0]: version conflict, document already exists (current version [1])}. I suspect that this problem only occurs when our first attempt to install fails, and might be a symptom of a partially successful package installation being retried.

    endpoint_security_test.go:83: 
        	Error Trace:	/home/ubuntu/agent/testing/integration/endpoint_security_test.go:248
        	            				/home/ubuntu/agent/testing/integration/endpoint_security_test.go:83
        	Error:      	Received unexpected error:
        	            	http error response with code 500: {StatusCode:500 Error:Internal Server Error Message:Error installing endpoint 8.9.0: runtime_exception
        	            		Caused by:
        	            			version_conflict_engine_exception: [data_frame_transform_state_and_stats-endpoint.metadata_current-default-8.9.0]: version conflict, document already exists (current version [1])
        	            		Root causes:
        	            			runtime_exception: Failed to persist transform statistics for transform [endpoint.metadata_current-default-8.9.0]}
        	Test:       	TestInstallAndCLIUninstallWithEndpointSecurity
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz
Copy link
Member Author

cmacknz commented Jul 25, 2023

@juliaElastic we are using the package policies API in a test to install Elastic Defend, and it sometimes fails with the errors above related to saved objects. This isn't a bug in the agent itself, and is either a possible bug on the Fleet side or we are just using the APIs wrong. Any ideas what can cause these problems? Particularly the Saved object [epm-packages/endpoint] not found error?

@cmacknz cmacknz added the Team:Fleet Team label for Observability Data Collection Fleet team label Jul 25, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@juliaElastic
Copy link
Contributor

juliaElastic commented Jul 26, 2023

@cmacknz I've seen some of these errors before, not sure about the cause. Can we document the manual steps to reproduce, so we can look into the kibana logs to debug?

This SO error was reported in a discuss issue where the package was tried to be rolled back:
https://discuss.elastic.co/t/kafka-integration-installation-error-limit-of-total-dimension-fields-16-has-been-exceeded/339052/8

Should we move this to kibana repo to investigate?

@cmacknz
Copy link
Member Author

cmacknz commented Jul 27, 2023

This is reproduced quite regularly but our integration test for whether the agent can run endpoint security. We use the package policies API to install the Elastic Defend package and it fails here.

https://github.com/elastic/elastic-agent/blob/39f3d7783caae3c11d550c63ab14e9c96dc7bf8a/testing/integration/endpoint_security_test.go#L60

This is where Defend is installed: https://github.com/elastic/elastic-agent/blob/39f3d7783caae3c11d550c63ab14e9c96dc7bf8a/testing/integration/endpoint_security_test.go#L233

Instructions for running the tests are in https://github.com/elastic/elastic-agent/blob/main/docs/test-framework-dev-guide.md.

Our tests will provision a new Elastic Stack for each run which might be part of the problem here, but there is nothing in that test that you couldn't easily script on the CLI against a cloud deployment you created yourself.

Something that continuously uninstalls and then installs the Elastic Defend package might reproduce this.

@cmacknz cmacknz changed the title Endpoint security integration tests can fail due to saved object errors [Fleet] Endpoint security integration tests can fail due to saved object errors Jul 27, 2023
@cmacknz cmacknz transferred this issue from elastic/elastic-agent Jul 27, 2023
@cmacknz
Copy link
Member Author

cmacknz commented Jul 27, 2023

Should we move this to kibana repo to investigate?

Done, CC @jlind23. This bug causes the agent endpoint security integration test to fail fairly regularly.

@juliaElastic juliaElastic self-assigned this Jul 28, 2023
@juliaElastic
Copy link
Contributor

juliaElastic commented Jul 28, 2023

I could reproduce the bug with transform conflict error, it happens if I try to reinstall the endpoint package a few times from the UI.
I found that the reason is that the code tries to create the same transform multiple times, it seems that the same transform is duplicated in transformPaths here
It looks like a bug because the paths come from the package zip, where there is one file per transform.
Something is wrong with how we are caching the content of the archive, it seems to add another duplicate of a path on every reinstall.

[
  'endpoint-8.9.1/elasticsearch/transform/metadata_current/default.json',
  'endpoint-8.9.1/elasticsearch/transform/metadata_current/default.json',
  'endpoint-8.9.1/elasticsearch/transform/metadata_united/default.json',
  'endpoint-8.9.1/elasticsearch/transform/metadata_united/default.json',
]

I think the duplicates are coming from here, looks like there is a bug where we push the same path twice.
https://github.com/nchaulet/kibana/blob/4fe3cce7d7427ceb4a7886b24b6334ef47da238d/x-pack/plugins/fleet/server/services/epm/archive/storage.ts#L213-L218

I'm not sure if the SO not found error is related, we can see if it can be reproduced after the first issue is fixed.

juliaElastic added a commit that referenced this issue Jul 31, 2023
## Summary

Fix #162639

Found a bug when paths were pushed twice in package archive, this
results in an error when trying to reinstall endpoint package, fails
with a conflict error when creating transforms.

See more details here:
#162639 (comment)

It was introduced with this change, perhaps unintentionally:
https://github.com/elastic/kibana/pull/151655/files

I think this bug is also the reason of many
[issues](#161804) with package
reinstallation. I could reproduce another bug when trying to reinstall
`system` integration, that is also fixed by removing the duplicate
paths.
```
[2023-07-28T17:41:27.116+02:00][ERROR][plugins.fleet] Error: Non-unique import objects detected: [dashboard:system-0d3f2380-fa78-11e6-ae9b-81e5311e8cab,dashboard:system-277876d0-fa2c-11e6-bbd3-29c986c96e5a,dashboard:system-5517a150-f9ce-11e6-8115-a7c18106d86a,dashboard:system-71f720f0-ff18-11e9-8405-516218e3d268,dashboard:system-79ffd6e0-faa0-11e6-947f-177f697178b8,dashboard:system-Logs-syslog-dashboard,dashboard:system-Metrics-system-overview,dashboard:system-Windows-Dashboard,dashboard:system-bae11b00-9bfc-11ea-87e4-49f31ec44891,dashboard:system-bb858830-f412-11e9-8405-516218e3d268,dashboard:system-d401ef40-a7d5-11e9-a422-d144027429da,search:system-06b6b060-7a80-11ea-bc9a-0baf2ca323a3,search:system-324686c0-fefb-11e9-8405-516218e3d268,search:system-62439dc0-f9c9-11e6-a747-6121780e0414,search:system-6f4071a0-7a78-11ea-bc9a-0baf2ca323a3,search:system-757510b0-a87f-11e9-a422-d144027429da,search:system-7e178c80-fee1-11e9-8405-516218e3d268,search:system-8030c1b0-fa77-11e6-ae9b-81e5311e8cab,search:system-9066d5b0-fef2-11e9-8405-516218e3d268,search:system-Syslog-system-logs,search:system-b6f321e0-fa25-11e6-bbd3-29c986c96e5a,search:system-ce71c9a0-a25e-11e9-a422-d144027429da,search:system-eb0039f0-fa7f-11e6-a1df-a78bd7504d38]
    at Function.nonUniqueImportObjects (errors.ts:35:12)
```

### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
juliaElastic added a commit to juliaElastic/kibana that referenced this issue Jul 31, 2023
Fix elastic#162639

Found a bug when paths were pushed twice in package archive, this
results in an error when trying to reinstall endpoint package, fails
with a conflict error when creating transforms.

See more details here:
elastic#162639 (comment)

It was introduced with this change, perhaps unintentionally:
https://github.com/elastic/kibana/pull/151655/files

I think this bug is also the reason of many
[issues](elastic#161804) with package
reinstallation. I could reproduce another bug when trying to reinstall
`system` integration, that is also fixed by removing the duplicate
paths.
```
[2023-07-28T17:41:27.116+02:00][ERROR][plugins.fleet] Error: Non-unique import objects detected: [dashboard:system-0d3f2380-fa78-11e6-ae9b-81e5311e8cab,dashboard:system-277876d0-fa2c-11e6-bbd3-29c986c96e5a,dashboard:system-5517a150-f9ce-11e6-8115-a7c18106d86a,dashboard:system-71f720f0-ff18-11e9-8405-516218e3d268,dashboard:system-79ffd6e0-faa0-11e6-947f-177f697178b8,dashboard:system-Logs-syslog-dashboard,dashboard:system-Metrics-system-overview,dashboard:system-Windows-Dashboard,dashboard:system-bae11b00-9bfc-11ea-87e4-49f31ec44891,dashboard:system-bb858830-f412-11e9-8405-516218e3d268,dashboard:system-d401ef40-a7d5-11e9-a422-d144027429da,search:system-06b6b060-7a80-11ea-bc9a-0baf2ca323a3,search:system-324686c0-fefb-11e9-8405-516218e3d268,search:system-62439dc0-f9c9-11e6-a747-6121780e0414,search:system-6f4071a0-7a78-11ea-bc9a-0baf2ca323a3,search:system-757510b0-a87f-11e9-a422-d144027429da,search:system-7e178c80-fee1-11e9-8405-516218e3d268,search:system-8030c1b0-fa77-11e6-ae9b-81e5311e8cab,search:system-9066d5b0-fef2-11e9-8405-516218e3d268,search:system-Syslog-system-logs,search:system-b6f321e0-fa25-11e6-bbd3-29c986c96e5a,search:system-ce71c9a0-a25e-11e9-a422-d144027429da,search:system-eb0039f0-fa7f-11e6-a1df-a78bd7504d38]
    at Function.nonUniqueImportObjects (errors.ts:35:12)
```

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
ThomThomson pushed a commit to ThomThomson/kibana that referenced this issue Aug 1, 2023
## Summary

Fix elastic#162639

Found a bug when paths were pushed twice in package archive, this
results in an error when trying to reinstall endpoint package, fails
with a conflict error when creating transforms.

See more details here:
elastic#162639 (comment)

It was introduced with this change, perhaps unintentionally:
https://github.com/elastic/kibana/pull/151655/files

I think this bug is also the reason of many
[issues](elastic#161804) with package
reinstallation. I could reproduce another bug when trying to reinstall
`system` integration, that is also fixed by removing the duplicate
paths.
```
[2023-07-28T17:41:27.116+02:00][ERROR][plugins.fleet] Error: Non-unique import objects detected: [dashboard:system-0d3f2380-fa78-11e6-ae9b-81e5311e8cab,dashboard:system-277876d0-fa2c-11e6-bbd3-29c986c96e5a,dashboard:system-5517a150-f9ce-11e6-8115-a7c18106d86a,dashboard:system-71f720f0-ff18-11e9-8405-516218e3d268,dashboard:system-79ffd6e0-faa0-11e6-947f-177f697178b8,dashboard:system-Logs-syslog-dashboard,dashboard:system-Metrics-system-overview,dashboard:system-Windows-Dashboard,dashboard:system-bae11b00-9bfc-11ea-87e4-49f31ec44891,dashboard:system-bb858830-f412-11e9-8405-516218e3d268,dashboard:system-d401ef40-a7d5-11e9-a422-d144027429da,search:system-06b6b060-7a80-11ea-bc9a-0baf2ca323a3,search:system-324686c0-fefb-11e9-8405-516218e3d268,search:system-62439dc0-f9c9-11e6-a747-6121780e0414,search:system-6f4071a0-7a78-11ea-bc9a-0baf2ca323a3,search:system-757510b0-a87f-11e9-a422-d144027429da,search:system-7e178c80-fee1-11e9-8405-516218e3d268,search:system-8030c1b0-fa77-11e6-ae9b-81e5311e8cab,search:system-9066d5b0-fef2-11e9-8405-516218e3d268,search:system-Syslog-system-logs,search:system-b6f321e0-fa25-11e6-bbd3-29c986c96e5a,search:system-ce71c9a0-a25e-11e9-a422-d144027429da,search:system-eb0039f0-fa7f-11e6-a1df-a78bd7504d38]
    at Function.nonUniqueImportObjects (errors.ts:35:12)
```

### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants