Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Delete the task from Task Manager when deleting a report #192417

Conversation

tsullivan
Copy link
Member

@tsullivan tsullivan commented Sep 9, 2024

Summary

Closes #191676

This PR ensures that when a user deletes a report, if there is a backing Task Manager task that references the report job, that task is deleted as well. With these changes, the report job will truly be "cancelled" when a report is deleted.

The risk of not cancelling the Task Manager task when deleting a report is evident in server logs and in task monitoring, when automated tests are run on the reporting features.

How to test

Primarily, the API Integration test for the Reporting Datastream exercises the changes in this PR.

Checklist

  • Ensure the API/Functional tests are using the delete report endpoint rather than directly deleting reports from the ES store.
    • To prevent this PR from becoming very large, this assurance is only needed for serverless tests, and for tests that do not wait for the report job to be completed before deleting the report document.
    • Those constrictions narrow the changes down to just the test on the Reporting Datastream
  • Handle scenario of report document deleted before task begins or completes
  • Unit or functional tests were updated or added to match the most common scenarios

@tsullivan tsullivan added Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) Feature:Reporting:Framework Reporting issues pertaining to the overall framework labels Sep 10, 2024
@tsullivan tsullivan force-pushed the reporting/delete-task-associated-with-report-being-deleted branch from 4104de7 to f482dd4 Compare September 10, 2024 23:22
@tsullivan tsullivan marked this pull request as ready for review September 16, 2024 14:49
@tsullivan tsullivan requested a review from a team as a code owner September 16, 2024 14:49
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-sharedux (Team:SharedUX)

@tsullivan tsullivan marked this pull request as draft September 16, 2024 14:50
@tsullivan tsullivan added the release_note:skip Skip the PR/issue when compiling release notes label Sep 16, 2024
@tsullivan tsullivan force-pushed the reporting/delete-task-associated-with-report-being-deleted branch from 8c80b9b to a178df6 Compare September 17, 2024 22:08
@tsullivan tsullivan force-pushed the reporting/delete-task-associated-with-report-being-deleted branch 6 times, most recently from 3531553 to d17cca4 Compare October 2, 2024 00:32
@tsullivan tsullivan force-pushed the reporting/delete-task-associated-with-report-being-deleted branch from d17cca4 to fdb6e72 Compare October 2, 2024 00:36
});

after(async () => {
await reportingAPI.deleteAllReports(roleAuthc, internalReqHeader);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the line that causes "failures" in monitoring deployments that are used by automated tests.

  1. Stages a report job, just for the side-effect of initializing the datastream
  2. Deletes the report job when the test is finished, but a reference to the job is still encoded in the pending report job task
  3. Deletes the saved object which is used in the report.
  4. Invalidates the user credentials which are encoded in the pending report job task

Result: when it tries to execute the report job task, it will fail.

  • If the task begins after the report job is deleted, it would fail because the task runner's job is to query Reporting storage for the report job data
  • If the task begins before the report job is deleted and somehow accesses the report job data, it would fail because the saved object or user credentials were deleted

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@tsullivan tsullivan marked this pull request as ready for review October 2, 2024 15:24
@tsullivan tsullivan requested a review from a team as a code owner October 2, 2024 15:24
Copy link
Member

@dmlemeshko dmlemeshko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x-pack/test_serverless/shared/services/svl_reporting.ts changes LGTM

@tsullivan
Copy link
Member Author

const result = await taskManager.fetch({
query: {
term: {
'task.taskType': { value: 'report:execute' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just wanted to double check that we can't query by id here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The report ID is part of the task payload, which is stored in the attributes.params data of the task document. The format of this field is a string of JSON, which needs to be parsed before it is sent to the consumer.

Searching for an ID substring in the JSON string field is possible, using a wildcard or regexp query. But I don't recommend we do this:

  1. We would probably want to have the loop anyway, to ensure accuracy.
  2. These are expensive query types that can be disabled in cluster settings or elasticsearch.yml.

@tsullivan tsullivan added backport:version Backport to applied version labels v8.16.0 labels Oct 2, 2024
@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#7067

[✅] x-pack/test_serverless/functional/test_suites/observability/common_configs/config.group1.ts: 25/25 tests passed.
[✅] x-pack/test_serverless/functional/test_suites/search/common_configs/config.group1.ts: 25/25 tests passed.
[✅] x-pack/test_serverless/functional/test_suites/security/common_configs/config.group1.ts: 25/25 tests passed.

see run history

@tsullivan tsullivan merged commit dfee928 into elastic:main Oct 2, 2024
23 checks passed
@tsullivan tsullivan deleted the reporting/delete-task-associated-with-report-being-deleted branch October 2, 2024 19:04
@kibanamachine
Copy link
Contributor

Starting backport for target branches: 8.x

https://github.com/elastic/kibana/actions/runs/11149993525

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Oct 2, 2024
…lastic#192417)

## Summary

Closes elastic#191676

This PR ensures that when a user deletes a report, if there is a backing
Task Manager task that references the report job, that task is deleted
as well. With these changes, the report job will truly be "cancelled"
when a report is deleted.

The risk of not cancelling the Task Manager task when deleting a report
is evident in server logs and in task monitoring, when automated tests
are run on the reporting features.

### How to test
Primarily, the API Integration test for the Reporting Datastream
exercises the changes in this PR.

### Checklist

- [x] Ensure the API/Functional tests are using the delete report
endpoint rather than directly deleting reports from the ES store.
* **To prevent this PR from becoming very large, this assurance is only
needed for serverless tests, and for tests that do not wait for the
report job to be completed before deleting the report document.**
* Those constrictions narrow the changes down to just the test on the
Reporting Datastream
- [x] Handle scenario of report document deleted before task begins or
completes
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit dfee928)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Oct 2, 2024
…eport (#192417) (#194758)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Reporting] Delete the task from Task Manager when deleting a report
(#192417)](#192417)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Tim
Sullivan","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-02T19:04:39Z","message":"[Reporting]
Delete the task from Task Manager when deleting a report (#192417)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/191676\r\n\r\nThis PR ensures
that when a user deletes a report, if there is a backing\r\nTask Manager
task that references the report job, that task is deleted\r\nas well.
With these changes, the report job will truly be \"cancelled\"\r\nwhen a
report is deleted.\r\n\r\nThe risk of not cancelling the Task Manager
task when deleting a report\r\nis evident in server logs and in task
monitoring, when automated tests\r\nare run on the reporting
features.\r\n\r\n### How to test\r\nPrimarily, the API Integration test
for the Reporting Datastream\r\nexercises the changes in this
PR.\r\n\r\n### Checklist\r\n\r\n- [x] Ensure the API/Functional tests
are using the delete report\r\nendpoint rather than directly deleting
reports from the ES store.\r\n* **To prevent this PR from becoming very
large, this assurance is only\r\nneeded for serverless tests, and for
tests that do not wait for the\r\nreport job to be completed before
deleting the report document.**\r\n* Those constrictions narrow the
changes down to just the test on the\r\nReporting Datastream\r\n- [x]
Handle scenario of report document deleted before task begins
or\r\ncompletes\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<[email protected]>","sha":"dfee92802d386a9d1d64e62bd363695a2b860b31","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Team:SharedUX","v8.16.0","Feature:Reporting:Framework","backport:version"],"title":"[Reporting]
Delete the task from Task Manager when deleting a
report","number":192417,"url":"https://github.com/elastic/kibana/pull/192417","mergeCommit":{"message":"[Reporting]
Delete the task from Task Manager when deleting a report (#192417)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/191676\r\n\r\nThis PR ensures
that when a user deletes a report, if there is a backing\r\nTask Manager
task that references the report job, that task is deleted\r\nas well.
With these changes, the report job will truly be \"cancelled\"\r\nwhen a
report is deleted.\r\n\r\nThe risk of not cancelling the Task Manager
task when deleting a report\r\nis evident in server logs and in task
monitoring, when automated tests\r\nare run on the reporting
features.\r\n\r\n### How to test\r\nPrimarily, the API Integration test
for the Reporting Datastream\r\nexercises the changes in this
PR.\r\n\r\n### Checklist\r\n\r\n- [x] Ensure the API/Functional tests
are using the delete report\r\nendpoint rather than directly deleting
reports from the ES store.\r\n* **To prevent this PR from becoming very
large, this assurance is only\r\nneeded for serverless tests, and for
tests that do not wait for the\r\nreport job to be completed before
deleting the report document.**\r\n* Those constrictions narrow the
changes down to just the test on the\r\nReporting Datastream\r\n- [x]
Handle scenario of report document deleted before task begins
or\r\ncompletes\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<[email protected]>","sha":"dfee92802d386a9d1d64e62bd363695a2b860b31"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/192417","number":192417,"mergeCommit":{"message":"[Reporting]
Delete the task from Task Manager when deleting a report (#192417)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/191676\r\n\r\nThis PR ensures
that when a user deletes a report, if there is a backing\r\nTask Manager
task that references the report job, that task is deleted\r\nas well.
With these changes, the report job will truly be \"cancelled\"\r\nwhen a
report is deleted.\r\n\r\nThe risk of not cancelling the Task Manager
task when deleting a report\r\nis evident in server logs and in task
monitoring, when automated tests\r\nare run on the reporting
features.\r\n\r\n### How to test\r\nPrimarily, the API Integration test
for the Reporting Datastream\r\nexercises the changes in this
PR.\r\n\r\n### Checklist\r\n\r\n- [x] Ensure the API/Functional tests
are using the delete report\r\nendpoint rather than directly deleting
reports from the ES store.\r\n* **To prevent this PR from becoming very
large, this assurance is only\r\nneeded for serverless tests, and for
tests that do not wait for the\r\nreport job to be completed before
deleting the report document.**\r\n* Those constrictions narrow the
changes down to just the test on the\r\nReporting Datastream\r\n- [x]
Handle scenario of report document deleted before task begins
or\r\ncompletes\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: kibanamachine
<[email protected]>","sha":"dfee92802d386a9d1d64e62bd363695a2b860b31"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Tim Sullivan <[email protected]>
tsullivan added a commit that referenced this pull request Oct 24, 2024
…#195841)

## Summary

Continuation of #192417. This PR
attempts to further improve task stability of the reporting task. The
original goals were:

1. Ensure the test data that is needed for the report gets loaded
2. Wait for report jobs to finish before the test completes. Errors in
task success metrics also occur if the task triggers after resources for
the report, such as a saved search, are removed before the task
triggers.

During development of this PR, more issues were discovered:

3. Requests to internal endpoints should use cookie credentials
4. The CSV export from ES|QL test was hitting a 404 error when it tried
to download the CSV. That error was included in the test. In other
words, that test was fundamentaly broken.

## Testing locally
1. Run the serverless functional tests:
1. **Reporting management app**: `node scripts/functional_tests.js
--config=x-pack/test_serverless/functional/test_suites/observability/common_configs/config.group1.ts
--grep=Reporting`
1. **CSV export in Discover**: `node scripts/functional_tests.js
--config=x-pack/test_serverless/functional/test_suites/observability/common_configs/config.group6.ts
--grep=CSV`
1. **Reporting API integration tests**: `node
scripts/functional_tests.js
--config=x-pack/test_serverless/api_integration/test_suites/search/common_configs/config.group1.ts
--grep=Reporting`
3. Ensure that there are no error logs from Task Manager regarding task
failure

---------

Co-authored-by: Dzmitry Lemechko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:version Backport to applied version labels Feature:Reporting:Framework Reporting issues pertaining to the overall framework release_note:skip Skip the PR/issue when compiling release notes Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Reporting] Automated setup and teardown causes low report task success metrics
6 participants