Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data dependent on security breaks tests #123720

Closed
Dosant opened this issue Jan 25, 2022 · 8 comments · Fixed by #128730
Closed

data dependent on security breaks tests #123720

Dosant opened this issue Jan 25, 2022 · 8 comments · Fixed by #128730
Labels
impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:medium Medium Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@Dosant
Copy link
Contributor

Dosant commented Jan 25, 2022

TLDR

  • Adding security plugin as an optional dependency of data breaks tests on CI 🔴
  • @elastic/kibana-app-services needs this to get rid of data_enhanced to simplify code [data.search] remove data_enhanced #119321.
  • Minimal reproduction: remove deps on data plugin #121955
    • Change in data/kibana.json breaks CI
    • Plugin dependency topology doesn't change
    • It seems like tests fail because es is slower to respond
    • Adding a delay between es/kibana spin up and running tests significantly stabilizes tests
  • I (as @elastic/kibana-app-services owner of this) run out of troubleshooting ideas. With help from @azasypkin, @afharo, @lukeelmers wasn't able to get to the root cause in a reasonable time. Looking for more attention or complete ownership takeover by another team (operations? qa? core?)

Blocks #119321, #122075

@elastic/kibana-app-services wants to clean up their code by removing *_enhanced plugins. This work was unblocked since src/ plugins can optionally depend on x-pack/ plugins.
While working on removing data_enhanced in #122075 we discovered that adding the security plugin as an optional dependency of data plugin fails CI causing numerous tests to become.

You can find minimal reproduction in this pr: #121955

  1. First in this pr we fix some circular dependencies that would occur between data <-> security
  2. We run CI multiple times. All times CI is stable green 🟢
  3. Then we add security plugin as a dependency of data plugin. Just changing kibana.json and not even using the security's plugin contract inside data plugin. df7151a
  4. Tests starting failing 🔴

Screen Shot 2022-01-06 at 16 47 25

What is failing?:

There are a lot of tests that fail. Some pass during 2nd attempt, some not. More often than not it seems like tests are related to security, but this isn't necessary.

  • One of the ofter failing tests is status page should display the server status. Kibana reports Yellow status when Green is expected. After adding more logging I saw that taskManager is causing Yellow status.
    image
    logs
  • Ofter failing tests look like Kibana or a bundle is not loaded properly
    image
    logs
  • Some of security/reporting are often failing, for example: Reporting Functional Tests with Security enabled Security with reporting_user built-in role Dashboard: Download CSV file does not allow user that does not have reporting privileges. It fails on page load with timeout. logs. It is also often SAML / OIDC related tests that are likely to fail
  • Sometimes the failure is: error: Setup lifecycle of "infra" plugin wasn't completed in 10sec. Consider disabling the plugin and re-start. I guess this just highlights that a bundle wasn't loaded on time

What we've tried:

  1. The First guess was that change in kibana.json caused plugin dependency topology change which discovered some race conditions. This doesn't seem to be the case, because plugin topology didn't change. We can see that from a log message from core: [plugins-system.standard] Setting up [113] plugins:... that reports plugin in the identical order - before and after the change. internal thread with @lukeelmers
  2. Tried to make security plugin a dependency of a different OSS plugin (discover) - didn't reproduce the issue. So we know the issue is reproduced on data but not on discover
  3. Tried to reproduce locally. Both with dev and prod build. No reliable reproduction was found. But @afharo was able to reproduce it once locally running tests in a loop thred
  4. After adding debug logs for security we've found that it looks like elasticsearch is very slow to respond. (Like it is busy with something?). @azasypkin found that this test fails because it looks like creating a security token took more than 30 seconds, and when the test was able to continue the token has already expired.
const { access_token: accessToken } = await es.security.getToken({
        body: {
          grant_type: 'password',
          username: 'elastic',
          password: 'changeme',
        },
      });

// 30 seconds delay between calls
// accessToken is only valid for 15 seconds. 
// so test fails

      await supertest
        .get('/internal/security/me')
        .set('kbn-xsrf', 'true')
        .set('authorization', `Bearer ${accessToken}`)
        .expect(200);
  1. We've added a delay between spinning up es and kibana and running the tests 2da369f. This certainly made tests a lot more stable 🟢 🟡 🟢 . I guess this means that the introduced change causes issues just after the start-up.

I (as @elastic/kibana-app-services owner of this) run out of troubleshooting ideas. With help from @azasypkin, @afharo, @lukeelmers wasn't able to get to the root cause in a reasonable time. Looking for more attention or complete ownership takeover by another team (operations? qa? core?)

@Dosant Dosant changed the title security as a dependency of data breaks tests 🔴 security as a dependency of data breaks tests Jan 25, 2022
@botelastic botelastic bot added the needs-team Issues missing a team label label Jan 25, 2022
@Dosant Dosant changed the title security as a dependency of data breaks tests data dependent on security breaks tests Jan 25, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 26, 2022
@exalate-issue-sync exalate-issue-sync bot added impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:small Small Level of Effort labels Jan 31, 2022
@exalate-issue-sync exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:small Small Level of Effort labels Feb 7, 2022
@afharo
Copy link
Member

afharo commented Feb 7, 2022

Running my own tests in #124682. I noticed 2 potential issues here:

  1. Installing artifacts on startup creates too much load on ES/Kibana, making early authentication requests to take longer than expected.
  2. There's a circular dependency in the UI.

Installing artifacts on startup

I have the suspicion that installing artifacts on startup is causing too much stress on ES/Kibana, and it makes them take longer to reply to the early authentication requests while the artifacts are still being installed.

I validated it locally by:

  1. Removing the Promise.all in favour of running each step sequentially. It reduced the test rejects expired access token via authorization Bearer header by almost 15s.
  2. Using the status API, forcing the plugins installing artifacts to report YELLOW until done installing them. The test suites wait until GREEN to start.

IMO, it still makes sense to run them concurrently so the setup ends ASAP. So I went for the status API approach in #124682.

It fixed the server-side FTRs, but the UI based tests still take longer than expected. And, worst of all, Jest Integration tests OOM when running Fleet's suites 😭

Circular dependency in the UI

Looking at the UI tests, I noticed that adding security as a dependency of data introduces a circular dependency because security depends on home -> customIntegrations -> presentationUtil -> savedObjects (the plugin) -> data.

The reason presentationUtil depends on savedObjects is to import a modal, so I tried moving that dependency to requiredBundles. However, I'm not entirely sure how this may affect, since the circular dependency is still in place.

Bearing in mind that the plugin savedObjects is deprecated and should have been removed in 8.0, I'd recommend migrating away from it to solve the circular dependency.

I created #124796 to remove the circular dependency by copying the deprecated modal from savedObjects into presentationUtil. However, there are more circular dependencies that we should solve:

➜  kibana git:(data-depends-on-security) ✗ node scripts/find_plugins_with_circular_plugin_deps
 warn Circular refs detected: [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [home]
 warn Circular refs detected: [management] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [management]
 warn Circular refs detected: [advancedSettings] --> [management] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [spaces] --> [advancedSettings]
 warn Circular refs detected: [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [spaces] --> [home]
 warn Circular refs detected: [management] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [spaces] --> [management]
 warn Circular refs detected: [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [management] --> [home]
 warn Circular refs detected: [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [spaces] --> [advancedSettings] --> [home]
 warn Circular refs detected: [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [home] --> [customIntegrations] --> [presentationUtil]
 warn Circular refs detected: [savedObjects] --> [data] --> [security] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects]
 warn Circular refs detected: [embeddable] --> [savedObjects] --> [data] --> [security] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable]
 warn Circular refs detected: [data] --> [security] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data]
 warn Circular refs detected: [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [home] --> [customIntegrations]
 warn Circular refs detected: [spaces] --> [advancedSettings] --> [management] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [spaces]
 warn Circular refs detected: [management] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security] --> [spaces] --> [advancedSettings] --> [management]
 warn Circular refs detected: [security] --> [home] --> [customIntegrations] --> [presentationUtil] --> [embeddable] --> [savedObjects] --> [data] --> [security]

NB: I created node scripts/find_plugins_with_circular_plugin_deps to track requiredPlugins + optionalPlugins + requiredBundles circular deps. I'll create a PR tomorrow for future reference.

@stacey-gammon
Copy link
Contributor

Is this actively being worked on?

@ppisljar - I think this is the issue to follow for unblocking removing of data_enhanced plugin.

@ppisljar
Copy link
Member

@stacey-gammon yes

@Dosant Dosant added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc and removed Team:AppServicesSv labels Mar 14, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@Dosant
Copy link
Contributor Author

Dosant commented Mar 24, 2022

Circular dependency in the UI

In this commit, I've fixed the requiredBundles dependency between customIntegrations -> presentationUtil.

I've used @afharo's tool to check that there are no circular dependencies anymore.

Unfortunately, this didn't help . Looks like our problem is unrelated to requiredBundles circular dependencies

@afharo
Copy link
Member

afharo commented Mar 24, 2022

😢 ACK! Thanks for removing the dependency. I think I may be able to take another look next week

@Dosant
Copy link
Contributor Author

Dosant commented Mar 29, 2022

Looks like the issue is resolved 🎉
We have 3 subsequent no flaky runs #121955 (comment)

@afharo suggests that #128324 that improved server event-loop delay and fixed memory leak helped #121955 (comment)

It is still not exactly clear to me why before the fix main didn't have the same flakiness as the pr that reproduced the issue.

But this is already good enough and unblocks us with data depending on security and then removal of data_enhanced

I plan to close this issue with the pr adding data -> security dependency.

@Dosant
Copy link
Contributor Author

Dosant commented Mar 31, 2022

#128730 is merged and should unblock #119321
Hope it's stable :D

Thanks everyone for all the help 👏 👏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:medium Medium Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants