Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix silently failing journeys #140074

Merged
merged 5 commits into from
Sep 7, 2022
Merged

fix silently failing journeys #140074

merged 5 commits into from
Sep 7, 2022

Conversation

suchcodemuchwow
Copy link
Contributor

@suchcodemuchwow suchcodemuchwow commented Sep 6, 2022

This PR aims to fix journeys silently failing because of UI changes:
ss

It is also found that playwright's page.waitForFunction() which we are using to detect if all animations are finished in dashboard pages does not work anymore for Kibana since Kibana's CSP policy does not allow us to run this function. There are some suggestions from playwright team which suggests to use bypass-csp option for the context. However enabling this option is also not really helping the case since Kibana explicitly redirects you to "Browser Update" page. There is a hacky solution which is currently working well for our purposes. But it might be required us to disable CSP if we want to use Playwright API for this type of operations.

@suchcodemuchwow suchcodemuchwow added Team:Operations Team label for Operations Team release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting v8.5.0 labels Sep 6, 2022
@suchcodemuchwow suchcodemuchwow requested a review from a team September 6, 2022 11:02
@suchcodemuchwow suchcodemuchwow self-assigned this Sep 6, 2022
@lizozom lizozom linked an issue Sep 6, 2022 that may be closed by this pull request
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one suggestion. Also, can you please provide some context for this in the PR's body?

export function serializeApmGlobalLabels(obj: any) {
return Object.entries(obj)
.filter(([, v]) => !!v)
.reduce((acc, [k, v]) => (acc ? `${acc},${k}=${v}` : `${k}=${v}`), '');
}

export async function waitForVisualizations(page: Page, visCount: number) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this raise an exception after a (configurable) timeout? This would fail the build instead of leaving it hanging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly ! However I believe better solution for this to use Playwright waitForFunction function which has both timeout mechanism and different frequency settings (default one is animation frame). As I mentioned in PR description depending on solution we would decide (If we can disable CSP pros/cons of it) I'd either delete this completely or update with timeout/frequency logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of disabling CSP, but only if we can confirm with @elastic/kibana-security that by doing so we will just be opening up the ability to write better tests, and that we won't be impacting the performance characteristics of Kibana in a serious way.

Additionally, @elastic/kibana-security, do you know if we'll be able to disable CSP on cloud instances? I'm sure that we will want to execute these tests against cloud instances at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping!

but only if we can confirm with https://github.com/orgs/elastic/teams/kibana-security that by doing so we will just be opening up the ability to write better tests, and that we won't be impacting the performance characteristics of Kibana in a serious way.

I don't think disabling\enabling CSP would impact Kibana performance in any way.

Additionally, https://github.com/orgs/elastic/teams/kibana-security, do you know if we'll be able to disable CSP on cloud instances? I'm sure that we will want to execute these tests against cloud instances at some point.

If you just need to set csp.strict (and potentially csp.warnLegacyBrowsers) to false to make Kibana play well with bypassCSP, then it should be possible to do for a Cloud deployment as well (I see these settings in the ESS allow-list).

But, do we know what part of the default CSP rules is interfering with Playwright exactly? Maybe we can just relax CSP a bit with additional directives in csp.script_src (or other CSP directives listed here)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, do we know what part of the default CSP rules is interfering with Playwright exactly? Maybe we can just relax CSP a bit with additional directives in csp.script_src (or other CSP directives listed here)?

This is certainly an option for a targeted subset of our functional tests, but I want to make sure that we don't relax the CSP for all functional tests. IMO, we need our test suites to verify that code changes don't violate our default CSP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing --csp.strict=false is working fine for performance journeys and allows us to use waitForFunction. The only nit is that we get some alertbox during journeys
Screenshot 2022-09-07 at 14 56 24
which is not super important imho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only nit is that we get some alertbox during journeys

This should do the trick:

csp.warnLegacyBrowsers: false

`csp.warnLegacyBrowsers`::
Shows a warning message after loading {kib} to any browser that does not
enforce even rudimentary CSP rules, though {kib} is still accessible. This
configuration is effectively ignored when <<csp-strict, `csp.strict`>> is enabled.
*Default: `true`*

Copy link
Contributor

@spalger spalger Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly an option for a targeted subset of our functional tests, but I want to make sure that we don't relax the CSP for all functional tests

@legrego we're currently only working with performance tests, which aren't designed to test features, just use the product in some way to get it to produce telemetry.


Sounds like we've found our solution @suchcodemuchwow :)

@suchcodemuchwow suchcodemuchwow marked this pull request as ready for review September 7, 2022 12:16
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

Copy link
Contributor

@lizozom lizozom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for creating that waitForVisualization function!

@suchcodemuchwow
Copy link
Contributor Author

@elasticmachine merge upstream

@suchcodemuchwow suchcodemuchwow enabled auto-merge (squash) September 7, 2022 12:34
@suchcodemuchwow
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @suchcodemuchwow

@suchcodemuchwow suchcodemuchwow merged commit fc51795 into elastic:main Sep 7, 2022
);
return visualizationElementsLoaded && visualizationAnimationsFinished;
});
await waitForVisualizations(page, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test finishes right after the panel got rendered - as it's the last step in the journey the browser closes right after. Does this mean there is a chance the browser won't have time to send out the telemetry event? Maybe we should wait a few seconds afterwards to make sure everything is captured?

Copy link
Contributor Author

@suchcodemuchwow suchcodemuchwow Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a lifecycle hook which is responsible for browser shutdown and waits for APM to flush . Not really sure if it's enough for the telemetry too. Maybe @afharo can give some insights about that but afaik telemetry events from browser is being sent from server side so closing browser prematurely won't drop the events.

Copy link
Contributor

@flash1293 flash1293 Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the events still need to reach the server - the dashboard loading happens in the browser so the browser needs to "send" the event to the server :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I believe the latency is really low and most probably we won't have so many dropped events it is still possible as you mentioned, so I agree 👍🏽. However not really sure if we have mechanism to ensure all events are sent successfully and it's ready shutdown, if we don't have some check for that we will probably end up adding arbitrary delay at the end of journeys.

Anyone has an idea on this ? cc: @spalger , @afharo

Copy link
Contributor

@spalger spalger Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that when a step completes we can wait for both:

  • 3 seconds
  • any requests to the telemetry endpoint that started before the step ended, or during this 3 second window, to complete.

If no requests to the telemetry endpoint are started in the 3 second period then telemetry was sent before the step completed. I'd be happy to help you get this logic implemented quickly @suchcodemuchwow. @afharo can you tell us how we can identify which requests are "telementry shipping" requests? What hostname pattern should we look for?

Copy link
Contributor

@flash1293 flash1293 Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They look like this:

Request URL: https://telemetry-staging.elastic.co/v3/send/kibana-browser
Request Method: POST
Status Code: 200 
Remote Address: [2600:1901:0:2fb7::]:443
Referrer Policy: no-referrer-when-downgrad
:authority: telemetry-staging.elastic.co
:method: POST
:path: /v3/send/kibana-browser
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8,de;q=0.7
content-length: 1103
content-type: application/x-ndjson
origin: http://localhost:5601
referer: http://localhost:5601/app/dashboards
sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
x-elastic-cluster-id: FUf8kO2WR9uDG1ehZcC88A
x-elastic-license-id: 77941f60-b2dc-46d2-b9a0-0767b5a73a3e
x-elastic-stack-version: 8.5.0

Copy link
Contributor

@flash1293 flash1293 Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good call to wait for them - they tend to be relatively slow because they wait for the events to be sent out:
Screenshot 2022-09-08 at 11 28 26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Operations Team label for Operations Team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kibana Performance - Overall shows no dashboard-loaded events
10 participants