-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix silently failing journeys #140074
fix silently failing journeys #140074
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one suggestion. Also, can you please provide some context for this in the PR's body?
x-pack/test/performance/utils.ts
Outdated
export function serializeApmGlobalLabels(obj: any) { | ||
return Object.entries(obj) | ||
.filter(([, v]) => !!v) | ||
.reduce((acc, [k, v]) => (acc ? `${acc},${k}=${v}` : `${k}=${v}`), ''); | ||
} | ||
|
||
export async function waitForVisualizations(page: Page, visCount: number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this raise an exception after a (configurable) timeout? This would fail the build instead of leaving it hanging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly ! However I believe better solution for this to use Playwright waitForFunction function which has both timeout mechanism and different frequency settings (default one is animation frame). As I mentioned in PR description depending on solution we would decide (If we can disable CSP pros/cons of it) I'd either delete this completely or update with timeout/frequency logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of disabling CSP, but only if we can confirm with @elastic/kibana-security that by doing so we will just be opening up the ability to write better tests, and that we won't be impacting the performance characteristics of Kibana in a serious way.
Additionally, @elastic/kibana-security, do you know if we'll be able to disable CSP on cloud instances? I'm sure that we will want to execute these tests against cloud instances at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping!
but only if we can confirm with https://github.com/orgs/elastic/teams/kibana-security that by doing so we will just be opening up the ability to write better tests, and that we won't be impacting the performance characteristics of Kibana in a serious way.
I don't think disabling\enabling CSP would impact Kibana performance in any way.
Additionally, https://github.com/orgs/elastic/teams/kibana-security, do you know if we'll be able to disable CSP on cloud instances? I'm sure that we will want to execute these tests against cloud instances at some point.
If you just need to set csp.strict
(and potentially csp.warnLegacyBrowsers
) to false
to make Kibana play well with bypassCSP
, then it should be possible to do for a Cloud deployment as well (I see these settings in the ESS allow-list).
But, do we know what part of the default CSP rules is interfering with Playwright exactly? Maybe we can just relax CSP a bit with additional directives in csp.script_src
(or other CSP directives listed here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, do we know what part of the default CSP rules is interfering with Playwright exactly? Maybe we can just relax CSP a bit with additional directives in csp.script_src (or other CSP directives listed here)?
This is certainly an option for a targeted subset of our functional tests, but I want to make sure that we don't relax the CSP for all functional tests. IMO, we need our test suites to verify that code changes don't violate our default CSP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing --csp.strict=false
is working fine for performance journeys and allows us to use waitForFunction. The only nit is that we get some alertbox during journeys
which is not super important imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only nit is that we get some alertbox during journeys
This should do the trick:
csp.warnLegacyBrowsers: false
kibana/docs/setup/settings.asciidoc
Lines 76 to 80 in 203d26e
`csp.warnLegacyBrowsers`:: | |
Shows a warning message after loading {kib} to any browser that does not | |
enforce even rudimentary CSP rules, though {kib} is still accessible. This | |
configuration is effectively ignored when <<csp-strict, `csp.strict`>> is enabled. | |
*Default: `true`* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is certainly an option for a targeted subset of our functional tests, but I want to make sure that we don't relax the CSP for all functional tests
@legrego we're currently only working with performance tests, which aren't designed to test features, just use the product in some way to get it to produce telemetry.
Sounds like we've found our solution @suchcodemuchwow :)
Pinging @elastic/kibana-operations (Team:Operations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for creating that waitForVisualization
function!
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
); | ||
return visualizationElementsLoaded && visualizationAnimationsFinished; | ||
}); | ||
await waitForVisualizations(page, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test finishes right after the panel got rendered - as it's the last step in the journey the browser closes right after. Does this mean there is a chance the browser won't have time to send out the telemetry event? Maybe we should wait a few seconds afterwards to make sure everything is captured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a lifecycle hook which is responsible for browser shutdown and waits for APM to flush . Not really sure if it's enough for the telemetry too. Maybe @afharo can give some insights about that but afaik telemetry events from browser is being sent from server side so closing browser prematurely won't drop the events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the events still need to reach the server - the dashboard loading happens in the browser so the browser needs to "send" the event to the server :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I believe the latency is really low and most probably we won't have so many dropped events it is still possible as you mentioned, so I agree 👍🏽. However not really sure if we have mechanism to ensure all events are sent successfully and it's ready shutdown, if we don't have some check for that we will probably end up adding arbitrary delay at the end of journeys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that when a step completes we can wait for both:
- 3 seconds
- any requests to the telemetry endpoint that started before the step ended, or during this 3 second window, to complete.
If no requests to the telemetry endpoint are started in the 3 second period then telemetry was sent before the step completed. I'd be happy to help you get this logic implemented quickly @suchcodemuchwow. @afharo can you tell us how we can identify which requests are "telementry shipping" requests? What hostname pattern should we look for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They look like this:
Request URL: https://telemetry-staging.elastic.co/v3/send/kibana-browser
Request Method: POST
Status Code: 200
Remote Address: [2600:1901:0:2fb7::]:443
Referrer Policy: no-referrer-when-downgrad
:authority: telemetry-staging.elastic.co
:method: POST
:path: /v3/send/kibana-browser
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8,de;q=0.7
content-length: 1103
content-type: application/x-ndjson
origin: http://localhost:5601
referer: http://localhost:5601/app/dashboards
sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
x-elastic-cluster-id: FUf8kO2WR9uDG1ehZcC88A
x-elastic-license-id: 77941f60-b2dc-46d2-b9a0-0767b5a73a3e
x-elastic-stack-version: 8.5.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR aims to fix journeys silently failing because of UI changes:
It is also found that playwright's page.waitForFunction() which we are using to detect if all animations are finished in dashboard pages does not work anymore for Kibana since Kibana's CSP policy does not allow us to run this function. There are some suggestions from playwright team which suggests to use bypass-csp option for the context. However enabling this option is also not really helping the case since Kibana explicitly redirects you to "Browser Update" page. There is a hacky solution which is currently working well for our purposes. But it might be required us to disable CSP if we want to use Playwright API for this type of operations.