Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Firefox test timeouts with latest fast-foundation/fast-element #2456

Closed
m-akinc opened this issue Oct 28, 2024 · 3 comments · Fixed by #2450
Closed

Investigate Firefox test timeouts with latest fast-foundation/fast-element #2456

m-akinc opened this issue Oct 28, 2024 · 3 comments · Fixed by #2450
Assignees

Comments

@m-akinc
Copy link
Contributor

m-akinc commented Oct 28, 2024

🧹 Tech Debt

A few tests (varies per build) seem to be timing out on Firefox on the build agents. The following example comes from a PR build on #2450:

[test-concurrent:nimble-components] [test-firefox]   Table Interactive Column Sizing
[test-concurrent:nimble-components] [test-firefox]     hidden column drag left divider tests 
[test-concurrent:nimble-components] [test-firefox]       ✗ first column hidden, drag second left divider to left results in correct columns widths (5416ms)
[test-concurrent:nimble-components] [test-firefox] 	Error: Timeout - Async function did not complete within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in /home/runner/work/nimble/nimble/node_modules/karma-jasmine/node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 7612)
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 
[test-concurrent:nimble-components] [test-firefox] 
[test-concurrent:nimble-components] [test-firefox]   TableColumnDurationText
[test-concurrent:nimble-components] [test-firefox]     ✗ sets title when cell text is ellipsized (6100ms)
[test-concurrent:nimble-components] [test-firefox] 	Error: Timeout - Async function did not complete within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in /home/runner/work/nimble/nimble/node_modules/karma-jasmine/node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 7612)
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 	<Jasmine>
[test-concurrent:nimble-components] [test-firefox] 
[test-concurrent:nimble-components] [test-firefox] Firefox 125.0 (Linux x86_64) ERROR
[test-concurrent:nimble-components] [test-firefox]   An error was thrown in afterAll
[test-concurrent:nimble-components] [test-firefox]   Expected '' to equal '99 days, 14 hr, 50 min, 22 sec'.
[test-concurrent:nimble-components] [test-firefox]   <Jasmine>
[test-concurrent:nimble-components] [test-firefox]   @/tmp/_karma_webpack_895461/commons.js:109769:52
[test-concurrent:nimble-components] [test-firefox]   <Jasmine>
[test-concurrent:nimble-components] [test-firefox]   <Jasmine>
[test-concurrent:nimble-components] [test-firefox]   <Jasmine>
[test-concurrent:nimble-components] [test-firefox]   <Jasmine>
[test-concurrent:nimble-components] [test-firefox]   <Jasmine>
[test-concurrent:nimble-components] [test-firefox] 
[test-concurrent:nimble-components] [test-firefox] Firefox 125.0 (Linux x86_64): Executed 984 of 5019 (2 FAILED) (skipped 2) ERROR (2 mins 41.22 secs / 2 mins 25.297 secs)

I could not reproduce the failures locally.

We need to determine if it is safe to update Nimble's dependencies to the latest FAST versions. If it is, then we need to un-pin our dependency versions (pinned in #2455).

This issue blocks AZDO 2843309

@m-akinc m-akinc added tech debt triage New issue that needs to be reviewed labels Oct 28, 2024
@rajsite
Copy link
Member

rajsite commented Oct 28, 2024

@m-akinc make sure to inline relevant build logs in the description since the logs are temporary

@m-akinc m-akinc removed the triage New issue that needs to be reviewed label Oct 29, 2024
@m-akinc m-akinc moved this to Current Iteration in Nimble Design System Priorities Oct 29, 2024
@m-akinc m-akinc self-assigned this Nov 1, 2024
@m-akinc
Copy link
Contributor Author

m-akinc commented Nov 2, 2024

I have been able to reproduce this locally, and I have confirmed that the timeouts are due to increasingly long "cycle collections" (CCs) by Firefox. A cycle collection is an operation that attempts to detect and free objects that are being held onto because of reference cycles. It seems to run after a garbage collection. Below is an image from a profile I took. In it, the red bars indicate "jank", which are event processing delays. As time goes on, these delays get longer and longer, Note the breaks in the black/rainbow-ish bands--these correspond to periods where the test run becomes unresponsive. The orange bands at the top are GCs. The wider yellow bands that follow them and the orange "mountains" at the bottom indicate where the CCs are being run.
Image
In contrast, the profile from before the FAST update shows no significant time spent in CCs:
Image
(Ignore the long break with the red bar and blue hill--that was me manually taking a memory snapshot.) CCs are still happening, but they have much less to do, and so take much less time. Here are some details from CCs with and without the FAST update, taken around the same point in the test run.
With the FAST update:
Image
Before the FAST update (i.e. main):
Image
"Suspected Objects" is about 9x larger with the FAST update, and "GC Objects Visited" is over 500x larger!

There are quite a few similar-sounding, CC-related issues reported on Bugzilla, some dating several years back, but others just months or days old. It seems like the Firefox cycle collector is prone to causing hangs over time. My guess is that by fixing the FAST memory leaks, we have now created a lot more garbage for the browser to collect and are now exposed to potential issues with the cycle collector.

I have not yet figured what the next steps should be, or if there is some clever way to work around this (e.g. random thought: maybe we could hold references to the documents used by the tests to mimic the leaky world we lived in before the FAST update).

@m-akinc
Copy link
Contributor Author

m-akinc commented Nov 5, 2024

Previously, I was running 500 insteances of a table selection test (i.e. 500 separate it statements). I tried the same thing with a single test that ran the setup and test logic 500x in a loop. I got basically the same results:
image
However, I discovered that Firefox is also leaking memory while the test runs. Memory usage climbs from 60ish MB to 900+ MB. The same thing in Chrome goes from 80 MB to 84 MB.

In fact, even in a modified version of the Angular example app (modified to just constantly create a new multi-column table from a template, add it to the body, then remove it 100ms later), Firefox has the same leaky behavior as the tests. I also saw periodically unresponsiveness at some point, but I cannot reproduce it now. The same thing running in Chrome does not leak any memory.

The good news, though, is this is apparently unrelated to the FAST update. I tested the same modified Angular example app with the current versions of the FAST packages, and I see the same memory growth.

I also ran a SystemLinkShared pipeline build with the newer versions of the FAST packages, and the test time was in-line with historical times (i.e. 6.5-7min).

Based on all of this, it should be safe to unpin and bump our FAST dependency versions. We also need to bump up the timeout for Firefox tests from 5s to, say, 10s.

@rajsite rajsite mentioned this issue Nov 5, 2024
1 task
@m-akinc m-akinc closed this as completed Nov 5, 2024
rajsite added a commit that referenced this issue Nov 5, 2024
# Pull Request

## 🤨 Rationale

Update FAST versions to include the following fixes:
- microsoft/fast#7022
- microsoft/fast#6960

And handle the following NI issues:
- Fixes #1661
- https://ni.visualstudio.com/DevCentral/_workitems/edit/2843309

## 👩‍💻 Implementation

Updates the Fast Foundation version. A test performance regression was
found in all browsers but particularly in Firefox, however [after
investigation by @m-akinc](#2456) it
was found to be an acceptable change. Fixes
#2456

## 🧪 Testing

Extensive testing done in #2456

## ✅ Checklist

- [x] I have updated the project documentation to reflect my changes or
determined no changes are needed.

---------

Co-authored-by: Mert Akinc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

2 participants