CT: Error: Failed to launch the browser process! #150

zepumph · 2022-06-24T15:56:30Z

I think this has something to do with bayes and the number of puppeteer instances on it, but I'm not really sure. I'll have to take a look.

This looks to be exactly the same as puppeteer/puppeteer#6757 (except docker). I'll see if there is something I need to do better about closing it down.


arithmetic : build
Build failed with status code 1:
Running "report-media" task

Running "clean" task

Running "build" task
>> TypeScript compilation complete: 24665ms
Building runnable repository (arithmetic, brands: phet, phet-io)
Building brand: phet
>> Webpack build complete: 6747ms
>> Production minification complete: 35456ms (2241851 bytes)
>> Debug minification complete: 0ms (20575992 bytes)
Building brand: phet-io
>> Webpack build complete: 5473ms
>> Production minification complete: 30259ms (2275428 bytes)
>> Debug minification complete: 29481ms (2586396 bytes)
No migration rules found at ../chipper/dist/js/phet-io-sim-specific/repos/arithmetic/js/arithmetic-migration-rules.js, no rules to be bundled with phet-io.js.
No client guides found at ../phet-io-sim-specific/repos/arithmetic/client-guide/, no guides being built.
>> TypeScript compilation complete: 6355ms
Fatal error: Perennial task failed:
Error: Failed to launch the browser process!
[0624/084852.594901:FATAL:zygote_host_impl_linux.cc(174)] Check failed: process.IsValid(). Failed to launch zygote process
#0 0x558cadb82469 base::debug::CollectStackTrace()
#1 0x558cadae8f33 base::debug::StackTrace::StackTrace()
#2 0x558cadafbd50 logging::LogMessage::~LogMessage()
#3 0x558cadafc90e logging::LogMessage::~LogMessage()
#4 0x558cabbbcb08 content::ZygoteHostImpl::LaunchZygote()
#5 0x558cad696a30 content::(anonymous namespace)::LaunchZygoteHelper()
#6 0x558cab21b6cb content::ZygoteCommunication::Init()
#7 0x558cab21bcd4 content::CreateGenericZygote()
#8 0x558cad695e95 content::ContentMainRunnerImpl::Initialize()
#9 0x558cad693eb9 content::RunContentProcess()
#10 0x558cad69400e content::ContentMain()
#11 0x558cad6eefca headless::(anonymous namespace)::RunContentMain()
#12 0x558cad6eecd5 headless::HeadlessShellMain()
#13 0x558caa2e0668 ChromeMain
#14 0x7f55b3e79555 __libc_start_main
#15 0x558caa2e04aa _start

Received signal 6
#0 0x558cadb82469 base::debug::CollectStackTrace()
#1 0x558cadae8f33 base::debug::StackTrace::StackTrace()
#2 0x558cadb81f71 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7f55b85f6630 (/usr/lib64/libpthread-2.17.so+0xf62f)
#4 0x7f55b3e8d387 __GI_raise
#5 0x7f55b3e8ea78 __GI_abort
#6 0x558cadb811f5 base::debug::BreakDebuggerAsyncSafe()
#7 0x558cadafc221 logging::LogMessage::~LogMessage()
#8 0x558cadafc90e logging::LogMessage::~LogMessage()
#9 0x558cabbbcb08 content::ZygoteHostImpl::LaunchZygote()
#10 0x558cad696a30 content::(anonymous namespace)::LaunchZygoteHelper()
#11 0x558cab21b6cb content::ZygoteCommunication::Init()
#12 0x558cab21bcd4 content::CreateGenericZygote()
#13 0x558cad695e95 content::ContentMainRunnerImpl::Initialize()
#14 0x558cad693eb9 content::RunContentProcess()
#15 0x558cad69400e content::ContentMain()
#16 0x558cad6eefca headless::(anonymous namespace)::RunContentMain()
#17 0x558cad6eecd5 headless::HeadlessShellMain()
#18 0x558caa2e0668 ChromeMain
#19 0x7f55b3e79555 __libc_start_main
#20 0x558caa2e04aa _start
r8: 0000000000000000 r9: 0000000000000300 r10: 0000000000000008 r11: 0000000000000206
r12: 000039d800300140 r13: 000039d800300158 r14: 000039d800300150 r15: 000039d800221400
di: 0000000000004fa6 si: 0000000000004fa6 bp: 00007ffc4a797e90 bx: 00007ffc4a7986d0
dx: 0000000000000006 ax: 0000000000000000 cx: ffffffffffffffff sp: 00007ffc4a797d58
ip: 00007f55b3e8d387 efl: 0000000000000206 cgf: aaaa000000000033 erf: 0000000000000000
trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]

TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

at onClose (/data/share/phet/continuous-testing/ct-snapshots/1656077734615/chipper/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:241:20)
at ChildProcess.<anonymous> (/data/share/phet/continuous-testing/ct-snapshots/1656077734615/chipper/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:232:79)
at ChildProcess.emit (events.js:327:22)
at Process.ChildProcess._handle.onexit (internal/child_process.js:277:12)
Full Error details:
Error: Failed to launch the browser process!
[0624/084852.594901:FATAL:zygote_host_impl_linux.cc(174)] Check failed: process.IsValid(). Failed to launch zygote process
#0 0x558cadb82469 base::debug::CollectStackTrace()
#1 0x558cadae8f33 base::debug::StackTrace::StackTrace()
#2 0x558cadafbd50 logging::LogMessage::~LogMessage()
#3 0x558cadafc90e logging::LogMessage::~LogMessage()
#4 0x558cabbbcb08 content::ZygoteHostImpl::LaunchZygote()
#5 0x558cad696a30 content::(anonymous namespace)::LaunchZygoteHelper()
#6 0x558cab21b6cb content::ZygoteCommunication::Init()
#7 0x558cab21bcd4 content::CreateGenericZygote()
#8 0x558cad695e95 content::ContentMainRunnerImpl::Initialize()
#9 0x558cad693eb9 content::RunContentProcess()
#10 0x558cad69400e content::ContentMain()
#11 0x558cad6eefca headless::(anonymous namespace)::RunContentMain()
#12 0x558cad6eecd5 headless::HeadlessShellMain()
#13 0x558caa2e0668 ChromeMain
#14 0x7f55b3e79555 __libc_start_main
#15 0x558caa2e04aa _start

Received signal 6
#0 0x558cadb82469 base::debug::CollectStackTrace()
#1 0x558cadae8f33 base::debug::StackTrace::StackTrace()
#2 0x558cadb81f71 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7f55b85f6630 (/usr/lib64/libpthread-2.17.so+0xf62f)
#4 0x7f55b3e8d387 __GI_raise
#5 0x7f55b3e8ea78 __GI_abort
#6 0x558cadb811f5 base::debug::BreakDebuggerAsyncSafe()
#7 0x558cadafc221 logging::LogMessage::~LogMessage()
#8 0x558cadafc90e logging::LogMessage::~LogMessage()
#9 0x558cabbbcb08 content::ZygoteHostImpl::LaunchZygote()
#10 0x558cad696a30 content::(anonymous namespace)::LaunchZygoteHelper()
#11 0x558cab21b6cb content::ZygoteCommunication::Init()
#12 0x558cab21bcd4 content::CreateGenericZygote()
#13 0x558cad695e95 content::ContentMainRunnerImpl::Initialize()
#14 0x558cad693eb9 content::RunContentProcess()
#15 0x558cad69400e content::ContentMain()
#16 0x558cad6eefca headless::(anonymous namespace)::RunContentMain()
#17 0x558cad6eecd5 headless::HeadlessShellMain()
#18 0x558caa2e0668 ChromeMain
#19 0x7f55b3e79555 __libc_start_main
#20 0x558caa2e04aa _start
r8: 0000000000000000 r9: 0000000000000300 r10: 0000000000000008 r11: 0000000000000206
r12: 000039d800300140 r13: 000039d800300158 r14: 000039d800300150 r15: 000039d800221400
di: 0000000000004fa6 si: 0000000000004fa6 bp: 00007ffc4a797e90 bx: 00007ffc4a7986d0
dx: 0000000000000006 ax: 0000000000000000 cx: ffffffffffffffff sp: 00007ffc4a797d58
ip: 00007f55b3e8d387 efl: 0000000000000206 cgf: aaaa000000000033 erf: 0000000000000000
trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]

TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md
�
Snapshot from 6/24/2022, 7:35:34 AM

The text was updated successfully, but these errors were encountered:

zepumph · 2022-06-24T15:56:59Z

Likely puppeteer/puppeteer#1981 is a byproduct of this error.

zepumph · 2022-06-24T18:51:35Z

I was also able to reproduce this just by going onto bayes and building arithmetic --brands=phet-io in the continuous-client repos. It worked when I stopped all pm2 processes. Something with continuous-client is not working well over a long period of time.

zepumph · 2022-06-27T15:59:23Z

I think @mattpen and I got to the bottom of this. The new puppeteer client seems to be orphaning chrome instances when it stops or crashes, and so this morning, when all pm2 processes were stopped, we found 575 chrome instances still running. I will work on better cleanup/error handling for the puppeteer client.

Also noting that once orphaned chrome instances were killed off, I was able to successfully build arithmetic in the same while while all pm2 processes were running.

KatieWoe · 2022-06-29T15:40:51Z

A number of repos are showing similar errors this morning (6/29/22)
Example:

arithmetic : build
Build failed with status code 1:
Running "report-media" task

Running "clean" task

Running "build" task
>> TypeScript compilation complete: 21831ms
Building runnable repository (arithmetic, brands: phet, phet-io)
Building brand: phet
>> Webpack build complete: 6363ms
>> Production minification complete: 33441ms (2241784 bytes)
>> Debug minification complete: 1ms (20575041 bytes)
Building brand: phet-io
>> Webpack build complete: 4835ms
>> Production minification complete: 33991ms (2275351 bytes)
>> Debug minification complete: 39640ms (2586496 bytes)
No migration rules found at ../chipper/dist/js/phet-io-sim-specific/repos/arithmetic/js/arithmetic-migration-rules.js, no rules to be bundled with phet-io.js.
No client guides found at ../phet-io-sim-specific/repos/arithmetic/client-guide/, no guides being built.
>> TypeScript compilation complete: 5083ms
Fatal error: Perennial task failed:
Error: Failed to launch the browser process!
[0629/065748.409687:FATAL:zygote_host_impl_linux.cc(174)] Check failed: process.IsValid(). Failed to launch zygote process
#0 0x5589efb53469 base::debug::CollectStackTrace()
#1 0x5589efab9f33 base::debug::StackTrace::StackTrace()
#2 0x5589efaccd50 logging::LogMessage::~LogMessage()
#3 0x5589efacd90e logging::LogMessage::~LogMessage()
#4 0x5589edb8db08 content::ZygoteHostImpl::LaunchZygote()
#5 0x5589ef667a30 content::(anonymous namespace)::LaunchZygoteHelper()
#6 0x5589ed1ec6cb content::ZygoteCommunication::Init()
#7 0x5589ed1eccd4 content::CreateGenericZygote()
#8 0x5589ef666e95 content::ContentMainRunnerImpl::Initialize()
#9 0x5589ef664eb9 content::RunContentProcess()
#10 0x5589ef66500e content::ContentMain()
#11 0x5589ef6bffca headless::(anonymous namespace)::RunContentMain()
#12 0x5589ef6bfcd5 headless::HeadlessShellMain()
#13 0x5589ec2b1668 ChromeMain
#14 0x7fbd63230555 __libc_start_main
#15 0x5589ec2b14aa _start

Received signal 6
#0 0x5589efb53469 base::debug::CollectStackTrace()
#1 0x5589efab9f33 base::debug::StackTrace::StackTrace()
#2 0x5589efb52f71 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7fbd679ad630 (/usr/lib64/libpthread-2.17.so+0xf62f)
#4 0x7fbd63244387 __GI_raise
#5 0x7fbd63245a78 __GI_abort
#6 0x5589efb521f5 base::debug::BreakDebuggerAsyncSafe()
#7 0x5589efacd221 logging::LogMessage::~LogMessage()
#8 0x5589efacd90e logging::LogMessage::~LogMessage()
#9 0x5589edb8db08 content::ZygoteHostImpl::LaunchZygote()
#10 0x5589ef667a30 content::(anonymous namespace)::LaunchZygoteHelper()
#11 0x5589ed1ec6cb content::ZygoteCommunication::Init()
#12 0x5589ed1eccd4 content::CreateGenericZygote()
#13 0x5589ef666e95 content::ContentMainRunnerImpl::Initialize()
#14 0x5589ef664eb9 content::RunContentProcess()
#15 0x5589ef66500e content::ContentMain()
#16 0x5589ef6bffca headless::(anonymous namespace)::RunContentMain()
#17 0x5589ef6bfcd5 headless::HeadlessShellMain()
#18 0x5589ec2b1668 ChromeMain
#19 0x7fbd63230555 __libc_start_main
#20 0x5589ec2b14aa _start
r8: 0000000000000000 r9: 0000000000000300 r10: 0000000000000008 r11: 0000000000000206
r12: 0000187600300140 r13: 0000187600300158 r14: 0000187600300150 r15: 0000187600221400
di: 000000000000975f si: 000000000000975f bp: 00007ffc57156180 bx: 00007ffc571569c0
dx: 0000000000000006 ax: 0000000000000000 cx: ffffffffffffffff sp: 00007ffc57156048
ip: 00007fbd63244387 efl: 0000000000000206 cgf: aaaa000000000033 erf: 0000000000000000
trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]

TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

at onClose (/data/share/phet/continuous-testing/ct-snapshots/1656506053578/chipper/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:241:20)
at ChildProcess.<anonymous> (/data/share/phet/continuous-testing/ct-snapshots/1656506053578/chipper/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:232:79)
at ChildProcess.emit (events.js:327:22)
at Process.ChildProcess._handle.onexit (internal/child_process.js:277:12)
Full Error details:
Error: Failed to launch the browser process!
[0629/065748.409687:FATAL:zygote_host_impl_linux.cc(174)] Check failed: process.IsValid(). Failed to launch zygote process
#0 0x5589efb53469 base::debug::CollectStackTrace()
#1 0x5589efab9f33 base::debug::StackTrace::StackTrace()
#2 0x5589efaccd50 logging::LogMessage::~LogMessage()
#3 0x5589efacd90e logging::LogMessage::~LogMessage()
#4 0x5589edb8db08 content::ZygoteHostImpl::LaunchZygote()
#5 0x5589ef667a30 content::(anonymous namespace)::LaunchZygoteHelper()
#6 0x5589ed1ec6cb content::ZygoteCommunication::Init()
#7 0x5589ed1eccd4 content::CreateGenericZygote()
#8 0x5589ef666e95 content::ContentMainRunnerImpl::Initialize()
#9 0x5589ef664eb9 content::RunContentProcess()
#10 0x5589ef66500e content::ContentMain()
#11 0x5589ef6bffca headless::(anonymous namespace)::RunContentMain()
#12 0x5589ef6bfcd5 headless::HeadlessShellMain()
#13 0x5589ec2b1668 ChromeMain
#14 0x7fbd63230555 __libc_start_main
#15 0x5589ec2b14aa _start

Received signal 6
#0 0x5589efb53469 base::debug::CollectStackTrace()
#1 0x5589efab9f33 base::debug::StackTrace::StackTrace()
#2 0x5589efb52f71 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7fbd679ad630 (/usr/lib64/libpthread-2.17.so+0xf62f)
#4 0x7fbd63244387 __GI_raise
#5 0x7fbd63245a78 __GI_abort
#6 0x5589efb521f5 base::debug::BreakDebuggerAsyncSafe()
#7 0x5589efacd221 logging::LogMessage::~LogMessage()
#8 0x5589efacd90e logging::LogMessage::~LogMessage()
#9 0x5589edb8db08 content::ZygoteHostImpl::LaunchZygote()
#10 0x5589ef667a30 content::(anonymous namespace)::LaunchZygoteHelper()
#11 0x5589ed1ec6cb content::ZygoteCommunication::Init()
#12 0x5589ed1eccd4 content::CreateGenericZygote()
#13 0x5589ef666e95 content::ContentMainRunnerImpl::Initialize()
#14 0x5589ef664eb9 content::RunContentProcess()
#15 0x5589ef66500e content::ContentMain()
#16 0x5589ef6bffca headless::(anonymous namespace)::RunContentMain()
#17 0x5589ef6bfcd5 headless::HeadlessShellMain()
#18 0x5589ec2b1668 ChromeMain
#19 0x7fbd63230555 __libc_start_main
#20 0x5589ec2b14aa _start
r8: 0000000000000000 r9: 0000000000000300 r10: 0000000000000008 r11: 0000000000000206
r12: 0000187600300140 r13: 0000187600300158 r14: 0000187600300150 r15: 0000187600221400
di: 000000000000975f si: 000000000000975f bp: 00007ffc57156180 bx: 00007ffc571569c0
dx: 0000000000000006 ax: 0000000000000000 cx: ffffffffffffffff sp: 00007ffc57156048
ip: 00007fbd63244387 efl: 0000000000000206 cgf: aaaa000000000033 erf: 0000000000000000
trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]

TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md
�
Snapshot from 6/29/2022, 6:34:13 AM

zepumph · 2022-06-30T19:00:01Z

This morning there were 5200 instances of chrome running. I should look into this! I was hoping to have time this afternoon

zepumph · 2022-06-30T20:33:24Z

https://stackoverflow.com/questions/62220867/puppeteer-chromium-instances-remain-active-in-the-background-after-browser-disc has something I'd like to try, instead of using zygote (an internal process forker), lets have things forked via node. Worth a shot!

zepumph · 2022-06-30T22:31:08Z

This didn't seem to change anything, After 2 hours, there are double the number of chrome instances running. I'll have to keep looking.

samreid · 2022-07-06T02:04:23Z

I noticed that we are calling withServer again and again in a while (true) loop in QuickServer. Each time, this allocates a new node server:

    const server = http.createServer( ( req, res ) => {

We call server.close on it, but after reading https://stackoverflow.com/questions/14626636/how-do-i-shutdown-a-node-js-https-server-immediately it's unclear whether that's enough to clear that memory. @zepumph could there be a memory leak in these server instances?

zepumph · 2022-07-06T15:35:30Z

I just logged onto bayes and saw that there were 2000 instances of chrome running.

Then I ran pm2 stop continuous-client and checked and there were 500 chrome instances still running.

I then ran pm2 stop continuous-quick-server and checked and there were 0 chrome instances still running. Quite likely however we are running Puppeteer in continuous client is similarly problematic to the quick server. Thanks for the intel @samreid! It may have to do with withServer, but I think the most likely culprit is puppeteerLoad itself.

samreid · 2022-07-07T02:04:58Z

Can you reproduce this locally, or only on bayes? Do we think it is related to pm2 or linux? Do you think I could reproduce the problem on mac with or without pm2? I'm investigating playwright in a separate issue but would like a way to test how it impacts this problem (even if I have to test on bayes directly).

samreid · 2022-07-07T02:48:05Z

I logged in to bayes as phet-admin and saw pm2 list like so:

[phet-admin@bayes ~]$ pm2 list
┌─────┬────────────────────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id  │ name                       │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
├─────┼────────────────────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 4   │ continuous-client          │ default     │ N/A     │ fork    │ 20052    │ 11h    │ 1    │ online    │ 100%     │ 387.6mb  │ phe… │ disabled │
│ 3   │ continuous-quick-server    │ default     │ N/A     │ fork    │ 20049    │ 11h    │ 0    │ online    │ 0%       │ 102.7mb  │ phe… │ disabled │
│ 1   │ continuous-server          │ default     │ N/A     │ fork    │ 20043    │ 11h    │ 10   │ online    │ 0%       │ 346.3mb  │ phe… │ disabled │
│ 0   │ phettest-server            │ default     │ 0.0.0   │ fork    │ 20035    │ 11h    │ 2    │ online    │ 0%       │ 43.8mb   │ phe… │ disabled │
│ 2   │ yotta-server               │ default     │ 0.0.0   │ fork    │ 20068    │ 11h    │ 2    │ online    │ 0%       │ 37.1mb   │ phe… │ disabled │
└─────┴────────────────────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘

that pgrep -c chrome reported 1579. Also, ps aux --no-heading showed many as defunct. Also there were many listings from perennial:

...
phet-ad+ 40211  0.5  0.0      0     0 ?        Zs   18:21   0:48 [chrome] <defunct>
phet-ad+ 40408  0.3  0.0      0     0 ?        Zs   15:19   1:00 [chrome] <defunct>
phet-ad+ 40414  0.0  0.0 17469332 57344 ?      Ssl  16:48   0:03 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40421  0.0  0.0 17299480 35760 ?      S    16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40423  0.0  0.0 17299464 35764 ?      S    16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40425  0.0  0.0 17299496 10972 ?      S    16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40442  0.0  0.0 17492024 40044 ?      Sl   16:48   0:04 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40445  0.0  0.0 25919264 35332 ?      Sl   16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40555  0.0  0.0 17332372 37372 ?      Sl   16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40727  0.0  0.0 17465132 57832 ?      Ssl  16:48   0:04 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40730  0.0  0.0 17299480 35760 ?      S    16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40732  0.0  0.0 17299464 35764 ?      S    16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40734  0.0  0.0 17299500 10976 ?      S    16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40748  0.0  0.0 17491828 31116 ?      Sl   16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40755  0.0  0.0 25927528 35576 ?      Sl   16:48   0:00 /data/share/phet/continuous-quick-server/perennial/node_mo
phet-ad+ 40909  0.6  0.0      0     0 ?        Zs   18:21   0:58 [chrome] <defunct>
phet-ad+ 40927  0.3  0.0      0     0 ?        Zs   18:21   0:30 [chrome] <defunct>
phet-ad+ 40942  0.6  0.0      0     0 ?        Zs   18:21   0:53 [chrome] <defunct>

I also saw:

[phet-admin@bayes ~]$ pgrep -c node
44

After stopping continuous-server, I saw:

[phet-admin@bayes ~]$ pm2 stop continuous-server
[PM2] Applying action stopProcessId on app [continuous-server](ids: [ 1 ])
[phet-admin@bayes ~]$ pgrep -c chrome
1582
[phet-admin@bayes ~]$ pgrep -c node
35

These numbers don't make sense, since the number of chromes went from 1579 to 1582.

Then I stopped quick-server:

[phet-admin@bayes ~]$ pm2 stop continuous-quick-server
[PM2] Applying action stopProcessId on app [continuous-quick-server](ids: [ 3 ])
...
[phet-admin@bayes ~]$ pgrep -c chrome
784
[phet-admin@bayes ~]$ pgrep -c node
33

So that reduced chromes from 1582 to 784 and nodes from 35 to 33. Still disconcerting that we still have 784 chromes running.

After restarting those pm2 services, and running pgrep -c chrome intermittently, I saw the number of chrome instances climb from 796 on startup to 811 after a few minutes.

samreid · 2022-07-07T03:01:50Z

I also saw puppeteer/puppeteer#1825 which described defunct zombie processes like we see here.

I also saw https://askubuntu.com/questions/201303/what-is-a-defunct-process-and-why-doesnt-it-get-killed which describes defunct processes.

I used that process to find the PPID of the bad parent process, and I ran kill -9 20052 and saw that pgrep -c chrome dropped to 94. One other process accounted for the other defunct chromes. I killed that parent process too.

I checked that pm2 is still operational, but after 5 minutes, there are already 50 defunct chrome processes.

samreid · 2022-07-07T03:35:01Z

During my review of quick server, I noticed two potential problems:

The browser was being created over and over again twice per iteration in the while loop. It seems like we could benefit by creating the browser once and asking it to generate new pages.
It looks like there is a bug in puppeteerLaunch because it is killing the browser if the browser was passed in, not if puppeteerLaunch owns the browser. I would like to try correcting these and pushing to test on bayes.

By the way, bayes has climbed to 204 chrome processes (as reported by pgrep -c chrome) while I worked on this. UPDATE: 220

samreid · 2022-07-07T04:01:14Z

I stopped continuous-server and continuous-quick-server, pulled aqua and perennial in each, and restarted. The number of pgrep -c chrome chrome instances held steady at 208 for a while, but just jumped to 241. But only a few (if any) look like defunct zombies.

UPDATE: Holding steady at 241 for quite a while.

samreid · 2022-07-07T04:11:39Z

The perennial commit seems like a bugfix, but warrants a code review since it is used in many places, and one or more usage sites could have been relying on the bug somehow? I marked blocks-publication and ready for review for that part.

Also, we are still around 300 chrome instances, and not many look like zombies. So there is probably still a problem of some sort.

samreid · 2022-07-07T14:12:31Z

This morning (10 hours later than preceding comment), I observed:

[phet-admin@bayes ~]$ pgrep -c chrome
927

With around 600 marked as defunct according to ps -ef | grep defunct.

zepumph · 2022-07-07T16:59:02Z

Great fix! I was using this command to see how many chrome instances there were ps aux | grep chrome | wc -l.

We have to apply this to the continuous-client process too. I just did this. I bet that will solve the rest of the defunct guys.

zepumph · 2022-07-07T17:05:53Z

After pulling/restarting continuous-client, I see no defunct chrome instances. There were 307 upon restart, and are now 313. I believe this is solved, but we can check in tomorrow. Thanks so much @samreid for finding that bug!

zepumph · 2022-07-13T15:27:03Z

We are still hovering around 330 chrome instances. Closing

zepumph · 2022-11-01T21:07:45Z

In general things were working well, but when there were errors outside the expected places for the webpage, puppeteerLoad still wasn't cleaning up. Thus, above I added even more cleanup.

When looking at the error logs, I see the page.goto has caused a fair number of errors over the past months. I wonder if that contributed to the errors. I'll check into to see if that helps things soon.

zepumph · 2022-11-02T17:48:34Z

This happened again tonight, and my best guess is that it was because 1de7fdf was launching too many chrome instances. I restarted things and pushed a fix over in #152 and ps aux | grep chrome | wc -l is showing steadily with 88 chrome instances.

zepumph · 2022-11-07T20:06:23Z

After the weekend we are at 112 chrome instances. Perhaps there is still a slow problem. After stopping all CT processes, there are still 25 zombies. I will keep tabs on these.

zepumph · 2022-11-28T19:55:24Z

Now there are 184 chrome instances. That is still growing. I need to continue to work here.

zepumph · 2022-11-29T22:03:45Z

@mattpen and I spoke about this yesterday. We feel like it is important to improve the logging in our puppeteer processes, and then from there figure out where the error case is coming from. I'll take the lead!

zepumph · 2022-12-07T17:28:21Z

I think that phetsims/perennial#291 could be helpful here, or at least something to note.

zepumph · 2023-04-27T22:38:56Z

We are running into this over in https://github.com/phetsims/special-ops/issues/234

zepumph · 2023-04-28T17:49:45Z

Most likely our sparky case will be solved by #172.

zepumph · 2023-05-01T17:04:15Z

Ok. After some discussion with @mattpen, The above commit should handle the orphaned chrome instances that are much more rare from the build process. I'm ready to close this issue, but I'll reopen if we need to track down any more orphaned chrome instances.

zepumph self-assigned this Jun 24, 2022

zepumph mentioned this issue Jun 24, 2022

CT has a number of grey cells and tests seem slow/not being done #148

Closed

zepumph mentioned this issue Jun 27, 2022

MaxListenersExceededWarning with puppeteer Client. #149

Closed

zepumph added the priority:2-high label Jun 27, 2022

zepumph added a commit that referenced this issue Jun 30, 2022

add --no-zygote option, #150

a0295db

zepumph added a commit that referenced this issue Jun 30, 2022

bug fix, #150

bb87988

samreid mentioned this issue Jul 7, 2022

Improve stability and coverage of CT (and for other puppeteer reliance) phetsims/chipper#1273

Closed

samreid added a commit that referenced this issue Jul 7, 2022

Create and reuse one puppeteer browser, see #150

ec48b30

samreid added a commit to phetsims/perennial that referenced this issue Jul 7, 2022

If we created a temporary browser, close it, see phetsims/aqua#150

2bfd6fa

samreid added status:blocks-sim-publication status:ready-for-review status:blocks-publication and removed status:blocks-sim-publication labels Jul 7, 2022

zepumph removed the status:blocks-publication label Jul 7, 2022

zepumph closed this as completed Jul 13, 2022

zepumph reopened this Nov 1, 2022

zepumph added a commit to phetsims/perennial that referenced this issue Nov 1, 2022

catch failures and clean up browser, phetsims/aqua#150

fec2275

zepumph added a commit that referenced this issue Nov 1, 2022

improve logging, #150

6db7eb9

zepumph added a commit to phetsims/perennial that referenced this issue Nov 1, 2022

minor code tweaks, phetsims/aqua#150

0ac7ccf

zepumph added a commit to phetsims/perennial that referenced this issue Nov 1, 2022

formatting, phetsims/aqua#150

24decc6

zepumph added a commit that referenced this issue Nov 29, 2022

update message logging, #150

386fbf8

zepumph removed the status:ready-for-review label Nov 29, 2022

zepumph removed their assignment Mar 3, 2023

zepumph self-assigned this Apr 27, 2023

zepumph closed this as completed May 1, 2023

zepumph added a commit that referenced this issue May 1, 2023

add no-zygote to API generation puppeteer instance, #150

f5fb16b

zepumph added a commit to phetsims/chipper that referenced this issue May 1, 2023

add no-zygote to API generation puppeteer instance, phetsims/aqua#150

023cf31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CT: Error: Failed to launch the browser process! #150

CT: Error: Failed to launch the browser process! #150

zepumph commented Jun 24, 2022 •

edited

Loading

zepumph commented Jun 24, 2022 •

edited

Loading

zepumph commented Jun 24, 2022

zepumph commented Jun 27, 2022 •

edited

Loading

KatieWoe commented Jun 29, 2022 •

edited by zepumph

Loading

zepumph commented Jun 30, 2022 •

edited

Loading

zepumph commented Jun 30, 2022

zepumph commented Jun 30, 2022

samreid commented Jul 6, 2022

zepumph commented Jul 6, 2022

samreid commented Jul 7, 2022

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022

zepumph commented Jul 7, 2022

zepumph commented Jul 7, 2022

zepumph commented Jul 13, 2022

zepumph commented Nov 1, 2022

zepumph commented Nov 2, 2022

zepumph commented Nov 7, 2022

zepumph commented Nov 28, 2022

zepumph commented Nov 29, 2022

zepumph commented Dec 7, 2022

zepumph commented Apr 27, 2023

zepumph commented Apr 28, 2023

zepumph commented May 1, 2023

CT: Error: Failed to launch the browser process! #150

CT: Error: Failed to launch the browser process! #150

Comments

zepumph commented Jun 24, 2022 • edited Loading

zepumph commented Jun 24, 2022 • edited Loading

zepumph commented Jun 24, 2022

zepumph commented Jun 27, 2022 • edited Loading

KatieWoe commented Jun 29, 2022 • edited by zepumph Loading

zepumph commented Jun 30, 2022 • edited Loading

zepumph commented Jun 30, 2022

zepumph commented Jun 30, 2022

samreid commented Jul 6, 2022

zepumph commented Jul 6, 2022

samreid commented Jul 7, 2022

samreid commented Jul 7, 2022 • edited Loading

samreid commented Jul 7, 2022 • edited Loading

samreid commented Jul 7, 2022 • edited Loading

samreid commented Jul 7, 2022 • edited Loading

samreid commented Jul 7, 2022 • edited Loading

samreid commented Jul 7, 2022

zepumph commented Jul 7, 2022

zepumph commented Jul 7, 2022

zepumph commented Jul 13, 2022

zepumph commented Nov 1, 2022

zepumph commented Nov 2, 2022

zepumph commented Nov 7, 2022

zepumph commented Nov 28, 2022

zepumph commented Nov 29, 2022

zepumph commented Dec 7, 2022

zepumph commented Apr 27, 2023

zepumph commented Apr 28, 2023

zepumph commented May 1, 2023

zepumph commented Jun 24, 2022 •

edited

Loading

zepumph commented Jun 24, 2022 •

edited

Loading

zepumph commented Jun 27, 2022 •

edited

Loading

KatieWoe commented Jun 29, 2022 •

edited by zepumph

Loading

zepumph commented Jun 30, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading

samreid commented Jul 7, 2022 •

edited

Loading