-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CT: Error: Failed to launch the browser process! #150
Comments
Likely puppeteer/puppeteer#1981 is a byproduct of this error. |
I was also able to reproduce this just by going onto bayes and building |
I think @mattpen and I got to the bottom of this. The new puppeteer client seems to be orphaning chrome instances when it stops or crashes, and so this morning, when all pm2 processes were stopped, we found 575 chrome instances still running. I will work on better cleanup/error handling for the puppeteer client. Also noting that once orphaned chrome instances were killed off, I was able to successfully build arithmetic in the same while while all pm2 processes were running. |
A number of repos are showing similar errors this morning (6/29/22)
|
This morning there were 5200 instances of chrome running. I should look into this! I was hoping to have time this afternoon |
https://stackoverflow.com/questions/62220867/puppeteer-chromium-instances-remain-active-in-the-background-after-browser-disc has something I'd like to try, instead of using zygote (an internal process forker), lets have things forked via node. Worth a shot! |
This didn't seem to change anything, After 2 hours, there are double the number of chrome instances running. I'll have to keep looking. |
I noticed that we are calling const server = http.createServer( ( req, res ) => { We call |
I just logged onto bayes and saw that there were 2000 instances of chrome running. Then I ran I then ran |
Can you reproduce this locally, or only on bayes? Do we think it is related to pm2 or linux? Do you think I could reproduce the problem on mac with or without pm2? I'm investigating playwright in a separate issue but would like a way to test how it impacts this problem (even if I have to test on bayes directly). |
I logged in to bayes as phet-admin and saw pm2 list like so:
that
I also saw:
After stopping continuous-server, I saw:
These numbers don't make sense, since the number of chromes went from 1579 to 1582. Then I stopped quick-server:
So that reduced chromes from 1582 to 784 and nodes from 35 to 33. Still disconcerting that we still have 784 chromes running. After restarting those pm2 services, and running |
I also saw puppeteer/puppeteer#1825 which described I also saw https://askubuntu.com/questions/201303/what-is-a-defunct-process-and-why-doesnt-it-get-killed which describes I used that process to find the PPID of the bad parent process, and I ran I checked that pm2 is still operational, but after 5 minutes, there are already 50 defunct chrome processes. |
During my review of quick server, I noticed two potential problems:
By the way, bayes has climbed to 204 chrome processes (as reported by |
I stopped continuous-server and continuous-quick-server, pulled aqua and perennial in each, and restarted. The number of UPDATE: Holding steady at 241 for quite a while. |
The perennial commit seems like a bugfix, but warrants a code review since it is used in many places, and one or more usage sites could have been relying on the bug somehow? I marked blocks-publication and ready for review for that part. Also, we are still around 300 chrome instances, and not many look like zombies. So there is probably still a problem of some sort. |
This morning (10 hours later than preceding comment), I observed:
With around 600 marked as |
Great fix! I was using this command to see how many chrome instances there were We have to apply this to the continuous-client process too. I just did this. I bet that will solve the rest of the defunct guys. |
After pulling/restarting continuous-client, I see no defunct chrome instances. There were 307 upon restart, and are now 313. I believe this is solved, but we can check in tomorrow. Thanks so much @samreid for finding that bug! |
We are still hovering around 330 chrome instances. Closing |
In general things were working well, but when there were errors outside the expected places for the webpage, puppeteerLoad still wasn't cleaning up. Thus, above I added even more cleanup. When looking at the error logs, I see the |
After the weekend we are at 112 chrome instances. Perhaps there is still a slow problem. After stopping all CT processes, there are still 25 zombies. I will keep tabs on these. |
Now there are 184 chrome instances. That is still growing. I need to continue to work here. |
@mattpen and I spoke about this yesterday. We feel like it is important to improve the logging in our puppeteer processes, and then from there figure out where the error case is coming from. I'll take the lead! |
I think that phetsims/perennial#291 could be helpful here, or at least something to note. |
We are running into this over in https://github.com/phetsims/special-ops/issues/234 |
Most likely our sparky case will be solved by #172. |
Ok. After some discussion with @mattpen, The above commit should handle the orphaned chrome instances that are much more rare from the build process. I'm ready to close this issue, but I'll reopen if we need to track down any more orphaned chrome instances. |
I think this has something to do with bayes and the number of puppeteer instances on it, but I'm not really sure. I'll have to take a look.
This looks to be exactly the same as puppeteer/puppeteer#6757 (except docker). I'll see if there is something I need to do better about closing it down.
The text was updated successfully, but these errors were encountered: