-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return detached threads to the pool #8286
Conversation
Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool. I don't know if this change has any other consequences.
Well tests are failing because joinable threads are calling exit, which cleans them up. Then the join fails. As far as I can tell there is no standard way to tell if a thread is joinable from within that thread, so I think that rules out posting different messages (unless we set a flag ourselves). The alternative would be to return success on joining a thread that doesn't exist (because it's been disposed of). This doesn't seem to go against the free pthreads spec I could find, but is different behaviour from Linux pthread_join, which returns ESRCH in this situation. |
So seems like my approach does the trick. |
@juj (or anyone else) Are you able to review this? |
It looks like we mark a thread as detached if we join it or if it calls detach, is that right? |
Plus some minor refactoring.
It was a while since I did this initially, but if I remember right joined threads are marked as detached after joining, but if you try and join a detached thread it will return an error. I've since refactored things as per @sbc100's suggestion, and I think it's now a bit clearer. |
For reference this is how I tested this:
Compile Clicking the canvas will spawn a detached thread. Pre-PR once the thread was finished it would still exist, and a new thread would get created each time. Post-PR threads will get re-used if they are available, and if no threads are free a new one will be spawned. You can observe this by watching the number of threads listed in the browsers dev tools. |
Thanks, yeah, I think it's clearer now. The code lgtm. Please add a test, though (maybe that code right there, with small modifications)? Can be alongside the other |
Btw, the test failures look unrelated - merging in latest incoming should fix them, might be an old problem. |
…om the PThread.pthreads object
I've got a test working (still need to figure out how to integrate it though), but I did discover another issue. If you create a bunch of threads take a look at Edit: Test is added. |
May need to adjust timings
@VirtualTim sorry for the delay here. Code looks good to me. The test, however looks like it might be racy - the dependence on timing is risky, we may end up with random failures because of it. I think it would be better to rewrite it in a way that does not depend on time measurements (instead, can communicate with mutexes etc. to tell the threads what to do at each time etc.). |
Yeah you're right, it does have the potential to be racy. However I did test this locally with 0.1s intervals instead of 0.5s, and it worked fine. I figured that 5x the wait would make failure due to a race condition very unlikely. Plus it runs with 20 threads, so the chance that even one wouldn't finish seems very unlikely. I guess my motivation for doing it this was way trying to emulate a real-world scenario as closely possible. So just to check if I'm understanding you correctly, are you suggesting putting a mutex around the lambda inside |
Well, the mutex comment was just a way to try to make it deterministic - there might be better ways. Like perhaps the threads can communicate using a shared atomic. But the key thing is to not depend on timing for their decisions. Sorry about this, but we've had many tests that depend on timing and that seem stable, but eventually become random errors because of a change on the browser side. So it's really important to avoid as much nondeterminism as possible. |
setTimeout(() => { _spawn_a_thread(); }, i*500); | ||
} | ||
|
||
setTimeout(() => { _count_threads(i_max); }, i_max*500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clever way to test, though can we make this test much shorter? A 10 second long test is quite long in comparison to most tests in the suite, so these kind of things add up the overall suite runtime. Perhaps this can be tested e.g. by spawning one thread detached, then exiting it, and then when that is confirmed to have exited (e.g. via an atomic var), wait for 500 msecs for good measure, and then run a second thread, which should be run in the context of the same worker that ran the first thread. That way the test should finish in less than a second, and if the test needs to be duplicated in multiple modes in the future (PROXY_TO_PTHREAD & asm.js vs wasm & different opts modes are common different cases that often are needed to test), that will not multiply the overall test suite run by that much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redid this.
Yeah you were right, a 10s test is far too long. It should now take about 1/2s. I didn't use an atomic, because I figured there was going to be some time between a thread going out of scope and being returned to the pool.
It's not completely race free, but I'm performing 5 100ms checks on the thread status, so if even one thread hasn't returned by then something has probably gone wrong.
Thanks for the PR! Indeed workers hosting detached threads were not properly returning to the worker pool. The fix looks good, only minor comments. |
1. Don't pass around threadId 2. Rename returnThreadToPool to returnWorkerToPool
If you build with It looks like the failures here are related to |
Oh whoops, missed the Edit: looks like the issues was introduced by adding this line https://github.com/emscripten-core/emscripten/pull/8286/files#diff-db41bea94577c2dd9b0eef0308b06cf9R243. This was added because the thread was never removed from PThread.pthreads. However this change revealed an issue with joining threads. Hope that explanation makes sense? |
Missed this change that was supposed to be part of the last changelist.
OK, so looks like that last change broke a bunch of tests on Firefox. I think it might be better to revert the change on this line: https://github.com/emscripten-core/emscripten/pull/8286/files#diff-db41bea94577c2dd9b0eef0308b06cf9R243 |
This wasn't introduced by this PR, and can be fixed by a subsequent one. Basically PThread.pthreads seems to not remove references to threads after that are exited. Once this PR is merged I'll open another one to address this issue.
OK, investigating the failing test reviled a weird issue. The The other note is that the |
…thread has exited worker.
Ugh, I tried to run some tests with CircleCI, but turns out that doing so hijacked the tests on this page. is someone (@kripken?) re-run the tests on your CI system for me? |
Oddly I can't rerun tests here - it says I don't have write permissions. It's using the permissions for your fork, I guess, and not the upstream repo? As a workaround, can merge incoming into here. |
OK, going through the test failures:
So it's a bit odd that some tests (4) work for me, but hang on the build machine. I'll double check my commits, perhaps I missed something. So what should we do with these tests? The easy fix would be to add |
I don't understand why this change needs those tests to have My understanding is that its possible to do thread creation and joining on the main thread if you have |
|
||
//Check if a worker is free every threads_to_spawn*100 ms, or until max_thread_check is exceeded | ||
const SpawnMoreThreads = setInterval(() => { | ||
if (PThread.unusedWorkerPool.length > 0) { //Spawn a thread if a worker is available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to write this using just the pthreads API, and not look at PThread.unusedWorkerPool
and other internal details, since this might change in future refactorings. but if it's much easier to write it this way then it's fine for now I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not sure this can be done from the pthreads API. I mean, you can't really use the API to test the API, right?
I realise that this will likely change once we get native WASM threads, but I think by then we'll need to redo a lot of the pthreads tests anyway.
Is the discussion of tests here about general improvements, or necessary changes for this PR? The tests all look green now here, so I think this PR can land? |
So I wrote up some big explanation about everything, but then realised that it's not actually an issue, since on Anyway, for a quick overview (using
I guess I'd like to change this, but I don't really think there is a good way to to it using web workers. I think we're better off waiting for native WASM threads. |
@kripken Yeah I'm happy for this to land. Turns out that this was more complicated than I first thought (which was probably why it wasn't implemented in the first place), but I think everything's as good as it's going to get. |
Great, thanks, merging. |
Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool. Resolves emscripten-core#8201.
This reverts commit eb37140.
This reverts commit eb37140.
Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool. Resolves emscripten-core#8201.
Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool.
I don't know if this change has any other consequences.
Resolves #8201.