Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return detached threads to the pool #8286

Merged
merged 19 commits into from
May 20, 2019

Conversation

VirtualTim
Copy link
Collaborator

Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool.
I don't know if this change has any other consequences.

Resolves #8201.

Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool. 
I don't know if this change has any other consequences.
@VirtualTim
Copy link
Collaborator Author

VirtualTim commented Mar 12, 2019

Well tests are failing because joinable threads are calling exit, which cleans them up. Then the join fails.

As far as I can tell there is no standard way to tell if a thread is joinable from within that thread, so I think that rules out posting different messages (unless we set a flag ourselves).
Exit: turns out we do set a flag: Atomics.load(HEAPU32, (thread + {{{ C_STRUCTS.pthread.detached }}} ) >> 2), and that's used to check if a thread is joinable. I've gone ahead and adjusted the PR to use this. Still not sure if this is the correct/best approach, but we'll see if the tests pass.

The alternative would be to return success on joining a thread that doesn't exist (because it's been disposed of). This doesn't seem to go against the free pthreads spec I could find, but is different behaviour from Linux pthread_join, which returns ESRCH in this situation.
I don't know how this would affect std::thread's.

@VirtualTim
Copy link
Collaborator Author

So seems like my approach does the trick.
@juj does this seem reasonable to you? You seem like the expert on this sort of stuff.
@benjymous if you manually apply my patch to library_pthread.js does this fix your issue?

@VirtualTim
Copy link
Collaborator Author

@juj (or anyone else) Are you able to review this?

@kripken
Copy link
Member

kripken commented Apr 16, 2019

It looks like we mark a thread as detached if we join it or if it calls detach, is that right?

src/library_pthread.js Outdated Show resolved Hide resolved
@VirtualTim
Copy link
Collaborator Author

It looks like we mark a thread as detached if we join it or if it calls detach, is that right?

It was a while since I did this initially, but if I remember right joined threads are marked as detached after joining, but if you try and join a detached thread it will return an error.

I've since refactored things as per @sbc100's suggestion, and I think it's now a bit clearer.

@VirtualTim
Copy link
Collaborator Author

For reference this is how I tested this:

#include <thread>
#include <iostream>
#include <math.h>
#include <emscripten/html5.h>

void some_function(const EmscriptenMouseEvent* e) {
	std::cout << "Thread ID: " << std::this_thread::get_id() << std::endl;
	double d;
	for (int i=0; i<100000000; i++)			//simulate work
		d += (i%2 ? sqrt((int)(100*rand())) : (-1)*sqrt((int)(100*rand())));
	std::cout << "    Finished work:" << d << std::endl;
}

EMSCRIPTEN_RESULT click_callback(int eventType, const EmscriptenMouseEvent* e, void* userdata) {
	std::thread(some_function, e).detach();
	return EMSCRIPTEN_RESULT_SUCCESS;
}

int main(int argc, char** argv) {
	const char* canvasID = "#canvas";
	emscripten_set_mousedown_callback(canvasID, 0, EM_TRUE, click_callback);
}

Compile like emcc test.cpp -std=c++11 -s USE_PTHREADS=1 -s NO_EXIT_RUNTIME=1 -s TOTAL_MEMORY=838860800 -o test.html

Clicking the canvas will spawn a detached thread. Pre-PR once the thread was finished it would still exist, and a new thread would get created each time. Post-PR threads will get re-used if they are available, and if no threads are free a new one will be spawned. You can observe this by watching the number of threads listed in the browsers dev tools.

@kripken
Copy link
Member

kripken commented Apr 18, 2019

Thanks, yeah, I think it's clearer now. The code lgtm. Please add a test, though (maybe that code right there, with small modifications)? Can be alongside the other browser.test_pthread_* tests - see those for examples (in tests/test_browser.py). Instead of watching the browser devtools for the count, can iterate say 1,000 times - almost certainly browsers will not allow that many actual threads, so we must be reusing them.

@kripken
Copy link
Member

kripken commented Apr 18, 2019

Btw, the test failures look unrelated - merging in latest incoming should fix them, might be an old problem.

@VirtualTim
Copy link
Collaborator Author

VirtualTim commented Apr 24, 2019

I've got a test working (still need to figure out how to integrate it though), but I did discover another issue. If you create a bunch of threads take a look at PThread.pthreads. You'll see this object keeps growing, and contains a reference to every thread created. It doesn't look like those are ever deleted, so I've added that in.

Edit: Test is added.

@kripken
Copy link
Member

kripken commented May 1, 2019

@VirtualTim sorry for the delay here. Code looks good to me. The test, however looks like it might be racy - the dependence on timing is risky, we may end up with random failures because of it. I think it would be better to rewrite it in a way that does not depend on time measurements (instead, can communicate with mutexes etc. to tell the threads what to do at each time etc.).

@VirtualTim
Copy link
Collaborator Author

Yeah you're right, it does have the potential to be racy. However I did test this locally with 0.1s intervals instead of 0.5s, and it worked fine. I figured that 5x the wait would make failure due to a race condition very unlikely. Plus it runs with 20 threads, so the chance that even one wouldn't finish seems very unlikely. I guess my motivation for doing it this was way trying to emulate a real-world scenario as closely possible.

So just to check if I'm understanding you correctly, are you suggesting putting a mutex around the lambda inside spawn_a_thread, to basically force the threads to run sequentially?

@kripken
Copy link
Member

kripken commented May 2, 2019

Well, the mutex comment was just a way to try to make it deterministic - there might be better ways. Like perhaps the threads can communicate using a shared atomic. But the key thing is to not depend on timing for their decisions.

Sorry about this, but we've had many tests that depend on timing and that seem stable, but eventually become random errors because of a change on the browser side. So it's really important to avoid as much nondeterminism as possible.

setTimeout(() => { _spawn_a_thread(); }, i*500);
}

setTimeout(() => { _count_threads(i_max); }, i_max*500);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clever way to test, though can we make this test much shorter? A 10 second long test is quite long in comparison to most tests in the suite, so these kind of things add up the overall suite runtime. Perhaps this can be tested e.g. by spawning one thread detached, then exiting it, and then when that is confirmed to have exited (e.g. via an atomic var), wait for 500 msecs for good measure, and then run a second thread, which should be run in the context of the same worker that ran the first thread. That way the test should finish in less than a second, and if the test needs to be duplicated in multiple modes in the future (PROXY_TO_PTHREAD & asm.js vs wasm & different opts modes are common different cases that often are needed to test), that will not multiply the overall test suite run by that much.

Copy link
Collaborator Author

@VirtualTim VirtualTim May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redid this.
Yeah you were right, a 10s test is far too long. It should now take about 1/2s. I didn't use an atomic, because I figured there was going to be some time between a thread going out of scope and being returned to the pool.
It's not completely race free, but I'm performing 5 100ms checks on the thread status, so if even one thread hasn't returned by then something has probably gone wrong.

src/library_pthread.js Outdated Show resolved Hide resolved
src/library_pthread.js Outdated Show resolved Hide resolved
src/library_pthread.js Outdated Show resolved Hide resolved
src/library_pthread.js Outdated Show resolved Hide resolved
@juj
Copy link
Collaborator

juj commented May 3, 2019

Thanks for the PR! Indeed workers hosting detached threads were not properly returning to the worker pool. The fix looks good, only minor comments.

1. Don't pass around threadId
2. Rename returnThreadToPool to returnWorkerToPool
@juj
Copy link
Collaborator

juj commented May 8, 2019

pthread_join() on the main browser thread is not going to work. Polling for completion with pthread_tryjoin_np() (GNU nonportable extension) could be implemented (looking at it, it does actually exist in musl, though not sure if it works - it has not been stressed before)

If you build with -s PROXY_TO_PTHREAD=1, then pthread_join() works in the application main thread, which will then be getting proxied to a web worker.

It looks like the failures here are related to browser.test_pthread_global_data_initialization. Those tests are getting run with -s PROXY_TO_PTHREAD=1 so they should be safe to do pthread_join(). If this PR regresses that test, that looks like something important, though can't spot what the cause for failure would be off the top of my head. Can you give that a closer look? If it looks something sketchy, I can try find time to look in detail, but unable to do so atm.

@VirtualTim
Copy link
Collaborator Author

VirtualTim commented May 9, 2019

Oh whoops, missed the -s PROXY_TO_PTHREAD=1 flag. No wonder I was having issues getting the test to run. I'll look into it today.

Edit: looks like the issues was introduced by adding this line https://github.com/emscripten-core/emscripten/pull/8286/files#diff-db41bea94577c2dd9b0eef0308b06cf9R243. This was added because the thread was never removed from PThread.pthreads. However this change revealed an issue with joining threads.
When the thread is finished it calls threadExit, cleaning itself up. When a thread is joined it calls _pthread_join, which calls cleanupThread, which cleans itself up.
So the same thread is disposed twice.

Hope that explanation makes sense?
I'll attach what I think the fix is here.

VirtualTim added 2 commits May 9, 2019 16:03
Missed this change that was supposed to be part of the last changelist.
@VirtualTim
Copy link
Collaborator Author

OK, so looks like that last change broke a bunch of tests on Firefox. I think it might be better to revert the change on this line: https://github.com/emscripten-core/emscripten/pull/8286/files#diff-db41bea94577c2dd9b0eef0308b06cf9R243
and open another PR to fix the leak this was trying to address.

This wasn't introduced by this PR, and can be fixed by a subsequent one.
Basically PThread.pthreads seems to not remove references to threads after that are exited. Once this PR is merged I'll open another one to address this issue.
@VirtualTim
Copy link
Collaborator Author

OK, investigating the failing test reviled a weird issue. The threadExit function executes on the worker, and sets the exit status. However for the thread to actually exit it has to post the exit command to the main thread. So there can be unlucky circumstances where the thread marks itself as exited but it hasn't actually exited. I think I know how to fix that.

The other note is that the test_pthread_mutex.cpp test joins on the main thread, so will by pretty race-y. I'm surprised I haven't seen it fail more. Unless I'm missing something I think it should be built with -s PROXY_TO_PTHREAD=1. But I think that change should be made in a separate PR.

@VirtualTim
Copy link
Collaborator Author

Ugh, I tried to run some tests with CircleCI, but turns out that doing so hijacked the tests on this page. is someone (@kripken?) re-run the tests on your CI system for me?
Thanks.

@kripken
Copy link
Member

kripken commented May 15, 2019

Oddly I can't rerun tests here - it says I don't have write permissions. It's using the permissions for your fork, I guess, and not the upstream repo?

As a workaround, can merge incoming into here.

@VirtualTim
Copy link
Collaborator Author

OK, going through the test failures:

Test Result
test_glgears_proxy_jstarget Firefox specific test. I didn't test this one
test_pthread_64bit_atomics join's on main thread. Should be compiled with -s PROXY_TO_PTHREAD=1
test_pthread_64bit_cxx11_atomics Tried compiling with all the options used for the test in both Firefox and Chrome and I couldn't get this to fail
test_pthread_atomics join's on main thread. Should be compiled with -s PROXY_TO_PTHREAD=1 I didn't test with modularize, but I assume this is the same.
test_pthread_barrier join's on main thread.
test_pthread_call_async_on_main_thread Tried compiling with all the options used for the test in both Firefox and Chrome and I couldn't get this to fail
test_pthread_call_sync_on_main_thread Bad tests. I've opened #8621 to fix them.
test_pthread_cancel Worked for me on both Firefox/Chrome
test_pthread_cleanup join's on main thread. Works with -s PROXY_TO_PTHREAD=1
test_pthread_clock_drift Worked for me on both Firefox/Chrome
test_pthread_condition_variable join's on main thread.

So it's a bit odd that some tests (4) work for me, but hang on the build machine. I'll double check my commits, perhaps I missed something.
Also a number of tests call pthread_join on the main thread, which was always a bad idea, since that was very flaky. The changes I made remove that flakeyness by setting the exited flag to when the thread is "more exited", but it basically means that joining a thread on the main browser thread will always hang, instead of often hanging. Even though this could break some behaviour, it's more consistent and that behaviour would have been very fragile anyway.
I could go into some more technical details, but a proper explanation would be a few paragraphs.

So what should we do with these tests? The easy fix would be to add -s PROXY_TO_PTHREAD=1 to the 5 that don't have them, and then remove that setting when we get native WASM threads.
Thoughts?

@sbc100
Copy link
Collaborator

sbc100 commented May 16, 2019

I don't understand why this change needs those tests to have PROXY_TO_PTHREAD where before they didn't?

My understanding is that its possible to do thread creation and joining on the main thread if you have PROXY_TO_PTHREAD enabled or if you pre-create your threads using PTHREAD_POOL_SIZE. Does this change prevent the latter from working now? (I'm not saying that would necessarily block this change, just want to get it clear what is changing).


//Check if a worker is free every threads_to_spawn*100 ms, or until max_thread_check is exceeded
const SpawnMoreThreads = setInterval(() => {
if (PThread.unusedWorkerPool.length > 0) { //Spawn a thread if a worker is available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to write this using just the pthreads API, and not look at PThread.unusedWorkerPool and other internal details, since this might change in future refactorings. but if it's much easier to write it this way then it's fine for now I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure this can be done from the pthreads API. I mean, you can't really use the API to test the API, right?

I realise that this will likely change once we get native WASM threads, but I think by then we'll need to redo a lot of the pthreads tests anyway.

@kripken
Copy link
Member

kripken commented May 17, 2019

Is the discussion of tests here about general improvements, or necessary changes for this PR? The tests all look green now here, so I think this PR can land?

@VirtualTim
Copy link
Collaborator Author

My understanding is that its possible to do thread creation and joining on the main thread if you have PROXY_TO_PTHREAD enabled or if you pre-create your threads using PTHREAD_POOL_SIZE.

PROXY_TO_PTHREAD should always work, since the join actually happens on a worker, not the main thread. My guess is that PTHREAD_POOL_SIZE helps hide the racyness since it's much quicker to pull a thread off the pool than create a new one.

So I wrote up some big explanation about everything, but then realised that it's not actually an issue, since on join-ing the 'exit' command does nothing.

Anyway, for a quick overview (using PROXY_TO_PTHREAD=0, join on main thread):

  1. The thread has joined, but 'exit' hasn't been executed.
    1.1 Currently 'exit' is a noop on joining threads, so this is "OK". Conceptually I don't really like it, but I don't think I can fix it without breaking more stuff.
  2. If the thread needs to execute something on the main thread it can't until the thread has finished joining.
    2.1 This is why joining on the main thread is super risky.
    2.2 This "works" if the main thread receives the message before the join command, but it's really race-y.

I guess I'd like to change this, but I don't really think there is a good way to to it using web workers. I think we're better off waiting for native WASM threads.

@VirtualTim
Copy link
Collaborator Author

@kripken Yeah I'm happy for this to land. Turns out that this was more complicated than I first thought (which was probably why it wasn't implemented in the first place), but I think everything's as good as it's going to get.
There's still some weirdness using workers to emulate threads, but I don't think this is really something that can be fixed, and one day we should get native WASM threads anyway.

@kripken
Copy link
Member

kripken commented May 20, 2019

Great, thanks, merging.

@kripken kripken merged commit bb2428b into emscripten-core:incoming May 20, 2019
@VirtualTim VirtualTim deleted the patch-2 branch May 21, 2019 01:22
VirtualTim added a commit to VirtualTim/emscripten that referenced this pull request May 21, 2019
Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool.

Resolves emscripten-core#8201.
VirtualTim added a commit to VirtualTim/emscripten that referenced this pull request May 23, 2019
VirtualTim added a commit to VirtualTim/emscripten that referenced this pull request May 23, 2019
belraquib pushed a commit to belraquib/emscripten that referenced this pull request Dec 23, 2020
Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool.

Resolves emscripten-core#8201.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detached threads don't get disposed of
4 participants