Return detached threads to the pool #8286

VirtualTim · 2019-03-12T05:17:11Z

Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool.
I don't know if this change has any other consequences.

Resolves #8201.

Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool. I don't know if this change has any other consequences.

VirtualTim · 2019-03-12T08:39:23Z

Well tests are failing because joinable threads are calling exit, which cleans them up. Then the join fails.

As far as I can tell there is no standard way to tell if a thread is joinable from within that thread, so I think that rules out posting different messages (unless we set a flag ourselves).
Exit: turns out we do set a flag: Atomics.load(HEAPU32, (thread + {{{ C_STRUCTS.pthread.detached }}} ) >> 2), and that's used to check if a thread is joinable. I've gone ahead and adjusted the PR to use this. Still not sure if this is the correct/best approach, but we'll see if the tests pass.

The alternative would be to return success on joining a thread that doesn't exist (because it's been disposed of). This doesn't seem to go against the free pthreads spec I could find, but is different behaviour from Linux pthread_join, which returns ESRCH in this situation.
I don't know how this would affect std::thread's.

VirtualTim · 2019-03-13T01:48:11Z

So seems like my approach does the trick.
@juj does this seem reasonable to you? You seem like the expert on this sort of stuff.
@benjymous if you manually apply my patch to library_pthread.js does this fix your issue?

VirtualTim · 2019-04-15T06:36:02Z

@juj (or anyone else) Are you able to review this?

kripken · 2019-04-16T16:37:31Z

It looks like we mark a thread as detached if we join it or if it calls detach, is that right?

src/library_pthread.js

Plus some minor refactoring.

VirtualTim · 2019-04-17T04:48:48Z

It looks like we mark a thread as detached if we join it or if it calls detach, is that right?

It was a while since I did this initially, but if I remember right joined threads are marked as detached after joining, but if you try and join a detached thread it will return an error.

I've since refactored things as per @sbc100's suggestion, and I think it's now a bit clearer.

VirtualTim · 2019-04-17T04:54:26Z

For reference this is how I tested this:

#include <thread>
#include <iostream>
#include <math.h>
#include <emscripten/html5.h>

void some_function(const EmscriptenMouseEvent* e) {
	std::cout << "Thread ID: " << std::this_thread::get_id() << std::endl;
	double d;
	for (int i=0; i<100000000; i++)			//simulate work
		d += (i%2 ? sqrt((int)(100*rand())) : (-1)*sqrt((int)(100*rand())));
	std::cout << "    Finished work:" << d << std::endl;
}

EMSCRIPTEN_RESULT click_callback(int eventType, const EmscriptenMouseEvent* e, void* userdata) {
	std::thread(some_function, e).detach();
	return EMSCRIPTEN_RESULT_SUCCESS;
}

int main(int argc, char** argv) {
	const char* canvasID = "#canvas";
	emscripten_set_mousedown_callback(canvasID, 0, EM_TRUE, click_callback);
}

Compile like emcc test.cpp -std=c++11 -s USE_PTHREADS=1 -s NO_EXIT_RUNTIME=1 -s TOTAL_MEMORY=838860800 -o test.html

Clicking the canvas will spawn a detached thread. Pre-PR once the thread was finished it would still exist, and a new thread would get created each time. Post-PR threads will get re-used if they are available, and if no threads are free a new one will be spawned. You can observe this by watching the number of threads listed in the browsers dev tools.

kripken · 2019-04-18T21:07:52Z

Thanks, yeah, I think it's clearer now. The code lgtm. Please add a test, though (maybe that code right there, with small modifications)? Can be alongside the other browser.test_pthread_* tests - see those for examples (in tests/test_browser.py). Instead of watching the browser devtools for the count, can iterate say 1,000 times - almost certainly browsers will not allow that many actual threads, so we must be reusing them.

kripken · 2019-04-18T21:10:01Z

Btw, the test failures look unrelated - merging in latest incoming should fix them, might be an old problem.

…om the PThread.pthreads object

VirtualTim · 2019-04-24T09:57:52Z

I've got a test working (still need to figure out how to integrate it though), but I did discover another issue. If you create a bunch of threads take a look at PThread.pthreads. You'll see this object keeps growing, and contains a reference to every thread created. It doesn't look like those are ever deleted, so I've added that in.

Edit: Test is added.

May need to adjust timings

kripken · 2019-05-01T17:46:12Z

@VirtualTim sorry for the delay here. Code looks good to me. The test, however looks like it might be racy - the dependence on timing is risky, we may end up with random failures because of it. I think it would be better to rewrite it in a way that does not depend on time measurements (instead, can communicate with mutexes etc. to tell the threads what to do at each time etc.).

VirtualTim · 2019-05-02T03:58:25Z

Yeah you're right, it does have the potential to be racy. However I did test this locally with 0.1s intervals instead of 0.5s, and it worked fine. I figured that 5x the wait would make failure due to a race condition very unlikely. Plus it runs with 20 threads, so the chance that even one wouldn't finish seems very unlikely. I guess my motivation for doing it this was way trying to emulate a real-world scenario as closely possible.

So just to check if I'm understanding you correctly, are you suggesting putting a mutex around the lambda inside spawn_a_thread, to basically force the threads to run sequentially?

kripken · 2019-05-02T18:29:14Z

Well, the mutex comment was just a way to try to make it deterministic - there might be better ways. Like perhaps the threads can communicate using a shared atomic. But the key thing is to not depend on timing for their decisions.

Sorry about this, but we've had many tests that depend on timing and that seem stable, but eventually become random errors because of a change on the browser side. So it's really important to avoid as much nondeterminism as possible.

tests/pthread/test_pthread_detach.cpp

juj · 2019-05-03T09:20:56Z

tests/pthread/test_pthread_detach.cpp

+			setTimeout(() => { _spawn_a_thread(); }, i*500);
+		}
+
+		setTimeout(() => { _count_threads(i_max); }, i_max*500);


This is a clever way to test, though can we make this test much shorter? A 10 second long test is quite long in comparison to most tests in the suite, so these kind of things add up the overall suite runtime. Perhaps this can be tested e.g. by spawning one thread detached, then exiting it, and then when that is confirmed to have exited (e.g. via an atomic var), wait for 500 msecs for good measure, and then run a second thread, which should be run in the context of the same worker that ran the first thread. That way the test should finish in less than a second, and if the test needs to be duplicated in multiple modes in the future (PROXY_TO_PTHREAD & asm.js vs wasm & different opts modes are common different cases that often are needed to test), that will not multiply the overall test suite run by that much.

Redid this.
Yeah you were right, a 10s test is far too long. It should now take about 1/2s. I didn't use an atomic, because I figured there was going to be some time between a thread going out of scope and being returned to the pool.
It's not completely race free, but I'm performing 5 100ms checks on the thread status, so if even one thread hasn't returned by then something has probably gone wrong.

src/library_pthread.js

juj · 2019-05-03T09:36:13Z

Thanks for the PR! Indeed workers hosting detached threads were not properly returning to the worker pool. The fix looks good, only minor comments.

1. Don't pass around threadId 2. Rename returnThreadToPool to returnWorkerToPool

juj · 2019-05-08T12:23:36Z

pthread_join() on the main browser thread is not going to work. Polling for completion with pthread_tryjoin_np() (GNU nonportable extension) could be implemented (looking at it, it does actually exist in musl, though not sure if it works - it has not been stressed before)

If you build with -s PROXY_TO_PTHREAD=1, then pthread_join() works in the application main thread, which will then be getting proxied to a web worker.

It looks like the failures here are related to browser.test_pthread_global_data_initialization. Those tests are getting run with -s PROXY_TO_PTHREAD=1 so they should be safe to do pthread_join(). If this PR regresses that test, that looks like something important, though can't spot what the cause for failure would be off the top of my head. Can you give that a closer look? If it looks something sketchy, I can try find time to look in detail, but unable to do so atm.

VirtualTim · 2019-05-09T02:54:08Z

Oh whoops, missed the -s PROXY_TO_PTHREAD=1 flag. No wonder I was having issues getting the test to run. I'll look into it today.

Edit: looks like the issues was introduced by adding this line https://github.com/emscripten-core/emscripten/pull/8286/files#diff-db41bea94577c2dd9b0eef0308b06cf9R243. This was added because the thread was never removed from PThread.pthreads. However this change revealed an issue with joining threads.
When the thread is finished it calls threadExit, cleaning itself up. When a thread is joined it calls _pthread_join, which calls cleanupThread, which cleans itself up.
So the same thread is disposed twice.

Hope that explanation makes sense?
I'll attach what I think the fix is here.

Missed this change that was supposed to be part of the last changelist.

VirtualTim · 2019-05-09T09:04:58Z

OK, so looks like that last change broke a bunch of tests on Firefox. I think it might be better to revert the change on this line: https://github.com/emscripten-core/emscripten/pull/8286/files#diff-db41bea94577c2dd9b0eef0308b06cf9R243
and open another PR to fix the leak this was trying to address.

This wasn't introduced by this PR, and can be fixed by a subsequent one. Basically PThread.pthreads seems to not remove references to threads after that are exited. Once this PR is merged I'll open another one to address this issue.

VirtualTim · 2019-05-13T08:00:44Z

OK, investigating the failing test reviled a weird issue. The threadExit function executes on the worker, and sets the exit status. However for the thread to actually exit it has to post the exit command to the main thread. So there can be unlucky circumstances where the thread marks itself as exited but it hasn't actually exited. I think I know how to fix that.

The other note is that the test_pthread_mutex.cpp test joins on the main thread, so will by pretty race-y. I'm surprised I haven't seen it fail more. Unless I'm missing something I think it should be built with -s PROXY_TO_PTHREAD=1. But I think that change should be made in a separate PR.

…thread has exited worker.

VirtualTim · 2019-05-13T08:25:59Z

Ugh, I tried to run some tests with CircleCI, but turns out that doing so hijacked the tests on this page. is someone (@kripken?) re-run the tests on your CI system for me?
Thanks.

kripken · 2019-05-15T20:12:08Z

Oddly I can't rerun tests here - it says I don't have write permissions. It's using the permissions for your fork, I guess, and not the upstream repo?

As a workaround, can merge incoming into here.

VirtualTim · 2019-05-16T07:52:11Z

OK, going through the test failures:

Test	Result
test_glgears_proxy_jstarget	Firefox specific test. I didn't test this one
test_pthread_64bit_atomics	`join`'s on main thread. Should be compiled with `-s PROXY_TO_PTHREAD=1`
test_pthread_64bit_cxx11_atomics	Tried compiling with all the options used for the test in both Firefox and Chrome and I couldn't get this to fail
test_pthread_atomics	`join`'s on main thread. Should be compiled with `-s PROXY_TO_PTHREAD=1` I didn't test with `modularize`, but I assume this is the same.
test_pthread_barrier	`join`'s on main thread.
test_pthread_call_async_on_main_thread	Tried compiling with all the options used for the test in both Firefox and Chrome and I couldn't get this to fail
test_pthread_call_sync_on_main_thread	Bad tests. I've opened #8621 to fix them.
test_pthread_cancel	Worked for me on both Firefox/Chrome
test_pthread_cleanup	`join`'s on main thread. Works with `-s PROXY_TO_PTHREAD=1`
test_pthread_clock_drift	Worked for me on both Firefox/Chrome
test_pthread_condition_variable	`join`'s on main thread.

So it's a bit odd that some tests (4) work for me, but hang on the build machine. I'll double check my commits, perhaps I missed something.
Also a number of tests call pthread_join on the main thread, which was always a bad idea, since that was very flaky. The changes I made remove that flakeyness by setting the exited flag to when the thread is "more exited", but it basically means that joining a thread on the main browser thread will always hang, instead of often hanging. Even though this could break some behaviour, it's more consistent and that behaviour would have been very fragile anyway.
I could go into some more technical details, but a proper explanation would be a few paragraphs.

So what should we do with these tests? The easy fix would be to add -s PROXY_TO_PTHREAD=1 to the 5 that don't have them, and then remove that setting when we get native WASM threads.
Thoughts?

sbc100 · 2019-05-16T15:53:05Z

I don't understand why this change needs those tests to have PROXY_TO_PTHREAD where before they didn't?

My understanding is that its possible to do thread creation and joining on the main thread if you have PROXY_TO_PTHREAD enabled or if you pre-create your threads using PTHREAD_POOL_SIZE. Does this change prevent the latter from working now? (I'm not saying that would necessarily block this change, just want to get it clear what is changing).

kripken · 2019-05-17T19:58:36Z

tests/pthread/test_std_thread_detach.cpp

+
+		//Check if a worker is free every threads_to_spawn*100 ms, or until max_thread_check is exceeded
+		const SpawnMoreThreads = setInterval(() => {
+			if (PThread.unusedWorkerPool.length > 0) {	//Spawn a thread if a worker is available


it would be better to write this using just the pthreads API, and not look at PThread.unusedWorkerPool and other internal details, since this might change in future refactorings. but if it's much easier to write it this way then it's fine for now I think

Yeah I'm not sure this can be done from the pthreads API. I mean, you can't really use the API to test the API, right?

I realise that this will likely change once we get native WASM threads, but I think by then we'll need to redo a lot of the pthreads tests anyway.

kripken · 2019-05-17T19:59:19Z

Is the discussion of tests here about general improvements, or necessary changes for this PR? The tests all look green now here, so I think this PR can land?

VirtualTim · 2019-05-20T03:35:01Z

My understanding is that its possible to do thread creation and joining on the main thread if you have PROXY_TO_PTHREAD enabled or if you pre-create your threads using PTHREAD_POOL_SIZE.

PROXY_TO_PTHREAD should always work, since the join actually happens on a worker, not the main thread. My guess is that PTHREAD_POOL_SIZE helps hide the racyness since it's much quicker to pull a thread off the pool than create a new one.

So I wrote up some big explanation about everything, but then realised that it's not actually an issue, since on join-ing the 'exit' command does nothing.

Anyway, for a quick overview (using PROXY_TO_PTHREAD=0, join on main thread):

Main thread waits in a loop checking thread status.
Thread exits, setting thread status: https://github.com/emscripten-core/emscripten/pull/8286/files#diff-db41bea94577c2dd9b0eef0308b06cf9R148
- Then send the 'exit' message to the main thread: https://github.com/emscripten-core/emscripten/pull/8286/files#diff-db41bea94577c2dd9b0eef0308b06cf9R162
  However if the main thread is still processing a message (as it is when waiting on a join) then the 'exit' message is processed after the thread has finished joining. So you can end up with two issues:

The thread has joined, but 'exit' hasn't been executed.
1.1 Currently 'exit' is a noop on joining threads, so this is "OK". Conceptually I don't really like it, but I don't think I can fix it without breaking more stuff.
If the thread needs to execute something on the main thread it can't until the thread has finished joining.
2.1 This is why joining on the main thread is super risky.
2.2 This "works" if the main thread receives the message before the join command, but it's really race-y.

I guess I'd like to change this, but I don't really think there is a good way to to it using web workers. I think we're better off waiting for native WASM threads.

VirtualTim · 2019-05-20T03:38:22Z

@kripken Yeah I'm happy for this to land. Turns out that this was more complicated than I first thought (which was probably why it wasn't implemented in the first place), but I think everything's as good as it's going to get.
There's still some weirdness using workers to emulate threads, but I don't think this is really something that can be fixed, and one day we should get native WASM threads anyway.

kripken · 2019-05-20T21:12:20Z

Great, thanks, merging.

Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool. Resolves emscripten-core#8201.

This reverts commit eb37140.

Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool. Resolves emscripten-core#8201.

Return detached threads to the pool

aac8627

Once detached threads are finished their execution they emit the 'exit' command. Instead of a noop they should rejoin the pool. I don't know if this change has any other consequences.

VirtualTim added 2 commits March 12, 2019 17:39

Check if a thread is detached before freeing its resources

7b8a67c

Fix typo in previous commit

8438f0e

VirtualTim mentioned this pull request Mar 22, 2019

Webassembly worker thread will hang if printf is used in threads. #8325

Closed

sbc100 reviewed Apr 16, 2019

View reviewed changes

src/library_pthread.js Outdated Show resolved Hide resolved

Extract 'returning threads to the pool' into a function

e40b80c

Plus some minor refactoring.

When workers are returned to the pool remove their disposed thread fr…

7a5d296

…om the PThread.pthreads object

VirtualTim added 2 commits April 24, 2019 18:02

Add a test for detached threads

86baf13

May need to adjust timings

Add test_pthread_join test.

ee97e8a

juj reviewed May 3, 2019

View reviewed changes

tests/pthread/test_pthread_detach.cpp Outdated Show resolved Hide resolved

juj reviewed May 3, 2019

View reviewed changes

src/library_pthread.js Outdated Show resolved Hide resolved

juj reviewed May 3, 2019

View reviewed changes

src/library_pthread.js Outdated Show resolved Hide resolved

juj reviewed May 3, 2019

View reviewed changes

src/library_pthread.js Outdated Show resolved Hide resolved

juj reviewed May 3, 2019

View reviewed changes

src/library_pthread.js Outdated Show resolved Hide resolved

juj added the multithreading label May 3, 2019

Code Review Changes

1dab304

1. Don't pass around threadId 2. Rename returnThreadToPool to returnWorkerToPool

VirtualTim added 2 commits May 9, 2019 16:03

Fix thread resources being double freed on join.

b1cd5b3

Fix thread resources being double freed on join Part2

70bd24c

Missed this change that was supposed to be part of the last changelist.

VirtualTim added 3 commits May 10, 2019 10:40

test_pthread_detach -> test_std_thread_detach

4e29926

rename test to test_std_thread_detach

2c48635

Revert changes that caused a double free.

773e863

This wasn't introduced by this PR, and can be fixed by a subsequent one. Basically PThread.pthreads seems to not remove references to threads after that are exited. Once this PR is merged I'll open another one to address this issue.

Fix cleanup ordering. Moved setting thread exit status to after main …

6ee3d7e

…thread has exited worker.

Merge from emscripten-core/incoming

31a5da8

See if this this allows join on the main thread.

bc6e5ad

VirtualTim mentioned this pull request May 17, 2019

Fixing issues with pthread implementation #8623

Closed

Don't try and clean up an cleaned up thread

f82d037

kripken reviewed May 17, 2019

View reviewed changes

kripken merged commit bb2428b into emscripten-core:incoming May 20, 2019

VirtualTim deleted the patch-2 branch May 21, 2019 01:22

VirtualTim added a commit to VirtualTim/emscripten that referenced this pull request May 23, 2019

Revert "Return detached threads to the pool (emscripten-core#8286)"

2e1d24d

This reverts commit eb37140.

VirtualTim added a commit to VirtualTim/emscripten that referenced this pull request May 23, 2019

Revert "Return detached threads to the pool (emscripten-core#8286)"

b354a91

This reverts commit eb37140.

kleisauke mentioned this pull request Aug 23, 2020

Pthreads keeps increasing with WASM #11825

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return detached threads to the pool #8286

Return detached threads to the pool #8286

VirtualTim commented Mar 12, 2019

VirtualTim commented Mar 12, 2019 •

edited

Loading

VirtualTim commented Mar 13, 2019

VirtualTim commented Apr 15, 2019

kripken commented Apr 16, 2019

VirtualTim commented Apr 17, 2019

VirtualTim commented Apr 17, 2019

kripken commented Apr 18, 2019

kripken commented Apr 18, 2019

VirtualTim commented Apr 24, 2019 •

edited

Loading

kripken commented May 1, 2019

VirtualTim commented May 2, 2019

kripken commented May 2, 2019

juj May 3, 2019

VirtualTim May 7, 2019 •

edited

Loading

juj commented May 3, 2019

juj commented May 8, 2019

VirtualTim commented May 9, 2019 •

edited

Loading

VirtualTim commented May 9, 2019

VirtualTim commented May 13, 2019

VirtualTim commented May 13, 2019

kripken commented May 15, 2019

VirtualTim commented May 16, 2019

sbc100 commented May 16, 2019

kripken May 17, 2019

VirtualTim May 20, 2019

kripken commented May 17, 2019

VirtualTim commented May 20, 2019

VirtualTim commented May 20, 2019

kripken commented May 20, 2019

Return detached threads to the pool #8286

Return detached threads to the pool #8286

Conversation

VirtualTim commented Mar 12, 2019

VirtualTim commented Mar 12, 2019 • edited Loading

VirtualTim commented Mar 13, 2019

VirtualTim commented Apr 15, 2019

kripken commented Apr 16, 2019

VirtualTim commented Apr 17, 2019

VirtualTim commented Apr 17, 2019

kripken commented Apr 18, 2019

kripken commented Apr 18, 2019

VirtualTim commented Apr 24, 2019 • edited Loading

kripken commented May 1, 2019

VirtualTim commented May 2, 2019

kripken commented May 2, 2019

juj May 3, 2019

Choose a reason for hiding this comment

VirtualTim May 7, 2019 • edited Loading

Choose a reason for hiding this comment

juj commented May 3, 2019

juj commented May 8, 2019

VirtualTim commented May 9, 2019 • edited Loading

VirtualTim commented May 9, 2019

VirtualTim commented May 13, 2019

VirtualTim commented May 13, 2019

kripken commented May 15, 2019

VirtualTim commented May 16, 2019

sbc100 commented May 16, 2019

kripken May 17, 2019

Choose a reason for hiding this comment

VirtualTim May 20, 2019

Choose a reason for hiding this comment

kripken commented May 17, 2019

VirtualTim commented May 20, 2019

VirtualTim commented May 20, 2019

kripken commented May 20, 2019

VirtualTim commented Mar 12, 2019 •

edited

Loading

VirtualTim commented Apr 24, 2019 •

edited

Loading

VirtualTim May 7, 2019 •

edited

Loading

VirtualTim commented May 9, 2019 •

edited

Loading