Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to terminate thread-pool upon shutdown #903

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alippai
Copy link

@alippai alippai commented Nov 29, 2021

While .terminate() will leave the thread-pool in corrupt state, it helps debugging and testing.

/// Terminates the threads, renders the threadpool unusable.
/// This is useful for testing (miri) if the thread-pool is static.
/// It should be used upon shutdown only
pub fn terminate(&mut self) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a &mut ThreadPool is available, couldn't it just be mem::replaced with a newly built thread pool, dropping the old one and thereby terminating its threads without creating this sharp edge in the API?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to terminate https://github.com/pola-rs/polars/blob/master/polars/polars-core/src/lib.rs#L44-L53 this thread-pool when the test is complete. Using the POOL static seems to be a good practice, I wouldn't want to change that.

Context: pola-rs/polars#1927

Copy link
Collaborator

@adamreichold adamreichold Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static will only ever give you access to a shared reference &ThreadPool (POOL will implement Deref<Target=ThreadPool>, not DerefMut<Target=ThreadPool>), so it seems like you will not be able to call the above method in the first place?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading the above context, if you really need to terminate that thread pool, you probably need to put it behind a RwLock or something similar inside the lazy_static so that you can get access to an exclusive reference &mut ThreadPool in which case you can just mem::replace it with a newly created one. (This would also imply that your tests need to take that lock to install any jobs within that thread pool.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamreichold thanks, you are right. Fortunately I can remove mut and just use &self. Does it look better now?

Copy link
Author

@alippai alippai Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I've compiled the changed rayon&polars and the test using miri succeeds (by adding POOL.terminate(); at the end of the test)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it technically works, this would need to be unsafe AFAIU, e.g. look at line 277 in the diff: Not being able to terminate the registry while the thread pool is still in use appears to be a safety invariant that could be broken by calling terminate and then trying to use the thread pool again.

Hence, I would suggest not adding this API (neither unsound as-is nor unsafe) and rely on putting the thread pool within an RwLock for the tests (where the additional cost of taking the lock is probably bearable). But this is just my drive-by-code-reviewer opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where the additional cost of taking the lock is probably bearable

I see now that you want to terminate in the tests, but POOL itself is used in production code where the RwLock option is not feasible. This would imply having this as an unsafe method.

But then again, if you can use unsafe and you can ensure that your borrow is unique, you could probably accomplish the same using an UnsafeCell and mem::replace.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, you are right about the unsafe required and it's good to have it there (since it supposed to be run in tests in teardown only, not in prod code, so one should read the docs again). I've added the keyword.

The other option you've mentioned: I'm not a seasoned Rust developer, so I'd ask @ritchie46 or @jorgecarleitao what do they think about that.

Thanks a lot, I've already learned a lot today.

While `.terminate()` will leave the thread-pool in corrupt state, it helps debugging and testing.
@alippai
Copy link
Author

alippai commented Nov 30, 2021

I was wrong, somehow I missed that it still doesn't work:

The following memory was leaked: alloc613796 (Rust heap, size: 1024, align: 8) {
    0x000 │ ╾0xd2bf60[a619138]<untagged> (8 ptr bytes)╼ ╾0xd2c3c2[a619156]<1518440> (8 ptr bytes)╼ │ ╾──────╼╾──────╼
    0x010 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x020 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x030 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x040 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x050 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x060 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x070 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x080 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x090 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x0a0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x0b0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x0c0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x0d0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x0e0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x0f0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x100 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x110 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x120 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x130 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x140 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x150 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x160 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x170 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x180 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x190 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x1a0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x1b0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x1c0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x1d0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x1e0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x1f0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x200 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x210 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x220 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x230 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x240 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x250 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x260 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x270 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x280 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x290 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x2a0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x2b0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x2c0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x2d0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x2e0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x2f0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x300 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x310 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x320 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x330 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x340 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x350 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x360 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x370 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x380 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x390 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x3a0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x3b0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x3c0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x3d0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x3e0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
    0x3f0 │ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ │ ░░░░░░░░░░░░░░░░
}
alloc613876 (Rust heap, size: 16, align: 8) {
    ╾0xd00360[a613796]<untagged> (8 ptr bytes)╼ 40 00 00 00 00 00 00 00 │ ╾──────╼@.......
}
alloc619138 (deallocated)
alloc619156 (fn: <rayon_core::job::StackJob<rayon_core::latch::SpinLatch, [closure@rayon::join_context::call_b<rayon::iter::collect::consumer::CollectResult<&i32>, [closure@rayon::iter::plumbing::bridge_producer_consumer::helper<rayon::slice::IterProducer<i32>, rayon::iter::collect::consumer::CollectConsumer<&i32>>::{closure#1}]>::{closure#0}], rayon::iter::collect::consumer::CollectResult<&i32>> as rayon_core::job::Job>::execute)

I've checked that in the ThreadInfo it's all stopped: true and miri now doesn't report running threads anymore, but I assume this means that miri can't handle the static with these latches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants