Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make std.parallelism.AbstractTask inaccessible #8834

Merged
merged 1 commit into from
Oct 29, 2023

Conversation

schveiguy
Copy link
Member

@schveiguy schveiguy commented Oct 29, 2023

No valid code should be using these members, they are thread-unsafe.

Note, this appears to have been changed to public to workaround some compiler issue, but the PR doesn't seem to be real: 9d151ac

@schveiguy schveiguy requested a review from andralex as a code owner October 29, 2023 15:07
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @schveiguy!

Bugzilla references

Auto-close Bugzilla Severity Description
24207 major std.parallelism: AbstractTask private data is inadvertently available

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8834"

@schveiguy
Copy link
Member Author

I didn't test locally, let's see if this breaks any code...

@schveiguy
Copy link
Member Author

Not sure if this needs a test. Can’t be a unit test since private is accessible in the same module.

@jmdavis
Copy link
Member

jmdavis commented Oct 29, 2023

Not sure if this needs a test. Can’t be a unit test since private is accessible in the same module.

Well, we don't typically test that private stuff is inaccessible (though I suppose that this bug would be an argument for doing so), but it shouldn't be hard to create a test for it somewhere. It just wouldn't be in a good place in the sense that we don't actually have anywhere to put such a test other than randomly picking a module. :|

@Imperatorn
Copy link
Contributor

Good. Now next step is to expose the state in a thread-safe manner.

@Imperatorn
Copy link
Contributor

When were those things that broke back in 2011 fixed? Can it be verified that this does not break anything?

This commit does exactly what @jmdavis said we should basically never do, change something from public to private because it could break code.

So, what's the policy?

@adamdruppe
Copy link
Contributor

I think it is worth noting that Fiber also has a race condition in its status field. I don't think there is a clear, obvious right answer to how to use it.

@jmdavis
Copy link
Member

jmdavis commented Oct 29, 2023

When were those things that broke back in 2011 fixed? Can it be verified that this does not break anything?

This commit does exactly what @jmdavis said we should basically never do, change something from public to private because it could break code.

So, what's the policy?

It's not something that we normally do. In particular, if we're dealing a symbol that's clearly part of the public API, then it could only be made private by first deprecating it, because code in the wild would definitely be using it.

This situation is a bit rare in that we're dealing with symbols that were clearly intended to be private - and even worse, they're symbols where messing with them could easily cause bugs. And given that they're not part of the documented API, and it's not at all obvious when looking at the source code that they'd be accessible, the expectation would be that no one would be using them, and thus we can probably get away with making the private like they should be - but it's a judgement call, and we do risk breaking code. Honestly, I have no clue how you even figured out that they were accessible.

So, ideally, we would just make the symbols private and close the hole, and in theory, the risk is minimal, but it is a judgement call, and we really don't like to be in this kind of situation. Fortunately, it's quite rare.

@schveiguy
Copy link
Member Author

I think it is worth noting that Fiber also has a race condition in its status field. I don't think there is a clear, obvious right answer to how to use it.

Fibers are thread local. There's no reason to access the status from another thread, so no race condition. If you have a shared Fiber, then the state would inaccessible as the state function is not mark shared.

When were those things that broke back in 2011 fixed? Can it be verified that this does not break anything?

The commit message suggests it was a compiler bug. Things were a bit loose on the rules in 2011.

How do we verify? The CI all passes.

This commit does exactly what @jmdavis said we should basically never do, change something from public to private because it could break code.

In this case, the cost is high to not fix the problem. The policy generally concerns things that are explicitly public being changed to private. In some cases, something that is accidentally public can be used even though it's not documented. But in this case, it was intended to be private, but made public as a workaround for a compiler problem, and clearly nobody uses this (all CI passes). It would have been good if the switch to public was accompanied by a comment saying the reason for the base not being private.

But here, we are talking about exposing the raw internals of Task, including the linked list of tasks which is only meant for implementation to use.

I'm glad you discovered this before it was used anywhere.

@adamdruppe
Copy link
Contributor

Fibers are thread local.

That's not true. They easily migrate between threads, and in fact, there's several benefits to doing it this way.

Perhaps it should be typed shared but that's part of what i mean by there being no obviously right answer it, since the use of shared with them is not obvious either.

@schveiguy
Copy link
Member Author

That's not true. They easily migrate between threads, and in fact, there's several benefits to doing it this way.

If you do, then you should probably not allow access from the original thread. Still thread local.

and if you don't make it shared, then obviously you are on your own for proper synchronization.

@jmdavis
Copy link
Member

jmdavis commented Oct 29, 2023

Fibers are thread local.

That's not true. They easily migrate between threads, and in fact, there's several benefits to doing it this way.

Perhaps it should be typed shared but that's part of what i mean by there being no obviously right answer it, since the use of shared with them is not obvious either.

Well, then they fall under the same heading as everything else that isn't explicitly designed to work across threads. Any code looking to use them across threads in a thread-safe manner will need to deal with the appropriate locking (along with casting away shared as necessary) or use atomic access (though the latter probably isn't possible, since it's an object).

Maybe its API should be tweaked to make it work better with shared, but as long as it isn't, you pretty much have to assume that accessing it across threads is going to require the appropriate locks and casts if you don't want race conditions. But there's really nothing special about fibers in that regard.

@adamdruppe
Copy link
Contributor

adamdruppe commented Oct 29, 2023

My solution was ultimately to just not use the member at all - in the first draft when I hit the problem, a fiber would try to register its own wake up condition immediately before yielding. This would hit trouble because if the wakeup conditions was already fulfilled, another worker would pick it up in between registration and yield, leading to an error.

Then I tried to spin on the fiber.state variable, knowing it would yield soon and i just had to wait a few cycles for it to finish. But fiber.state is updated non-atomically with the state swap. So even with the spin working correctly, there was still a race.

My final draft (which works fine) is to ignore all this state stuff and defer the wake up registration until after the yield. So the scheduler does like fiber.call(); if(fiber.onPostYield) fiber.onPostYield(); and that delegate actually sets the conditions for wakeup. Since the schedule is responsible for calling it, it knows when it is running vs on hold - it is running only directly inside fiber.call right here.

I guess maybe my point is that these things being private is a good thing, but the docs might want to explain how to address use cases where you'd be tempted to pull for them with alternative means. Though state.TERM still has some use since that one is a little safer i think.

@jmdavis jmdavis merged commit aa3416c into dlang:master Oct 29, 2023
11 checks passed
@schveiguy schveiguy deleted the fixparallelism branch October 29, 2023 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants