Managed concurrency #923

matklad · 2018-06-30T14:20:47Z

Hi!

In tests, we currently do synchronization via thread::sleep. It is rather problematic by itself, but it also shows that we don't have a very good control of concurrency inside RLS :(

I think this is a big problem: RLS is going to be a highly concurrent application, so it's better to establish reliable concurrency practices from the start. I don't have too much experience with designing concurrent though, so I am not sure if the proposed approach of dealing with the problem is not insane :D

In my experience, a lot of problems with managing concurrency stem from "fire an forget" approach: if a function schedules some work to be executed outside the dynamic extent of the function call itself (for example, by spawning a thread, or by submitting the work to some queue), and doesn't return a handle to the yet-to-be-computed result, then such function will be impossible to test without resorting to busy-waiting.

Current PR proposes to use a future-like ConcurrentJob object to keep track of tasks. Any background operation returns this object, which can be used to wait for that operation to finish. Dropping ConcurrentJob without waiting for the result in general results in a panic, to protect from accidentally forgetting about a background operation.

The end result is that LsService now has a wait_for_background_jobs which can be used to make sure that all background ops are finished.

alexheretic · 2018-07-01T10:10:11Z

Thread sleep is inelegant, but is it a big problem in tests? You wait a bit for what you expect and fail if it never arrives, or arrives wrongly. The absolute worst case is the tests can take a while to fail, and the current timeout seems a little high.

But I'm not sure it shows that we lack good control of concurrency in RLS.

Here I'd be tempted to define "good" as sufficient. Shouldn't we add this ability only when we have a production code need for it?

The other issue is I think we'd generally want to move the feature tests out into integration, or whatever you want to call it, style tests. Tests that start rls as a process and talk to it like a lsp client. If that's still true then in-process hooks won't be of any use to the tests.

matklad · 2018-07-01T11:24:23Z

Thread sleep is inelegant, but is it a big problem in tests?

It certainly is a problem: in #918 (comment), quite some cycles were spent on figuring out that test fails on CI but not locally due to the timing issue, and not because this is some windows specific behavior. In other projects I've worked on unrestricted concurrency also posed challenges to writing robust and fast tests.

In general, I strongly believe that logical clocks and causality should be used instead of timeouts if possible, and that APIs should be designed in such a way that using logical clocks is possible (i.e, if something spawns a concurrent thread of execution, it should return a handle which can be used to wait for that thread to finish). I don't have concrete strong arguments here except that it,... feels right I guess? :)

The other issue is I think we'd generally want to move the feature tests out into integration, or whatever you want to call it, style tests. Tests that start rls as a process and talk to it like a lsp client. If that's still true then in-process hooks won't be of any use to the tests.

I agree that we should add tests that literally spawn the RLS process and communicate with it via standard streams: currently this layer seems untested? However, while I in general a strong believer in integration tests, I personally don't think that moving all feature tests to integration testing will be a good idea, for two reasons:

Concurrency is easier to test from withing the project. For batch programs like cargo or rustc, just running the binary and asserting the output gives perfect feature test coverage, but for stateful servers you also need to check possible interactions with requests ordering and timing, and that is usually easier to do internally.
I expect that there'll be ton of test in RLS, because the amount of features you can add to IDE is enormous. As such, it'll be important to keep tests fast and to avoid spawning a process per test.

Long-term, I think we should split RLS into the "rls-library" layer, which is more or less oblivious of the details of the LSP and "rls-lsp binary", which wraps the API of the library into a specific protocol. That way, we'll have a smallish number of integration tests for binary, which spawn process and check that output indeed conforms to LSP, and most feature tests would be done on the library level.

alexheretic · 2018-07-01T14:04:50Z

Well I do agree that eliminating the messaging timeout issue in the tests would be really nice. I'm just not sure keeping track of the jobs with this much extra prod-code is worth it.

For me the test code is there to make the prod code better, this feels like the other way around.

On the other hand perhaps it will serve the greater good by allowing better testing, and I do feel rls test code isn't really good enough at the moment.

matklad · 2018-07-02T10:54:27Z

I'm just not sure keeping track of the jobs with this much extra prod-code is worth it.

Agree that this brings-in a significant amount of "clever" code with tricks like uninhabited enums and drop-bombs, which is a big drawback.

A simpler alternative is possible: instead of keeping track of jobs, we can just drain all the work queues explicitly. That is, in wait_for_background_jobs we can call dispatcher.await_all_dispatched(), something similar for AnalysisQueue (the method needs to be introduce), and BuildQueue.block_on_build.

I think that explicit tracking of all concurrent tasks is better, long term, than tracking of potential sources of concurrency, although it's more code:

it is a generalization of the pattern we already use in Dispatcher. In this PR, in_flight_requests is removed and jobs are used instead.
JobToken could be turned into cancellation token
Threading of jobs through API signals to the reader where concurrency happens.

matklad · 2018-07-04T16:14:13Z

@nrc curious what do you think about the problem in general and this specific approach? Is it something reasonable, or should we just stick with thread::sleep for the time being?

nrc · 2018-07-15T21:59:24Z

Sorry for the delay. My initial opinions (I haven't looked at the code yet):

I agree that the tests running long (and often non-deterministically) is a big problem.
I am somewhat paranoid about the concurrency in the RLS. Historically this has caused some subtle and hard to fix bugs and generally been difficult to work on. I would love to do better (safer) here, even if it means some more code. On the other hand, we are very close to releasing a 1.0 version and I am frightened of changing this area very much until after the release.
I've been particularly unhappy about the 'fire and forget' stuff, it's often felt like there should be a better way, so the approach as described sounds appealing.

Here I'd be tempted to define "good" as sufficient.

I do agree with this sentiment too - there is something to be said for 'if it ain't broke, don't fix it'.

nrc · 2018-07-15T22:01:54Z

src/actions/notifications.rs

@@ -73,7 +74,7 @@ impl BlockingNotificationAction for DidOpenTextDocument {
 }

 impl BlockingNotificationAction for DidChangeTextDocument {
-    fn handle<O: Output>(params: Self::Params, ctx: &mut InitActionContext, out: O) -> Result<(), ()> {
+    fn handle<O: Output>(params: Self::Params, jobs: &mut Jobs, ctx: &mut InitActionContext, out: O) -> Result<(), ()> {


could jobs be part of the context here, rather than adding a variable?

Moved jobs to the Context. Initially I've avoided that because Context is clone, and so Arc<Mutex> is required. And having explicit &mut Jobs argument and -> ConcurrentJob return types gives better idea about when concurrency happens. However, that is indeed a lot of extra typing compared to just stuffing jobs into the ctx.

nrc · 2018-07-15T22:03:43Z

src/test/lens.rs

-    expect_messages(
-        results.clone(),
-        &[
-            ExpectedMessage::new(Some(0)).expect_contains(r#""codeLensProvider":{"resolveProvider":false}"#),


Should probably check for this message

Indeed! Restored other messages as well to try to debug #925

nrc · 2018-07-15T22:16:38Z

src/concurrency.rs

+
+impl Drop for ConcurrentJob {
+    fn drop(&mut self) {
+        if self.is_abandoned || self.is_completed() || thread::panicking() {


Why don't we panic on abandoned jobs? Seems like a bad thing if Jobs is dropped without completing the jobs

Hm, indeed, for normal termination process it is reasonable to require that all the jobs are awaited for!

Still not sure what's the best choice for abnormal termination (some thoughts in af2534f), but probably just abandoning jobs is fine, as things have gone south anyway.

nrc · 2018-07-16T08:54:52Z

I think we should do this. I'm pretty excited to improve our flaky tests. Once we land it I'll publish a new version and that should show up any bugs pretty soon.

@matklad could you rebase and address the review comments, then I'll merge. Thanks!

matklad · 2018-07-16T08:58:02Z

@nrc rebased before your comment :) Fixing the comments now!

matklad · 2018-07-16T11:41:34Z

I think I've addressed all comments.

matklad · 2018-07-16T12:23:46Z

CI should be fixed by #944

Xanewok · 2018-07-16T15:58:20Z

@matklad #944 was merged, but there's a lockfile conflict now

matklad · 2018-07-16T17:16:38Z

Rebased!

matklad · 2018-07-16T17:17:21Z

bors try

We've tried to setup bors here, let's see if is working...

bors-voyager · 2018-07-16T17:17:22Z

🔒 Permission denied

Existing reviewers: click here to make matklad a reviewer

Xanewok · 2018-07-16T17:22:22Z

I get a Permission denied when trying to add you as a reviewer; let's see if I'll have more luck...

bors try

cc @nrc

bors-voyager · 2018-07-16T17:22:23Z

🔒 Permission denied

Existing reviewers: click here to make Xanewok a reviewer

carols10cents · 2018-07-16T19:38:33Z

Sorry about that. I updated my bors-ng instance's code and synced reviewer+members with people on this repo who have push permission. Try again?

nrc · 2018-07-16T20:35:50Z

bors try (I don't need the @?)

bors-voyager · 2018-07-16T21:13:32Z

try

Build succeeded

matklad · 2018-07-16T23:20:24Z

@nrc yep, this the another bors, it does’t need an @. And you need bors r+ to do the merge. I’ve tried try just to check if bors is listening :-)

nrc · 2018-07-17T03:47:33Z

bors r+

923: Managed concurrency r=nrc a=matklad Hi! In tests, we currently do synchronization via `thread::sleep`. It is rather problematic by itself, but it also shows that we don't have a very good control of concurrency inside RLS :( I think this is a big problem: RLS is going to be a highly concurrent application, so it's better to establish reliable concurrency practices from the start. I don't have too much experience with designing concurrent though, so I am not sure if the proposed approach of dealing with the problem is not insane :D In my experience, a lot of problems with managing concurrency stem from "fire an forget" approach: if a function schedules some work to be executed outside the dynamic extent of the function call itself (for example, by spawning a thread, or by submitting the work to some queue), and doesn't return a handle to the yet-to-be-computed result, then such function will be impossible to test without resorting to busy-waiting. Current PR proposes to use a future-like `ConcurrentJob` object to keep track of tasks. Any background operation returns this object, which can be used to wait for that operation to finish. Dropping `ConcurrentJob` without waiting for the result in general results in a panic, to protect from accidentally forgetting about a background operation. The end result is that `LsService` now has a `wait_for_background_jobs` which can be used to make sure that all background ops are finished. Co-authored-by: Aleksey Kladov <[email protected]>

bors-voyager · 2018-07-17T03:58:39Z

Build failed

continuous-integration/travis-ci/push

matklad · 2018-07-17T10:38:09Z

Ok, one more round of clippy bumping!

#945

RLS is going to be a highly concurrent application, with three main concurrency sources: * user input/requests from the client editor * changes from the file system * internal parallelisation of CPU-heavy tasks Because developing, debugging and testing concurrent applications is hard, it makes sense to try to establish good concurrency patterns from the start. One concurrency anti-pattern is "fire & forget" spawning of background tasks. This is problematic: without the ability to check if the task is finished, you have to resort to busy-waiting in tests. `concurrency` module introduces a `ConcurrentJob` object, which is a handle to background job, which can be used to check if the job is running and to wait for it to finish. The idea is that any function, which schedules some job off the current thread, should return a `ConcurrentJob`. Jobs are stored inside the `Jobs` table, which gives a nice overview of all currently active tasks and can be used to wait for all active tasks to finish. All `ConcurrentJob`s must be stored in a `Jobs` table, it's and error to drop a job on the floor.

matklad · 2018-07-17T21:30:59Z

Resolved conflict, squashing changes to a single commit in the process.

Xanewok · 2018-07-18T09:07:53Z

Nice! Didn’t mean to add you extra work with that 2018 edition commit... 😅

matklad changed the title ~~rManaged concurrency~~ Managed concurrency Jul 1, 2018

matklad force-pushed the managed-concurrency branch from ba80d25 to 8d33e48 Compare July 1, 2018 07:17

matklad mentioned this pull request Jul 2, 2018

Number of progress messages seems undeterministic #925

Closed

nrc reviewed Jul 15, 2018

View reviewed changes

matklad force-pushed the managed-concurrency branch from 8d33e48 to ad66472 Compare July 16, 2018 08:48

matklad force-pushed the managed-concurrency branch 3 times, most recently from ee1b647 to 4962bb7 Compare July 16, 2018 11:34

matklad force-pushed the managed-concurrency branch from 4962bb7 to dd1f805 Compare July 16, 2018 17:00

bors-voyager bot added a commit that referenced this pull request Jul 16, 2018

Try #923: (I don't need the @?)

2c88c72

matklad force-pushed the managed-concurrency branch from dd1f805 to 833a2a9 Compare July 17, 2018 21:30

nrc merged commit f7b4a9b into rust-lang:master Jul 18, 2018

matklad mentioned this pull request Jul 20, 2018

SIGILL in <crossbeam_channel::flavors::zero::Channel<T>>::recv crossbeam-rs/crossbeam-channel#76

Closed

matklad mentioned this pull request Oct 11, 2018

Simplify poisoning pattern kakoune-lsp/kakoune-lsp#106

Merged

matklad mentioned this pull request Feb 11, 2021

CI failure after upgrading actix near/nearcore#3925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managed concurrency #923

Managed concurrency #923

matklad commented Jun 30, 2018

alexheretic commented Jul 1, 2018 •

edited

Loading

matklad commented Jul 1, 2018

alexheretic commented Jul 1, 2018

matklad commented Jul 2, 2018

matklad commented Jul 4, 2018

nrc commented Jul 15, 2018

nrc Jul 15, 2018

matklad Jul 16, 2018

nrc Jul 15, 2018

matklad Jul 16, 2018

nrc Jul 15, 2018

matklad Jul 16, 2018

nrc commented Jul 16, 2018

matklad commented Jul 16, 2018

matklad commented Jul 16, 2018

matklad commented Jul 16, 2018

Xanewok commented Jul 16, 2018

matklad commented Jul 16, 2018

matklad commented Jul 16, 2018

bors-voyager bot commented Jul 16, 2018

Xanewok commented Jul 16, 2018

bors-voyager bot commented Jul 16, 2018

carols10cents commented Jul 16, 2018

nrc commented Jul 16, 2018

bors-voyager bot commented Jul 16, 2018

matklad commented Jul 16, 2018

nrc commented Jul 17, 2018

bors-voyager bot commented Jul 17, 2018

matklad commented Jul 17, 2018

matklad commented Jul 17, 2018

Xanewok commented Jul 18, 2018

Managed concurrency #923

Managed concurrency #923

Conversation

matklad commented Jun 30, 2018

alexheretic commented Jul 1, 2018 • edited Loading

matklad commented Jul 1, 2018

alexheretic commented Jul 1, 2018

matklad commented Jul 2, 2018

matklad commented Jul 4, 2018

nrc commented Jul 15, 2018

nrc Jul 15, 2018

Choose a reason for hiding this comment

matklad Jul 16, 2018

Choose a reason for hiding this comment

nrc Jul 15, 2018

Choose a reason for hiding this comment

matklad Jul 16, 2018

Choose a reason for hiding this comment

nrc Jul 15, 2018

Choose a reason for hiding this comment

matklad Jul 16, 2018

Choose a reason for hiding this comment

nrc commented Jul 16, 2018

matklad commented Jul 16, 2018

matklad commented Jul 16, 2018

matklad commented Jul 16, 2018

Xanewok commented Jul 16, 2018

matklad commented Jul 16, 2018

matklad commented Jul 16, 2018

bors-voyager bot commented Jul 16, 2018

Xanewok commented Jul 16, 2018

bors-voyager bot commented Jul 16, 2018

carols10cents commented Jul 16, 2018

nrc commented Jul 16, 2018

bors-voyager bot commented Jul 16, 2018

try

Build succeeded

matklad commented Jul 16, 2018

nrc commented Jul 17, 2018

bors-voyager bot commented Jul 17, 2018

Build failed

matklad commented Jul 17, 2018

matklad commented Jul 17, 2018

Xanewok commented Jul 18, 2018

alexheretic commented Jul 1, 2018 •

edited

Loading