-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the Compiler and Language Service API Asynchronous #1857
Comments
👍 Does that mean you will ship and support a promise library? |
Honestly after the hell I have gone through to write brackets-typescript, and make languageService works in a webworker, I can only 👍 👍 👍 this one. |
I think it'd still be bring-your-own-promise; whatever we use would probably not be an official endorsement of what to use. My own preference would be for us to use a polyfilled ES6 promise so that we can actually get a sense of how we support that scenario. |
Just FYI : people will still need to run in a webworker (or separate node process). Making |
Right - for an optimal experience, you really need to get some multithreading into the mix. In our case, we do some caching to avoid blocking IO operations, so this would help us in terms of reducing our memory footprint. |
👍 |
Definitely try https://atom.io/packages/atom-typescript again if you haven't before. Now I'm running a With a parent : https://github.com/TypeStrong/atom-typescript/blob/df110e7218815dc05be90102f46aad7bd2f04d29/lib/worker/messages.ts#L1-L50 There are still cleanups to do but the experience is awesome. /cc (and thanks) @csnover |
I've wrapped it up in an architecture that is extremely simple to understand / debug / extend. Also documented it (thanks to excellent support by @csnover) https://github.com/TypeStrong/atom-typescript/blob/master/CONTRIBUTING.md#architecture |
This doesn't actually buy us much because we're not I/O-bound, and we don't have any I/O that can be easily parallelized (we produce a list of files basically serially). The impact on the compiler base would be very large for not any real perf gain. |
Agreed. Anyone using the language service will still need to run it in the background even in this case as even after we have read all the FS files doing something like |
The LS is very IO bound. Furthermore, the LS needs to be sync to be a good VS/Roslyn citizen. We must make the LS async for scalability :-)
From: Ryan Cavanaughmailto:[email protected] This doesn't actually buy us much because we're not I/O-bound, and we don't have any I/O that can be easily parallelized (we produce a list of files basically serially). The impact on the compiler base would be very large for not any real perf gain. — |
I don't think we must :) we still can use the ts server, which can Handel most of the io. We should close this issue and file another one when we do perf and scalability work. Possibly with P2P work. |
@mhegazy This is already an issue for the VS case today. Right now we end up having to load up all files on the VS side before we can even call into the script side to answer a question. We do this so we have the data ready when the LS asks the host "what has changed between these files?" And we have to do this, even though the LS might not ever ask that question (because the version hasn't changed). If we were async through-and-through, we wouldn't have this problem. We'd simply call into the script LS. It would synchronize info with the host, and then the LS would just ask for the changes to files with different versions in an async manner. This would be much better, and would provide a lot less stress on the managed side (for example, Roslyn could flush these files to their cache, instead of having to keep them all in memory.). |
To add to what Cyrus said, I think it would generally be helpful to see some data regarding this before we came to any sort of conclusion - both on the compiler side as well as the LS side. |
@CyrusNajmabadi I understand how the system works :). what I am saying is this should be done as part of other work items. We need to do scalability and performance work items, but these are not scheduled.. So all I am saying is close this issue, and when we start looking into these work items (possibly as we are doing P2P) see what are our options and what we need to do.. |
I know this is not javascript, but I made a tool to deal with similar issue in javascript: |
Any news on this? |
@felixfbecker Nice! do you know if actual LanguageService methods like getCompletionsAtPosition, getEditsForRefactor supports this too ? If so it would be a cheap PR since as I understand it only imply to change the signatures not the implementation right ? Thanks |
Note: asynchrony here is very tricky. This is because hte language service and compiler work over mutable data structures (for example, the syntax trees in source files). So, you have to be extremely careful that your async calls don't mean you come back and the world is now in a different state than you expect. The only place async/await could potentially work would be in the initial sync-up between the LS and the host. And even then, we wouldn't want to allow htat to be interrupted with our sync-up calls. Once hte LS starts processing though, it basically should not be preempted with asynchrony. |
@CyrusNajmabadi thanks, but a question, when you said "asynchronicity here", you are referring to all interfaces like CompilerHost, LanguageServiceHost (the ones that perform FS) or to LanguageService only ? In other words, is the @felixfbecker suspicion that FS access internally support async functions out of the box although this is not formalized in the signatures ? |
I think the question is a little too broad for me to be able to take a bite of that. Basically, all i'm saying is: Asynchrony presents concurrency/reentrancy concerns that you have to be super careful with. Specifically, at any point where you 'await', you may have arbitrary code run before the 'await' returns'. This is highly problematic for the LS without being super careful. Specifically, consider how any LS operation works. Something calls into the LS, then the LS attempts to synchronize with the host mutating shared state as it does so. If, during synchronization, an asynchronous call happens, then someone else may come in and mutate that shared state. This is a solvable problem. In general, by going with the work-queue model. In other words, any LS operation that comes in is put into a queue and only one operation can come off that queue and operate (even asynchronously) at at time. It just needs to be done carefully and conscientiously. Often times, many projects can jsut move to async/await, virally extending out the async APIs as necessary. This isn't the case here. Because of shared mutable state, special care needs to be taken (somewhat equivalent to how one might need to lock in multi-threaded systems). |
These types of dependencies are blocking due to TypeScript's synchronous implementation: microsoft/TypeScript#1857 However, if you register your urls beforehand (the react.html demo), I can asynchronously pull them down. Todo: I'll need to look into seeing if I can marry the two ideas. Greater customisation has been added. When calling `compile()` you can provide your own - callback for transpiled code - algorithm for finding resource files when given an import string - function for outputing the compiler's warnings (id est, you can quiet it by passing no-op `() => {}`) Todo: add import paths. So far only relative paths or fully-qualified urls work; need some default import directories. The module shim has been improved to no longer assume that 'require' and 'exports' will be the first two dependencies when calling `define()`. You can safely call it yourself for non-ts code without knowing internally how it works, just as you would with another AMD library. Todo: use a real AMD library, or somehow otherwise fix the cyclical import issues. I've had a thought that I could use real es6 modules if I include a ServiceWorker to do the compilation (so I can trap requests to the ts files, and return actual js). But this will be a stretch goal; it will certainly break my current `file://` compatibility. Included as a third demo just to test these new changes the html file of the main demo for `klesun/ts-browser`. It has minimal changes and almost works, but there are a tonne of cyclical imports it seems. I either need to address my library's lacking ability in this regard, or selectively override the specific files in order to see if the library can handle a larger project like his.
I hope this will get prioritized |
If you await all the way back up the call stack then there shouldn't be any introduction of possible race conditions. There should be no "If, during synchronization, an asynchronous call happens" because everything will be awaiting the readFile() just like in the synchronous code. |
How do you ensure that? |
Hey folks! What is the state of this feature? PS: Make the Compiler great again! |
1. Does a canonical transformation of sync to async which guarantees same order of execution exist?To clarify the meaning of "same order of execution", I mean the same order of execution of that code which modifies/reads external objects.
I am not convinced that is true. Let us suppose that, in the case of a single top level function that calls multiple lower single level functions, after conversion to async, and properly For the remainder of this discussion, I will assume that such a "safe" transformation exists in Javascript, because it is a concurrent language. However, if that it wrong, then the remainder of the discussion is on shaky ground. 2. If such canonical conversion exists, it must be automated to be useful.For the sake of argument, let us suppose that under some canonical transformation to async that guarantees same order of code execution were guaranteed. Then there is still the issue of achieving that canonical transformation. Doing that manually is time consuming and bug prone. If it were done by humans, it could take months for some selected sync version to be translated to async and debugged. For one thing, all the generic array iterator functions, ( Therefore it seem safe to assume that an asynchronous version will never be implemented unless automated transformation by software is available. Progress to date supports that assumption. So obviously such transformation to a canonical async form should be done by automation, if possible. Is it possible? It seems to me that automated transformation to async should be possible. 3. Scope of transformation
Right, it is not necessary to change every function to async.
The collection of all selected function are targeted for transformation. 4. Making use of the transformed versionA transformation to async which guarantees same order of execution is not going to be any faster than the original. As the async version is gradually and manually optimized, the original version will be undergoing gradual version improvements.
When a new synchronous version T(N+1) of Typescript is prepared, an automated transformation from T(N+1) to AsyncT(N+1) is generated. For the most part, git merging can be used to merge AsyncT(N)-optimized and AsyncT(N+1) into 5. Support of synchronous APIFurthermore, there are clients of the current API who depend upon the synchronous API. So those versions have to be maintained. You have to decide whether to maintain two separate versions of the API, one having some async calls, and the other having no async calls, or one version of the API contain two differently named versions of some calls, e.g., 6. ConclusionIn conclusion, a clean path to leveraging asynchronous IO exists, provided the following conditions are satisfied:
(I'd love to work on that project if I could find a life support system to make that possible.) |
@craigphicks I don't really understand the argument that is being made. All i'm saying is this: The TS compiler/language-server is heavily mutable (by design). The current design presumes that any mutation is safe because it happens, and then all processing that needs to view that mutation will run after it all teh way to completion. In other words, computation that operates on that mutable state will not ever see that mutable state change while it executes. If we allow for asynchrony here in operations, then this assumption may get blown out the window. A computation operation may yield, allowing another operation to start. That operation may mutate the shared state, which the old operation will now see when it resumes execution. That will violate invariants of the system and can easily lead to corruption and/or crashes/data-loss. -- In no way am I saying that TS cannot become async. What i'm saying simply is that it is not trivial and would need to be done witha very conscious eye to ensure that either shared state mutations do not happen. Or, if they do, they cannot cause problems to in-flight operations. |
If any LS api becomes async, that would allow another LS api to operate concurrently with it. An LS API request can and will mutate the shared state of the LS server. As such, if an LS api yields, and another LS api then runs, it can mutate shared state that will break the first LS api call when it resumes. |
I don't think the line between sync and async is so stark in this case. While it's true that while the compiler is waiting on I/O, something else might alter program state it turns out that this is already true with a synchronous compiler host - the host can already modify program state in the existing sync callbacks. That would maybe be unusual, but there's certainly no existing guarantees here in any place where the compiler calls out to user code, and those callouts are precisely the places that need to be made async. |
No it can't. Because it is sync (and JS is single threaded) the host can't make any changes as the event thread is blocked progressing whatever sync operation is in flight.
Yes there are. We don't modify shared mutable state once an LS computation operation starts. That's a necessary current invariant on how the system is built. We would either need to find some way to keep this invariant in an async world. Or we'd need to update all our features to support the idea that the shared state they are examining may mutate across an async yield point. |
This isn't about single or multi threading - the sync callback itself, say |
I think Cyrus is talking more about the risks of having to be very intentional in preventing state modification. An arbitrary host can create a language service, bork the state, and if an API consumer filed an issue saying "it hurts when I do this", much of the time we'd probably say "don't do that". On the other hand, if you allow the API to re-enter asynchronously, that becomes a much more deliberate API commitment - in fact, I think that's probably half the use-case of making the API asynchronous. Even if we tell people "don't do that", it's way too easy for API consumers to accidentally re-enter the API. So either we would have to harden the API against that, or we'd have to tell people "don't re-enter" which I would think is unfortunate. |
How? These calls just return values that the LS uses to update it's own state. That state is then able to be used for the remainder of hte computation.
It really isn't. Calls to user-provided callbacks don't generally make reentrancy happen. Calls to awaits do. |
It get what you are saying - that there are both multi-part read and write operations which must take place atomically - at least atomic writings cannot overlap atomic readings or other atomic writings, although atomic readings can overlap each other without issue. Moreover, it might be the case that those multi-part atomic write operations in LS include calls to How about using context (LS vs non-LS) to conditionally call async-IO under On the subject of atomic writes and reads, that's an issue that atomic-RW supporting databases handle. Perhaps we could avoid reinventing the wheel and use such a database as an intermediary for the language service? |
i see no way to do that without enormous complexity. TS is not Roslyn. It cannot trivially fork the state of the LS (esp. given the perf constraints of JS engines). So it has to take a direct mutation approach just to maintain reasonable perf (and even with that, perf is a continual issue). TS's design is heavily predicated on shared mutation. So either we have to:
|
My thinking may to too simple, or maybe the overhead is too high. Inside LS, an javascript- Ideally, the database would detect that read request overlaps a write in progress (e.g., same file) and would not return with the old version, but would make the read wait until the write was finished. (No database duplication.) The overhead of including a database is certainly high - that imposes requirements on the end users too I think - it makes ensuring compatabilty of all parts harder. The only requirement it satisfies is ensuring transactional consistency, and the DB gets blown away when the language server stops. |
What database are you referring to? We have no databases here in our impl. As stated above, we have a shared-memory, mutable-state, representation of everything.
So this is either 1 or 3 (closer to 3 from my reading). It would be an entire redesign of how everything works from top to bottom. We'd need to introduce a DB indirection layer into the deepest parts of our system, which have been tuned heavily for perf. There is no indication whatsoever that this could possibly work or meet any sort of perf constraints we have. |
This is a huge requirement. :) It's effectively stating: move away from a shared-state/mutable system as our backing store. Yes, that would potentially address things. But it would need an effective full rewrite of major parts of teh stack. And there's no data at all that indicates that this could be implemented in an efficient enough fashion. If we were to contemplate a (near) full rewrite, i'd rather just go the roslyn approach and have an shared-state/immutable (rather than mutable) model. |
Unwind my database detour that would not be practical nor easier. Here is a plan: protocol for ensuring atomic transaction consistencyAt the heart of the matter is the protocol for ensuring atomic transaction consistency. The per file interface could look like
or something that shifted the burden of waiting before execution to the gatekeeper such as
which would return a function that managed holding onto the arguments before calling the function, and also handle the later Something that would make the caller wait, but would still type check the arguments and would take care of calling
The caller code could call like this
or
The instantiation of the interface would use a couple of queues to keep track of requests, one queue each for reads and writes, with writes taking precedence. Also, it would keep track of the number of currently executing reads/writes. For the next request to start,
The gatekeeper logic is activated on every event - i.e., a new request or a currently executing read or write completing. It's not even necessary to use nodejs messaging - it can be achieved more transparently with promises managed by gatekeeper. Careful development and testing
At this point, with gatekeeper in place at the entry to each atomic operation, it should be safe to call LS in a re-entrant manner, which is required for testing. (Mistakes in identifying must-be-atomic operations, or race conditions on global variables would make it unsafe of course, which is one reason for testing.) However, to fully optimize it is also necessary convert the IO operations from sync to async, and group calls to run concurrently if/where it would help:
Note: The automatic transform of sync to async creates a huge amount of code changes across a wide range of code. We don't have to worry about maintaining most of that, e.g., if the sync version master undergoes drastic change, then a new auto async version can be generated. We only have worry about merging the code of functions in T and the immediate callers of functions in T into that new auto-generated version.
|
That feels like an effective rewrite of the TypeScript compiler/LS , along with teh development of several new systems and tools needed to support the sync/async hybrid model. It's also unclear how any of this is enforced. It seems to be a cooperative reader/writer lock approach. But a single errant yield means mutations may happen when problematic. In a codebase the size and complexity of TS, i don't see how that is prevented. |
As described above it is only enforced inside the library. I can see the viewpoint that atomicity must be enforced outside of the LS as well, and that would have to be via the LS interface. For example:
Obviously this is an atomic combined read-then-write action. It is guaranteed to be atomically executed when the LS is synchronous, and therefore the client program must be calling it synchronously, provided the client is not calling from multiple separate parallel threads (actual OS level threads). If the client is calling from multiple separate parallel threads then the atomicity is not guaranteed, even in the sync-version of LS. Therefore it is safe to assume that we are only dealing with clients running a in a single thread (actual OS level thread). Now suppose language server is changed to async. Re-entrant calls could break the atomicity of a single File Rename Action ensemble. Now we are at a fork in the road:
I'll provide argument for both sides. Argument for 1. Clients should handle that themselves.
Argument for 2. Clients should be provided with help (a lock interface) to avoid incorrect re-entrant calls.
How the LS side assistance (a lock interface) would workThe gatekeeper logic would be extended, and an external interface provided. This extension would allow a client to await for a read or write lock, and to release that lock when it is done. That would be implemented as another command type in protocol.d.ts. A modification of the development plan
Exactly right, I agree. @CyrusNajmabadi 2018
I believe that my concrete suggestions are basically just filling in details of expressed in your overall 2018 master vision. The gatekeeper interface does utilize read and write queues just as you mention, and as discussed it is possible to provide a client side IF to the gatekeeper to assist the client in grouping their must-be-atomic command groupings (variously read only, or mutating). (Note, the queues contain only promises to enforce waiting, rather than a queue of tasks, but the overall effect is the same.) At the top level of LS, where commands are received, we (initially) code for just the following 7 cases:
(* assuming we code for client side locking) Of course there are more than 7 commands, but each of them would fall within one of those 7 cases, Referring back to the steps 1- 5 mentioned in my previous post, those 7 are the functions in T, and they are very high up in the program at the top of the call tree. So high up, that automated sync->async software is not absolutely required for that, because the scope of change is small. In the first pass of development, the code beneath or adjacent to the functions in T do not need to made async. The program will still work and maintain logically safe performance even if they (the code beneath or adjacent to the functions in T) are not async. In this first pass, there is not yet performance improvement. Later (careful and deliberate) development would improve performance as follows:
I very much appreciate your critical eye helping to push this proposal into better shape. Your judgement is indispensable. (Aside: One thing I forgot to mention earlier in previous posts is about the need to check for any variables that might be in the a shared top level scope between re-entrant LS calls, but that really should belong to only one calls scope.) |
I feel strongly the the current sync libraries and the async libraries should be kept separate. All version improvements other than the async conversion changes should be applied to the sync version, and the async conversion branch updated from the sync version. A client of LS would use one or the other. I don't see any way to mix them for LS. I don't see any reason to mix them for compiler API. If async performance gains were sufficient, adoption would naturally increase, especially new projects. At some point, non-async version up support could be dropped. I would guess that async is not the only bottleneck. It sometimes seems the compiler is loading a lot a files housing symbols which are never ever actually referenced during |
Automated transformation (from sync to async) sounds like a good idea, Is there any tools/project can do or trying to do this? |
It makes very little sense that IO-bound operations (e.g. readFile, writeFile) are synchronous when we run on a platform whose touted strength is in writing IO-bound applications through asynchronous non-blocking APIs.
Compile times could probably be shaved down by a bit, especially for large projects with many files. Additionally, tooling experience should be as responsive as possible and avoid blocking in the event that IO takes place, enabling several optimizations to free memory on the managed side.
With #1664, this should be significantly easier.
This means that at the very least, the following would change to have asynchronous counterparts.
createSourceFile
(because of/// <reference ... />
comments)createProgram
The text was updated successfully, but these errors were encountered: