Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: IO simplification #219

Closed
wants to merge 1 commit into from
Closed

Conversation

aturon
Copy link
Member

@aturon aturon commented Aug 29, 2014

This RFC proposes a significant simplification to the I/O stack distributed with
Rust. It proposes to move green threading into an external Cargo package, and
instead weld std::io directly to the native threading model.

The std::io module will remain completely cross-platform.

Rendered

@mcpherrinm
Copy link

While losing the ability for libraries to use either native or green IO is somewhat unfortunate, I think this is overall a positive change: There have been a few times where I have opted to just call the syscalls directly because adding support to libgreen was going to be too painful, so sidestepping the entire Rust IO stack was easier.

The embedding use cases are a lot nicer here, especially if we get better lowering operations so interacting with C code via file descriptor passing becomes cleaner.

Overall: Thumbs up and agreement.

I haven't thought a lot about the scheduling API changes yet, but will soon and then post another comment.

for more details.

- *Task-local storage*. The current implementation of task-local storage is
designed to work seamlessly across native and green threads, and its performs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performs -> performance

@Ericson2314
Copy link
Contributor

I certainly get the justification for having two layers of IO -- high-level platform agnostic vs low-level platform specific, but why 3?

I'd love it if some of the "abstract io" machinary such as the Reader and Writer traits are moved to a crate that doesn't depend on libsys transitively (except for via alloc as allocators will solve that). Reader and Writer could return Result<T, E> instead of IoResult<T> where E is an associated type.

Moreoever, with this the only libstd facade crates are libcore, liballoc, libnative, libunicode, and libsync (iirc); do we even need libstd at all? Each of these has its own well-defined niche.

@aturon
Copy link
Member Author

aturon commented Sep 2, 2014

@Ericson2314 The idea with the three layers is pretty straightforward. The lowest level is essentially direct C bindings. The highest level provides safe Rust abstractions that may do a fair amount of work over the underlying system calls. The middle level bridges between the two, allowing the high level to be implemented in a cross-platform way.

This design stays closest to how things are set up today, but in terms of API stabilization we will likely only mark the highest level as stable, to begin with. (That's what most Rust code is and will be built on, for now.)

It may turn out that the libnative layer doesn't justify itself, or that some other internal organization will be better. It will take some time to figure out the ideal architecture, so these lower-level APIs will likely not be stable for a while.

I agree that moving out some of the key io abstractions in libstd could be very helpful, and it's something I'd like to pursue as part of this work.

@aturon
Copy link
Member Author

aturon commented Sep 3, 2014

There's been a fair amount of discussion on this RFC already, but I wanted to reiterate some key points regarding the long-term vision:

  1. Part of the RFC's goal is to open the door to exposing more system capabilities directly. That includes async and/or nonblocking I/O interfaces. There's a lot of design work to be done there, but by decoupling I/O from libgreen it becomes much easier to evolve the native I/O model.

    A lot of the discussion in the comments has focused on whether blocking or async I/O is ultimately the best way to go, but that's not something that needs to be decided as part of this RFC. Rather, the proposal here is to decouple our current I/O system from green threading, so that we have more freedom to explore these options.

  2. Similarly, while we want to provide as broad of a cross-platform interface as we can, it may be necessary to provide some platform-specific functionality as part of Rust's libraries (e.g. we may want to expose radically different interfaces to nonblocking I/O). Again, this will need careful design work beyond the current RFC, but it's one of the long-term goals this RFC is meant to support.

  3. Green threading support is entirely removed from std::io in this RFC. The only remaining "hook" for userspace threading is the proposed scheduling API, which is intended for use when building pure Rust concurrency abstractions -- channels, barriers, phasers, group locks, concurrent containers, and so on. These abstractions need a uniform way to block and wake threads, and the scheduler API provides a simple abstraction which in some cases can be more efficient than using condvars, but also has the benefit of seamlessly working with green threading models if desired.

    This RFC does not require us to settle on the merits of green threading; the point is just to decouple std::io from green threading, so that they can evolve independently. (Though, FWIW, our planned focus in the near term is beefing up the native threading and I/O model.)

  4. The proposed scheduler abstraction would be part of libsync, but would be exposed along side native kernel abstractions like condvars.

So, to summarize, the intent of the RFC is to increase flexibility and to simplify our I/O story. It is meant to open doors (to nonblocking/async I/O, and platform-specific APIs), not close them.

@rrichardson
Copy link

FWIW, I am working on an (yet another) async IO library which features 3 layers:

  1. A poll trait that provides an interface over epoll, kqueue and poll/select implementations. (and IOCP if someone wishes to implement it)
  2. A futures library which relies on the poll trait
  3. A set of async/await macros which build a state machine to manage futures callbacks in place, in very much the same way that the C# async/await is implemented.

There are obviously downsides to this approach, but they are known (from wide usage in both .NET and also Clojure's async implementation, which is similar)

This can all exist as libraries in cargo.

Currently the only dependency on rust IO is native::io::file:fd_t

My hope is, from that point, to submit RFCs to canonize the libraries from 1 and 2, which will allow us to provide async and await keywords in the rust syntax, and do the code generation in a preprocessing step. If it makes it into 1.0, great, if not, oh well.

I have completed most of step one (epoll and the traits). I hope to have step 2 completed (as a POC) by the end of the week. Also note that this is not just a hobby project, but I will be committing time to this at my day job as well. We've deemed async IO in Rust as important to our success. So our effort will be to ensure that this code meets stability and documentation standards suitable for libraries.

@pcwalton
Copy link
Contributor

pcwalton commented Sep 3, 2014

@rrichardson Exciting! I'm stoked to see where your work goes.

@geertj
Copy link

geertj commented Sep 3, 2014

@rrichardson I'm curious to see how your poll trait looks like, and how you unify epoll/kqueue which is readyness based with IOCP which is completion based. I think the only way it can be done is to make the API itself completion based, like libuv has done.

@wycats
Copy link
Contributor

wycats commented Sep 3, 2014

@geertj @carllerche and I are working on (yet another 😉) IO library (mio). Our plan is to unify the readiness model by providing a slab buffer in Windows. "Ready to read" would mean that there is data in the slab to read and "ready to write" would mean that there was available space in the slab to write to.

In general, our goals for mio are zero-allocations, an efficient implementation of multiplexing across multiple "blocking" operations, and a portable readiness model.

We chose to unify around the readiness model because certain kinds of applications (high performance proxies) simply require it to get the optimal performance and zero allocations. It is also possible to build higher level abstractions on top of the readiness model, including callback-based APIs, futures, streams, and even APIs based on shallow or stackful coroutines.

The short version is that for optimal performance and minimal buffers in high-performance situations, you need the readiness model, and it doesn't preclude higher-level abstractions built on top. So why not 😄

@Ericson2314
Copy link
Contributor

@aturon Thanks for the clarification. It sounds like libnative will essentially be the libplatform (at least as far as IO is concerned) asked for in #185 --- which is awesome. Glad to see you'd like to move those traits too. I eagerly await how this turns out!

@geertj
Copy link

geertj commented Sep 3, 2014

@wycats isn't a slab buffer incompatible with zero allocation? I assume that you'd use IOCP to write into the slab buffer on Windows? And so a subsequent read() would copy from the buffer?

Also I'd be curious to see how efficient flow control will be (I have no idea myself, just wondering if you frequently need to start/continue IO when the buffer gets full and what the impact of this is).

Finally were you aware of https://github.com/piscisaureus/epoll_windows? It's been moved into libuv and has been sitting there for the last 2 years as a backend for uv_poll. I am not sure how suitable it is as it's not always available (but usually it is). Anyway it promises an efficient, readiness based model also on Windows, by hooking into some super low-level APIs (the same APIs I assume that WsaPoll and select() are using as a backend).

@rrichardson
Copy link

@geertj TBQH I am not even attempting to address completion vs readiness. This is definitely a readiness API. When I mentioned someone contributing IOCP I was being very optimistic.

In cases where the number of FDs to be polled is less than approximately 500, poll is actually more efficient than Epoll. My plan is to provide a poll interface which could easily match WSAPoll. But really, I haven't cared about Windows since I moved from Redmond to Manhattan 8 years ago ;)

I actually think the completion model is superior to the readiness model, but who am I to judge? I just need to make this work on Linux.

Interesting, though about the epoll_windows, perhaps someone can distill that notion into a pure epoll abstraction directly over IOCP. **edit** looking at the code, I now see that it is a pure epoll abstraction directly over IOCP. This seems like the best possible route to get async IO running on Windows... if someone were so inclined.

@rrichardson
Copy link

I was being a bit flippant about Windows support above, but I just want to point out that deciding to not support IOCP results in a much, much smaller and less complex codebase. About 10x less, I would guess. A poll/epoll/kqueue abstrattion is dead simple, you're basically just standardizing function and enum/event names.

I don't consider Windows a target for massively wide scaling IO, and therefore, don't see a reason to go through the hassle. WSA poll could handily support a few hundred sockets. If someone has a use case for something larger, I am all ears.

@wycats
Copy link
Contributor

wycats commented Sep 3, 2014

@geertj

isn't a slab buffer incompatible with zero allocation?

Zero amortized allocation 😄 You allocate the slab once for the lifetime of your application and then you're done.

Also I'd be curious to see how efficient flow control will be (I have no idea myself, just wondering if you frequently need to start/continue IO when the buffer gets full and what the impact of this is).

It's definitely an open question. We'll see!

Anyway it promises an efficient, readiness based model also on Windows, by hooking into some super low-level APIs (the same APIs I assume that WsaPoll and select() are using as a backend).

I was not! I'll read through the code and see how it works.

@carllerche
Copy link
Member

@geertj I am targeting zero allocations at runtime, and optimizing for posix vs. windows (though windows will be quite performant). MIO is going to be a low level abstraction, allowing building higher level abstractions on top of it. On windows, the buffer slab will be preallocated and gracefully handle filling the slab. The user of MIO will not have to worry about these details though.

@wycats
Copy link
Contributor

wycats commented Sep 3, 2014

I actually think the completion model is superior to the readiness model, but who am I to judge? I just need to make this work on Linux.

The completion model has fundamental limitations that make it very difficult to implement highest-performance streaming (for example proxies). In particular, since writing is async, and you may have multiple pending write operations, you need at least one buffer per concurrent write operation. In practice, people often end up with one buffer per pending read and one buffer per pending write.

The readiness model is well-optimized for reusing stack buffers because the reads and writes are synchronous (and non-blocking) once readiness is established. In Rust parlance, this means that the readiness model doesn't require transferring buffer ownership while the completion model does.

@nathanaschbacher
Copy link

I don't believe Golang has yields in loop back edges, although it does have yields in function preambles (now—it didn't originally), and green threads work quite well there. Because of Golang's experience (and ours in e.g. Servo) I'm not convinced that they are necessary for green threads to work.

I'm having a bit of a difficult time following the contention over the lightweight-tasks model as a concurrency primitive.

It seems like the favorable view is that it presents a unified concurrency API to the developer regardless of if they want LWT's or dedicated 1:1/Task:Thread behavior. This seems like a pretty useful unified abstraction to me. This is a position I'm sympathetic to as it's one of the things I really like about Erlang and Go.

The negative view if this model seems to be that it's trivially easy to block the green threads' (LWT) real underlying thread pool, potentially exhausting the pool as a result in the absence of a pre-empting scheduler or some auto-inlined yields, and ultimately defeating the supposed purpose of the LWT model. This is also a position I'm sympathetic to because it's trivially easy to induce this same behavior in Erlang when running code in a NIF, since the Erlang scheduler can't preempt code executing in a NIF context.

What I don't understand is why there's a conflict pushing the exclusion of one for the other. They're both essentially fine, precisely because Rust allows and escape hatch by allowing end users to decide which scheduler behavior to schedule any particular task on. Erlang got a similar mechanism in R17 with the dirty-schedulers work. You can intentionally schedule your native code to run on a thread pool that's outside the general VM scheduler pool so that you don't cause long-running/high-throughput native execution or legacy blocking IO to impede the execution and preemption of regular Erlang LWP's. Before the only way to do this this was to maintain your own long-running thread pool on the C side of the NIF boundary and a lot of monkeying around with resource maintenance as you go back and forth across that boundary, or just abandon using the NIF interface entirely and go C Port Driver instead (which is much more callback-y).

The fact that Rust already essentially offers this flexibility, by allowing you to choose which scheduler to run a given task on (libnative or libgreen) means that I as the end user can decide when to make a bunch of lightweight requests that I know are going to be low-latency (libgreen) or when to favor high-throughput or blocking IO calls on a dedicated thread pool (libnative) that are quarantined. All while consuming the exact same concurrency API... message passing between tasks.

There's a whole different discussion about how concerns of libnative and libgreen leak into IO library implementation. One that, having not written an IO library in Rust, I'm ill suited to discuss in-depth, but that seems like a sorta weird concern to warrant throwing out the lightweight-task baby with the implementation complexity bathwater. Especially since you're basically just sacrificing one sort of end user (people consuming a unified concurrency API) at the altar of another sort of user (IO library makers).

Am I missing something?

@thestinger
Copy link

@nathanaschbacher:

I'm having a bit of a difficult time following the contention over the lightweight-tasks model as a concurrency primitive.

None of this has to do with lightweight threads. Green threads are just as heavy as native ones, since the memory slab used as the stack is by far the most dominant resource. In fact, the current implementation of green threads is heavier than native threads because of the overhead from libuv. It uses a completion-based API, which implies a lot of memory allocation behind the scenes. It also needs to spin up a thread pool for anything not covered by OS AIO APIs, like stat.

On Linux, it doesn't even know how to use the kernel's AIO implementation and needs the thread pool even for normal file IO, but that could be fixed. It does mean that it's pretty hard to take libuv seriously as a backend for a performance oriented language, among other issues like total lack of support for a modern multi-threaded event loop.

It seems like the favorable view is that it presents a unified concurrency API to the developer regardless of if they want LWT's or dedicated 1:1/Task:Thread behavior. This seems like a pretty useful unified abstraction to me. This is a position I'm sympathetic to as it's one of the things I really like about Erlang and Go.

Another way of phrasing this is that Rust has a crippled concurrency and I/O API. It compares very unfavourably to other languages without the limitations brought on by green threads. Every feature implemented for native threads also has to be implemented via libuv for green threads, and it's very far from being up to the task. It makes maintenance much harder and both libuv and the Rust side of the code will be an endless stream of security vulnerabilities. It's a large library written in a memory unsafe language.

What I don't understand is why there's a conflict pushing the exclusion of one for the other. They're both essentially fine, precisely because Rust allows and escape hatch by allowing end users to decide which scheduler behavior to schedule any particular task on. Erlang got a similar mechanism in R17 with the dirty-schedulers work. You can intentionally schedule your native code to run on a thread pool that's outside the general VM scheduler pool so that you don't cause long-running/high-throughput native execution or legacy blocking IO to impede the execution and preemption of regular Erlang LWP's. Before the only way to do this this was to maintain your own long-running thread pool on the C side of the NIF boundary and a lot of monkeying around with resource maintenance as you go back and forth across that boundary, or just abandon using the NIF interface entirely and go C Port Driver instead (which is much more callback-y).

Rust's green threads are slower than native threads in every vaguely real world benchmark. There is simply no demonstrated use case for the feature. Unlike Erlang, it has no pre-emption for green threads because it's both infeasible to implement for native code and the lack of pre-emption overhead is the only advantage of green threads over native ones. The compiler also doesn't take the route of inserting yield points because it would have far too much overhead. Rust is a systems language and it's not going to make that kind of performance sacrifice.

The fact that Rust already essentially offers this flexibility, by allowing you to choose which scheduler to run a given task on (libnative or libgreen) means that I as the end user can decide when to make a bunch of lightweight requests that I know are going to be low-latency (libgreen) or when to favor high-throughput or blocking IO calls on a dedicated thread pool (libnative) that are quarantined. All while consuming the exact same concurrency API... message passing between tasks.

Green tasks are anything but low-latency. The scheduler can't do pre-emption and doesn't make any attempt to implement fairness. If you need low-latency, libgreen is the last thing you want to use. Resource usage also has nothing to do with it, they're not lighter.

There's a whole different discussion about how concerns of libnative and libgreen leak into IO library implementation. One that, having not written an IO library in Rust, I'm ill suited to discuss in-depth, but that seems like a sorta weird concern to warrant throwing out the lightweight-task baby with the implementation complexity bathwater. Especially since you're basically just sacrificing one sort of end user (people consuming a unified concurrency API) at the altar of another sort of user (IO library makers).

Every I/O and concurrency method goes through a virtual function table in order to support mixed green and native threads at runtime. This adds significant overhead at runtime and also results in massive code bloat. Instead of a small program created with -Z lto being 4-10kiB, it's nearly a megabyte. Rust also lacks support for fast thread-local storage since the OS implementation provided by the compiler / linker only works for native threads. The current task-local storage is more than one order of magnitude slower than proper thread local storage. The API is also crippled due to needing a separate implementation on top of libuv for any feature that's added.

Am I missing something?

Yes. Many of the things I've stated in this comment are part of the RFC text already.

@thestinger
Copy link

@pcwalton: Go's approach to the problem is pretending it doesn't exist. That's not suitable for programming large systems where a non-deterministic issue heavily dependent on the workload would be very hard to debug and eliminate from the software. It's a bigger problem for libgreen because it's unable to steal an ongoing I/O request to another scheduler. Blocking a scheduler in Rust implies blocking any ongoing I/O requests indefinitely, so it cannot be used for writing correct software without judiciously inserting yield checks. That could be fixed by abandoning libuv, but there seems to be no plan to do that.

Go has a tendency to sweep the edge cases under the rug since they only care about good enough. It's designed for Google's use cases where unpredictable latency and less than perfect reliability / safety are acceptable because they're able to throw lots and lots of hardware at every problem and distributed systems make individual failures more likely and less important. It has latency issues both due to the global garbage collector and a naive scheduler without fairness / pre-emption.

@wycats
Copy link
Contributor

wycats commented Sep 4, 2014

@thestinger I find your lengthy comment to be well-reasoned, reasonably complete and accurate. Thanks 😄

@andrew-d
Copy link

andrew-d commented Sep 4, 2014

FWIW, the silly little syntax extension, from above, now supports "yielding" inside loop bodies. Not sure how useful something like this would be, esp. since it only really works if you control the source of everything that might loop or call a function, but it's at least a PoC that something potentially useful could be done in a third-party syntax extension.

@nathanaschbacher
Copy link

@thestinger

Most of the issues seem like a scathing indictment of the implementation and choice of tethering that implementation it to libuv (which is admittedly an odd choice FWIW).

Though I'm not sure I follow the logic on green threads being slower. Of course they're slower. They ought to be. They're just supposed to have a lower resource, creation, and destruction footprint. Erlang LWP's are slower than fully occupying a scheduler too. Because you get can get descheduled, get thrown to the back of a work queue, have the overhead of reduction counting, and the work involved in sleeping/waking schedulers. I mean there's a pretty obvious tradeoff choice to be made, one which you decide between when choosing the io_mode for say Riak's Bitcask backend. Running in "erlang" mode screws with the VM schedulers significantly less and running in "nif" mode can produce sometimes considerable throughput gains with associated risks.

That they're slower isn't a problem per se, but that they're not actually lightweight does beg the question of... WTF? Making it the case that the only benefit they could even plausibly provide is lower context switching overhead. Though that seems unlikely in this case.

Having tasks, both lightweight and not-so-lightweight, be the unifying concurrency API still seems like a worthy goal. At this point is there enough time to create a runtime for Rust that can support both properly, given that libgreen/librustuv isn't it?

@thestinger
Copy link

@nathanaschbacher:

How are you going to get lightweight tasks? Haskell doesn't have a contiguous call stack so it doesn't have this problem. Erlang is probably in a similar boat and as a managed language it can do pre-emption based on counting the approximate number of instructions executed. The performance is in a completely different ballpark than Rust so the same concerns don't apply.

Go is using relocatable stacks, which is not possible in Rust because it has unrestricted raw pointers and safe lightweight references. Segmented stacks turned out to be a massive performance problem and were dropped. Even the stack space function preludes alone have a fairly high overhead, and are being replaced with stack probes.

I think the only acceptable solution would be static stack space analysis. It would be very difficult to implement and it would place a lot of restrictions on coding style. This would require quite a bit of work on LLVM upstream.

Though I'm not sure I follow the logic on green threads being slower. Of course they're slower. They ought to be. They're just supposed to have a lower resource, creation, and destruction footprint. Erlang LWP's are slower than fully occupying a scheduler too. Because you get can get descheduled, get thrown to the back of a work queue, have the overhead of reduction counting, and the work involved in sleeping/waking schedulers.

I/O operations from a green task are significantly slower, because completion-based AIO is slow. It implies thread pool synchronization, memory allocation and far more system calls. It will never compete with direct usage of blocking or non-blocking IO in terms of raw performance, and it won't scale nearly as well as direct usage of non-blocking IO / AIO. There are other high-level abstractions able to preserve more of the performance / scalability. I'm not talking about anything to do with scheduling. I/O in the M:N threading model is significantly slower even if it ends up having 1 scheduler thread for every green thread.

@thestinger
Copy link

Automatically inserted yields, segmented stacks, dynamic dispatch on blocking calls and slow TLS are never going to be sane sacrifices for a systems language to make. Green threads can still be implemented in a third party library, and without any of those sacrifices there's actually no benefit to it being integrated into the standard library. If static stack space analysis is ever implemented, it can be exposed via an intrinsic giving the worst-case stack usage of a function pointer.

@reem
Copy link

reem commented Sep 4, 2014

@thestinger I'm curious, do you think there is a future for green threads in Rust ever? By green threads I mean a task-like primitive which is lighter-weight than a full OS thread? If so, what does that primitive look like?

@thestinger
Copy link

I'm only interested in low-level / efficient support for non-blocking IO and AIO along with libraries and possibly language features building abstractions over those. I don't think green threads in Rust are ever going to be comparable to a language like Erlang and I don't see the point in offering a barely useful feature that's never going to be competitive with other languages.

If it was rewritten on top of lower level primitives rather than libuv and there was stack space analysis, then it could actually offer the lightweight feature but code written without them would still be significantly more efficient. Dealing with the strict limitations of stack space analysis would be quite painful too. Without automatically inserted yields and the dynamic dispatch system, it's just as easy to screw up with green threads as it is with a normal event loop. There's the same issue of libraries performing work without yields, either via loops / recursion, a blocking system call or page faults.

@nathanaschbacher
Copy link

@thestinger

There seems to be a misconception that green threads make life easier because you don't need to worry about blocking an event loop. The fact is that without automatically inserted yields and the current dynamic dispatch system, it's not actually any easier than dealing with a normal event loop via high-level abstractions.

This would seem like a misconception only in-use by people who've never used any lightweight process system to any depth. At some point they all have an edge case around long-blocking processes someplace that you have to design around (recall R17's dirty-schedulers).

The point of such systems to me is the API.

@reem

You can implement similar looking things in user land with execution/work queues containing closures and scheduling the queues across a set of consumer threads that invoke the closures, which is what I'm gathering @thestinger would prefer to see done given the complexity of making a lower-level solution in Rust that isn't fraught with peril.

@thestinger
Copy link

Rust's green threads are an I/O feature, they're not designed for CPU-bound work like work queues / task trees. The alternative to green threads is using non-blocking I/O directly or using a high-level abstraction like async/await or some sort of reactor pattern (Boost ASIO). CPU-bound work doesn't block on system calls so there is no need for anything more than work queues of closures and it's a first class citizen without any additional compiler support.

@arielb1
Copy link
Contributor

arielb1 commented Sep 4, 2014

@thestinger

Having preëmption-by-default with green threads forces you to worry about atomicity and add locks - so this isn't clearly a win - if your workload is inherently sequential you may have to add a big lock around everything and still have infinite loops hanging your program. The typical solution to this problem is a watchdog.

@Ericson2314
Copy link
Contributor

@nathanaschbacher
As someone partial to the idea of green threads, I think the basic problem is that they thrive in a language with very different cost model than rust. Green threads work well pervasive tracing GC, segmented stacks, etc, and generally choices that sacrifice low latency for throughput. Rust with it's costly frees, continuous stacks, etc, generally goes for low latency over throughput.

So just as native IO is bad for Haskell, green threads probably isn't the way to go for Rust. If you think of the choices I've mentioned as a 2^3 discrete design space, Haskell and Rust inhabit two local maxima, and other combinations are probably worse than both.

Eventually, I'd love for Rust to be flexible enough to try both ends of that design space -- e.g. we will be able to write a tracing GC in a library that will plugin to compiler-generated stackmaps, and enable or disable segmented stacks. But Rust is not there yet.

This RFC is no doubt a temporary setback for green threads, but working with less machinery on top of the OS IO primitives will give us more flexibility long term. It also will make writing a rust exokerenel easier, which, aside from being a personal goal of mine, will allow experiments in IO design space that are not possible on top of Unix or Windows.

@thestinger
Copy link

Contiguous stacks are only a good thing for throughput, which is why Rust dropped segmented stacks. The design of jemalloc isn't really a sacrifice of throughput compared to a garbage collector either. In fact, it amortizes the cost of allocation/deallocation to O(1) on average by performing incremental garbage collection on thread caches. Full static knowledge of when data can be freed isn't a bad thing and it doesn't imply that the allocator actually has to do something beyond writing 2 words. Paying a cost for tracking ownership via reference counting is the exception in Rust, not the norm.

@nathanaschbacher
Copy link

@Ericson2314

I guess that makes some sense to me, save for the whole point of LWP's and reduction-based preemption in Erlang is to facilitate its original design goal of making soft-realtime systems. Systems which favor latency over throughput. So I think you may have your performance axises reversed.

@thestinger

If LWT's can't actually be lightweight, then I'm begrudgingly in favor of punting libgreen to the hinterlands. I don't care if they can be blocked and not pre-empted as much since I know I can put tasks at risk of blocking onto their own native threads. The former seems like a defensible reason to accept the RFC, the latter not as much.

I'm loathe to have to implement my own user-land version of this functionality, but inevitably I will if I have to. :-/

@arthurprs
Copy link

Green threads shouldn't be preemptive-ish otherwise the overhead is far too great (despite my earlier thoughts). Golang is the major example but there're others: D vibe.d, Python gevent, etc..

@Ericson2314
Copy link
Contributor

@thestinger
Since tracing's time complexity is a function of the number of live objects, while free is called on every dead object, I'm a bit suspicious of most algorithmic analysis arguments as it's technically apples and oranges. That said I know nothing about jemalloc, and I absolutely agree Rust with it's static lifetimes means that all arguments for GC need to be revised. Is there any paper or something similar that describes jemalloc? I'd love to read it.

@thestinger
Copy link

@Ericson2314: There's no paper on the current jemalloc design that I'm aware of. There's some good documentation of the internals in the man page.

There's no connection between the number of allocations and the time complexity of free since it can find the metadata for every non-huge allocation in O(1) and then mark it as free in the run's bitmap. The common case is pushing it onto the thread local free list for that size class though, and then the thread cache is flushed out to the arena.

The main costs in the average case where it's not allocating more memory from the OS or doing the lazy dirty page purging are just constant time ones, such as the branch for supporting valgrind, a branch for supporting a user-specified arena, etc. It can all be optimized better in the future.

@Ericson2314
Copy link
Contributor

Thanks for the link, the man page was an interesting read.

This has little do with the RFC, but for posterity: notice I said "free is called on every dead object". While free is called once per dead object, one trace can clean a huge number of dead objects. Assuming free is not only O(1) in the common case, but also O(1) amortized, the time spent to free all dead objects at a given point is O(dead-objects), while the time spent to do the same thing with (one) tracing gc call is O(live-objects).

@aturon
Copy link
Member Author

aturon commented Sep 5, 2014

An update on this RFC: after the discussion on this thread and further discussion with the core team, we've decided to close this RFC and open a new one that:

  1. Lays out the design space, vision, and priorities more clearly, and
  2. More aggressively removes all of the Runtime features.

I will be overhauling the text and posting a new RFC in the near future, and will post a link to it once available.

@aturon aturon closed this Sep 5, 2014
@bnoordhuis
Copy link

On Linux, it doesn't even know how to use the kernel's AIO implementation and needs the thread pool even for normal file IO, but that could be fixed.

Sorry, I realize I'm a bit late to the party. The flippant response to your remark is "no, it can't." :-)

I've looked hard and long at the io_*() family of functions and I have regrettably come to the conclusion that they are broken to the extent that native AIO is only usable under very controlled conditions. It might be an option when the file system and kernel are known but not in general: the list of bugs that AIO has suffered over time is long and worrisome, from silently performing blocking I/O to memory leaks and data corruption.

"When the kernel is known" doesn't mean that version sniffing is sufficient. Vendor kernels are patchworks of forward-ported and back-ported changesets where the version number is essentially meaningless. You would have to perform feature detection but that is a hard problem. The presence of AIO support is easily established but the presence of reliable AIO is not nearly as simple.

Summary: Linux native AIO has been carefully considered but found lacking.

It does mean that it's pretty hard to take libuv seriously as a backend for a performance oriented language

A word of admonishment: you were making unwarranted assumptions. I'm not mad but it doesn't do you credit either.

among other issues like total lack of support for a modern multi-threaded event loop.

That was a deliberate design decision (with added alliterative appeal). If it's an issue for you, you should bring it up on the libuv mailing list - ditto for AIO - but please check the mailing list archives first, both subjects have been discussed before.

@thestinger
Copy link

"When the kernel is known" doesn't mean that version sniffing is sufficient. Vendor kernels are patchworks of forward-ported and back-ported changesets where the version number is essentially meaningless. You would have to perform feature detection but that is a hard problem. The presence of AIO support is easily established but the presence of reliable AIO is not nearly as simple.

Version detection works fine as a conservative estimate.

Summary: Linux native AIO has been carefully considered but found lacking.

It has been good enough for lighttpd and nginx for years, although they still have the blocking IO backend because it's often (usually?) faster than non-blocking AIO operations. It's always going to be way faster and lighter than farming out the work to a thread pool though.

A word of admonishment: you were making unwarranted assumptions. I'm not mad but it doesn't do you credit either.

I don't think they're unwarranted assumptions. It also doesn't make use of the signalfd / timerfd family of functions. It's not the weakest link when it's being used by node.js which has no threads and isn't a lightweight systems language, but it is the weakest link for Rust.

That was a deliberate design decision (with added alliterative appeal).

It's not a bad design decision, but it means it's quite broken when used in the context of M:N threading. Rust has to pin an ongoing I/O request to the scheduler it was started on, so work stealing doesn't work as expected and a request can time out if a specific scheduler is too busy / blocked.

@aturon
Copy link
Member Author

aturon commented Sep 9, 2014

The new RFC has been posted.

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Use SeqCst instead of Acquire and Release in Lock.
wycats pushed a commit to wycats/rust-rfcs that referenced this pull request Mar 5, 2019
Update Ember Issue of Module Unification RFC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.