-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
greenlet.switch triggers address sanitizer #113
Comments
I notice you have v8 on your stack. As far as I know greenlet is fundamentally incompatible with v8 (or any C++ code that shares pointers to objects on the stack globally). The reason for this is that when greenlets switch the stack is overwritten back and forth and objects become temporarily invalid until you switch back to that greenlet, thus any access to shared objects becomes a hazard. In this instance there might be another issue in that greenlet truly does overwrite the stack and it might not look much different from a buffer overflow to a sanitizer. You probably need to patch address sanitizer to teach it how to deal with stack switching in greenlet. |
I know that greenlet is incompatible with V8. This is not just about V8 though, it also happens when V8 is not involved. The asan runtime includes functions that stack switchers can call to shut up these messages, you could probably use those. |
Regarding V8 compatibility, is it theoretically possible to reimplement greenlet on top of libcoro? node-fibers uses that, and it works. |
My big concerns are stack sizes, switch performance and platform compatibility. For me a big appeal of greenlet was that it doesn't use dedicated stacks, you only pay for the delta of used stack space which makes it possible to create millions of greenlets and they cost very little. Next, swapcontext is very slow and needs a lot of space for all that state, but dedicated stacks mean you need to worry about their sizes. If the size is too small (Coro on perl uses something like 64-128kb, which is a lot btw) you risk running out and crashing, if the size is too big you'd have to use mmap for efficiency and then you run other risks, like running out of the maximum number of mmaps per process. Personally I was thinking about rewriting greenlet in cython and using boost::context for quite a while, however such a library wouldn't be called greenlet (maybe greenlet-context or whatever, since the number of supported platforms would be much smaller than what greenlet currently supports and what people surprisingly actually use), and I'm not interested in it that much. If you have spare time you may try going that route, however I wouldn't accept such a solution into the mainline greenlet, as it would have entirely different set of drawbacks than what people currently expect. |
I'm not sure those apply, since greenlet doesn't actually switch stacks, it reuses system stack by overwriting it with saved deltas. Currently it doesn't even know where the bottom or what the size of the stack is, so it doesn't look like those callbacks apply to what greenlet does. |
libcoro has a variety of different implementations:
You can choose at compile time. So I don't think platform support is a problem. About performance:
Stack space is definitely still an issue. FWIW libcoro includes stack allocation code that uses mmap if it's supported and malloc otherwise. I'd like to make a libcoro-based version of greenlet, but the problem is getting libraries such as gevent to use it. My choices are to either make the fork appear in sys.modules under the name greenlet (which I really don't want to do), or to get something merged into greenlet. If I could pull off something that lets you choose if you want to use libcoro on a per-coroutine basis, could that be merged, or would it be too disgusting? |
And you're right, the asan functions don't really apply to greenlet. They would apply to libcoro though. |
It's not about disgust, if you make a fork I would probably gladly start using it myself, and if you convince gevent and eventlet to use it, then even better! :) It would work better, would have less opportunities for crashing and would probably be a lot faster too. But as a maintainer I feel my primary responsibility is not breaking code for people that already use greenlet for whatever scenario. And scenarios for using greenlet are different for different people too, just 4-5 years ago I had something like 100k greenlets sloshing around in a single process and I have serious doubts any mmap based solution would have worked for me back then. Now that Go is more mature I would have preferred to use Go, and wouldn't want to do anything like that in Python ever again, so I personally wouldn't be affected. However I'm sure there's someone out there who's doing something crazy like that right now and mmap based solution would stop them from succeeding. Making a patch that makes the new mode optional (behind something like I think if you make a new library and it's a lot faster (and trust me it will be, 6500ns on benchmarks/chain.py shouldn't be hard to beat), then people would recognize safety and a different set of tradeoffs are worth it. If you make using it as easy as |
Sounds like a fun weekend project. 😄 |
Make greenlets great again! |
@tbold, just curious, did anything happened on your side on this topic? Thanks beforehand for feedback, |
I made this: http://github.com/tbodt/greenstack. Try not to use it in production. |
- Move channels implementation to be done in C++ inside libgolang. The code and logic is based on previous Python-level channels implementation, but the new code is just C++ and does not depend on Python nor GIL at all, and so works without GIL if libgolang runtime works without GIL(*). (*) for example "thread" runtime works without GIL, while "gevent" runtime acquires GIL on every semaphore acquire. New channels implementation is located in δ(libgolang.cpp). - Provide low-level C channels API to the implementation. The low-level C API was inspired by Libtask[1] and Plan9/Libthread[2]. [1] Libtask: a Coroutine Library for C and Unix. https://swtch.com/libtask. [2] http://9p.io/magic/man2html/2/thread. - Provide high-level C++ channels API that provides type-safety and automatic channel lifetime management. Overview of C and C++ APIs are in δ(libgolang.h). - Expose C++ channels API at Pyx level as Cython/nogil API so that Cython programs could use channels with ease and without need to care about lifetime management and low-level details. Overview of Cython/nogil channels API is in δ(README.rst) and δ(_golang.pxd). - Turn Python channels to be tiny wrapper around chan<PyObject>. Implementation note: - gevent case needs special care because greenlet, which gevent uses, swaps coroutine stack from C stack to heap on coroutine park, and replaces that space on C stack with stack of activated coroutine copied back from heap. This way if an object on g's stack is accessed while g is parked it would be memory of another g's stack. The channels implementation explicitly cares about this issue so that stack -> * channel send, or * -> stack channel receive work correctly. It should be noted that greenlet approach, which it inherits from stackless, is not only a bit tricky, but also comes with overhead (stack <-> heap copy), and prevents a coroutine to migrate from 1 OS thread to another OS thread as that would change addresses of on-stack things for that coroutine. As the latter property prevents to use multiple CPUs even if the program / runtime are prepared to work without GIL, it would be more logical to change gevent/greenlet to use separate stack for each coroutine. That would remove stack <-> heap copy and the need for special care in channels implementation for stack - stack sends. Such approach should be possible to implement with e.g. swapcontext or similar mechanism, and a proof of concept of such work wrapped into greenlet-compatible API exists[3]. It would be good if at some point there would be a chance to explore such approach in Pygolang context. [3] python-greenlet/greenlet#113 (comment) and below Just this patch brings in the following speedup at Python level: (on [email protected]) thread runtime: name old time/op new time/op delta go 20.0µs ± 1% 15.6µs ± 1% -21.84% (p=0.000 n=10+10) chan 9.37µs ± 4% 2.89µs ± 6% -69.12% (p=0.000 n=10+10) select 20.2µs ± 4% 3.4µs ± 5% -83.20% (p=0.000 n=8+10) def 58.0ns ± 0% 60.0ns ± 0% +3.45% (p=0.000 n=8+10) func_def 43.8µs ± 1% 43.9µs ± 1% ~ (p=0.796 n=10+10) call 62.4ns ± 1% 63.5ns ± 1% +1.76% (p=0.001 n=10+10) func_call 1.06µs ± 1% 1.05µs ± 1% -0.63% (p=0.002 n=10+10) try_finally 136ns ± 0% 137ns ± 0% +0.74% (p=0.000 n=9+10) defer 2.28µs ± 1% 2.33µs ± 1% +2.34% (p=0.000 n=10+10) workgroup_empty 48.2µs ± 1% 34.1µs ± 2% -29.18% (p=0.000 n=9+10) workgroup_raise 58.9µs ± 1% 45.5µs ± 1% -22.74% (p=0.000 n=10+10) gevent runtime: name old time/op new time/op delta go 24.7µs ± 1% 15.9µs ± 1% -35.72% (p=0.000 n=9+9) chan 11.6µs ± 1% 7.3µs ± 1% -36.74% (p=0.000 n=10+10) select 22.5µs ± 1% 10.4µs ± 1% -53.73% (p=0.000 n=10+10) def 55.0ns ± 0% 55.0ns ± 0% ~ (all equal) func_def 43.6µs ± 1% 43.6µs ± 1% ~ (p=0.684 n=10+10) call 63.0ns ± 0% 64.0ns ± 0% +1.59% (p=0.000 n=10+10) func_call 1.06µs ± 1% 1.07µs ± 1% +0.45% (p=0.045 n=10+9) try_finally 135ns ± 0% 137ns ± 0% +1.48% (p=0.000 n=10+10) defer 2.31µs ± 1% 2.33µs ± 1% +0.89% (p=0.000 n=10+10) workgroup_empty 70.2µs ± 0% 55.8µs ± 0% -20.63% (p=0.000 n=10+10) workgroup_raise 90.3µs ± 0% 70.9µs ± 1% -21.51% (p=0.000 n=9+10) The whole Cython/nogil work - starting from 8fa3c15 (Start using Cython and providing Cython/nogil API) to this patch - brings in the following speedup at Python level: (on [email protected]) thread runtime: name old time/op new time/op delta go 92.9µs ± 1% 15.6µs ± 1% -83.16% (p=0.000 n=10+10) chan 13.9µs ± 1% 2.9µs ± 6% -79.14% (p=0.000 n=10+10) select 29.7µs ± 6% 3.4µs ± 5% -88.55% (p=0.000 n=10+10) def 57.0ns ± 0% 60.0ns ± 0% +5.26% (p=0.000 n=10+10) func_def 44.0µs ± 1% 43.9µs ± 1% ~ (p=0.055 n=10+10) call 63.5ns ± 1% 63.5ns ± 1% ~ (p=1.000 n=10+10) func_call 1.06µs ± 0% 1.05µs ± 1% -1.31% (p=0.000 n=10+10) try_finally 139ns ± 0% 137ns ± 0% -1.44% (p=0.000 n=10+10) defer 2.36µs ± 1% 2.33µs ± 1% -1.26% (p=0.000 n=10+10) workgroup_empty 98.4µs ± 1% 34.1µs ± 2% -65.32% (p=0.000 n=10+10) workgroup_raise 135µs ± 1% 46µs ± 1% -66.35% (p=0.000 n=10+10) gevent runtime: name old time/op new time/op delta go 68.8µs ± 1% 15.9µs ± 1% -76.91% (p=0.000 n=10+9) chan 14.8µs ± 1% 7.3µs ± 1% -50.67% (p=0.000 n=10+10) select 32.0µs ± 0% 10.4µs ± 1% -67.57% (p=0.000 n=10+10) def 58.0ns ± 0% 55.0ns ± 0% -5.17% (p=0.000 n=10+10) func_def 43.9µs ± 1% 43.6µs ± 1% -0.53% (p=0.035 n=10+10) call 63.5ns ± 1% 64.0ns ± 0% +0.79% (p=0.033 n=10+10) func_call 1.08µs ± 1% 1.07µs ± 1% -1.74% (p=0.000 n=10+9) try_finally 142ns ± 0% 137ns ± 0% -3.52% (p=0.000 n=10+10) defer 2.32µs ± 1% 2.33µs ± 1% +0.71% (p=0.005 n=10+10) workgroup_empty 90.3µs ± 0% 55.8µs ± 0% -38.26% (p=0.000 n=10+10) workgroup_raise 108µs ± 1% 71µs ± 1% -34.64% (p=0.000 n=10+10) This patch is the final patch in series to reach the goal of providing channels that could be used in Cython/nogil code. Cython/nogil channels work is dedicated to the memory of Вера Павловна Супрун[4]. [4] https://navytux.spb.ru/memory/%D0%A2%D1%91%D1%82%D1%8F%20%D0%92%D0%B5%D1%80%D0%B0.pdf#page=3
Here's the output:
Probably the most important line from that:
The text was updated successfully, but these errors were encountered: