Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

greenlet.switch triggers address sanitizer #113

Closed
tbodt opened this issue Dec 1, 2016 · 13 comments
Closed

greenlet.switch triggers address sanitizer #113

tbodt opened this issue Dec 1, 2016 · 13 comments

Comments

@tbodt
Copy link

tbodt commented Dec 1, 2016

Here's the output:

==17897==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7fff5a996440 at pc 0x00010580e9d6 bp 0x7fff5a996340 sp 0x7fff5a995ae8
READ of size 17424 at 0x7fff5a996440 thread T0
    #0 0x10580e9d5  (libclang_rt.asan_osx_dynamic.dylib+0x479d5)
    #1 0x1101b1e05 in slp_save_state (greenlet.so+0x2e05)
    #2 0x1101b1d1c in slp_switch (greenlet.so+0x2d1c)
    #3 0x1101b1431 in g_switch (greenlet.so+0x2431)
    #4 0x1101b2005 in green_switch (greenlet.so+0x3005)
    #5 0x105445f03 in PyEval_EvalFrameEx ceval.c:4352
    #6 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #7 0x1052dcce3 in function_call funcobject.c:523
    #8 0x10526bf28 in PyObject_Call abstract.c:2547
    #9 0x10545aeb3 in PyEval_CallObjectWithKeywords ceval.c:4221
    <my code. snip>
    #23 0x105445f03 in PyEval_EvalFrameEx ceval.c:4352
    #24 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #25 0x1052dcce3 in function_call funcobject.c:523
    #26 0x10526bf28 in PyObject_Call abstract.c:2547
    #27 0x106db4649 in partial_call _functoolsmodule.c:197
    #28 0x10526bf28 in PyObject_Call abstract.c:2547
    #29 0x10545aeb3 in PyEval_CallObjectWithKeywords ceval.c:4221
    #30 0x1101b1b29 in g_initialstub (greenlet.so+0x2b29)
    #31 0x1101b13ac in g_switch (greenlet.so+0x23ac)
    #32 0x1101b2005 in green_switch (greenlet.so+0x3005)
    #33 0x105445f03 in PyEval_EvalFrameEx ceval.c:4352
    #34 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #35 0x1052dcce3 in function_call funcobject.c:523
    #36 0x10526bf28 in PyObject_Call abstract.c:2547
    #37 0x10545aeb3 in PyEval_CallObjectWithKeywords ceval.c:4221
    <more of my code. snip>
    #51 0x105445f03 in PyEval_EvalFrameEx ceval.c:4352
    #52 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #53 0x1052dcce3 in function_call funcobject.c:523
    #54 0x10526bf28 in PyObject_Call abstract.c:2547
    #55 0x106db4649 in partial_call _functoolsmodule.c:197
    #56 0x10526bf28 in PyObject_Call abstract.c:2547
    #57 0x10545aeb3 in PyEval_CallObjectWithKeywords ceval.c:4221
    #58 0x1101b1b29 in g_initialstub (greenlet.so+0x2b29)
    #59 0x1101b13ac in g_switch (greenlet.so+0x23ac)
    #60 0x1101b2005 in green_switch (greenlet.so+0x3005)
    #61 0x105445f03 in PyEval_EvalFrameEx ceval.c:4352
    #62 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #63 0x1052dcce3 in function_call funcobject.c:523
    #64 0x10526bf28 in PyObject_Call abstract.c:2547
    #65 0x10544870e in PyEval_EvalFrameEx ceval.c:4666
    #66 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #67 0x1052dcce3 in function_call funcobject.c:523
    #68 0x10526bf28 in PyObject_Call abstract.c:2547
    #69 0x10544870e in PyEval_EvalFrameEx ceval.c:4666
    #70 0x10545c18e in fast_function ceval.c:4437
    #71 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #72 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #73 0x1052dcce3 in function_call funcobject.c:523
    #74 0x10526bf28 in PyObject_Call abstract.c:2547
    #75 0x105297b4c in instancemethod_call classobject.c:2602
    #76 0x10526bf28 in PyObject_Call abstract.c:2547
    #77 0x10545aeb3 in PyEval_CallObjectWithKeywords ceval.c:4221
    #78 0x10529183f in PyInstance_New classobject.c:581
    #79 0x10526bf28 in PyObject_Call abstract.c:2547
    #80 0x105445cac in PyEval_EvalFrameEx ceval.c:4569
    #81 0x10545c18e in fast_function ceval.c:4437
    #82 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #83 0x10545c18e in fast_function ceval.c:4437
    #84 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #85 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #86 0x10545bf6a in fast_function ceval.c:4447
    #87 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #88 0x10545c18e in fast_function ceval.c:4437
    #89 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #90 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #91 0x1052dcce3 in function_call funcobject.c:523
    #92 0x10526bf28 in PyObject_Call abstract.c:2547
    #93 0x105297b4c in instancemethod_call classobject.c:2602
    #94 0x10526bf28 in PyObject_Call abstract.c:2547
    #95 0x105381936 in slot_tp_call typeobject.c:5546
    #96 0x10526bf28 in PyObject_Call abstract.c:2547
    #97 0x105445cac in PyEval_EvalFrameEx ceval.c:4569
    #98 0x10545c18e in fast_function ceval.c:4437
    #99 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #100 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #101 0x1052dcce3 in function_call funcobject.c:523
    #102 0x10526bf28 in PyObject_Call abstract.c:2547
    #103 0x10544870e in PyEval_EvalFrameEx ceval.c:4666
    #104 0x10545c18e in fast_function ceval.c:4437
    #105 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #106 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #107 0x1052dcce3 in function_call funcobject.c:523
    #108 0x10526bf28 in PyObject_Call abstract.c:2547
    #109 0x105297b4c in instancemethod_call classobject.c:2602
    #110 0x10526bf28 in PyObject_Call abstract.c:2547
    #111 0x10545aeb3 in PyEval_CallObjectWithKeywords ceval.c:4221
    #112 0x10529183f in PyInstance_New classobject.c:581
    #113 0x10526bf28 in PyObject_Call abstract.c:2547
    #114 0x105445cac in PyEval_EvalFrameEx ceval.c:4569
    #115 0x10545c18e in fast_function ceval.c:4437
    #116 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #117 0x10545c18e in fast_function ceval.c:4437
    #118 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #119 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #120 0x10545bf6a in fast_function ceval.c:4447
    #121 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #122 0x10545c18e in fast_function ceval.c:4437
    #123 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #124 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #125 0x1052dcce3 in function_call funcobject.c:523
    #126 0x10526bf28 in PyObject_Call abstract.c:2547
    #127 0x105297b4c in instancemethod_call classobject.c:2602
    #128 0x10526bf28 in PyObject_Call abstract.c:2547
    #129 0x105381936 in slot_tp_call typeobject.c:5546
    #130 0x10526bf28 in PyObject_Call abstract.c:2547
    #131 0x10544870e in PyEval_EvalFrameEx ceval.c:4666
    #132 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #133 0x10545bf6a in fast_function ceval.c:4447
    #134 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #135 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #136 0x1052dcce3 in function_call funcobject.c:523
    #137 0x10526bf28 in PyObject_Call abstract.c:2547
    #138 0x105297b4c in instancemethod_call classobject.c:2602
    #139 0x10526bf28 in PyObject_Call abstract.c:2547
    #140 0x10545aeb3 in PyEval_CallObjectWithKeywords ceval.c:4221
    #141 0x10529183f in PyInstance_New classobject.c:581
    #142 0x10526bf28 in PyObject_Call abstract.c:2547
    #143 0x105445cac in PyEval_EvalFrameEx ceval.c:4569
    #144 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #145 0x1052dcce3 in function_call funcobject.c:523
    #146 0x10526bf28 in PyObject_Call abstract.c:2547
    #147 0x10544870e in PyEval_EvalFrameEx ceval.c:4666
    #148 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #149 0x10545bf6a in fast_function ceval.c:4447
    #150 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #151 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #152 0x10545bf6a in fast_function ceval.c:4447
    #153 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #154 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #155 0x1052dcce3 in function_call funcobject.c:523
    #156 0x10526bf28 in PyObject_Call abstract.c:2547
    #157 0x10544870e in PyEval_EvalFrameEx ceval.c:4666
    #158 0x10545c18e in fast_function ceval.c:4437
    #159 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #160 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #161 0x1052dcce3 in function_call funcobject.c:523
    #162 0x10526bf28 in PyObject_Call abstract.c:2547
    #163 0x105297b4c in instancemethod_call classobject.c:2602
    #164 0x10526bf28 in PyObject_Call abstract.c:2547
    #165 0x10545aeb3 in PyEval_CallObjectWithKeywords ceval.c:4221
    #166 0x10529183f in PyInstance_New classobject.c:581
    #167 0x10526bf28 in PyObject_Call abstract.c:2547
    #168 0x105445cac in PyEval_EvalFrameEx ceval.c:4569
    #169 0x10545c18e in fast_function ceval.c:4437
    #170 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #171 0x10545c18e in fast_function ceval.c:4437
    #172 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #173 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #174 0x10545bf6a in fast_function ceval.c:4447
    #175 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #176 0x10545c18e in fast_function ceval.c:4437
    #177 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #178 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #179 0x1052dcce3 in function_call funcobject.c:523
    #180 0x10526bf28 in PyObject_Call abstract.c:2547
    #181 0x105297b4c in instancemethod_call classobject.c:2602
    #182 0x10526bf28 in PyObject_Call abstract.c:2547
    #183 0x105381936 in slot_tp_call typeobject.c:5546
    #184 0x10526bf28 in PyObject_Call abstract.c:2547
    #185 0x105445cac in PyEval_EvalFrameEx ceval.c:4569
    #186 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #187 0x1052dcce3 in function_call funcobject.c:523
    #188 0x10526bf28 in PyObject_Call abstract.c:2547
    #189 0x10544870e in PyEval_EvalFrameEx ceval.c:4666
    #190 0x10545c18e in fast_function ceval.c:4437
    #191 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #192 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #193 0x10545bf6a in fast_function ceval.c:4447
    #194 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #195 0x10545c18e in fast_function ceval.c:4437
    #196 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #197 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #198 0x1052dcce3 in function_call funcobject.c:523
    #199 0x10526bf28 in PyObject_Call abstract.c:2547
    #200 0x105297b4c in instancemethod_call classobject.c:2602
    #201 0x10526bf28 in PyObject_Call abstract.c:2547
    #202 0x105381936 in slot_tp_call typeobject.c:5546
    #203 0x10526bf28 in PyObject_Call abstract.c:2547
    #204 0x105445cac in PyEval_EvalFrameEx ceval.c:4569
    #205 0x10545c18e in fast_function ceval.c:4437
    #206 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #207 0x10545c18e in fast_function ceval.c:4437
    #208 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #209 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #210 0x1052dcce3 in function_call funcobject.c:523
    #211 0x10526bf28 in PyObject_Call abstract.c:2547
    #212 0x10544870e in PyEval_EvalFrameEx ceval.c:4666
    #213 0x10545c18e in fast_function ceval.c:4437
    #214 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #215 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #216 0x10545bf6a in fast_function ceval.c:4447
    #217 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #218 0x10545c18e in fast_function ceval.c:4437
    #219 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #220 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #221 0x1052dcce3 in function_call funcobject.c:523
    #222 0x10526bf28 in PyObject_Call abstract.c:2547
    #223 0x105297b4c in instancemethod_call classobject.c:2602
    #224 0x10526bf28 in PyObject_Call abstract.c:2547
    #225 0x105381936 in slot_tp_call typeobject.c:5546
    #226 0x10526bf28 in PyObject_Call abstract.c:2547
    #227 0x105445cac in PyEval_EvalFrameEx ceval.c:4569
    #228 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #229 0x10545bf6a in fast_function ceval.c:4447
    #230 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #231 0x10545c18e in fast_function ceval.c:4437
    #232 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #233 0x10545c18e in fast_function ceval.c:4437
    #234 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #235 0x10545c18e in fast_function ceval.c:4437
    #236 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #237 0x10545c18e in fast_function ceval.c:4437
    #238 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #239 0x10545c18e in fast_function ceval.c:4437
    #240 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #241 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #242 0x10545bf6a in fast_function ceval.c:4447
    #243 0x10544596b in PyEval_EvalFrameEx ceval.c:4372
    #244 0x10543d757 in PyEval_EvalCodeEx ceval.c:3584
    #245 0x10543c271 in PyEval_EvalCode ceval.c:669
    #246 0x1054cb84c in PyRun_FileExFlags pythonrun.c:1376
    #247 0x1054caa58 in PyRun_SimpleFileExFlags pythonrun.c:948
    #248 0x105508d71 in Py_Main main.c:640
    #249 0x7fffc8cb6254 in start (libdyld.dylib+0x5254)

Address 0x7fff5a996440 is located in stack of thread T0 at offset 0 in frame
    #0 0x10543e1cf in PyEval_EvalFrameEx ceval.c:690

  This frame has 18 object(s):
    [32, 36) 'cf.i' <== Memory access at offset 0 partially underflows this variable
    [48, 56) 'str.i' <== Memory access at offset 0 partially underflows this variable
    [80, 84) 'cf113.i' <== Memory access at offset 0 partially underflows this variable
    [96, 104) 'type.i' <== Memory access at offset 0 partially underflows this variable
    [128, 136) 'value.i' <== Memory access at offset 0 partially underflows this variable
    [160, 168) 'traceback.i' <== Memory access at offset 0 partially underflows this variable
    [192, 200) 'ptype.i' <== Memory access at offset 0 partially underflows this variable
    [224, 232) 'pvalue.i' <== Memory access at offset 0 partially underflows this variable
    [256, 264) 'ptraceback.i' <== Memory access at offset 0 partially underflows this variable
    [288, 296) 'type.addr.i' <== Memory access at offset 0 partially underflows this variable
    [320, 328) 'value.addr.i' <== Memory access at offset 0 partially underflows this variable
    [352, 360) 'tb.addr.i' <== Memory access at offset 0 partially underflows this variable
    [384, 392) 'bounds.i' <== Memory access at offset 0 partially underflows this variable
    [416, 424) 'sp' <== Memory access at offset 0 partially underflows this variable
    [448, 456) 'sp5779' <== Memory access at offset 0 partially underflows this variable
    [480, 488) 'exc' <== Memory access at offset 0 partially underflows this variable
    [512, 520) 'val' <== Memory access at offset 0 partially underflows this variable
    [544, 552) 'tb' <== Memory access at offset 0 partially underflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-underflow (libclang_rt.asan_osx_dynamic.dylib+0x479d5)
Shadow bytes around the buggy address:
  0x1fffeb532c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1fffeb532c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1fffeb532c50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1fffeb532c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1fffeb532c70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1fffeb532c80: 00 00 00 00 00 00 00 00[f1]f1 f1 f1 04 f2 00 f2
  0x1fffeb532c90: f2 f2 04 f2 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2
  0x1fffeb532ca0: 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2
  0x1fffeb532cb0: 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2
  0x1fffeb532cc0: 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 f3 f3 f3
  0x1fffeb532cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==17897==ABORTING

Probably the most important line from that:

HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
@tbodt tbodt changed the title Greenlet does not work with address sanitizer greenlet.switch triggers address sanitizer Dec 1, 2016
@snaury
Copy link
Contributor

snaury commented Dec 1, 2016

I notice you have v8 on your stack. As far as I know greenlet is fundamentally incompatible with v8 (or any C++ code that shares pointers to objects on the stack globally). The reason for this is that when greenlets switch the stack is overwritten back and forth and objects become temporarily invalid until you switch back to that greenlet, thus any access to shared objects becomes a hazard.

In this instance there might be another issue in that greenlet truly does overwrite the stack and it might not look much different from a buffer overflow to a sanitizer. You probably need to patch address sanitizer to teach it how to deal with stack switching in greenlet.

@tbodt
Copy link
Author

tbodt commented Dec 1, 2016

I know that greenlet is incompatible with V8.

This is not just about V8 though, it also happens when V8 is not involved.

The asan runtime includes functions that stack switchers can call to shut up these messages, you could probably use those.

https://github.com/llvm-mirror/compiler-rt/blob/0b95585616bd28fc0b738289bcc5f7887d7c304e/include/sanitizer/common_interface_defs.h#L164-L184

@tbodt
Copy link
Author

tbodt commented Dec 1, 2016

Regarding V8 compatibility, is it theoretically possible to reimplement greenlet on top of libcoro? node-fibers uses that, and it works.

@snaury
Copy link
Contributor

snaury commented Dec 2, 2016

My big concerns are stack sizes, switch performance and platform compatibility. For me a big appeal of greenlet was that it doesn't use dedicated stacks, you only pay for the delta of used stack space which makes it possible to create millions of greenlets and they cost very little. Next, swapcontext is very slow and needs a lot of space for all that state, but dedicated stacks mean you need to worry about their sizes. If the size is too small (Coro on perl uses something like 64-128kb, which is a lot btw) you risk running out and crashing, if the size is too big you'd have to use mmap for efficiency and then you run other risks, like running out of the maximum number of mmaps per process.

Personally I was thinking about rewriting greenlet in cython and using boost::context for quite a while, however such a library wouldn't be called greenlet (maybe greenlet-context or whatever, since the number of supported platforms would be much smaller than what greenlet currently supports and what people surprisingly actually use), and I'm not interested in it that much. If you have spare time you may try going that route, however I wouldn't accept such a solution into the mainline greenlet, as it would have entirely different set of drawbacks than what people currently expect.

@snaury
Copy link
Contributor

snaury commented Dec 2, 2016

The asan runtime includes functions that stack switchers can call to shut up these messages, you could probably use those.

I'm not sure those apply, since greenlet doesn't actually switch stacks, it reuses system stack by overwriting it with saved deltas. Currently it doesn't even know where the bottom or what the size of the stack is, so it doesn't look like those callbacks apply to what greenlet does.

@tbodt
Copy link
Author

tbodt commented Dec 2, 2016

libcoro has a variety of different implementations:

  • Using assembler to switch esp, works on unix+windows and both 32 and 64 bit intel chips
  • Using a strange combination of sigaltstack and setjmp that works on any unix where those calls are available
  • Using ucontext on unix if supported
  • Using fibers on windows
  • If all else fails, using pthread

You can choose at compile time. So I don't think platform support is a problem.

About performance:

  • The assembler-based version uses the minimal amount of instructions to swap out esp and restore registers.
  • The sigaltstack/setjmp version is pretty slow to create coroutines, because it has to send a signal to itself to trigger sigaltstack creation, but longjmp probably isn't doing a lot more than the custom assembler is.
  • The rest are pretty slow, but the first two look to me like they would cover most platforms in use. Correct me if I'm wrong.

Stack space is definitely still an issue. FWIW libcoro includes stack allocation code that uses mmap if it's supported and malloc otherwise.

I'd like to make a libcoro-based version of greenlet, but the problem is getting libraries such as gevent to use it. My choices are to either make the fork appear in sys.modules under the name greenlet (which I really don't want to do), or to get something merged into greenlet. If I could pull off something that lets you choose if you want to use libcoro on a per-coroutine basis, could that be merged, or would it be too disgusting?

@tbodt
Copy link
Author

tbodt commented Dec 2, 2016

And you're right, the asan functions don't really apply to greenlet. They would apply to libcoro though.

@snaury
Copy link
Contributor

snaury commented Dec 2, 2016

It's not about disgust, if you make a fork I would probably gladly start using it myself, and if you convince gevent and eventlet to use it, then even better! :) It would work better, would have less opportunities for crashing and would probably be a lot faster too.

But as a maintainer I feel my primary responsibility is not breaking code for people that already use greenlet for whatever scenario. And scenarios for using greenlet are different for different people too, just 4-5 years ago I had something like 100k greenlets sloshing around in a single process and I have serious doubts any mmap based solution would have worked for me back then. Now that Go is more mature I would have preferred to use Go, and wouldn't want to do anything like that in Python ever again, so I personally wouldn't be affected. However I'm sure there's someone out there who's doing something crazy like that right now and mmap based solution would stop them from succeeding.

Making a patch that makes the new mode optional (behind something like greenlet.enable_awesome_mode()) would probably work, but it would add a lot of complexity to an already complex code, and for what? People won't even know they need to use it, and I have no way of checking all 18+ platforms (and many more variants) for breakage. New library and a clean break is a lot better, at least that's how I would have done it if I was still interested enough. There's way too much complexity in greenlet right now that has to deal with unexpected gc and unexpected switching, with libcoro you'd have to deal with a lot less surprises like that, since stack cannot get invalidated right under you. It would be a lot faster too (no malloc and copy on every switch), but I'm not the one you need to convince.

I think if you make a new library and it's a lot faster (and trust me it will be, 6500ns on benchmarks/chain.py shouldn't be hard to beat), then people would recognize safety and a different set of tradeoffs are worth it. If you make using it as easy as from something import greenlet, then porting and trying it out should be easy. You shouldn't want to integrate it into the current mess of greenlet.

@tbodt
Copy link
Author

tbodt commented Dec 2, 2016

Sounds like a fun weekend project. 😄

@tbodt tbodt closed this as completed Dec 2, 2016
@tbodt
Copy link
Author

tbodt commented Dec 2, 2016

Make greenlets great again!

@navytux
Copy link
Contributor

navytux commented Aug 2, 2019

Sounds like a fun weekend project.
Make greenlets great again!

@tbold, just curious, did anything happened on your side on this topic?

Thanks beforehand for feedback,
Kirill

@tbodt
Copy link
Author

tbodt commented Aug 3, 2019

I made this: http://github.com/tbodt/greenstack. Try not to use it in production.

navytux added a commit to navytux/pygolang that referenced this issue Aug 29, 2019
- Move channels implementation to be done in C++ inside libgolang. The
  code and logic is based on previous Python-level channels
  implementation, but the new code is just C++ and does not depend on
  Python nor GIL at all, and so works without GIL if libgolang
  runtime works without GIL(*).

  (*) for example "thread" runtime works without GIL, while "gevent" runtime
      acquires GIL on every semaphore acquire.

  New channels implementation is located in δ(libgolang.cpp).

- Provide low-level C channels API to the implementation. The low-level
  C API was inspired by Libtask[1] and Plan9/Libthread[2].

  [1] Libtask: a Coroutine Library for C and Unix. https://swtch.com/libtask.
  [2] http://9p.io/magic/man2html/2/thread.

- Provide high-level C++ channels API that provides type-safety and
  automatic channel lifetime management.

  Overview of C and C++ APIs are in δ(libgolang.h).

- Expose C++ channels API at Pyx level as Cython/nogil API so that Cython
  programs could use channels with ease and without need to care about
  lifetime management and low-level details.

  Overview of Cython/nogil channels API is in δ(README.rst) and
  δ(_golang.pxd).

- Turn Python channels to be tiny wrapper around chan<PyObject>.

Implementation note:

- gevent case needs special care because greenlet, which gevent uses,
  swaps coroutine stack from C stack to heap on coroutine park, and
  replaces that space on C stack with stack of activated coroutine
  copied back from heap. This way if an object on g's stack is accessed
  while g is parked it would be memory of another g's stack.

  The channels implementation explicitly cares about this issue so that
  stack -> * channel send, or * -> stack channel receive work correctly.

  It should be noted that greenlet approach, which it inherits from
  stackless, is not only a bit tricky, but also comes with overhead
  (stack <-> heap copy), and prevents a coroutine to migrate from 1 OS
  thread to another OS thread as that would change addresses of on-stack
  things for that coroutine.

  As the latter property prevents to use multiple CPUs even if the
  program / runtime are prepared to work without GIL, it would be more
  logical to change gevent/greenlet to use separate stack for each
  coroutine. That would remove stack <-> heap copy and the need for
  special care in channels implementation for stack - stack sends.
  Such approach should be possible to implement with e.g. swapcontext or
  similar mechanism, and a proof of concept of such work wrapped into
  greenlet-compatible API exists[3]. It would be good if at some point
  there would be a chance to explore such approach in Pygolang context.

  [3] python-greenlet/greenlet#113 (comment) and below

Just this patch brings in the following speedup at Python level:

 (on [email protected])

thread runtime:

    name             old time/op  new time/op  delta
    go               20.0µs ± 1%  15.6µs ± 1%  -21.84%  (p=0.000 n=10+10)
    chan             9.37µs ± 4%  2.89µs ± 6%  -69.12%  (p=0.000 n=10+10)
    select           20.2µs ± 4%   3.4µs ± 5%  -83.20%  (p=0.000 n=8+10)
    def              58.0ns ± 0%  60.0ns ± 0%   +3.45%  (p=0.000 n=8+10)
    func_def         43.8µs ± 1%  43.9µs ± 1%     ~     (p=0.796 n=10+10)
    call             62.4ns ± 1%  63.5ns ± 1%   +1.76%  (p=0.001 n=10+10)
    func_call        1.06µs ± 1%  1.05µs ± 1%   -0.63%  (p=0.002 n=10+10)
    try_finally       136ns ± 0%   137ns ± 0%   +0.74%  (p=0.000 n=9+10)
    defer            2.28µs ± 1%  2.33µs ± 1%   +2.34%  (p=0.000 n=10+10)
    workgroup_empty  48.2µs ± 1%  34.1µs ± 2%  -29.18%  (p=0.000 n=9+10)
    workgroup_raise  58.9µs ± 1%  45.5µs ± 1%  -22.74%  (p=0.000 n=10+10)

gevent runtime:

    name             old time/op  new time/op  delta
    go               24.7µs ± 1%  15.9µs ± 1%  -35.72%  (p=0.000 n=9+9)
    chan             11.6µs ± 1%   7.3µs ± 1%  -36.74%  (p=0.000 n=10+10)
    select           22.5µs ± 1%  10.4µs ± 1%  -53.73%  (p=0.000 n=10+10)
    def              55.0ns ± 0%  55.0ns ± 0%     ~     (all equal)
    func_def         43.6µs ± 1%  43.6µs ± 1%     ~     (p=0.684 n=10+10)
    call             63.0ns ± 0%  64.0ns ± 0%   +1.59%  (p=0.000 n=10+10)
    func_call        1.06µs ± 1%  1.07µs ± 1%   +0.45%  (p=0.045 n=10+9)
    try_finally       135ns ± 0%   137ns ± 0%   +1.48%  (p=0.000 n=10+10)
    defer            2.31µs ± 1%  2.33µs ± 1%   +0.89%  (p=0.000 n=10+10)
    workgroup_empty  70.2µs ± 0%  55.8µs ± 0%  -20.63%  (p=0.000 n=10+10)
    workgroup_raise  90.3µs ± 0%  70.9µs ± 1%  -21.51%  (p=0.000 n=9+10)

The whole Cython/nogil work - starting from 8fa3c15 (Start using Cython
and providing Cython/nogil API) to this patch - brings in the following
speedup at Python level:

 (on [email protected])

thread runtime:

    name             old time/op  new time/op  delta
    go               92.9µs ± 1%  15.6µs ± 1%  -83.16%  (p=0.000 n=10+10)
    chan             13.9µs ± 1%   2.9µs ± 6%  -79.14%  (p=0.000 n=10+10)
    select           29.7µs ± 6%   3.4µs ± 5%  -88.55%  (p=0.000 n=10+10)
    def              57.0ns ± 0%  60.0ns ± 0%   +5.26%  (p=0.000 n=10+10)
    func_def         44.0µs ± 1%  43.9µs ± 1%     ~     (p=0.055 n=10+10)
    call             63.5ns ± 1%  63.5ns ± 1%     ~     (p=1.000 n=10+10)
    func_call        1.06µs ± 0%  1.05µs ± 1%   -1.31%  (p=0.000 n=10+10)
    try_finally       139ns ± 0%   137ns ± 0%   -1.44%  (p=0.000 n=10+10)
    defer            2.36µs ± 1%  2.33µs ± 1%   -1.26%  (p=0.000 n=10+10)
    workgroup_empty  98.4µs ± 1%  34.1µs ± 2%  -65.32%  (p=0.000 n=10+10)
    workgroup_raise   135µs ± 1%    46µs ± 1%  -66.35%  (p=0.000 n=10+10)

gevent runtime:

    name             old time/op  new time/op  delta
    go               68.8µs ± 1%  15.9µs ± 1%  -76.91%  (p=0.000 n=10+9)
    chan             14.8µs ± 1%   7.3µs ± 1%  -50.67%  (p=0.000 n=10+10)
    select           32.0µs ± 0%  10.4µs ± 1%  -67.57%  (p=0.000 n=10+10)
    def              58.0ns ± 0%  55.0ns ± 0%   -5.17%  (p=0.000 n=10+10)
    func_def         43.9µs ± 1%  43.6µs ± 1%   -0.53%  (p=0.035 n=10+10)
    call             63.5ns ± 1%  64.0ns ± 0%   +0.79%  (p=0.033 n=10+10)
    func_call        1.08µs ± 1%  1.07µs ± 1%   -1.74%  (p=0.000 n=10+9)
    try_finally       142ns ± 0%   137ns ± 0%   -3.52%  (p=0.000 n=10+10)
    defer            2.32µs ± 1%  2.33µs ± 1%   +0.71%  (p=0.005 n=10+10)
    workgroup_empty  90.3µs ± 0%  55.8µs ± 0%  -38.26%  (p=0.000 n=10+10)
    workgroup_raise   108µs ± 1%    71µs ± 1%  -34.64%  (p=0.000 n=10+10)

This patch is the final patch in series to reach the goal of providing
channels that could be used in Cython/nogil code.

Cython/nogil channels work is dedicated to the memory of Вера Павловна Супрун[4].

[4] https://navytux.spb.ru/memory/%D0%A2%D1%91%D1%82%D1%8F%20%D0%92%D0%B5%D1%80%D0%B0.pdf#page=3
@navytux
Copy link
Contributor

navytux commented Aug 29, 2019

@tbodt, thanks a lot for feedback. It would be intresting to try to switch to something like greenstack in Pygolang context eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants