-
-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cython's gc_track and gc_untrack #13896
Comments
Patch to more reliably produce crash |
comment:1
Attachment: double-free-crash.patch.gz With attached patch applied to 5.6.beta2 (and probably also other versions close to it),
will crash relatively reliably on several machines (including |
comment:3
I'd like to see this ticket as a blocker, anyone against this idea? |
comment:4
Replying to @jpflori:
Since this is the ultimate "can generate segfaults anywhere", it's a prime candidate for blocker status. However, we're fully at the mercy of cython developers as to when this gets fixed. Also, if we release with this bug unfixed, we might as well leave #715 in too, since this one has a much wider possible impact :-). |
comment:5
Ok, Ive put it as blocker. For those who want to play while waiting for upstream, I've posted a p0 Cython spkg which does "something" with PyObject_GC_[Un]Track. |
This comment has been minimized.
This comment has been minimized.
comment:6
Apologies. I saw I linked to the wrong file. Include/object.h also has some interesting information, but it looks like it is a bit out-of-date on some bits. In particular, if you look at the actual use of the TRASHCAN macros:
with the explanation a little lower:
It's probably better to leave out the trashcan for now. It seems like rather tricky code and I'm not sure it's part of the official Python C-API (it might be something internal, just like they use some macros themselves they find unsafe for use in extension modules) |
comment:7
I saw and read about this additional steps in addition to the macro, but I was not sure it was also needed here. Anyway I agree it is a better take to leave that out for now, and anyway, upstream will decide what is the best. So I've updated the spkg to not include the trashcan parts. |
Attachment: cython-0.17.3.p0.diff.gz |
comment:8
Replying to @jpflori:
In fact, I think the precautions taken are not enough for general cython classes. With the little
dance they are making sure there is room for one extra trashcan nesting provided that that call doesn't use the same trick. However, a cython class could have a whole inheritance hierarchy going here (that would all use this trick too!), so I'm pretty sure that the exact scenario they describe could still happen. You'd need to know the depth of the inheritance line (for deallocs, multiple inheritance can't happen, right?) and ensure there's enough room for all those. |
comment:9
Coming up with a nice clean test was...interesting. |
comment:10
Just one potentially naive question: Anyway, it just made me think of what will happen if your extension class is GC tracked, but the base class is not? In this case you're lost because if you track your object before calling the base dealloc, then you will not untrack it there. Is that even possible? And anyway if a class is not gc tracked, or is not a container I guess it cannot be weakrefed... |
comment:11
The final call to the (generic) tp_free calls PyObject_GC_Untrack iff the GC flags are set in the type flags. If the base class is not GC tracked then its dealloc method won't touch these bits. |
comment:12
Thanks for pointing that out. |
comment:13
Spkg up at http://sage.math.washington.edu/home/robertwb/patches/cython-0.17.4pre.spkg , if this looks good I'll cut a release and make an actual spkg based on that. |
comment:14
trashcan issues now tracked on #13901 (yes, you can easily crash cython because it's not using the trashcan) |
comment:15
Replying to @robertwb:
This does look good to me. JP has already confirmed that this fixed the issue (as does your elegant test in the cython suite). Your pre.spkg has some different files in it, but I guess that's why you don't consider it an actual spkg. |
comment:16
Replying to @robertwb:
Sorry to insist a little bit, but while looking at the trashcan stuff, I thought again about it and in fact what I was worried about was rather the converse. If the base type does not have the GC_FLAG, and youve retracked it in the subclass, then final tp_free will indeed not touch anything related to gc, but won't that leave an invalid object in the gc tracked object list? |
comment:17
Replying to @jpflori:
The base tp_free looks at the actual type's flags (which will have GC_FLAG set) to determine what gc (un)tracking to do. Any intermediate superclasses will either leave this alone or do the untrack/track dance. |
comment:19
Replying to @robertwb:
... so suppose we have a superclass that doesn't do the untrack/track dance (so this must be a non-container superclass of a container class. We're entering rather hypothetical territory here). We'll be entering its dealloc with tracking SET. I guess the actual memory free happens by our class, so I guess the list of GC-tracked objects will be properly amended eventually. Can we prove that no GC or trashcan-shelving of this intermediate object will happen in between? I guess it's unlikely because non-container types should be easy to deallocate ... unless some callous person writes an extension class that does hold references to other objects but is convinced that those will never lead to cycles and hence makes it non-GC-tracked. Some weakref callbacks and a GC could then find a partially torn down object tracked by the GC. Multithreaded stuff could make this even worse, but I guess we're protected by the GIL here. It should probably be mandated that any container type has to participate in GC. For a non-container type it's hard to see how a dealloc could ever be interrupted or interleaved by a GC. So this note is probably more a request for clarification (addition to documentation somewhere?) why this is not a problem than a diagnosis of a bug. |
comment:20
I think it helps to look at the generated code. Suppose one has
In this case one has, roughly,
bodyX consists of decrefing Python members, traversing weakrefs, and (if present)
The track/untrack markers are added exactly when Python/weakref members are present, which is where a garbage collection might happen. (When executing What could be an issue is a non-gc-tracked container class that is subclassed by a gc-tracked class, but we don't have those in Cython. |
comment:21
That is exactly what I was thinking about, and IIRC what is looked for in the CPython subtype_dealloc when looking for the base type. If you say it cannot happy in Cython, I'm very happy with that! |
comment:22
Are you sure this is the case, e.g., for category_object and sage_object? |
Robert's cython test case (I spent quite some time twice to find it, so I'm storing it here for future reference) |
comment:23
Attachment: double_dealloc_T796.pyx.gz And Robert just released Cython 0.17.4, see https://groups.google.com/d/topic/cython-users/s3ycj83Yctw/discussion |
This comment has been minimized.
This comment has been minimized.
comment:24
Spkg up at http://sage.math.washington.edu/home/robertwb/patches/cython-0.17.4.spkg |
comment:25
Typo in the version number:
should be
|
Author: Robert Bradshaw |
This comment has been minimized.
This comment has been minimized.
comment:26
Fixed |
Reviewer: Jeroen Demeyer |
Changed upstream from Reported upstream. Developers acknowledge bug. to Completely fixed; Fix reported upstream |
comment:28
D'oh. Thanks. |
Merged: sage-5.6.beta3 |
comment:30
I have not seen anymore segmentation faults regarding #715, so this might have fixed it. |
comment:31
Yay! Congratulations to everybody and a special thanks to Simon for pushing the weak caches! |
In a long sage-devel thread we eventually found in this message that a GC during a weakref callback on a Cython class can lead to double deallocation of that class. In Python's Objects/typeobject.c, line 1024 and onwards, there are some comments that indicate that earlier version of Python were bitten by this problem too. The solution is to insert the appropriate
PyObject_GC_Untrack
andPyObject_GC_Track
in cython's deallocation code. This is best fixed in cython itself.Install only the new spkg at http://boxen.math.washington.edu/home/jdemeyer/spkg/cython-0.17.4.spkg
Upstream: Completely fixed; Fix reported upstream
CC: @simon-king-jena @jpflori
Component: memleak
Author: Robert Bradshaw
Reviewer: Jeroen Demeyer
Merged: sage-5.6.beta3
Issue created by migration from https://trac.sagemath.org/ticket/13896
The text was updated successfully, but these errors were encountered: