Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_any_gap_element() should not be called inside sig_on() #27140

Closed
vbraun opened this issue Jan 27, 2019 · 26 comments
Closed

make_any_gap_element() should not be called inside sig_on() #27140

vbraun opened this issue Jan 27, 2019 · 26 comments

Comments

@vbraun
Copy link
Member

vbraun commented Jan 27, 2019

As explained in #27060, we shouldn't be allocating/deallocating Python objects inside sig_on():

**********************************************************************
File "src/sage/groups/matrix_gps/coxeter_group.py", line 421, in sage.groups.matrix_gps.coxeter_group.CoxeterMatrixGroup.is_finite
Failed example:
    [l for l in range(2, 9) if
     CoxeterGroup([[1,3,2,2,2], [3,1,3,3,2], [2,3,1,2,2],
                   [2,3,2,1,l], [2,2,2,l,1]]).is_finite()]
Exception raised:
    Traceback (most recent call last):
      File "/mnt/disk/home/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/mnt/disk/home/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1095, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.groups.matrix_gps.coxeter_group.CoxeterMatrixGroup.is_finite[2]>", line 3, in <module>
        [Integer(2),Integer(3),Integer(2),Integer(1),l], [Integer(2),Integer(2),Integer(2),l,Integer(1)]]).is_finite()]
      File "sage/misc/lazy_import.pyx", line 354, in sage.misc.lazy_import.LazyImport.__call__ (build/cythonized/sage/misc/lazy_import.c:3684)
        return self.get_object()(*args, **kwds)
      File "/mnt/disk/home/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/combinat/root_system/coxeter_group.py", line 132, in CoxeterGroup
        return CoxeterMatrixGroup(data, base_ring, index_set)
      File "sage/misc/classcall_metaclass.pyx", line 330, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1701)
        return cls.classcall(cls, *args, **kwds)
      File "/mnt/disk/home/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/groups/matrix_gps/coxeter_group.py", line 233, in __classcall_private__
        data, base_ring, data.index_set())
      File "sage/misc/cachefunc.pyx", line 1005, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6068)
        w = self.f(*args, **kwds)
      File "/mnt/disk/home/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/structure/unique_representation.py", line 1027, in __classcall__
        instance = typecall(cls, *args, **options)
      File "sage/misc/classcall_metaclass.pyx", line 497, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2151)
        return (<PyTypeObject*>type).tp_call(cls, args, kwds)
      File "/mnt/disk/home/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/groups/matrix_gps/coxeter_group.py", line 303, in __init__
        for i in range(n)]
      File "/mnt/disk/home/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/groups/matrix_gps/coxeter_group.py", line 284, in val
        return E(2 * x) + ~E(2 * x)
      File "/mnt/disk/home/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/rings/universal_cyclotomic_field.py", line 1258, in gen
        return self.element_class(self, libgap.E(n)**k)
      File "sage/libs/gap/element.pyx", line 1079, in sage.libs.gap.element.GapElement.__pow__ (build/cythonized/sage/libs/gap/element.c:11484)
        sig_on()
    SystemError: calling remove_from_pari_stack() inside sig_on()
**********************************************************************
1 item had failures:
   1 of   6 in sage.groups.matrix_gps.coxeter_group.CoxeterMatrixGroup.is_finite
    [136 tests, 1 failure, 9.44 s]
**********************************************************************

Depends on #26992
Depends on #27155

CC: @jdemeyer @embray @tscrim

Component: group theory

Keywords: random_fail

Author: Jeroen Demeyer

Branch/Commit: 1570071

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/27140

@vbraun vbraun added this to the sage-8.7 milestone Jan 27, 2019
@vbraun
Copy link
Member Author

vbraun commented Jan 27, 2019

comment:1

This is with the cypari2-2.0.3 update, for the record

@jdemeyer
Copy link

comment:2

The problem is that this

make_any_gap_element(self.parent(), result)

should not be run under sig_on().

CC @embray because he might know why this is done this way and whether this make_any_gap_element call needs to be protected by either sig_on() or GAP_Enter().

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link

comment:3

Note that these GC-related failures are not "random" strictly speaking. They are deterministic, but require a very precise set of conditions to reproduce.

@jdemeyer jdemeyer changed the title Random SystemError: calling remove_from_pari_stack() inside sig_on() make_any_gap_element() should not be called inside sig_on() Jan 27, 2019
@jdemeyer
Copy link

comment:5

For reference, this is the code structure for the PARI interface:

  • convert input to PARI
  • sig_on()
  • call PARI function
  • sig_off()
  • wrap output in Python object

It should be this way for GAP too.

@jdemeyer
Copy link

Dependencies: #26992, #27155

@embray
Copy link
Contributor

embray commented Jan 28, 2019

comment:7

Replying to @jdemeyer:

The problem is that this

make_any_gap_element(self.parent(), result)

should not be run under sig_on().

CC @embray because he might know why this is done this way and whether this make_any_gap_element call needs to be protected by either sig_on() or GAP_Enter().

I need to take a closer look at the code but I'm sure you're right that it needs to be more precisely bracketed in its use of sig_on()/sig_off(). Some of that will be refactored soon too, especially when I bring in some of the upstream changes that expanded the official "libgap API" (I mentioned this also in #27094).

@jdemeyer
Copy link

comment:8

After reading the GAP code on gap-system/gap#3096, I'm pretty sure that there are race conditions too with the current code structure. For extra safety, we could wrap the GAP_Enter/GAP_Leave macros inside sig_block/sig_unblock.

@embray
Copy link
Contributor

embray commented Jan 28, 2019

comment:9

I'm sure there are, but I wouldn't want to go nuts unless there's something provable there. You don't want to make GAP functions uninterruptible.

@jdemeyer
Copy link

comment:10

Replying to @embray:

You don't want to make GAP functions uninterruptible.

No, I only want to protect GAP_Enter and GAP_Leave. So something like

sig_on()
sig_block()
GAP_Enter()
sig_unblock()
<actual GAP call>
sig_block()
GAP_Enter()
sig_unblock()
sig_off()

Of course, we could use macros to make this less painfull to write.

@embray
Copy link
Contributor

embray commented Jan 28, 2019

comment:11

Yes, I just came back to comment to say I realized that's what you meant. That makes sense to me.

I was thinking about what you wrote on the PR about sig_atomic_t and you're right: I thought it at least guaranteed that simple arithmetic operations involving a read and a write would be atomic, but it makes no such guarantee. It doesn't even work out that way in practice. Oh well.

I think I thought differently because cysignals, which obviously influenced a lot of this, uses it for sig_on_count so it seemed to make sense. But I think I understand now that in the case of sig_on_count it was specifically for setting sig_on_count = 0 in the case where an interrupt was received before sig_on()?

@jdemeyer
Copy link

jdemeyer commented Feb 4, 2019

@jdemeyer
Copy link

jdemeyer commented Feb 4, 2019

comment:13

This fixes the __pow__ call. For now, I decided to fix only that instead of all such GAP calls. This in order to get #26992 merged and to get some feedback on the general mechanism.


New commits:

3a99bafrearrange this code so that PyErr_Fetch is called before extract_libgap_errout()
e5454edensure the GIL is held when entering these callback functions
cbd1228trivial python3 test fix
369fadeMerge branch 'u/embray/ticket-26992' in 8.7.b1
494ab6dSimplify libGAP error handling
1570071Fix GapElement.__pow__

@jdemeyer
Copy link

jdemeyer commented Feb 4, 2019

Commit: 1570071

@jdemeyer
Copy link

jdemeyer commented Feb 4, 2019

comment:14

As you can see, I also took the opportunity to use the coercion model _pow_ and _pow_int instead of __pow__.

@jdemeyer
Copy link

jdemeyer commented Feb 4, 2019

Author: Jeroen Demeyer

@tscrim
Copy link
Collaborator

tscrim commented Feb 5, 2019

comment:16

From a somewhat quick look-over, everything seems fine to me. However, I do not claim to fully understand what goes on in util.pyx, so I would like a second opinion (e.g. Erik) before setting to positive.

@jdemeyer
Copy link

jdemeyer commented Feb 5, 2019

comment:17

Replying to @tscrim:

From a somewhat quick look-over, everything seems fine to me. However, I do not claim to fully understand what goes on in util.pyx, so I would like a second opinion (e.g. Erik) before setting to positive.

The commit under review is the last one, which doesn't change util.pyx

@tscrim
Copy link
Collaborator

tscrim commented Feb 5, 2019

comment:18

Replying to @jdemeyer:

Replying to @tscrim:

From a somewhat quick look-over, everything seems fine to me. However, I do not claim to fully understand what goes on in util.pyx, so I would like a second opinion (e.g. Erik) before setting to positive.

The commit under review is the last one, which doesn't change util.pyx

Ah, sorry, I missed the changes to it on #27155 (read too quickly). Actually, I just have one little question: is this meant to be a comment, definition, or just a note to yourself?

    """
    #define sig_GAP_Enter()  {int t = GAP_Enter(); if (!t) sig_error();}
    """

@jdemeyer
Copy link

jdemeyer commented Feb 5, 2019

comment:19

Replying to @tscrim:

is this meant to be a comment, definition, or just a note to yourself?

See https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#including-verbatim-c-code and cython/cython#1915

@tscrim
Copy link
Collaborator

tscrim commented Feb 5, 2019

comment:20

Replying to @jdemeyer:

Replying to @tscrim:

is this meant to be a comment, definition, or just a note to yourself?

See https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#including-verbatim-c-code and cython/cython#1915

That was what I was thinking, but I didn't know Cython could do that. Thank you.

LGTM.

@tscrim
Copy link
Collaborator

tscrim commented Feb 5, 2019

Reviewer: Travis Scrimshaw

@embray
Copy link
Contributor

embray commented Feb 5, 2019

comment:21

I seem to recall looking at that code previously and wondering why it didn't use the coercion system for __pow__. I trust you to know the right way to handle this.

@embray
Copy link
Contributor

embray commented Feb 5, 2019

comment:22

Replying to @tscrim:

That was what I was thinking, but I didn't know Cython could do that. Thank you.

That's because it's relatively new! And long overdue.

@jdemeyer
Copy link

jdemeyer commented Feb 5, 2019

comment:23

To the release manager: tickets #26992, #27155 and #27140 should be merged together to avoid test failures.

@vbraun
Copy link
Member Author

vbraun commented Feb 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants