Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace_ with nested CFunctions crashes #307

Closed
abehring opened this issue Feb 11, 2019 · 10 comments
Closed

replace_ with nested CFunctions crashes #307

abehring opened this issue Feb 11, 2019 · 10 comments
Labels
bug Something isn't working

Comments

@abehring
Copy link

I've encountered a problem with some code that has been working for over 10 years when I tried upgrading to a newer version of FORM. I've boiled down the issue to the following minimal example:

S N,j1;
CF cfun1,cfun2;
V p;

L test = cfun1(cfun2(-p,-3 - j1 + N)*cfun2(p,j1)) * cfun2(p,j1);

multiply replace_(N,3);

Print;
.end

Running with the latest version from github (087a772) compiled with gcc 7.3.0-27ubuntu1~18.04 on amd64 yields

FORM 4.2.1 (Feb  6 2019, v4.2.1-2-g087a772) 64-bits  Run: Mon Feb 11 08:02:59 2019
    S N,j1;
    CF cfun1,cfun2;
    V p;

    L test = cfun1(cfun2(-p,-3 - j1 + N)*cfun2(p,j1)) * cfun2(p,j1);

    multiply replace_(N,3);

    Print;
    .end
!!!This $ variation has not been implemented yet!!!
!!!This $ variation has not been implemented yet!!!
Called from TestSub
Program terminating at rep3.frm Line 9 -->
  0.00 sec out of 0.00 sec

I've used git bisect to find the first commit where this program fails and I found that something happens at commit 29e608e. Beginning with it I simply get the message Program terminating at rep3.frm Line 9 --> but no further messages.

The exact message ("This $ variation ...") first appears with commit 2e409bc.

Interestingly, a small modification of the above program, i.e., replacing the expression by

L test = cfun1(j1,0, - 3 + N,cfun2(-p, - 3 - j1 + N)*cfun2(p,j1));

Yields the result (note the non-sensical rational prefactor):

18446744078004518933/2803905099190966943747*cfun1(j1,0,0,cfun2(-p, - j1)*cfun2(p,j1));

Other modifications lead to programs that seem to run forever (longer than I cared to wait).

Valgrind's memcheck outputs several messages along the lines of
Conditional jump or move depends on uninitialised value(s), but I haven't compiled form with debug symbols yet so I cannot give exact line numbers.

@tueda tueda added the bug Something isn't working label Feb 11, 2019
@tueda
Copy link
Collaborator

tueda commented Feb 11, 2019

Clearly, something seems to be still completely broken in replace_ with nested functions:

FORM 4.2.1 (Feb  6 2019, v4.2.1-2-g087a772) 64-bits  Run: Mon Feb 11 17:30:31 2019
    CF f;
    S x;
    L F = f(f(x+1));
    P "%t";
    P "%r";
    multiply replace_(x,1);
    P "%t";
    P "%r";
    .end
 + f(f(1 + x))
30  150  26  0  23  0  21  150  17  0  14  0  4  1  1  3  8  1  4  20  1  1  1  3
  1  1  3  1  1  3
 + 2*f(0 + f(2))
20  150  16  0  13  0  2  1  9  150  5  0  -16  2  1  1  3  2  1  3

@tueda
Copy link
Collaborator

tueda commented Feb 12, 2019

@vermaseren Is the following result of Normalize() for f(f(2*x)) * replace_(x,98) correct?

{26, 150, 22, 1, 19, 1, 17, 150, 13, 1, 10, 1, 8, 16, 4, 98, 1, 2, 1, 3, 1, 1, 3, 1, 1, 3}

What is the data format of SNUMBER = 16, i.e., 16, 4, 98, 1??

@benruijl
Copy link
Collaborator

benruijl commented Jul 27, 2020

In commit 29e608e, the modifications in proces.c caused this issue. The goto redosize; is replaced by some of the code that the goto refers to. However the following lines are not part of the new code:

t1[2] = 1;
if ( *t1 == AR.PolyFun && AR.PolyFunType == 2 )
    t1[2] |= CLEANPRF;
AT.NestPoin--;
AN.TeInFun++;
AR.TePos = 0;
AN.ncmod = oldncmod;
return(retvalue);

If I simply add return(retvalue); to the end of the new code, I get the correct result. Is it indeed correct that we should return in this branch?

I think this bug has high priority to be solved. Sadly the control flow is rather non-trivial and I don't have all the informed yet about what this code does and what the fix should be. @vermaseren can you have a look at this too?

It may be related to #316.

@vermaseren
Copy link
Owner

vermaseren commented Jul 27, 2020 via email

@vermaseren
Copy link
Owner

vermaseren commented Jul 28, 2020 via email

vermaseren added a commit that referenced this issue Jul 28, 2020
@tueda
Copy link
Collaborator

tueda commented Aug 11, 2020

A test for this issue would be something like this (not merged yet). [Travis log].

Then, 2 questions:

  1. Do you still see some problems? Or can we close it?
  2. What was done in b2def9d? I mean, I can see the comment, but do you have any example that didn't work and fixed by it? It would be nice if I could put also such an example to the test. (But, it is somehow already covered.)

@vermaseren
Copy link
Owner

vermaseren commented Aug 11, 2020 via email

@jodavies
Copy link
Collaborator

It is interesting that coveralls shows that new code is used... perhaps one can intentionally cause a crash there and see which example fails?

@vermaseren
Copy link
Owner

vermaseren commented Aug 11, 2020 via email

@tueda
Copy link
Collaborator

tueda commented Aug 11, 2020

The new code was covered by #105. Well, maybe reasonable... it contains replace_ with functions...

log

Check /home/tueda/work/form/sources/vorm
FORM 4.2.1 (Aug 11 2020, v4.2.1-26-g68c69db-dirty) 64-bits  Run: Tue Aug 11 20:32:25 2020
Loaded suite ./check
Started
..............................
===============================================================================
Issue105 (fixes.frm:918) FAILED
===============================================================================
FORM 4.2.1 (Aug 11 2020, v4.2.1-26-g68c69db-dirty) 64-bits  Run: Tue Aug 11 20:32:31 2020
    * Crash by replace_(x,0)
    S x;
    V p;
    CF f;

    L F = f(p.p+x);
    L G = f(p.p*x);

    multiply replace_(x,0);
    P;
    .end
Program terminating at 1.frm Line 10 -->
===============================================================================

F
====================================================================================================================================
     926:
     927:
     928:
  => 929: def test_Issue105; do_test {
     930: assert(succeeded?, 'Failed for succeeded?')
     931: assert result("F") =~ expr("f(p.p)")
     932: assert result("G") =~ expr("f(0)")
/tmp/form_check_20200811-9866-dh3tm4/fixes.rb:929:in `test_Issue105'
./check.rb:272:in `do_test'
./check.rb:272:in `times'
./check.rb:277:in `block in do_test'
Failure: test_Issue105(Test_Issue105):
  timeout (= 10 sec) in 1.frm of Issue105 (fixes.frm:918).
  <false> is not true.
====================================================================================================================================
....................................................................................................................................
.................................................
Finished in 33.0536936 seconds.
------------------------------------------------------------------------------------------------------------------------------------
212 tests, 794 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
99.5283% passed
------------------------------------------------------------------------------------------------------------------------------------
6.41 tests/s, 24.02 assertions/s

OK, it seems difficult to construct a failing example. So, anyway I can just commit the test case and then I would close this issue for now.

tueda added a commit that referenced this issue Aug 11, 2020
@tueda tueda closed this as completed Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants