-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escaping from stage4 sortings for $-variables? #215
Comments
Do other parameters affect on the performance for $-variables? For example, |
This is tricky stuff when you use tform.
With largesize tform reserves one largesize for the master and one for the combined workers.
That means that for 32 worker, each worker gets largesize/32.
This is not the case with sublargesize. Each worker gets that amount.
It gets even worse: if you have functions inside functions, you need that stuff twice and if, at the
same time you have a dollar there in the inner function, three times. etc.
Dollars, if I am not mistaken, need one buffer per worker and/or master. Each function level
needs one.
The functions do not need such extreme buffers presumably, because in the end they are
limited by MaxTermSize.
This seems to indicate that one would have to decouple the use of the same buffers for dollars
and function arguments. That is a bit messy and means also that potentially you need more space.
The dollars are really not meant to be on disk. It is already nice that it gets through stage 3 at all.
Same for function arguments.
For now the safest seems to be to disallow stage4 for functions and dollars.
Do not forget that when you need stage4 it is only very rare that the output is not very big
(your example was rather artificial I must say) and dollars need to be in allocated memory.
What the effect of your assignments is you just have to try by running and keeping an eye on
the use of the memory in top. Above are the rules I know, but this all depends probably on more
parameters (like maxtermsize).
Jos
… On 7 jul. 2017, at 20:25, Takahiro Ueda ***@***.***> wrote:
Because of really dangerous memory bugs of #211 <#211> for $-variables, I would like to avoid stage4 sortings as much as possible. But I am not sure which parameters are really relevant for that. And it is preferable to save the memory usage. Suppose that I don't have any big local expressions, but have $-variables which become potentially very long polynomials. Does the following setting make sense?
* Adopt the default values.
#:filepatches 256
#:largepatches 256
#:largesize 50000000
#:smallextension 20000000
#:smallsize 10000000
#:sortiosize 100000
#:termsinsmall 100000
* Try to avoid stage4 sortings for $-variables.
* Setting the same values as non-sub parameters may save us from
* the FORM's confusion of buffer sizes(?)
#:subfilepatches 256
#:sublargepatches 256
#:sublargesize 50000000
#:subsmallextension 20000000
#:subsmallsize 10000000
#:subsortiosize 100000
#:subtermsinsmall 100000
* Slightly smaller maxtermsize for each monomial in polynomials.
#:maxtermsize 10000
which consumes 407MB of virtual memory for an empty program. Or better recommendation?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#215>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFLxEnOOVewuO0s7jA1KESzJ8zcLfgi9ks5sLnglgaJpZM4ORR4n>.
|
Can FORM at least print a warning if arguments or dollars hit stage4? (Or terminate completely?) |
Thanks. In general, it would be very tricky. In my specific application, everything is (for now) polynomials and there are no functions. tform doesn't make any sense because all operations have to be done via preprocessor instructions For such a specific situation, is it still risky to set non-sub parameters < sub parameters (to reduce the memory usage), like
? |
Sorry to repeat the discussion above but I am wondering: what is the long term plan with Stage4 and $-variables? Currently I have a similar use case to that described in this question, very many $-variables containing polynomials that can occasionally get quite large (no functions or nesting etc). Clearly I can instead use local expressions but this seems to be a suboptimal use of FORM (it seems most time is spent defining or reading the local expressions). It seems there are a few options:
All of these would be very helpful, it would allow me to speed up our code without worrying that it sometimes gives wrong answers. Thanks! |
Hi Stephen,
Maybe there is another solution, depending on what it is exactly that you do.
Let me assume that you have 10000 expressions and most of them are ‘inactive’ in a given module.
Start with putting all in the Hide. This is the basic location for all.
Suppose that you want to modify expressions F1,…,F10, but not the others.
You start the module with unhide F1,…,F10; and then you do your thing.
After the .sort you put them back in the hide.
Why can this go wrong?
When it takes the F1,…,F10 out from the hide, it leaves holes in the hide file and when it
puts the expressions back it places them at the end of the file. This means that the file will get
longer and longer. You can solve this by occasionally unhiding all of them and hiding them again.
This means that only once in a while you get the overhead of having to copy everybody.
But this works of course only if the growing of the file is not super-rapid.
On the other hand, if you can keep all your expressions in dollar variables, their total size should
be far less than your disk space (even less than your memory size). Hence it sounds to me that
this may work.
The only thing I do not know by heart is whether
Unhide;
Skip;
.sort
Hide;
does what one would naively expect it to do. I think it does, but you might try it first.
If this does not work for you, let me know….
Cheers
Jos
… On 13 Feb 2018, at 14:00, Stephen Jones ***@***.***> wrote:
Sorry to repeat the discussion above but I am wondering what is the long term plan with Stage4 and $-variables.
Currently I have a similar use case to that described in this question, very many $-variables containing polynomials that can occasionally get quite large (no functions or nesting etc). Clearly I can instead use local expressions but this seems to be a suboptimal use of FORM (it seems most time is spent defining or reading the local expressions).
It seems there are a few options:
Fix Stage4 for $-variables
Allow Stage4 to be disabled for $-variables
Warn or better terminate when Stage4 is entered for $-variables
All of these would be very helpful, it would allow me to speed up our code without worrying that it sometimes gives wrong answers. Thanks!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#215 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFLxEsP3EDAKXQ1RkbVpV72q5rq3BY3iks5tUYdzgaJpZM4ORR4n>.
|
Dear @vermaseren, Thank you for your response. I was not aware of the "holes in the hide file" issue, what you point out may help us. However, the key reason that I wanted to use I noticed that it seems to take quite some time to define local expressions but
Using METHOD1 I get:
Using METHOD2:
Perhaps this is very stupid? |
Hi,
Interesting example!
When I change the code a bit into
* snip snip
B+ A;
.sort
Hide;
B+ A;
.sort
#ifdef `METHOD1'
#do i = 1,`t'
L a`i' = expr[A(`i')];
.sort
Hide a`i';
#enddo
#endif
* snip snip
it is about as slow as your original. But if I replace Hide a`i’; by Drop a`i’; it gets about the
same speed as the dollar case (METHOD2).
Seems to indicate that having many expressions at the same time slows it down a lot,
which may have to do with inefficient lookup tables.
I’ll be back……
Jos
… On 13 Feb 2018, at 14:48, Stephen Jones ***@***.***> wrote:
Dear @vermaseren <https://github.com/vermaseren>,
Thank you for your response. I was not aware of the "holes in the hide file" issue, what you point out may help us.
However, the key reason that I wanted to use $-variables was to avoid the long time it takes to either read in or define local expressions. I think I need my polynomials in separate expressions or $-variables because I would like to pass them to the gcd_ function and I do not want to be subject to the MaxTerm size constraint of function arguments, my polynomials can go beyond such limitations even when adjusting the form.set file accordingly.
I noticed that it seems to take quite some time to define local expressions but $-variables are very fast. Here is a test file that shows roughly what I am doing (assume that the coefficients of A(1),...,A(`t') are polynomials that are usually quite small but occasionally contain many thousands of terms, after I have these local expressions or $-variables I would then pass them to gcd_ and use div_ on them before writing each of them to a file):
#-
Off Statistics;
CF A;
#define t "30000"
#define METHOD1 "1"
L expr =
#do i = 1,`t'
+ `i'*A(`i')
#enddo
;
B+ A;
.sort
#ifdef `METHOD1'
skip;
#do i = 1,`t'
L a`i' = expr[A(`i')];
#enddo
#endif
#ifdef `METHOD2'
skip;
#do i = 1,`t'
#$a`i' = expr[A(`i')];
#enddo
#endif
.end
Using METHOD1 I get:
TFORM 4.2.0 (Jan 24 2018, v4.2.0-35-g8b945c1) 64-bits 0 workers Run: Tue Feb 13 14:43:12 2018
#-
16.98 sec + 0.00 sec: 16.98 sec out of 17.12 sec
Using METHOD2:
TFORM 4.2.0 (Jan 24 2018, v4.2.0-35-g8b945c1) 64-bits 0 workers Run: Tue Feb 13 14:44:46 2018
#-
0.21 sec + 0.00 sec: 0.21 sec out of 0.21 sec
Perhaps this is very stupid?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#215 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFLxErneNjePaPbvdzegi5nJJNwfY6t4ks5tUZKsgaJpZM4ORR4n>.
|
OK, the story is the following:
Your test shows (in various guises) three points of inefficiency.
The first is the test as you put it.
There the problem was a relict from version 1 of Form in which it tried for
each expression something nearly empty. That is something that runs away quadratically.
I repaired that. Hence your two methods are now almost equal speed.
Next: The trick with the .sort Hide; L a`i’ = ….
creates 30000 modules in each of which it has to run through the list of variables to
see that for almost all it has nothing to do. again something of O(30000^2).
In the case of the drop statement there are only two existing expressions all the time
and hence that goes as O(2*30000).
Third: If you use tform in one thread your example goes fine now, but with for instance
8 workers it gets very slow again. This is due to the many small expressions and the
resulting enormous overhead f the thread library. It gets a bit better if you use
the InParallel statement, but still, it is not good.
The dollars all reside in memory and hence do not need ‘disk management’ all the time.
The drawback is however that all should fit in memory.
By the way: it is easy to sabotage stage4 for dollars and function arguments, but why is
this important?
Cheers
Jos
… On 13 Feb 2018, at 15:14, Jos Vermaseren ***@***.***> wrote:
Hi,
Interesting example!
When I change the code a bit into
* snip snip
B+ A;
.sort
Hide;
B+ A;
.sort
#ifdef `METHOD1'
#do i = 1,`t'
L a`i' = expr[A(`i')];
.sort
Hide a`i';
#enddo
#endif
* snip snip
it is about as slow as your original. But if I replace Hide a`i’; by Drop a`i’; it gets about the
same speed as the dollar case (METHOD2).
Seems to indicate that having many expressions at the same time slows it down a lot,
which may have to do with inefficient lookup tables.
I’ll be back……
Jos
> On 13 Feb 2018, at 14:48, Stephen Jones ***@***.*** ***@***.***>> wrote:
>
> Dear @vermaseren <https://github.com/vermaseren>,
>
> Thank you for your response. I was not aware of the "holes in the hide file" issue, what you point out may help us.
>
> However, the key reason that I wanted to use $-variables was to avoid the long time it takes to either read in or define local expressions. I think I need my polynomials in separate expressions or $-variables because I would like to pass them to the gcd_ function and I do not want to be subject to the MaxTerm size constraint of function arguments, my polynomials can go beyond such limitations even when adjusting the form.set file accordingly.
>
> I noticed that it seems to take quite some time to define local expressions but $-variables are very fast. Here is a test file that shows roughly what I am doing (assume that the coefficients of A(1),...,A(`t') are polynomials that are usually quite small but occasionally contain many thousands of terms, after I have these local expressions or $-variables I would then pass them to gcd_ and use div_ on them before writing each of them to a file):
>
> #-
> Off Statistics;
> CF A;
>
> #define t "30000"
> #define METHOD1 "1"
>
> L expr =
> #do i = 1,`t'
> + `i'*A(`i')
> #enddo
> ;
> B+ A;
> .sort
>
> #ifdef `METHOD1'
> skip;
> #do i = 1,`t'
> L a`i' = expr[A(`i')];
> #enddo
> #endif
>
> #ifdef `METHOD2'
> skip;
> #do i = 1,`t'
> #$a`i' = expr[A(`i')];
> #enddo
> #endif
>
> .end
> Using METHOD1 I get:
>
> TFORM 4.2.0 (Jan 24 2018, v4.2.0-35-g8b945c1) 64-bits 0 workers Run: Tue Feb 13 14:43:12 2018
> #-
> 16.98 sec + 0.00 sec: 16.98 sec out of 17.12 sec
> Using METHOD2:
>
> TFORM 4.2.0 (Jan 24 2018, v4.2.0-35-g8b945c1) 64-bits 0 workers Run: Tue Feb 13 14:44:46 2018
> #-
> 0.21 sec + 0.00 sec: 0.21 sec out of 0.21 sec
> Perhaps this is very stupid?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub <#215 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFLxErneNjePaPbvdzegi5nJJNwfY6t4ks5tUZKsgaJpZM4ORR4n>.
>
|
Dear @vermaseren, Thank you very much for the changes (a261560, 536e778, 7599180)! They hugely reduce the time for our code to run and have moved the bottleneck elsewhere. I suppose that now I will not rewrite our code to use $-variables, instead I will leave it using expressions. In your response you mentioned that the Hide trick "creates 30000 modules in each of which it has to run through the list of variables to see that for almost all it has nothing to do, again something of O(30000^2)". Did you mean that in each module it has to run through the list of expressions (not variables)? Regarding stage4 for dollars, as I will continue to use expressions I am currently not too worried about the memory errors discussed in #211 (comment) . Originally I was concerned that if I rewrite my code to use |
Hi Stephen,
The problem is historical.
Originally Form did not have the hide system. Hence in each module it had to consider
what to do with each expression (act on it, skip it or drop it). As acting and skipping each
take much more time than looking what to do, there was no problem.
Then the hide system came and now, if you have 30000 expressions in the hide, it
looks 30000 times and sees that it has to do nothing. In the example I gave, there were
30000 modules and hence it sees 30000^2 times that it can go on. Even though that
loop does not take much time, each nsec translates into almost a whole sec in real time.
At the moment I am thinking what would be a not too complicated way to get around this.
Making a double administration could introduce errors easily. The ideal would of course be
that the Hide example I gave is about as fast as the drop case.
In principle I could make an option that disables stage4 for dollars and function arguments,
even though I did spend quite some time to get that to run on a number of examples, but that
is of course never the same as having it running in real calculations. And in real calculations
it is very rare. (Same as stage5 in a regular expression which I ran into only once in a real
calculation. It worked though.)
The program prints a message though when entering stage4. Hence if you grep your output
file for it, you will know whether you got there.
Cheers
Jos
… On 14 Feb 2018, at 14:05, Stephen Jones ***@***.***> wrote:
Dear @vermaseren <https://github.com/vermaseren>,
Thank you very much for the changes (a261560 <a261560>, 536e778 <536e778>, 7599180 <7599180>)! They hugely reduce the time for our code to run and have moved the bottleneck elsewhere. I suppose that now I will not rewrite our code to use $-variables, instead I will leave it using expressions.
In your response you mentioned that the Hide trick "creates 30000 modules in each of which it has to run through the list of variables to see that for almost all it has nothing to do, again something of O(30000^2)". Did you mean that in each module it has to run through the list of expressions ?
Regarding stage4 for dollars, as I will continue to use expressions I am currently not too worried about the memory errors discussed in #211 <#211> . Originally I was concerned that if I rewrite my code to use $-variables it may start to give wrong results just because $-variables accidentally enter stage4. Personally I would rather FORM crash or print a warning than give wrong results. My original question just wondered if there is a way for us to detect that something may have gone wrong due to $-variables entering stage4.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#215 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFLxEiyKHB3mxoH1vcmogSXDl1GBmSXqks5tUtoqgaJpZM4ORR4n>.
|
Because of really dangerous memory bugs of #211 for $-variables, I would like to avoid stage4 sortings as much as possible. But I am not sure which parameters are really relevant for that. And it is preferable to save the memory usage. Suppose that I don't have any big local expressions, but have $-variables which become potentially very long polynomials. Does the following setting make sense?
which consumes 407MB of virtual memory for an empty program. Or better recommendation?
The text was updated successfully, but these errors were encountered: