Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the large sort allocation into separate allocations #529

Merged
merged 4 commits into from
Nov 14, 2024

Conversation

jodavies
Copy link
Collaborator

Here I have split the large single allocation in AllocSort into individual allocations. Only Large+Small(+extension) must be contiguous. The aim is that overrunning any of these buffers will cause a crash and not silent corruption of the program state.

In principle this has a performance penalty, though whether it is measurable or not has to be determined by benchmarking.

The second commit adds warning messages, if any of the buffers are adjusted from their requested values. I have changed some of the default (64bit) allocations, such that they are consistent with FORM's own requirements and no warnings are printed. This is not completely resolved yet: the sublargesize is still shifted, and more things are shifted in 32bit mode.

@coveralls
Copy link

coveralls commented May 27, 2024

Coverage Status

coverage: 50.186% (+0.2%) from 49.986%
when pulling fd2410b on jodavies:allocations
into 61ecfdd on vermaseren:master.

@jodavies
Copy link
Collaborator Author

jodavies commented May 27, 2024

The second point is a bit tricky to resolve, since there is some circular interaction between largesize and IOsize.

I am not sure of the current code:

form/sources/setfile.c

Lines 884 to 903 in 83e3d41

IObuffersize = (IObuffersize+sizeof(WORD)-1)/sizeof(WORD);
/*
The next statement fixes a bug. In the rare case that we have a
problem here, we expand the size of the large buffer or the
small extension
*/
if ( (ULONG)( LargeSize+SmallEsize ) < MaxFpatches*((IObuffersize
+COMPINC)*sizeof(WORD)+2*AM.MaxTer) ) {
if ( LargeSize == 0 )
SmallEsize = MaxFpatches*((IObuffersize+COMPINC)*sizeof(WORD)+2*AM.MaxTer);
else
LargeSize = MaxFpatches*((IObuffersize+COMPINC)*sizeof(WORD)+2*AM.MaxTer)
- SmallEsize;
}
IOtry = ((LargeSize+SmallEsize)/MaxFpatches-2*AM.MaxTer)/sizeof(WORD)-COMPINC;
if ( (LONG)(IObuffersize*sizeof(WORD)) < IOtry )
IObuffersize = (IOtry+sizeof(WORD)-1)/sizeof(WORD);

IObuffersize is in units of WORD (L884). Then comes some code which checks (LargeSize+SmallEsize) against MaxFpatches*IObuffersize (with a constant offset). If it is too small, the large (or small) buffer are increased in size accordingly. This code runs, basically, if the user has set sortiosize significantly larger.

But then we make the same check in reverse, and adjust IObuffersize accordingly. This check seems broken though; IOtry in units of WORD is compared against IObuffersize in bytes. If I fix this comparison, the default setup parameters need more changes to satisfy the constraints. This "units mismatch" came in 8be528b.

@jodavies
Copy link
Collaborator Author

jodavies commented Jun 4, 2024

I have cleaned this up a bit, but now I have a few questions:

In RecalcSetups, the buffer size constraints are not consistent with those in AllocSort. AllocSort only allocates the internally-determined sizes, so it seems like this doesn't matter in the end. But I am not sure of the purpose of RecalcSetupts?

int RecalcSetups(VOID)

In AllocSetups the ziobuffer size is set to IOsize, but this value is modified inside AllocSort. This means that the ziobuffer is not the same size as POsize (it is smaller, but it is still at least MaxTermSize). Could this ever cause a problem?

form/sources/setfile.c

Lines 567 to 592 in 83e3d41

IOsize = sp->value;
if ( IOsize < AM.MaxTer ) { IOsize = AM.MaxTer; sp->value = IOsize; }
#ifndef WITHPTHREADS
#ifdef WITHZLIB
for ( j = 0; j < 2; j++ ) { AR.Fscr[j].ziosize = IOsize; }
#endif
#endif
AM.S0 = 0;
AM.S0 = AllocSort(LargeSize,SmallSize,SmallEsize,TermsInSmall
,MaxPatches,MaxFpatches,IOsize);
#ifdef WITHZLIB
AM.S0->file.ziosize = IOsize;
#ifndef WITHPTHREADS
AR.FoStage4[0].ziosize = IOsize;
AR.FoStage4[1].ziosize = IOsize;
AT.S0 = AM.S0;
#endif
#else
#ifndef WITHPTHREADS
AT.S0 = AM.S0;
#endif
#endif
#ifndef WITHPTHREADS
AR.FoStage4[0].POsize = ((IOsize+sizeof(WORD)-1)/sizeof(WORD))*sizeof(WORD);
AR.FoStage4[1].POsize = ((IOsize+sizeof(WORD)-1)/sizeof(WORD))*sizeof(WORD);
#endif

@jodavies
Copy link
Collaborator Author

jodavies commented Jun 4, 2024

Another thing I noticed: AllocSort allocates the SplitScratch array of TermsInSmall/2 pointers for use in SplitMerge. But actually this is never used; SplitMerge uses the SplitScratch arrays in the N struct. The array in the sort struct can be removed.

@jodavies jodavies marked this pull request as draft June 7, 2024 10:49
Buffer overruns in these allocations will be visible to valgrind.

Fix units mismatch re: IObuffersize, IOtry.

Print warnings in debug mode, if any buffer sizes are altered by AllocSort.

Update "default" buffer sizes, so that we have no warnings.
@jodavies jodavies marked this pull request as ready for review November 6, 2024 11:40
@jodavies
Copy link
Collaborator Author

jodavies commented Nov 6, 2024

Here the "changes" to the default buffer sizes imply a change to the manual also.

@jodavies jodavies added this to the v4.3.2 milestone Nov 6, 2024
@jodavies jodavies requested a review from tueda November 11, 2024 15:20
@tueda
Copy link
Collaborator

tueda commented Nov 12, 2024

Do you see any performance regression because of the memory nonlocality?

If this is unclear, maybe we could turn on SPLITALLOC only with DEBUGGING.

@jodavies
Copy link
Collaborator Author

I've not measured any performance difference, no. This is required in non-DEBUGGING mode by #537 .

@tueda
Copy link
Collaborator

tueda commented Nov 13, 2024

For reference, what happens with

SmallEsize = (SmallEsize+15) & (-16L);

is not guaranteed prior to C23, which standardizes signed integers are two's complement. That said, we can keep this code, as it's now 2024. Anyway, I don't think anyone is interested in running FORM on exotic systems like Unisys ClearPath Dorado Servers​.

@tueda
Copy link
Collaborator

tueda commented Nov 13, 2024

Many lines are duplicated for both the SPLITALLOC case and the non-SPLITALLOC case. Do you want to keep these duplicated lines separate? They could be reorganized as:

#ifndef SPLITALLOC
    LONG allocation = ...
    char *allocated = Malloc1(allocation,"sort buffers");
    #define Malloc1(size, msg) (allocated += (size), (void *)(allocated - (size)))
#endif
    sort = Malloc1(sortsize, "sorting struct");
    ...
#ifndef SPLITALLOC
    #undef Malloc1
#endif

@jodavies
Copy link
Collaborator Author

I left the old code there for future debugging purposes, but it could also just be removed entirely if it is neater. One can always use git to bisect a potential bug to this change (and assuming I did not make a mistake in the separated allocations, crashes due to this in principle imply bugs elsewhere in the code).

@tueda
Copy link
Collaborator

tueda commented Nov 14, 2024

One can always use git to bisect a potential bug to this change

Yes, you are right. We can simply remove the old code, which lowers the code maintenance overhead.

Copy link
Collaborator

@tueda tueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks good to me.

@jodavies jodavies merged commit 15c9d7f into vermaseren:master Nov 14, 2024
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants