Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New PERL_STRLEN_NEW_MIN & PERL_ARRAY_NEW_MIN_KEY definitions #20529

Merged
merged 2 commits into from
Nov 21, 2022

Conversation

richardleach
Copy link
Contributor

This PR consists of two commits, both concerned with the
hardcoded default minimum length for PV buffers and AV arrays.

  1. Extracts the hardcoded numbers into two new definitions:
        PERL_ARRAY_NEW_MIN_KEY
        PERL_STRLEN_NEW_MIN
  1. Increases the value of PERL_STRLEN_NEW_MIN to be
    closer to the likely actual allocation returned by malloc().

The intention is to reduce the potential for unnecessary
calls to sv_grow, such as when building a string through
concatenation, when the PV buffer is actually big enough
for the operation and perl just does not know it.

The minimum value chosen for 64-bit is very likely too
small, but I am unfamiliar with memory-constrained
systems and wanted to play it safe. (Happy to raise it to
the very common 3*8 bytes if people are comfortable
with that.)

Actually measuring the minimum allocation size at
Configure time, for malloc() implementations that support
it, would be more accurate and I might try that in the next
development cycle.

The PERL_UNWARANTED_CHUMMINESS_WITH_MALLOC
stuff would also help with this problem, but I've previously been
told that it was disabled due to problems with COW that have
not been ironed out yet.

In a nutshell, for a long time the minimum PV length (hardcoded
in Perl_sv_grow) has been 10 bytes and the minimum AV array size
(hardcoded in av_extend_guts) has been 4 elements.

These numbers have been used elsewhere for consistency (e.g.
Perl_sv_grow_fresh) in the past couple of development cycles.
Having a standard definition, rather than hardcoding in multiple
places, is more maintainable. This commit therefore introduces
into perl.h:

    PERL_ARRAY_NEW_MIN_KEY
    PERL_STRLEN_NEW_MIN

(Note: Subsequent commit(s) will actually change the values.)
Major malloc implementations, including the popular dlmalloc
derivatives all return chunks of memory that are a multiple of
the platform's pointer size. Perl's traditional default string
allocation of 10 bytes will almost certainly result in a larger
allocation than requested. Consequently, the interpreter may try
to Renew() an allocation to increase the PV buffer size when it
does not actually need to do so.

This commit increases the default string size to the nearest
pointer multiple. (12 bytes for 32-bit pointers, 16 bytes for
64-bit pointers). This is almost certainly unnecessarily small
for 64-bit platforms, since most common malloc implementations
seem to return 3*pointer size (i.e. 24 bytes) as the smallest
allocation. However, 16 bytes was chosen to prevent an increase
in memory usage in memory-constrained platforms which might have
a smaller minimum memory allocation.
@demerphq
Copy link
Collaborator

demerphq commented Nov 20, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Nov 20, 2022

This PR consists of two commits, both concerned with the
The PERL_UNWARANTED_CHUMMINESS_WITH_MALLOC
stuff would also help with this problem, but I've previously been
told that it was disabled due to problems with COW that have
not been ironed out yet.

I vaguely remember that. I believe it is related to sv_gets(). IIRC we end
up massively over-allocating when we use it. We need to be super careful
with adjusting some of these things, they can have surprising and
unintended action at a distance on some of our code.

If you look for a thread titled "sv_grow() and malloc" you will find a
discussion by Dave M and me about this. He provided the following summary
which I am quoting below verbatim, his summary of the work i did to fix
sv_gets() is better than any I did myself I think, and he did a lot of
additional research on the subject:

Sayeth @iabyn (Dave Mitchel): in https://marc.info/?l=perl5-porters&m=141147301828156&w=3

I've spent the last week or two (on and off) looking into how all the
malloc and COW stuff interacts.

TL;DR: COW caused problems with memory wastage when using readline, so
sv_setsv_flags() was modified to skip COW if SvLEN was quite bigger than
SvCUR; this then caused problems on platforms with malloc_size() (Darwin
and perl's malloc as far as I know), so calling malloc_size() was
disabled. I believe that the SvCUR verses SvLEN test is sub-optimal in
some cases, and can be tweaked, after which malloc_size() can be
re-enabled. Further, I believe a run-time startup probe of malloc() and
realloc() behaviour on platforms without malloc_size() can provide much of
the benefit of malloc_size().

Background:

There are two core functions I'm interested in here: sv_setsv_flags(),
which for strings, decides when to copy, and when to COW; and sv_grow(),
which mallocs or reallocs the string buffer when the SV is first used as a
string, and when it is subsequently needed to hold a bigger string.

sv_grow() applies various fiddle factors to the requested length; these
come in two classes; first those where we think we're smarter than the
caller (e.g. sometimes over-allocating by typically 25% to avoid lots of
reallocs if the string is being repeatedly appended to), and second where
we start being chummy with the malloc library. The first class I think
can be improved, but I'm going to leave that discussion for another time.

As regards chumminess, some platforms have a malloc_size() function which
returns, for a malloc()ed address, the real size allocated, which is likely
to be larger than the requested size due to alignment, implementation and
efficiency considerations. Where available, perl uses this function to
post-hoc adjust SvLEN. On platforms without malloc_size(), we instead
round the request up to sizeof(size_t), on the grounds that the malloc
library is likely to do the same. So overall, sv_grow(sv, len) looks
something like:

    len += adjust for various adjusts;

    #ifdef Perl_safesysmalloc_size
        s = malloc(len);
        SvPV_set(sv, s);
        SvLEN_set(sv, Perl_safesysmalloc_size(s));
    #else
        len = round_up(len);
        SvPV_set(sv, malloc(len));
        SvLEN_set(sv, len);
    #endif

The other function of interest, sv_setsv_flags() (which is the thing
called for perl-level scalar assignment), has a set of criteria to decide
when to copy the string, when to steal it, and when to share it (COW). For
example, if the string is relatively short and the destination string
already has a buffer big enough to hold it, then it's cheaper to just copy
the string rather than set up COW.

The issues:

A while back we had an issue with COW and readline() interacting:
code like this:

    push @a, $_ while (<>);

was causing each pushed SV to be large, even if the lines were short. This
was due to sv_gets() initially allocating a large buffer to read the first
line; rather than a copy being pushed, the original large SV is COWed and
pushed; when readline read the second line, it noticed that the buffer was
shared, abandoned it, and created a new (large) buffer. It's generally
agreed that the whole readline()/sv_gets() buffer allocation strategy
needs rethinking anyway, but in the meantime, and also to help avoid
similar situations elsewhere, Yves added to 5.20.0, with commit
e8c6a47, some extra criteria to sv_setsv_flags() on when to skip
COW: namely when the src's SvLEN is quite a bit bigger than its SvCUR,
(for some definition of "quite".) This solved the readline() issue.

This subsequently caused an issue on Darwin (and any platform that has
malloc_size()):

    [perl #121975] COW speedup lost after e8c6a474

In that ticket, something like:

    $x = "x" x 1_000_000;
    $y = $x for 1..1_000_000;

was slow (took minutes) up to and including 5.18.0; it then became fast
(< 1 sec) when COW was enabled; it became slow again when the SvCUR verses
SvLEN checks were added. This was because perl was doing malloc(1000002),
and Darwin's malloc() library was rounding this to a nice block, 1003520
(0xf4242 => 0xf5000). When we set SvLEN to malloc_size(), we ended up with
SvCUR = 100000, SvLEN = 1003520; the difference between the two was enough
to trigger the "skip COW" test. Yves worked round this by disabling calls
to malloc_size() by default, with ce861ea:

temporary fix for [perl #121975] COW speedup lost after e8c6a474

What I propose:

I think that the criteria for skipping COW on divergent SvCUR/SvLEN
need tweaking; I'll discuss this further below. Once that's fixed,
I think it will be safe to revert ce861ea, and platforms that have
malloc_size() will again benefit from this slight optimisation.

Further, based on the outputs of that little program I asked people
to run on various platforms, I think it is possible at perl startup time,
to do a probe consisting approximately of

    malloc(1); realloc($_) for 2..N

which will allow us to determine two constants A and B such that the size
in any call to malloc() or realloc() in sv_grow() can be rounded up to to
A+nB for some n with no extra memory wasted (for example on 64-bit linux,
A=24, B=16). This would replace the existing "round up to sizeof(size_t)"
on non-malloc_size() platforms (equivalent to A=8,B=8). We'd continue to
use malloc_size() on platforms that support it. It may also be the case
that there are other platform-specific malloc() introspection functions
that we could add Configure probes for and use where available. But I'll
want to think further about the A+nB and probing issues and discuss this
later.

Finally, there are issues with sv_grow()'s "I know better than the caller"
over-allocation strategies; which I want to fix, but again I'll discuss
this further later.

What now:

So the main thing I want to do right now is tweak the SvCUR verses SvLEN
testing, then revert the malloc_size() disabling. The three tables below
show how sv_setsv_flags() decides whether to steal, copy or COW based on
various values of SvCUR of the source string, plus various other criteria
(some of which I have not mentioned since they are irrelevant to this
discussion, such as the cow refcount being > SV_COW_REFCNT_MAX).

The main action shown in each table is that which prevailed before the
e8c6a47 additional "SvLEN >> SvCUR" restrictions; the comments in
[square brackets] describe the additional restrictions following that
commit, while the numbers in {} are examples of my proposed changes to the
criteria (see later more details).

The main pre-patch criterion was to copy rather than COW if the string was
short (< 1250 bytes) and would already fit in the destination buffer (and
the buffer already existed); the added post-patch criteria were to skip
COW (and thus favour copy) if SvLEN > 2*SvCUR or SvLEN > SvCUR + 80.

First, test whether the buffer is swipeable:

    just steal the buffer from the src SV if:
        src not COW, RC==1, and various other constraints,
        and the SV is either SVs_TEMP or SVs_PADTMP:

     src:CUR  SVs_TEMP    SVs_PADTMP
         ---  --------    ----------
           1     STEAL    DONT-STEAL
           2     STEAL    DONT-STEAL
           3     STEAL    DONT-STEAL
          ..
          78     STEAL    DONT-STEAL
          79     STEAL    DONT-STEAL
          80     STEAL    DONT-STEAL
          81     STEAL    DONT-STEAL
          82     STEAL    DONT-STEAL
          83     STEAL    DONT-STEAL
         ...
        1249     STEAL    DONT-STEAL
        1250     STEAL    STEAL [if src:LEN < 1330   {1562} ]
        1251     STEAL    STEAL [if src:LEN < 1331   {1563} ]
        1252     STEAL    STEAL [if src:LEN < 1333   {1565} ]
        ....
     1000000     STEAL    STEAL [if src:LEN < 100080 {1250000} ]
     1000001     STEAL    STEAL [if src:LEN < 100081 {1250001} ]

If not swipeable, then test for COWability. There are two sets of
criteria, based on whether the src SV is already marked as COW or not
(there is a greater cost in upgrading it to COW)

If the src is already COW:

    COW if:
       src:LEN == 0 (shared hash key)
    || (src:CUR >= 1250 && src:LEN-src:CUR < 80 && src:LEN < src:CUR*2)
    || dst:LEN < src:CUR+1

     src:CUR
         ---
           1     COW if wont fit in dst buf
           2     COW if wont fit in dst buf
           3     COW if wont fit in dst buf
          ..
          78     COW if wont fit in dst buf
          79     COW if wont fit in dst buf
          80     COW if wont fit in dst buf
          81     COW if wont fit in dst buf
          82     COW if wont fit in dst buf
          83     COW if wont fit in dst buf
         ...
        1249     COW if wont fit in dst buf
        1250     COW always [ if src:LEN < 1330    {1562}    ]
        1251     COW always [ if src:LEN < 1331    {1563}    ]
        1252     COW always [ if src:LEN < 1332    {1565}    ]
        ....
     1000000     COW always [ if src:LEN < 1000080 {1250000} ]
     1000001     COW always [ if src:LEN < 1000081 {1250001} ]

If the src is not already COW:

    COW if:
       src SV is capable of being converted to COW
          (inc 1 spare byte available for refcnt)
    && (src:CUR >= 0 && src:LEN-src:CUR < 80 && src:LEN < src:CUR*2)
    && (   (src:CUR >= 1250 && src:LEN-src:CUR < 80 && src:LEN < src:CUR*2)
        || dst:LEN < src:CUR+1
       )

     src:CUR
         ---
           1     COW if wont fit in dst buf [ and if src:LEN <   2  {24}]
           2     COW if wont fit in dst buf [ and if src:LEN <   4  {24}]
           3     COW if wont fit in dst buf [ and if src:LEN <   6  {24}]
          ..
          78     COW if wont fit in dst buf [ and if src:LEN < 156  {97}]
          79     COW if wont fit in dst buf [ and if src:LEN < 158  {98}]
          80     COW if wont fit in dst buf [ and if src:LEN < 160  {100}]
          81     COW if wont fit in dst buf [ and if src:LEN < 161  {101}]
          82     COW if wont fit in dst buf [ and if src:LEN < 162  {102}]
          83     COW if wont fit in dst buf [ and if src:LEN < 163  {103}]
         ...
        1249     COW if wont fit in dst buf [ and if src:LEN < 1329 {1561}]
        1250     COW always                 [     if src:LEN < 1330 {1562}]
        1251     COW always                 [     if src:LEN < 1331 {1563}]
        1252     COW always                 [     if src:LEN < 1332 {1565}]
        ....
     1000000     COW always                 [     if src:LEN < 1000080 {1250000} ]
     1000001     COW always                 [     if src:LEN < 1000081 {1250001} ]

As can be seen, for large strings the SvLEN < SvCUR + 80 test dominates,
while for small strings, the existing "< 1250" test and/or the new SvLEN < SvCUR * 2 test dominate.

I think this is basically the wrong way round; I'm not sure if this was
intentional or a coding error (the && should have been a || maybe??). I
think for large strings, the test should be SvLEN < SvCUR * N for some
small constant N like 1.25, (but with N big enough such that
malloc_size() rounding up won't cause issues), while for small strings,
the SvLEN < SvCUR + M test should dominate, but I think M should be
changed to something smaller than 80. Ideally, we should round up SvLEN
using the A+nB values found from startup probing, and use that as the
limit.

So together, I propose that the criteria should be something like:
    skip COW if:
           SvLEN >= SvCUR * 1.25
        && round_up(SvLEN) >= SvCUR

With Linux/glibc/64-bit where A=24 and B=16, this would yield thresholds
that are marked with {NNN} in the tables above.

So in conclusion, I am proposing for discussion right now,
a) re-enabling malloc_size() where available, and
probing at runtime startup to determine constants A and B where mallocs
will be of the form A+nB;
b) adjusting the SvCUR verses SvLEN algorithm as suggested above.
I'm not totally wedded to my exact suggested criteria 2 paragraphs
above; I'm open to suggestions.

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@richardleach
Copy link
Contributor Author

Thanks, @demerphq. I'd remembered Dave talking about runtime probing but couldn't unearth the correct discussion. That thread seemed to end on (FC's email?):

Dave Mitchell wrote:
> So in conclusion, I am proposing for discussion right now,
> a) re-enabling malloc_size() where available, and
>    probing at runtime startup to determine constants A and B where mallocs
>    will be of the form A+nB;
> b) adjusting the SvCUR verses SvLEN algorithm as suggested above.
>    I'm not totally wedded to my exact suggested criteria 2 paragraphs
>    above; I'm open to suggestions.

I agree with both.

Thoughts:

  • So (a) hasn't been done yet, but has (b)?
  • I'm unclear what the A and B constants would be. A is the smallest possible allocation? If B is some kind of scaling constant, it could break down if, as someone suggested in the thread, the malloc() has multiple different sizing strategies.
  • I suppose just for the point of this PR, we'd only need to determine that A, and it seems like this could be done independently of the scaling question.
  • Could we do the runtime test prior to runtime and bake the constants in to the interpreter, rather than having to e.g. read them out of an interpreter variable? e.g. Could miniperl do the probe and adjust a definition then used to build perl?

I also saw this, which seemed like an interesting approach: https://doc.qt.io/qt-5/containers.html#growth-strategies

Independent of all this, do you think this PR could be applied as-is, with us revisiting the rest as a follow up?

@demerphq demerphq merged commit 06dbfff into Perl:blead Nov 21, 2022
@demerphq
Copy link
Collaborator

@richardleach Yes, sorry, my mail was just explaining why this stuff is sensitive. It doesn't change the picture much except for replacing the hard coded constants with defines, with is an obvious improvement. I don't think the change in min string size makes much of a difference. So I approved and merged the patch.

I think further discussion on this should go into an issue where it can be tracked independently. FWIW, as an issue for another ticket: hashes are presized to 8 buckets, why should arrays be any different? Should hashes be reduced or arrays expanded? Food for though for a future ticket.

@iabyn
Copy link
Contributor

iabyn commented Nov 22, 2022 via email

@richardleach richardleach deleted the hydahy/min_sizes branch September 1, 2023 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants