Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sv_setsv_cow: only succeed if sv_setsv() would also COW #22120

Draft
wants to merge 4 commits into
base: blead
Choose a base branch
from

Conversation

tonycoz
Copy link
Contributor

@tonycoz tonycoz commented Apr 4, 2024

This improved performance of the test case I was using from:

real    1m7.690s
user    0m2.593s
sys     1m4.984s

to

real    0m0.576s
user    0m0.453s
sys     0m0.093s

on cygwin.

The problem that is fixed here is that sv_setsv_cow() would COW SVs even though they had a very short string in a very large buffer, which had some sort of unclear to me interaction with the Win32 virtual memory system that had a painful performance impact when the large SV was made COW.

It may also improve memory handling on non-Win32 systems, since the SV was made COW, other SV copying functions wouldn't check if it was suitable for COW, so further copies would retain a COW reference for the large buffer. In the example test case from #21877 if we pushed the SV to an array after a successful match memory use would balloon out, running even a system with a generous amount of memory, out of memory.

Fixes part of #21877

This will actually fail in some cases in the next commit.
Previously if you had a successful match against an SV with a
SvLEN() large relative to the SvCUR() the regexp engine would
use sv_setsv_cow() to make a COW copy of the matched SV,
extending the life of the large allocation buffer.

A normal sv_setsv() normally didn't do such a COW copy, but the above
also marked the source SV as COW, so further copies of the SV
could even further extend the lifetime of the buffer, eg:

  while (<>) { # readline tends to make large SvLEN()
    /something/; # some sort of match
    push @save, $_; # with a successful match, the large $_ buffer
                    # survives until @save is released
  }

Fixes part of Perl#21877
@tonycoz tonycoz requested a review from demerphq April 4, 2024 04:57
@tonycoz tonycoz marked this pull request as draft April 7, 2024 23:01
Copy link
Contributor

@iabyn iabyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks plausible. It would be nice for the sv_setsv_cow() fn to have a few lines of code comments at the top explaining what it's for/does.

@jkeenan
Copy link
Contributor

jkeenan commented Aug 26, 2024

@tonycoz this p.r. was filed back in April, but is still in draft status and has acquired merge conflicts. Assuming you want to move forward with it, could you resolve those conflicts and un-draft the ticket? Thanks.

@tonycoz
Copy link
Contributor Author

tonycoz commented Aug 27, 2024

I'll rebase it, I'm looking into some behaviour I didn't expect with a variant of this change.

@bulk88
Copy link
Contributor

bulk88 commented Oct 17, 2024

@tonycoz

#22623

Your performance problem is related to the ticket above, and HeapReAlloc() either refuses to shrink the memory allocation as a MS perf optimization, or all "freed" memory by HeapReAlloc() from large (any alloc too large for Low Fragmention Heap), is donated to the Low Fragmentation Heap. HeapReAlloc() does a VirtualProtect(ptr, len, MEM_RESERVE);, then puts ptr into a linked list in Low Fragmentation Heap. VirtualProtect(ptr, len, MEM_RESERVE); makes the memory counts for the process go down in task manager, for "Private Working Set" and "Commit Size" but you will eventually have problems with running out of virtual address space because of 100s MBs or 10s of GBs of memory marked as "reserved".

I agree with logic, is SvCUR vs SvLEN gap is too big at some threshold, do not COW. Or shrink the PV buffer in the parent SV * (maybe a very bad perf idea for PAD vars).

edit:

very little can be found on this bug, and im using win7, but look at this 2011 post, nobody in that thread knows why the code exists, but someone in the mid 1990s knew the secret bug/problem but it was lost to time

https://forum.powerbasic.com/forum/user-to-user-discussions/powerbasic-for-windows/48415-refresh-heap?p=569065#post569065

@bulk88
Copy link
Contributor

bulk88 commented Oct 17, 2024

very little can be found on this bug, and im using win7, but look at this 2011 post, nobody in that thread knows why the code exists, but someone in the mid 1990s knew the secret bug/problem but it was lost to time

https://forum.powerbasic.com/forum/user-to-user-discussions/powerbasic-for-windows/48415-refresh-heap?p=569065#post569065

https://stackoverflow.com/questions/9164837/realloc-does-not-correctly-free-memory-in-windows

Above shows the bug with Windows architecture. I found the correct explanation, NT Kernel has no function called VirtualReAlloc. Only 2 functions exist, VirtualAlloc and VirtualFree. A kernel memory object, once it is born, it has 2 fields. start_address and length. They are constant for the rest of that kernel object's lifespan. A user can use VirtualProtect to change single 4096 byte units between start_address and start_address+length. The changes are RWX flags and decommit (discard data, SEGV ON, all process memory counters instantly drop), and turning on any of the 3 RWX flags does "SEGV OFF". VirtualUnlock is a command to "move from phy ram to paging file now and delete from phy ram, next RWX will read back from paging file .

The only way to shrink the reserved address space of kernel memory handle is to VirtualAlloc a new kernel memory object, then call memcpy(), then call VirtualFree on the old kernel memory object. That is a pathological perf degrade, therefore HeapReAlloc will not do that. Now how to fix that in perl when msvcrt/libc realloc does not release virtual address space, and instead only deletes memory contents from phy ram and from paging file and switch the address to SEGV ON?

Side trivia VirtualLock is obsolete/noop now. It used to mean pin 4096 byte unit in phy ram until further notice. MS either made the command always fail and return an error code, or it silently succeeds but does nothing. VirtualLock was a security problem from user mode. Its only purpose was to talk to hardware I/O devices (Windows 95), but that is privilege for kernel mode drivers. If it makes sense, the kernel mode driver can take a 4096 unit of user mode phy ram, and make it unpage/pin in phy ram forever or however long is needed for hardware DMA/MMIO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants