Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
remove pointless SVPV mortal alloc in S_strip_spaces()/prototype parser
Valid, parseable, and sane prototypes, are tiny in char len and often fit on 1 hand. Original commit referred to mitigating sloppy XS code with random white space. Majority of XS code will have perfect clean prototype strings. Do not SV Alloc, PV Alloc, MEXTENDPUSH, memcpy(), and alot more free()s in scope _dec(), for clean strings. Even for dirty but parsable prototypes, they will be tiny. Therefore use a tiny stack buffer for dirty semi-hot path to remove overhead. Fuzzing, junk, abuse, can OOM die in newSV()/malloc() if needed, same as in prior version of the code. Use newSV(len) and POK_off, SV head is private to us, and a waste to bookkeep SVPV details. SAVEFREEPV() was not used, because previous code did mortal, and not SAVEFREEPV(), so keep using mortal. One day after testing by public, maybe SAVEFREEPV() is smarter choice here, but KISS in this commit. The size of the stack buffers should probably be 8 or 16 bytes to cover legit protoype strings. I made the buffers larger, simply, because I can, and there is no machine code size difference on x86/x64 between 16 and the numbers picked. The numbers are from 2 different binary analysis tools of perl541.dll on x64 Windows, -O1, MSVC 2022. The numbers are "width" or "size" or "overhead" in bytes of the C stack frames, of the 3 callers of S_strip_spaces. My rational is, by keeping width of the C stack frame, under 0xFF bytes, x86_op+stk_reg+imm8 encoding is emitted by CCs. Instead of x86_op+stk_reg+imm32 encoding which is larger. So the math is 0xFF-current_frame_size-(5*ptrs). -(5*ptrs) accounts for future P5P C auto vars or changes to the C code of the 3 callers, and whatever GCC vs Clang vs each CC build number uniqueness, so x86/x64 CCs only use stk_reg+imm8_offset instructions and never resort to writing out 32b offsets in machine code. As a guesstimate "/2" the stack frame width for i386 CPUs. Perl_cv_ckproto_len_flags(), has 6 args, therefore its inelligible for Win64's 4 register __fastcall ABI, and args 5 and 6, must be read off the C stack per ABI. So even if small U8-U64 C auto vars, are at the "top" of the C stack, and reached with +imm8 operands, obv the CC still has to write 2 "lone" read(+imm32) ops to read arg 5 and 6. There are tricks to optimize out +imm32 to reach incoming args, but thats for a CC vendor talk. Anyways, 0x10/16 or 0x20/32 is the realistic buffer size, the higher lengths here, are simply because, in theory, Perl#1 avoid malloc() always, Perl#2 no perf, runtime, or machine code size diff between 0x20 and my numbers. 2 different tools were used, I picked the "larger" numbers C stack size report number, to make the cleanedproto buffers even smaller, so this "CC only uses +imm8 op to r/w C stack" optimization lasts for years, not 1 build number of GCC/VC/LLVM. statistics Perl_ck_entersub_args_proto 0x88/0x48 yyl_subproto 0x28/0x20 Perl_cv_ckproto_len_flags 0x68/0x30 S_strip_spaces() was added in d16269d 6/24/2013 5:58:46 PM Remove spaces from a (copy of) a proto when used. The logic that *CUT*
- Loading branch information