-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null pointer dereference in S_SvREFCNT_dec #16627
Comments
From @geeknik./perl -e 'for$0(qw(0 0)){push@r,qr/@r(?{})/}' triggers a null pointer ==10676==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 AddressSanitizer can not provide additional info. |
This is still present in v39.10
|
I don't get a null pointer dereference, but I'm not using ASan either.
|
Slightly reduced:
This segfaults about 50% of the time for me. |
... which is what I also get on Ubuntu Linux 22.04 LTS. But on FreeBSD-13, I don't get any message or segfault.
|
n looks like a fill pointer for pRExC_state->code_blocks->cb. It is a local variable in S_concat_pat that is incremented in several places. S_concat_pat also calls itself, but the recursive call has its own n. That feels wrong. Use a pointer to make all recursive invocations of S_concat_pat share the same n, which is actually "allocated" at the top level (where S_concat_pat is invoked with pn = NULL). Cf. Perl#16627.
n looks like a fill pointer for pRExC_state->code_blocks->cb. It is a local variable in S_concat_pat that is incremented in several places. S_concat_pat also calls itself, but the recursive call has its own n. That feels wrong. Use a pointer to make all recursive invocations of S_concat_pat share the same n, which is actually "allocated" at the top level (where S_concat_pat is invoked with pn = NULL). Cf. Perl#16627.
I have a patch that seems to fix this issue: mauke@039cadf However, this patch was written based on vibes, not any deep understanding of the code. I'd appreciate any review of what's going on here. @iabyn ? |
I have a very similar patch in development (passing address of n as a pointer), but there are some subtleties you've missed (notably at line 734, *pn shouldn't be reset to zero). I wasn't entirely happy with my branch, so I was going to leave it until tomorrow to see if there was a more elegant fix. Probably best to leave this one to me - it was my mistake in the first place. |
Split the 'count' field of reg_code_blocks structures into separate 'count' and 'size' fields to make the code less fragile; and as an intended side-effect, fix GH #16627. Background: When a pattern includes embedded perl code, such as /(?{ CODE })/, then at compile-time the op trees associated with each of those code blocks are stored within the compiled regex, in a reg_code_blocks structure. This structure contains some basic info, plus a pointer to an array of reg_code_block structures, each of which contains a pointer to the optree for that code block, plus string offsets to where the (?{..}) or similar expression starts and ends within the pattern string. For a runtime pattern, perl tries to reuse any original compiled code blocks rather than recompiling them, to maintain correct closure behaviour. So for example, in the following: my $x = 1; { my $x = 2; $r = qr/(??{$x})/ } my $y = 3; my $s = '(??{$y})'; my $pat = qr/A (??{$x}) B $r C $s/x; at perl compile time, the two '$x' code blocks are compiled, and their optrees stored. At runtime, when the $pat pattern is compiled, the third code block, '$y', is compiled, and the two earlier optrees are retrieved. A new three-element 'struct reg_code_blocks' array is malloc()ed, and the pointers to the two old, and one new, optrees are stored in it. Overall, $pat has the same effect as qr/A1B2C3/. The assembly of this reg_code_blocks array is mostly performed by S_concat_pat() and S_compile_runtime_code(). It is done incrementally, since the total number of code blocks isn't known in advance. Prior to this commit, the array was often realloced() and grown one element at at a time, as each new run-time code block was discovered, with a corresponding pRExC_state->code_blocks->count++. This count field served twin purposes: it indicated both how many code blocks had been found and stored so far, and the malloc()ed size of the array. But some parts of the regex compiler allocate more than one slot at a time, and so the two meanings of the 'count' field temporarily diverge. This became noticeable when S_concat_pat() recursed to interpolate the contents of an array, such as qr/$a$b@c/, where interpolating $a, $b was done iteratively at the top level, then it recursed to process each element of @c. S_concat_pat() had a local var, 'int n', which counted how many code blocks had been found so far, and this value sometimes represented the difference between the two meanings of the 'count' field. However when it recursed, n started from zero again and things got out of whack, which led to GH #16627. The bug in that ticket can be reduced to: my @x = ( qr/(?{A})/ ); qr/(?{B})@x/; Here the B code block is stored in pRExC_state->code_blocks->cb[0], but then S_concat_pat recurses, n is reset to 0, and the A code block is also stored into slot 0. Then things would start to crash. The quick and dirty fix would be to share n between recursive calls to S_concat_pat(), by passing a pointer to it. Instead, this commit takes the approach of adding a 'size' field to pRExC_state->code_blocks, so that ->count now only indicates the current number of code blocks stored (replacing the local var n) while ->size indicates the current number of slots malloc()ed. This makes the code more conventional and simpler to understand, and allows the realloc() to pre-allocate rather than incrementing the array size by 1 each time. By removing the fragile double meaning of the 'count' field, it should make any future bugs easier to diagnose, at the cost of this initial commit being more complex.
I've now created PR #22201, which has a more general fix to this issue. |
Split the 'count' field of reg_code_blocks structures into separate 'count' and 'size' fields to make the code less fragile; and as an intended side-effect, fix GH #16627. Background: When a pattern includes embedded perl code, such as /(?{ CODE })/, then at compile-time the op trees associated with each of those code blocks are stored within the compiled regex, in a reg_code_blocks structure. This structure contains some basic info, plus a pointer to an array of reg_code_block structures, each of which contains a pointer to the optree for that code block, plus string offsets to where the (?{..}) or similar expression starts and ends within the pattern string. For a runtime pattern, perl tries to reuse any original compiled code blocks rather than recompiling them, to maintain correct closure behaviour. So for example, consider the following: my $x = 1; { my $x = 2; $r = qr/(??{$x})/ } my $y = 3; my $s = '(??{$y})'; my $pat = qr/A (??{$x}) B $r C $s/x; At perl compile time, the two '$x' code blocks are compiled, and their optrees stored. At runtime, when the $pat pattern is compiled, the third code block, '$y', is compiled, and the two earlier optrees are retrieved. A new three-element 'struct reg_code_block' array is malloc()ed, and the pointers to the two old, and one new, optrees are stored in it. So when $pat gets compiled, it becomes equivalent to: qr/A (??{$x}) B (??{$x}) C (??{$y})/x; except that the $x's have different values since they are from different closures. When the pattern is executed, the sub-patterns returned by the various (??{..})'s result in $pat having the same overall effect as qr/A1B2C3/. The assembly of this reg_code_block array is mostly performed by S_concat_pat() and S_compile_runtime_code(). It is done incrementally, since the total number of code blocks isn't known in advance. Prior to this commit, the array was often realloced() and grown one element at a time as each new run-time code block was discovered, with a corresponding pRExC_state->code_blocks->count++. This count field served twin purposes: it indicated both how many code blocks had been found and stored so far, and the malloc()ed size of the array. But some parts of the regex compiler allocate more than one slot at a time, and so the two meanings of the 'count' field temporarily diverge. This became noticeable when S_concat_pat() recursed to interpolate the contents of an array, such as qr/$a$b@c/, where interpolating $a, $b was done iteratively at the top level, then it recursed to process each element of @c. S_concat_pat() had a local var, 'int n', which counted how many code blocks had been found so far, and this value sometimes represented the difference between the two meanings of the 'count' field. However when it recursed, n started from zero again and things got out of whack, which led to GH #16627. The bug in that ticket can be reduced to: my @x = ( qr/(?{A})/ ); qr/(?{B})@x/; Here the B code block is stored in pRExC_state->code_blocks->cb[0], but then S_concat_pat() recurses, n is reset to 0, and the A code block is also stored into slot 0. Then things would start to crash. The quick and dirty fix would be to share n between recursive calls to S_concat_pat(), by passing a pointer to it. Instead, this commit takes the approach of adding a 'size' field to pRExC_state->code_blocks, so that ->count now only indicates the current number of code blocks stored (replacing the local var n) while ->size indicates the current number of slots malloc()ed. This makes the code more conventional and simpler to understand, and allows the realloc() to pre-allocate rather than incrementing the array size by 1 each time. By removing the fragile double meaning of the 'count' field, it should make any future bugs easier to diagnose, at the cost of this initial commit being more complex.
Split the 'count' field of reg_code_blocks structures into separate 'count' and 'size' fields to make the code less fragile; and as an intended side-effect, fix GH #16627. Background: When a pattern includes embedded perl code, such as /(?{ CODE })/, then at compile-time the op trees associated with each of those code blocks are stored within the compiled regex, in a reg_code_blocks structure. This structure contains some basic info, plus a pointer to an array of reg_code_block structures, each of which contains a pointer to the optree for that code block, plus string offsets to where the (?{..}) or similar expression starts and ends within the pattern string. For a runtime pattern, perl tries to reuse any original compiled code blocks rather than recompiling them, to maintain correct closure behaviour. So for example, consider the following: my $x = 1; { my $x = 2; $r = qr/(??{$x})/ } my $y = 3; my $s = '(??{$y})'; my $pat = qr/A (??{$x}) B $r C $s/x; At perl compile time, the two '$x' code blocks are compiled, and their optrees stored. At runtime, when the $pat pattern is compiled, the third code block, '$y', is compiled, and the two earlier optrees are retrieved. A new three-element 'struct reg_code_block' array is malloc()ed, and the pointers to the two old, and one new, optrees are stored in it. So when $pat gets compiled, it becomes equivalent to: qr/A (??{$x}) B (??{$x}) C (??{$y})/x; except that the $x's have different values since they are from different closures. When the pattern is executed, the sub-patterns returned by the various (??{..})'s result in $pat having the same overall effect as qr/A1B2C3/. The assembly of this reg_code_block array is mostly performed by S_concat_pat() and S_compile_runtime_code(). It is done incrementally, since the total number of code blocks isn't known in advance. Prior to this commit, the array was often realloced() and grown one element at a time as each new run-time code block was discovered, with a corresponding pRExC_state->code_blocks->count++. This count field served twin purposes: it indicated both how many code blocks had been found and stored so far, and the malloc()ed size of the array. But some parts of the regex compiler allocate more than one slot at a time, and so the two meanings of the 'count' field temporarily diverge. This became noticeable when S_concat_pat() recursed to interpolate the contents of an array, such as qr/$a$b@c/, where interpolating $a, $b was done iteratively at the top level, then it recursed to process each element of @c. S_concat_pat() had a local var, 'int n', which counted how many code blocks had been found so far, and this value sometimes represented the difference between the two meanings of the 'count' field. However when it recursed, n started from zero again and things got out of whack, which led to GH #16627. The bug in that ticket can be reduced to: my @x = ( qr/(?{A})/ ); qr/(?{B})@x/; Here the B code block is stored in pRExC_state->code_blocks->cb[0], but then S_concat_pat() recurses, n is reset to 0, and the A code block is also stored into slot 0. Then things would start to crash. The quick and dirty fix would be to share n between recursive calls to S_concat_pat(), by passing a pointer to it. Instead, this commit takes the approach of adding a 'size' field to pRExC_state->code_blocks, so that ->count now only indicates the current number of code blocks stored (replacing the local var n) while ->size indicates the current number of slots malloc()ed. This makes the code more conventional and simpler to understand, and allows the realloc() to pre-allocate rather than incrementing the array size by 1 each time. By removing the fragile double meaning of the 'count' field, it should make any future bugs easier to diagnose, at the cost of this initial commit being more complex.
Now fixed with v5.41.0-45-g40727c420c. |
Migrated from rt.perl.org#133369 (status was 'new')
Searchable as RT133369$
The text was updated successfully, but these errors were encountered: