-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in Modules/sre_lib.h #67877
Comments
In Modules/sre_lib.h on line 882 [1], a block of memory is allocated. If SRE(match) function later terminates abruptly, either because of a signal or because subsequent memory allocation fails, this block is never released. [1] https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l882 |
There is maybe a bug. Can you show an example of regex and a text where the memory leak occurs? You can use the tracemalloc module to check if there is a memory leak. Or use sys.getcounts() if you compiled Python in debug mode. sre_lib.h is very complex, it uses the C instruction "goto" with regex bytecodes. "DO_JUMP(JUMP_REPEAT, jump_repeat, ctx->pattern+ctx->pattern[0]);" calls "goto entrace" to execute following bytecodes, but later it comes back after DO_JUMP() with the "jump_repeat:" label: https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l1180 |
Memory leak only happens if match operation terminates abruptly, e.g. because of SIGINT. In this case, DO_JUMP doesn't come back. |
Tracemalloc code: import re
import signal
import tracemalloc
class AlarmError(Exception):
pass
def handle_alarm(signal, frame):
raise AlarmError
signal.signal(signal.SIGALRM, handle_alarm)
s1 = tracemalloc.take_snapshot()
for _ in range(20):
try:
signal.alarm(1)
re.match('(?:a|a|(?=b)){1000}', 'a'*999)
raise RuntimeError
except AlarmError:
pass
s2 = tracemalloc.take_snapshot()
res = s2.compare_to(s1, 'lineno')
for e in res[:10]:
print(e) For me, it shows almost 3 MiB allocated in re.py. |
May be this patch helps. |
Oh cool, you wrote a script to reproduce the issue! And Serhiy wrote a patch, great! Great job guys. sre_clean_repeat_data.patch looks good to me. @serhiy: Can you try the example to ensure that it fixes the issue? If yes, go ahead! |
This patch doesn't fix the issue. The problem is that the list starting with state->repeat doesn't necessarily contains all repeat contexts that are allocated. Indeed, here [1] and here [2] repeat contexts are temporarily removed from the list. If the match procedure terminates abruptly, they are not added back. [1] https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l963 |
Try to allocate SRE_REPEAT on state's stack, the performance has not changed significantly. It passes the other tests, except this one (test_stack_overflow): |
PR11926 (closed) tried to allocate SRE_REPEAT on state's stack. PR12160 uses a memory pool, this solution doesn't mess up the code. 🔸For infrequent alloc/free scenes, it adds a small overhead: s = 'a'
p = re.compile(r'(a)?')
p.match(s) # <- measure this statement before patch: 316 ns +- 19 ns 🔸For very frequent alloc/free scenes, it brings a speedup: s = 200_000_000 * 'a'
p = re.compile(r'.*?(?:bb)+')
p.match(s) # <- measure this statement before patch: 7.16 sec 🔸I tested in a real case that use 17 patterns to process 100MB data: before patch: 27.09 sec |
My PR methods are suboptimal, so I closed them. The number of REPEAT can be counted when compiling a pattern, and allocate a It seem at any time, a REPEAT will only have one in active, so a Can the number of REPEAT be placed in |
This looks promising. Please, go ahead! You are free to add any fields to any opcodes. It may break some third-party code which generates compiled patterns from a sequence of opcodes, it the stability of this interface was not promised. And they will be broken in any case due to reorganizing of internal code (bpo-47152). |
Thank you Ma Lin for all your work. The fix changes interfaces of some internal functions which can be used in third-party code, and the bug occurs only in special circumstances, so it is not practical to backport it. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: