-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault caused on PHP #9446
Comments
I wonder if this is fixed by: #9370 Do you have any way of trying that out? |
No unfortunately I don't. I assume those commits will be in the next point release, is there a timeline in mind, I can test once it's on packagist |
Hey @haberman , we've compiled the version from master and the segfault is still there. Our runtime: |
Sorry for the delay on this. So far it sounds like all of the reports are for PHP 8.1, and only occur under heavy load. Is that correct? I will see if any memory errors show up in a Valgrind run with PHP 8.1. |
It also looks like all reports so far are using php-fpm. Does this reproduce without php-fpm? Is that feasible to test? So far this does not trigger any Valgrind warnings or errors under any of our tests. I'm not sure how to reproduce this. |
This problem occurs only when opcache is enabled and php is under heavy load. |
If this only occurs on 8.1, and only with opcache enabled, it seems like something changed on the PHP side. It seems like perhaps the message is being destroyed between Message_create() and zim_Message___construct(). Or perhaps it's being constructed without being created first. |
As you have mentioned, PHP 8.1 introduced some opcache features, such as inheritance cache. In my case, only constructing a new proto object without any parameters under a lot of concurrent access produces a crash. I'm trying to make a simple example project, but it is difficult. |
Hi @haberman, I see [v3.20.0-rc3] has a huge list of included fixes. I've read through it and see no reference of this issue, so I assume 3.20 will not resolve this segmentation fault on 8.1? |
That is correct, there has been no movement on this issue. We have not received a repro for this issue, and the fact that the issue only occurs with opcache enabled suggests to me that something has changed on the PHP side -- either a bug or an intended semantic change that breaks protobuf somehow. Unless PHP 8.1 has introduced a known/documented change in the C API semantics, I'm inclined to think this is a bug on the PHP side. The only somewhat unusual thing we are doing (that I know of) is this. If the PHP 8.1 opcache breaks this behavior somehow, perhaps we need to stop doing this: protobuf/php/ext/google/protobuf/message.c Lines 63 to 76 in 10df21e
@remicollet may have ideas here. |
What is the benefit of keeping suck hack ? |
If we think that this SEGV might possibly be related to the In the past, putting |
My team cannot update to 8.1 until this issue has been fixed. It's now over 6 months since the release of 8.1. What progress has been made, should we expect an update soon? We will migrate off protobuf if there is no plans to resolve the issue. Deactivating OPCache is not feasible. Thank you. @haberman @remicollet |
To date we have still not received a repro. Without a repro, the best we can do is guess. The best way to help accelerate a fix is to provide us with a self-contained repro. We can try making the change I outlined in #9446 (comment), but that idea is based on guesswork. If it does not fix the problem, we are back to square one. I will work on implementing the idea in #9446 (comment) and hope it helps. Hopefully we can get it into the release that will be going out in the next week or two. |
After starting the implementation of this, I feel some more confidence that this theory is correct, and that the idea mentioned above will fix these SEGVs. On the downside, the simple fix will also increase memory usage, because we will be allocating storage for message fields at both the PHP and the C levels. This is a side effect of the fact that we use the same generated code for both pure-PHP and C-accelerated PHP, so the generated classes always have PHP-level properties for all message fields. Previously we were suppressing the PHP-level properties when using the C extension, but this hack appears to be incompatible with OPCache. So we will have to allocate memory for the PHP properties even when using the C extension, which will increase memory usage until we can change the generated code. |
There is a chance this may fix some SEGVs seen on PHP 8.1 when OPCache is enabled: protocolbuffers#9446 (comment) However this also will increase memory usage of PHP protos, and will likely introduce CPU overhead also.
I have implemented the change I described above in #9930 Can anybody who is experiencing this problem try this out to verify whether it fixes the SEGVs? Unfortunately it will probably have some CPU/memory overhead also, so it could cause regressions if deployed to prod. But if there is a test cluster that reliably reproduces the behavior, a test there would be very helpful. If this indeed addresses the root cause of the issue, we should rework the generated code so that we get the fix without the overhead. |
@haberman we can give it a try next week. We were experiencing the segfaults in a sustained way when using PHP 8.1. I'll come back with news. |
We've tried it today and the segmentation faults are still there. We are going to try to create a reproducible scenario in a docker-compose stack. |
We manage to reproduce the segmentation fault in a small docker environment. https://github.com/stayforlong/protobuf_sigsegv_poc Hope this helps. |
@radykal-com Thanks for giving it a try. Unfortunately it looks like it didn't fix the problem. @averges thank you for the repro! I was unfortunately not able to reproduce the SEGV. On my machine I get:
The full logs do not seem to mention a segv or any other error. |
@haberman I've tried the repro from @averges and even is not happening 100% of the time, it does reproduce the SEGV most of the times. I did not get the error you showed (AB container unhealthy error) once. From 8 executions, 5 executions did reproduce the error and 3 not. Maybe if the ab runs longer or with more concurrency will trigger the error more often. It seems like some kind of race condition, so a 100% reproducible case is gonna be imposible to create.
|
I've increased ab concurrency to improve the SIGSEGV hit ratio and I've also added a new command "flood".
The easiest way to reproduce the segmentation fault is using apache ab (or other benchmarking tool) directly on a server with php-fpm + opcache. |
I can't seem to get the Docker image to come up successfully, but I will use the html/PHP files from the repro and set up php-fpm manually on my desktop. |
I have managed to reproduce this locally! Thanks everyone for your help in reaching this point. This will make it much easier to get to a root cause. |
A few early results that rule out possible causes:
The crashes appear to be specific to the shared memory operation under concurrency:
|
I have a fix that appears to work for me: #9995 If anyone can confirm the fix, that would be very helpful to ensure that this fixes the bug for others. |
We've tried the fixed version for ~1h and it seems fixed. Thanks you so much @haberman When is the next release scheduled? |
Spectacular, thank you so much @haberman 🙏🙏 |
That's great to hear!
I've cherry-picked this into the 21.x branch: #9996 It will go out in the 3.21.0 release, which should go out in the next week or two. |
What version of protobuf and what language are you using?
Protobuf 3.19.3 installed using pecl
PHP8.1.2 using PHP-FPM
What operating system (Linux, Windows, ...) and version?
Ubuntu 18.04
What runtime / compiler are you using (e.g., python version or gcc version)
c version installed with pecl
What did you do?
What did you expect to see
No segfault as this breaks the request
What did you see instead?
Seg faults listed in php8.1-fpm.log
Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).
Anything else we should know about your project / environment
Stack: Ubuntu + Nginx + PHP-FPM
Backtraces:
The text was updated successfully, but these errors were encountered: