-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault when using Continuous Query Notification #1009
Comments
Considering that ProcessNotification() is called on the main Node.js thread that would seem impossible, barring a bug in Node.js itself, of course! But it is likely still a concurrency issue, if not there. :-) Thanks for providing such a complete test case and the steps for reproducing it. Hopefully we can indeed replicate the issue. |
Would be great of you can figure this one out. CQN seems the perfect match for our scenario but these stability issues make use hesitant to rely on CQN. |
I just tried running your docker-compose setup. It all appeared to work but running the final command fails with SP2-0310: unable to open file "update.sql". Running the commands by copy/paste yields ORA-00942: table or view does not exist. So something seems a bit off yet. What script runs the setup for the database? |
The table is created by the node container with the listening process. See line 48 at Be sure to use the segfault branch of the repo. The master branch is not in a working state. |
Ah. I first built it with master. I then rebuilt everything, but running docker-compose up yields the following problems:
Any further suggestions? |
That's weird. I’ll try it myself tomorrow in a fully clean working directory and will get back to you |
I don't really understand what is going wrong, but I expect it has something to do with the /usr/src/app being a volume that I mapped to your docker host directory. Looks like docker somehow doesn't have permissions on your host. Anyhow; I've updated the |
Ok. I updated my git repo and ran the commands specified. Everything came up properly this time without any issues. I then ran the command to perform the update....and that worked, too. I ran it over a dozen times without any problems! :-) So I hacked update.sql and got it to run 1000 times in an anonymous PL/SQL block. That successfully crashed node. So now I'll take it apart and see if I can figure out why! |
“Good” to hear you also have the same issue. Let’s hope that’s a big step towards fixing this. Let me know if there is anything else I can do |
Looking at this issue I discovered that all of the notifications are running the same thread, so this is not a concurrency issue. Instead, it seems to be due to the fact that too many notifications get backed up. The workaround (and probably better implementation anyway!) is to use notification grouping, as in the following code: const cqnOptions = {
callback: cqnCallback,
sql: "select * from demo",
groupingClass: oracledb.SUBSCR_GROUPING_CLASS_TIME,
groupingValue: 1, // number of seconds
groupingType: oracledb.SUBSCR_GROUPING_TYPE_SUMMARY
}; If you're getting that many notifications this is going to be an improvement anyway! You can take a look at the documentation as well. |
We've tried grouping but that has the major downside of losing access to actual rowid's and we need to do a full table refresh. So we would like to use individual messages, but in my setup this already crashes node when I send just a couple of messages. It is not a very busy database but it might happen that a few messages hit at the same time. In my docker setup it already fails at the second message. |
One other possibility to consider: use another method to determine which rows need to be processed (last updated column, separate table containing primary keys of affected rows, etc.) so you can treat the CQN message like a signal. Regarding processing the messages as quickly as possible, that's probably a good idea but I don't think that will eliminate the issue completely. Perhaps you can simply comment out the real processing temporarily to see if you can process more messages then? The queuing is taking place inside the Oracle Client libraries. We are hoping to talk to one of the owners of that section of the code to see if there is anything that can address this issue and will get back to you once we have more information. |
@wvanderdeijl I just pushed a patch from @anthony-tuininga to master that seems to resolve the CQN crash. Can you help test & review it? |
And it actually did turn out to be a concurrency issue. :-) And I was able to process better than 10,000 notifications in under a second, so I think we're good to go. |
I have the same problem with CQN. How do you make 10,000 under a second. I just make max 20 . anthony-tuininga |
Can you take a look at this comment? As for performance? That's simply a matter of a fast machine and what you are doing in your notification callback! All I did (effectively) was count the number of notifications, so that's going to process much more quickly than something that performs a query, for example! |
When i recive CQN from Server i only run console.log(message);async function myCallback(message) {
|
Have you recompiled node-oracledb with the patch that was provided? |
I will compile and test monday. Thank s a lot. |
@alpertandogan how did your testing work out? |
Node-oracledb 3.1 was just released with a fix in this area. There may still be an issue on Windows, but we haven't reproduced in development. |
I am creating a CQN subscription for
select * from demo
with onlySUBSCR_QOS_ROWIDS
as qos. This works fine, unless I do a lot of transactions very quickly. For example a SQL script doing 10 update statements on that table with a commit between each update. This creates 10 notification to the node container and that frequently (but not always) crashes the node process with a 'Segmentation Fault' or 'Aborted' message on the console.A reproducable test case with two docker containers in a docker-compose can be found at https://github.com/wvanderdeijl/oracle-leech/tree/segfault
The console output from the node container with the subscription:
The first line (callback of type 6) is from my javascript callback being invoked with the first message. The second message never arrives as the node process has crashed.
Usually the nodejs container simply crashes with a message 'Aborted', but sometimes it crashes with 'Segmentation Fault'. When that happens the segfault-handler npm module we use gets a chance to write a crash.log with a stack trace. One example of such a log file:
This stack makes me believe the error is in the
CreateMessage
method ofSubscription
which was invoked byProcessNotification
The issue only seems to reproduce when we get a number of messages in a very short timeframe. I am no C++ developer, but looking at Subscription.cpp it seems like the message lives as a property on the subscription itself (subscription->message
). Could it be that this is a concurreny issue and some of the native code is already starting processing the second message and putting that message on the subscription while the processing of the first message has not completed yet.Answer the following questions:
Command to reproduce once the docker-compose (with an oracle EE container and a node container) are up:
This SQL script is just 10 update statements on the table with a commit between each statement: https://github.com/wvanderdeijl/oracle-leech/blob/segfault/update.sql
NodeJS typescript code for the listener that fails can be found at https://github.com/wvanderdeijl/oracle-leech/blob/segfault/index.ts
The text was updated successfully, but these errors were encountered: