Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.01 extensive RSS memory usage #1173

Closed
hategen opened this issue Nov 5, 2019 · 21 comments
Closed

v4.01 extensive RSS memory usage #1173

hategen opened this issue Nov 5, 2019 · 21 comments
Labels

Comments

@hategen
Copy link

hategen commented Nov 5, 2019

RSS memory grows intensively for node-oracledb 4.01
Hi, I have faced extensive memory consumption after migration from version 3.1.2.

  1. This seems to be a memory allocation issue

  2. No errors. But constant and persistent RSS memory growth

  3. The runnable example can be found here https://github.com/hategen/node-oracledb-issue
    that includes a simple script and a bunch of docker files to set-up the whole environment.
    To simulate the issue follow the instructions in Readme.md

    The issue is reproducible on example environment reproduces under the link and 'prod' env running on RHEL
    please see memory consumption of node container after several runs of cannon npm script

image

JS heap size remain small and modest
image

Top also shows quite a bit of memory used
image

The same charts after some time and several more runs of cannon script
image

The memory consumption seem to depend strongly on oracledb.fetchArraySize
On prod, we solved the issue by downgrading oracledb package version to 3.1.2 and using jemalloc. That actually made the memory consumption stable and not grow infinitely.

With 4.0.1 both with and without jemalloc the situation remained the same.
Any ideas how to deal with this RSS memory?

  1. Run node and show the output of:

On docker

> process.platform
'linux'
> process.version
'v10.16.3'
> process.arch
'x64'
> require('oracledb').versionString
'4.0.1'
>

On RHEL

> process.platform
'linux'
> process.version
'v10.16.3'
> process.arch
'x64'
> require('oracledb').versionString
'4.0.1'
>
  1. What is your Oracle Database version?
    Docker environment runs under
    Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
    PL/SQL Release 11.2.0.2.0 - Production
    "CORE 11.2.0.2.0 Production"
    TNS for Linux: Version 11.2.0.2.0 - Production
    NLSRTL Version 11.2.0.2.0 - Production

Oracle version from RHEL environment:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
"CORE 11.2.0.4.0 Production"
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

@hategen hategen added the bug label Nov 5, 2019
@cjbj
Copy link
Member

cjbj commented Nov 5, 2019

@anthony-tuininga has a fix for one node-oracledb leak that will be landing soon. It is when getting Oracle Database named type information, such as with getDbObjectClass(). However there are various Node.js N-API leaks below node-oracledb that are open in various Node.js branches e.g. nodejs/node#26667 and nodejs/node#28988

@hategen
Copy link
Author

hategen commented Nov 6, 2019

@cjbj not sure if the one with getDbObjectClass() is really related, at least directly I see no usage of this function across the codebase. Now I am not sure if I can provide any additional valuable information on the issue I face. Anything else I can submit to make it easier to nail down?

@cjbj
Copy link
Member

cjbj commented Nov 7, 2019

@hategen, here is how you can help:

  • can you simplify the test case so it doesn't need dependencies like lodash - or express?! This would make it easier to examine, and eliminate them as causes.

  • Can you investigate your suspicion about oracledb.fetchArraySize by trying various values and/or number of rows?

  • try Node 12 or 13.

  • Did you run tests for longer than the 10 minutes shown? To confirm your graphs: the zero growth sections were when there was no load?

  • Was jemalloc necessary for 3.1? (Can you share how you configure this, perhaps in another issue, so other users can see how to configure it?)

  • You might want to get involved in the Node.js issues I referenced, to help progress them.

@hategen
Copy link
Author

hategen commented Nov 7, 2019

@cjbj Sure I will update the code and get back here
As for oracledb.fetchArraySize I had quite a large value (3000) set for some reason and after upgrade to 4.01 the memory consumption just exploded reducing the value to 200 actually did not fix the issue but the symptoms got much milder. So in the case of oracledb.fetchArraySize = 3000 one iteration of tests caused an instant jump of rss from 200Mb to 1GB and with 200 that was only 300Mb per iteration. Something like that. I will perform more detailed testing and post here the results.

With node 12 + node-oracle 4.01 the situation is the same. RSS grows rapidly.

As for jemalloc and 3.12 my feelings are mixed. I faced quite a high memory consumption on the production environment and that could have been ~ 1gb per node worker, but the overall memory footprint remained quite stable and there was no grow (at least noticeable after reaching ~1GB).

After installation of jemalloc RSS oscillates in 200~300 MB boundaries with some peaks under load.

The whole configuration of jemaloc is quite straightforward. it has to be installed and
LD_PRELOAD environment variable for the node must be set to the path to a corresponding so file.
In context docker-compose that will look like

  
environment:
          LD_PRELOAD: /usr/lib/x86_64-linux-gnu/libjemalloc.so.1

Flat parts of the graph represent no load.
This particular test was not that long. But there is an environment with long-running process that gets stressed from time to time. And there I was able to achieve consumption around 8gb during 9-10 hours of uptime

Thanks for help I will get back with more detailed data

@hategen
Copy link
Author

hategen commented Nov 8, 2019

@cjbj So, I have updated the project so there are the changes:

  • removed lodash and express dependencies
  • added rss size diff logging

rss size diff actually seems rather odd for my taste:
log fragment attached below first number is rss size the second one is delta with the previous state
When rss grows almost all the delta values are dividable by 270336

270757888 0
270757888 0
270757888 0
270757888 0
270757888 0
271028224 270336
271568896 540672
272920576 1351680
273461248 540672
273461248 0
273461248 0
273141760 -319488
273141760 0
273141760 0
273141760 0
273141760 0
273141760 0
274223104 1081344
274223104 0
274223104 0
274223104 0
274223104 0
274763776 540672
274763776 0
274763776 0
274763776 0
276385792 1622016
276385792 0
276385792 0
276385792 0
277196800 811008
277196800 0
277196800 0
277196800 0
277196800 0
277196800 0
277196800 0
277196800 0
277196800 0
278007808 811008
278818816 811008
279089152 270336

As for fetchArraySize seems that impacts memory consumption strongly.
below is the example of memory ramp for fetchArraySize =1000 and 10 concurrent conntctions storming with requests for 10 seconds. Total request count is around 6k.
Starting memory consumption ~30 mb and 1.3Gb after some idle time
image

Reducing fetchArraySize to 100 actually makes the whole thing better
testing parameters are the same
image

Flat areas on graphs represent no load

@anthony-tuininga
Copy link
Member

Yes, fetchArraySize impacts memory use considerably. That value specifies how many rows of data you wish to have fetched at a single time from the database. Memory will be allocated to support having that number of rows in memory at a single time. There is always a trade-off. Increasing that value increases the amount of memory while reducing the number of round-trips to the database. The default value of 100 is generally adequate, but you are free to tune that if you have sufficient memory and wish to increase performance. Note that there is a decreasing performance improvement as fetchArraySize is increased!

@anthony-tuininga
Copy link
Member

I was able to replicate your issue and have an idea of the source of the issue -- namely the querystream object is the issue. If you simply retrieve the rows directly or call resultSet.getRows() directly instead of using querystream the issue goes away. We will continue to look into this further next week.

@anthony-tuininga
Copy link
Member

anthony-tuininga commented Nov 12, 2019

Good news! I was able to track down the source of the issue. The issue is in this code:

diff --git a/oracle.js b/oracle.js
index 558bcfd..f7cd83a 100644
--- a/oracle.js
+++ b/oracle.js
@@ -40,7 +40,7 @@ async function query(params) {
         return response.push(data);
       });
 
-      stream.on("end", function() {
+      stream.on("close", function() {
         //  console.log(`TOTAL ROWS: ${response.length}`);
         return resolve(response);
       });

You were resolving your promise in the end event, which will subsequently close the connection. At that point, the underlying result set of the query stream can no longer be closed because the connection that created it has been closed, which in turn results in that result set not being cleaned up until garbage collection gets around to it -- and since the amount of memory known to Node.js is minimal, that will be a significant period of time.

During this investigation, we discovered that the error (NJS-018: invalid ResultSet) was not being reported and the documentation (and one example) doesn't talk about the difference between the end and close events, and encourages closing the connection in the end event. We will correct these issues for the next release. In the meantime, however, you have your solution!

FYI, the end event is fired when all rows have been fetched from the result set and the close event is fired when the result set has been closed. The connection should not be closed during the end event but only during the close event. The documentation will be updated to include this information.

@hategen
Copy link
Author

hategen commented Nov 13, 2019

@anthony-tuininga Wow, thank you for your help. I will get back to confirm the issue resolution.

@hategen
Copy link
Author

hategen commented Nov 13, 2019

@anthony-tuininga @cjbj
Hi again, confirming that the fix worked out like a breeze! I have performed several tests and RSS memory does not seem to be increasing.

Thank you for help!

@anthony-tuininga
Copy link
Member

@hategen, glad to hear it worked for you, too! We'll leave this open until the new release is made that contains the fixes I mentioned.

@cjbj
Copy link
Member

cjbj commented Nov 13, 2019

@hategen With the change in your code, how is memory use now in 4.0 vs 3.1.2? Is jemalloc still something you will/do use?

@hategen
Copy link
Author

hategen commented Nov 14, 2019

@cjbj

So here are some more test results,
Each peak represent stress load generated by 10 connections for 100 seconds ~70-80 k requests in total for each test run/peak
Areas marked in yellow represent idle time.
node version used 10.16.3

node-oracle v4.01, no jemalloc .
I see the slight climb up but for sure it is not that critical as it was before
image

The same configuration with jemalloc
image

Node- oracle v3.1.2 with jemalloc
Here it seems that the memory after certain point remains at the same watermark
image

Node-oracle v3.1.2 without jemalloc
The same as previous
image

After more testing, it seems there is still something wrong with RSS in v 4.01 that might be related to other open issues. The RSS growth for the longest test on version 4.01 is about 80Mb per ~420k queries.

Will check the same scenario with node v12.xx

@cjbj
Copy link
Member

cjbj commented Nov 15, 2019

Keep us posted on what you see with Node 12 (or 13).

It doesn't look like jemalloc helps.

@hategen
Copy link
Author

hategen commented Nov 15, 2019

And finally node v12.13 with jemalloc and node-oracledb 4.01 seem to be pretty stable
image

The same without jemalloc
image

@cjbj
Copy link
Member

cjbj commented Nov 15, 2019

Yay

@hategen
Copy link
Author

hategen commented Nov 19, 2019

@cjbj, @anthony-tuininga hi again, after several days of watching memory consumption on RHEL environment the memory consumption looks so much better now.
I think this issue is resolved. Should I close this one? Or it is better to wait for the documentation updates and related changes?

Thank you all again for the great help!

@cjbj
Copy link
Member

cjbj commented Nov 19, 2019

Thanks for the update. We can keep this open until the doc & example changes have landed.

Let us know what you're using node-oracledb for. It's always interesting to hear.

@hategen
Copy link
Author

hategen commented Nov 20, 2019

So we actually build a web-app for managing land mobile radio infrastructure and our data-warehouse is an instance of Oracle DB. So our node backend interacts with the Oracle database in a quite intensive manner.

@cjbj
Copy link
Member

cjbj commented Nov 20, 2019

@hategen that's really cool!

@cjbj
Copy link
Member

cjbj commented Nov 25, 2019

@hategen Node-oracledb 4.1 is out with the corrected doc & examples. Thank you for all the effort you put into resolving this "subtle" problem.

@cjbj cjbj closed this as completed Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants