Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node process does not end and but CPU utilization at 100% #2090

Open
nachogiljaldo opened this issue Jun 27, 2023 · 25 comments
Open

Node process does not end and but CPU utilization at 100% #2090

nachogiljaldo opened this issue Jun 27, 2023 · 25 comments

Comments

@nachogiljaldo
Copy link

nachogiljaldo commented Jun 27, 2023

I have been investigating a problem that is causing trouble on our test suite and I managed to trace it to mysql2. This script allows reproducing it:

Reproduction instructions
// create database test;
// create table test(id int(10) unsigned not null auto_increment, primary key (`id`));
// insert into test(id) values(1);
const mysql = require('mysql2/promise');
async function main() {
    console.log("===========================>", "foo");
    const connection = await mysql.createConnection({
        host: 'localhost',
        user: 'root',
        database: 'test',
        password: 'rootpass',
        connectionLimit: 1234,
    });
    for (let ix = 0; ix < 157; ix++) {
        await basicHanging3(connection);
    }
    connection.destroy();
    console.log('=========================>', 'Done');
}


async function basicHanging3(connection) {
    let query = 'SELECT ';
    let limit = 1000;
    for (let i = 0; i < limit; i++) {
      query += `t.id AS t_id_${i},`;
    }
    query += `t.id AS t_id_${limit}`;
    query += ' FROM test as t';
    const [rows] = await connection.query(
      query,
    );
    return rows;
}

main();
{
  "name": "reproducer",
  "version": "1.0.0",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "mysql2": "^3.4.2",
    "yarn": "^1.22.19"
  }
}

It happens with multiple node versions (v16.14.0, v18.16.0 are some of those I tested). For node 20.3.1 it happens, but you need to raise the 157 iterations.

In our tests we run thousands of DB queries which seems to be reaching this threshold and causing the process to "hung" (it eventually exits, but it takes a few minutes). As you can imagine this is a huge pain on CI because the "exit" takes as much as the test suite itself.

2 other interesting things to note:

  • the CPU for the node process stays at 100% even though nothing is supposedly being done (we checked to see if it is gc using --trace-gc but no traces are output after "Done")
  • if no data is returned (you can delete the insert statement on index.js) it seems to be ok (at least I raised the number of iterations to 10k and it finished just fine)
  • if you have less columns then you require more iterations to achieve the effect

Netstat shows the DB connection is closed while the process is waiting:

$ sudo netstat -alpW | grep -i <my_pid> 

shows no output.

@nachogiljaldo
Copy link
Author

Sorry, I just noticed #2080 and I believe they could be related, my reproduction seems to work correctly with 2.3.0 (which is the one that @rogerho1224 was using and worked for them).

@sidorares
Copy link
Owner

thanks @nachogiljaldo I can reproduce the issue

@nachogiljaldo
Copy link
Author

No worries @sidorares if it helps, it might be intimately related to #2080 since I can confirm that using 2.3.0 fixed the issue for me.

@sidorares
Copy link
Owner

What is strange is that process hangs with 100% cpu just before exit. I suspect some combination of GC and large number of objects with large number of fields. Profiler does not pick that and shows a very small number of ticks in the main code and in the GC, the ticks during the "frozen" part are in "unaccounted" section

The difference with 2.3.0 is highlighted in #2055, I'll try to see if changing that part only helps.

@sidorares
Copy link
Owner

@nachogiljaldo what is your os / architecture? Not sure if that matters, I'm seeing the issue on m1 mac

@nachogiljaldo
Copy link
Author

I'm not sure if it's GC related, (not a node internals expert,but) I ran my reproduction with --trace-gc and the gc logs stop right with the "Done" message.

@nachogiljaldo
Copy link
Author

Mine is Ubuntu x86_64. But a colleague of mine has the same issue with similar effects with M1.

@sidorares
Copy link
Owner

sidorares commented Jun 27, 2023

typically GC related performance problems are visible in the profiler log, maybe we see a bug in GC
The script works fine even with large number of repeats and only shows problem just before the exit

@sidorares
Copy link
Owner

@nachogiljaldo I reduced script to a version with no external dependencies, could you try to run first file from https://gist.github.com/sidorares/128160e6b3dea1da3ad45cd672651d2d ? Still not sure what's going on, but switching to setting fields in the constructor hides the issue ( repro2.js )

cc @testn

@sidorares
Copy link
Owner

Actually it does finish, eventually. First script from gist finishes in about 5 minutes time
I think we are hitting some V8 edge case where performance is not great ( o(n^2) for the number of fields? )

@sidorares
Copy link
Owner

I think I'll try to file this to the core node

@sidorares
Copy link
Owner

nodejs/node#48581

@sidorares
Copy link
Owner

sidorares commented Jun 28, 2023

since there is reliable repro without dependency on mysql2 I'll try to "fix" the issue by converting parser to add fields in the constructor

Actually, I think the better solution would be to cap the number of fields in dynamically compiled parser, if the number is over the limit use "interpreting" parser instead ( e.g in each row iterate over every column and read the code based on the column type ). We used to have "interpreting" parser a while ago -

var row = Packets.TextRow.fromPacket(packet);
// TODO: here we'll have dynamically pre-compiled and cached row parser
if (true) // TODO: think of API to store raw copulns array (most probably connection options flags)
{
var r = {};
for (var i = 0; i < row.columns.length; ++i)
{
var name = this.fields[i].name;
r[name] = row.columns[i];
}

@sebdec
Copy link

sebdec commented Jun 28, 2023

Not sure if this related.
We reproduced the same kind of problem few time ago when we migrated from another mysql driver.
We noticed an important increase of the cpu usage on the nodejs server and more slow queries.
What was strange is when scaling horizontally or vertically the server, not the db, the slow queries decreased.

We thought that the problem came from our side since few issues were open on the repository and one of the purpose of this lib was a performance gain. But we finally reverted the change of the driver and the problem is gone.

Indeed maybe it's when we manage many objects with many relations.

We use RDS mysql with ECS on fargate, v16.20 node version, [email protected]
If necessary i can give other information.

@sidorares
Copy link
Owner

@sebdec do you know roughly the number of columns and rows you are reading in an average query?

@sidorares
Copy link
Owner

@sebdec @nachogiljaldo @rogerho1224 could you try with --noopt or --max-opt=2 flags passed to node

@nachogiljaldo
Copy link
Author

Just tested with --noopt and it finished successfully almost immediately.

@nachogiljaldo
Copy link
Author

Also @sidorares I tested the first file as you suggested (repro1) and it reproduces.

@stefkkkk
Copy link

stefkkkk commented Dec 14, 2023

we are currently using this module and our session inits into apps spent a lot of cpu, sometimes >1 core, but after auth is everything fine.

We already done cpu profiling and seeing that most of ticks it's /usr/local/bin/node and mysql2 library

How to deal with that broken but important module? Any hotfix? Any patches? Any workarounds?

@sidorares
Copy link
Owner

@stefkkkk some possible options: 1) --noopt or --max-opt=2 flags 2) check if you have responses with more rows than you actually need ( SELECT * ) and reduce number of fields to what is actually used

What is your node version? The root cause is in node optimiser so hopefully its fixed eventually on a node side

@stefkkkk
Copy link

stefkkkk commented Dec 15, 2023

2023-12-15_10-53
The fix for me was to upgrade node js version from 20.10 to 21.4, and inadequate cpu usage gone
Probably, it's due to updates of the V8 JavaScript engine to 11.8 in 21 version of node js
@sidorares thanks for help

@nachogiljaldo
Copy link
Author

nachogiljaldo commented Apr 24, 2024

Mmm... in my environment (node20, linux) this was fixed on 3.9.4 (i.e. I was able to upgrade from the previous version I was running 2.7.0).

I tried to upgrade to 3.9.7 today and the problem started to happen again, traced it to 3.9.5 so there's something in that PR that broke the issue that had been inadvertently fixed?

EDIT: tried the noopt and max-opts alternatives but that's not viable as it makes our build absurdly slow (and some tests even time out).

@sidorares
Copy link
Owner

@nachogiljaldo I'm pretty sure inheriting result from null fixes/hides the problem, unfortunately this comes with it's own issues ( #2585 )

I'm still keen to review the way resulting object is constructed - see #2055

@sidorares
Copy link
Owner

Also a possibility could be a configurable / pluggable result construction, e.i some users might prefer {} as base, some Object.create(null) and some new Map()

@nachogiljaldo
Copy link
Author

Also a possibility could be a configurable / pluggable result construction, e.i some users might prefer {} as base, some Object.create(null) and some new Map()

That seems potentially as a good idea, but not sure how feasible it would be in practice for cases when one doesn't connect directly to mysql but rather use an ORM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants