Upgrade to Node 16.x #98650

tylersmalley · 2021-04-28T16:13:47Z

Node v16, the next LTS, was released on April 20th. Active LTS starts on October 18th. With this, I feel it would be best to target 7.16.

The primary driver for this upgrade at this time is to support ARM on macOS.

Node.js v16.0.0 will be the first release where we ship prebuilt binaries for Apple Silicon. While we’ll be providing separate tarballs for the Intel (darwin-x64) and ARM (darwin-arm64) architectures the macOS installer (.pkg) will be shipped as a ‘fat’ (multi-architecture) binary.

Related to #83946

elasticmachine · 2021-04-28T16:13:49Z

Pinging @elastic/kibana-operations (Team:Operations)

elasticmachine · 2021-04-28T16:13:49Z

Pinging @elastic/kibana-core (Team:Core)

tylersmalley · 2021-04-28T16:15:36Z

Also, cc @watson since you have done the recent upgrades.

Is everyone OK targeting 7.16? 7.15 would probably ship before the active LTS of v16 starts, which is not recommended for production applications.

The changes here include support for Node 16, and is needed for elastic#98650. Changelogs: https://github.com/hapijs/boom/releases https://github.com/hapijs/hapi/releases https://github.com/hapijs/hoek/releases https://github.com/hapijs/inert/releases https://github.com/hapijs/podium/releases

rudolf · 2021-09-22T11:21:04Z

I think the comments on #77469 (comment) are still relevant. Although I'm in favour of moving towards crashing (and adopting v16) I believe this is a huge stability risk. We first need to get unhandled promise rejections in production to a reasonable number before we can crash as the default.

watson · 2021-09-22T11:38:30Z

I'm surprised if we still have a lot of unhandled promise rejections in our code base. I implemented a trap for this which should crash the process with a good error message in case of an unhandled promise rejection in both development and CI a while back:

kibana/src/setup_node_env/exit_on_warning.js

Lines 41 to 69 in 522a0c4

    
           if (process.noProcessWarnings !== true) { 
        
             process.on('warning', function (warn) { 
        
               if (shouldIgnore(warn)) return; 
        
               if (process.traceProcessWarnings === true) { 
        
                 console.error('Node.js process-warning detected - Terminating process...'); 
        
               } else { 
        
                 console.error('Node.js process-warning detected:'); 
        
                 console.error(); 
        
                 console.error(warn.stack); 
        
                 console.error(); 
        
                 console.error('Terminating process...'); 
        
               } 
        
               process.exit(1); 
        
             }); 
        
             // While the above warning listener would also be called on 
        
             // unhandledRejection warnings, we can give a better error message if we 
        
             // handle them separately: 
        
             process.on('unhandledRejection', function (reason) { 
        
               console.error('Unhandled Promise rejection detected:'); 
        
               console.error(); 
        
               console.error(reason); 
        
               console.error(); 
        
               console.error('Terminating process...'); 
        
               process.exit(1); 
        
             }); 
        
           }

The idea with this was to find all these places so we could fix them before turning this behavior on in production as well.

If we still have a lot of these, it could mean that this code is no longer working as intended. Could it be that part of our CI isn't triggering this code path?

rudolf · 2021-09-22T11:52:44Z

When looking over the previous 4 weeks there's still a huge number of unhandled rejections:

https://overview.elastic-cloud.com/app/dashboards#/view/cf546190-1b9a-11ec-b8ee-754395ac06ab?_g=(filters%3A!()%2CrefreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3Anow-4w%2Cto%3Anow))

Now if these 30% of clusters all only crashed once in a 4 week period that would be acceptable. I'm not exactly sure how to break down the data to get an idea of how often the average cluster would crash. But I think this is probably enough to postpone making it the default.

We don't currently have enough context to make these unhandled rejections actionable e.g. an log message like:

Detected an unhandled Promise rejection. ResponseError: [parent] Data too large, data for [] would be [1032302040/984.4mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1032302040/984.4mb], new bytes reserved: [0/0b], usages [request=16440/16kb, fielddata=4477154/4.2mb, in_flight_requests=0/0b, model_inference=0/0b, accounting=23035828/21.9mb]: circuit_breaking_exception

Could come from all 2k+ places we're making calls to Elasticsearch

rudolf · 2021-09-22T11:56:53Z

If we still have a lot of these, it could mean that this code is no longer working as intended. Could it be that part of our CI isn't triggering this code path?

Unfortunately my guess is that it's neither. Most of Kibana and our tests are written for the happy path, we rarely test network errors or any circuit_breaker_exceptions which Elasticsearch returns.

mshustov · 2021-09-22T12:07:55Z

Detected an unhandled Promise rejection. ResponseError: [parent] Data too large, data for [] would be [1032302040/984.4mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1032302040/984.4mb], new bytes reserved: [0/0b], usages [request=16440/16kb, fielddata=4477154/4.2mb, in_flight_requests=0/0b, model_inference=0/0b, accounting=23035828/21.9mb]: circuit_breaking_exception

Note that this is logged by the Core

kibana/src/core/server/environment/environment_service.ts

Line 58 in 9b410ce

process.on('unhandledRejection', (reason) => {

It's interesting the message doesn't contain an error stack. I guess the root problem is in an ES client limitation that will be fixed in @elastic/[email protected] #108387 is planned for the next week.

From elastic/elasticsearch-js#1542

Drop callback-style API
Maintaining both API styles is not a problem per se, but it makes error handling more convoluted due to async stack traces.
Moving to a full-promise API will solve this issue.

rudolf · 2021-09-22T12:17:27Z

I think we would also have to log the stack trace explicitly (or change ResponseError's .toString()) https://github.com/elastic/elasticsearch-js/blob/acb77fa3d1badc15091bd330e2fcf87246851799/lib/errors.js#L124

watson · 2021-09-22T12:23:00Z

The nice thing about the reason argument to the unhandledRejection callback is that it points to the place where the promise was rejected - not where the rejected error object was created. This makes tracking down the promise much easier.

mshustov · 2021-09-22T13:02:46Z

I think we would also have to log the stack trace explicitly (or change ResponseError's .toString())

toString() doesn't contain the original error message? 😱 @delvedor

delvedor · 2021-09-22T13:15:42Z

I think we would also have to log the stack trace explicitly (or change ResponseError's .toString())

I have no problem changing the toString method, what would you like to see there?

toString() doesn't contain the original error message? 😱 @delvedor

@mshustov argh! that can be fixed! :)

rudolf · 2021-09-22T13:19:28Z

I have no problem changing the toString method, what would you like to see there?

I feel like not including the stack in the toString() is more inline with the rest of Nodejs. So I think we should just log the stack explicitly.

mshustov · 2021-09-22T13:34:59Z

I feel like not including the stack in the toString() is more inline with the rest of Nodejs. So I think we should just log the stack explicitly.

@rudolf you are absolutely right, the stack is logged by the console automatically. Our logging system logs it as well when receives Error, but here we wrap the object in a string

kibana/src/core/server/environment/environment_service.ts

Line 59 in 9b410ce

this.log.warn(`Detected an unhandled Promise rejection.\n${reason}`);

we should do something like:

this.log.warn(`Detected an unhandled Promise rejection.\n${reason}\n${reason.stack}`);

jbudz · 2021-09-22T14:06:01Z

How does this sound?

Not crashing in production by default

echo --unhandled-rejections=warn >> config/node.options

Fixing CI for tested code
Letting development crash on untested code
Documenting a workaround for developers unnecessarily impacted:
```
export NODE_OPTIONS=--unhandled-rejections=warn
```
If we can find a way, some form of static code analysis and another meta issue

Upgrading in 7.16.x when Node 14 LTS ends (2023-04-30) will delay ARM macOS support for development and production, so I'm hoping to avoid that if we can. It'll put more pressure on unresolved cases too.

watson · 2021-09-23T08:12:48Z

Upgrading in 7.16.x when Node 14 LTS ends (2023-04-30) will delay ARM macOS support for development and production, so I'm hoping to avoid that if we can. It'll put more pressure on unresolved cases too.

As we have learned the hard way from 6.8, we can't upgrade the previous-major branch to a new major Node.js version without so much work that it's not worth it. So we will/have to upgrade 7.16 to Node.js 16 before v8.0.0 is released - preferable as soon as it's LTS (which is in about a month). From a security perspective it's not advisable to stay on Node.js 14 🚨

rudolf · 2021-10-01T11:01:31Z

The nice thing about the reason argument to the unhandledRejection callback is that it points to the place where the promise was rejected - not where the rejected error object was created. This makes tracking down the promise much easier.

Unfortunately this doesn't seem to be the case:

given a file test.js:

const err = new Error();

process.on('unhandledRejection', (reason, promise) => {
  console.log(reason);
  console.log(promise);
});

new Promise((res, rej) => rej(err));

node test.js (using node v16.10.0) produces the following log:

Error
    at Object.<anonymous> (/Users/rudolf/dev/kibana/test.js:1:13)
    at Module._compile (node:internal/modules/cjs/loader:1101:14)
    ...
Promise {
  <rejected> Error
      at Object.<anonymous> (/Users/rudolf/dev/kibana/test.js:1:13)
      at Module._compile (node:internal/modules/cjs/loader:1101:14)
      ...
}

So you can identify where the error originates from but not where the unhandled promise was constructed 😢

https://github.com/grantila/trace-unhandled aims to solve this problem by instrumenting the promise constructor, but doesn't recommend running it in production.

watson · 2021-10-06T08:48:21Z

@rudolf You're right. If would have sworn it was like I said - I'm not sure how I got that idea. But I agree that the trace-unhandled package shouldn't be run in production. I think it will severely decrease performance as V8 will not be able to optimize performance of these overloaded promises as they can with the native - at least that's my gut feeling based on how much optimization that V8 currently does for promises.

kobelb · 2021-10-13T21:28:05Z

Did we decide to do what @jbudz suggested in #98650 (comment) and no longer crash on unhandled promise rejections for 7.16 because we continue to see unhandled project rejections in production?

If so, is there anything else that we need to worry about?

rudolf · 2021-10-18T09:23:37Z

@kobelb yes. No further concerns from my side 🥳

tylersmalley added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team labels Apr 28, 2021

This was referenced Aug 24, 2021

Bump node-sass to 6.0.1 #109930

Merged

Bump lmdb-store to 1.6.6 #109939

Merged

tylersmalley assigned jbudz Aug 30, 2021

This was referenced Aug 30, 2021

Bump re2 to 1.16.0 #110489

Merged

Bump node to 16.11.1 #110684

Merged

jbudz mentioned this issue Sep 21, 2021

Jest test suites with unhandled promise rejections #112699

Closed

23 tasks

This was referenced Sep 21, 2021

Bump hapi related dependencies #112723

Merged

Bump lmdb-store to 1.6.8 #112743

Merged

jbudz mentioned this issue Sep 23, 2021

[reporting] Remove async from ContentStream._final implementation #112897

Closed

mshustov mentioned this issue Sep 23, 2021

Include tracing information in the Kibana logs #112973

Merged

2 tasks

rudolf mentioned this issue Oct 1, 2021

Log Unhandled rejection stack #113614

Merged

9 tasks

tylersmalley added 1 and removed 1 labels Oct 11, 2021

YulNaumenko mentioned this issue Oct 12, 2021

[Cases] Fix unit and functional tests which marked as skipped or commented due to migration to Node 16.x #114684

Closed

gmmorris mentioned this issue Oct 13, 2021

[Alerting] Fix unit tests which marked as skipped or commented due to migration to Node 16.x #114776

Closed

tylersmalley added the EnableJiraSync label Oct 14, 2021

jbudz closed this as completed in #110684 Oct 16, 2021

sabarasaba mentioned this issue Oct 18, 2021

[Index Management] Unskip and fix tests #115362

Merged

jbudz mentioned this issue Feb 17, 2022

Failing tests: src/plugins/dashboard/public/application/embeddable #126000

Closed

rudolf mentioned this issue Jun 21, 2022

Suggestion: Crash Kibana on unhandled promise rejection #77469

Open

lukeelmers mentioned this issue Jun 22, 2022

Upgrade to Node 18.x #134930

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to Node 16.x #98650

Upgrade to Node 16.x #98650

tylersmalley commented Apr 28, 2021

elasticmachine commented Apr 28, 2021

elasticmachine commented Apr 28, 2021

tylersmalley commented Apr 28, 2021

rudolf commented Sep 22, 2021

watson commented Sep 22, 2021

rudolf commented Sep 22, 2021 •

edited

Loading

rudolf commented Sep 22, 2021

mshustov commented Sep 22, 2021 •

edited

Loading

rudolf commented Sep 22, 2021

watson commented Sep 22, 2021

mshustov commented Sep 22, 2021

delvedor commented Sep 22, 2021

rudolf commented Sep 22, 2021

mshustov commented Sep 22, 2021

jbudz commented Sep 22, 2021 •

edited

Loading

watson commented Sep 23, 2021

rudolf commented Oct 1, 2021

watson commented Oct 6, 2021

kobelb commented Oct 13, 2021

rudolf commented Oct 18, 2021

Upgrade to Node 16.x #98650

Upgrade to Node 16.x #98650

Comments

tylersmalley commented Apr 28, 2021

elasticmachine commented Apr 28, 2021

elasticmachine commented Apr 28, 2021

tylersmalley commented Apr 28, 2021

rudolf commented Sep 22, 2021

watson commented Sep 22, 2021

rudolf commented Sep 22, 2021 • edited Loading

rudolf commented Sep 22, 2021

mshustov commented Sep 22, 2021 • edited Loading

rudolf commented Sep 22, 2021

watson commented Sep 22, 2021

mshustov commented Sep 22, 2021

delvedor commented Sep 22, 2021

rudolf commented Sep 22, 2021

mshustov commented Sep 22, 2021

jbudz commented Sep 22, 2021 • edited Loading

watson commented Sep 23, 2021

rudolf commented Oct 1, 2021

watson commented Oct 6, 2021

kobelb commented Oct 13, 2021

rudolf commented Oct 18, 2021

rudolf commented Sep 22, 2021 •

edited

Loading

mshustov commented Sep 22, 2021 •

edited

Loading

jbudz commented Sep 22, 2021 •

edited

Loading