Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault immediately on require inside Worker threads on Linux #452

Closed
mikkopiu opened this issue May 11, 2023 · 6 comments
Closed
Labels
bug This issue is a bug.

Comments

@mikkopiu
Copy link

mikkopiu commented May 11, 2023

Describe the bug

When using Node.js Worker threads, Segmentation fault (core dumped)/SIGSEGV is triggered when aws-crt is imported/loaded, or more specifically: when the native binary is launched.

For me, this first appeared after upgrading a project to AWS SDK JS v3 and a test case run via ava (using Worker threads) started segfaulting immediately when a module that invoked new FirehoseClient({}) was imported (which in turn, imports/uses aws-crt).

Expected Behavior

Expected aws-crt to either throw the exception implemented in #290 or just work when using Worker threads (based on #451 but I might be misunderstanding).

Ideally, I'd be able to run tests using aws-crt concurrently with ava (using Worker threads).

Current Behavior

Immediate Segmentation fault (core dumped) upon require('aws-crt') (or equivalent).

As I'm not too familiar with debugging C(++), my debugging attempts probably contain a lot of red herrings but here are some of my attempts/findings so far:

  1. Using llnode (lldb plugin), the backtrace of the minimal repro at least looks weird:

    $ llnode /usr/bin/node -c /tmp/core.123
    (llnode) v8 bt
    * thread #1: tid = 487, 0x00007f4b814c1450, name = 'node', stop reason = signal SIGSEGV
    * frame #0: 0x00007f4b814c1450
    frame #1: 0x00007f4b8d256df0 libc.so.6`__restore_rt
    frame #2: 0x00007f4b814c1450
    frame #3: 0x00007f4b8d256df0 libc.so.6`__restore_rt
    ... Repeated >5600 times
    frame #5691: 0x00007f4b8d256df0 libc.so.6`__restore_rt
    frame #5692: 0x00007f4b815b7510
    frame #5693: 0x00007f4b8d29e931 libc.so.6`__GI___nptl_deallocate_tsd + 161
    frame #5694: 0x00007f4b8d2a16d6 libc.so.6`start_thread + 422
    frame #5695: 0x00007f4b8d241450 libc.so.6`__clone3 + 48
  2. Trying to run the binary directly with lldb, crashes with SIGSEGV: address access protected:

    $ chmod +x dist/bin/linux-x64/aws-crt-nodejs.node
    $ lldb dist/bin/linux-x64/aws-crt-nodejs.node
    (lldb) run
    Process 4291 launched: '/aws-crt-nodejs/dist/bin/linux-x64/aws-crt-nodejs.node' (x86_64)
    Process 4291 stopped
    * thread #1, name = 'aws-crt-nodejs.', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7a8a000)
        frame #0: 0x00007ffff7a8a000 aws-crt-nodejs.node
    ->  0x7ffff7a8a000: jg     0x7ffff7a8a047
    (lldb) memory read 0x7ffff7a8a000
    0x7ffff7a8a000: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00  .ELF............
    0x7ffff7a8a010: 03 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00  ..>.............

Reproduction Steps

I've been trying to identify the meaningful variables but at least the most reproducible example is (based on #286 (comment)):

  1. Start an EC2 instance with AMI al2023-ami-2023.0.20230503.0-kernel-6.1-x86_64 (latest Amazon Linux 2023 HVM at the time of writing)

    • or equivalent Linux host, the exact flavour and kernel version don't seem to matter too much (or I might just be really unlucky)
  2. On the host, install Node.js: yum install nodejs (from the built-in repos, it's 18.12.1 at the time of writing)

  3. Core dumps: ulimit -c unlimited

  4. Create repro files and run:

    cd $(mktemp -d)
    echo '{"name": "repro","type": "module","dependencies": {"aws-crt": "1.15.16"}}' > package.json
    npm install
    echo 'import { Worker } from "worker_threads"; const worker = new Worker("./reproWorker.js");' > index.js
    echo 'import "aws-crt";' > reproWorker.js
    node index.js
    # -> Segmentation fault (core dumped)
    • In my attempts, reproduces also with all the versions listed below, and if I built aws-crt from source and required aws-crt-nodejs/dist/index.js (or the linux-x64 binary directly in CommonJS)

Possible Solution

No response

Additional Information/Context

If I'm not mistaken about aws-crt being supposed to work under Worker threads, I guess this is actually an upstream Node.js issue but as mentioned, I'm not really familiar enough with C(++) stuff and Worker threads so I haven't been able to confirm.

Here's all the setups I've been able to reproduce this with:

Versions of aws-crt:

  • 1.15.9
  • 1.15.16
  • Local version built from source at commit aafdfee

Node.js:

  • 16.19.1
  • 18.16.0
  • 18.12.1

Operating systems:

  • First saw this in a Docker container based on amazonlinux:2, running on an Ubuntu-based host
    • Linux hostname 5.15.0-1033-aws #37~20.04.1-Ubuntu SMP Fri Mar 17 11:39:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Reproduced in a Debian Bullseye container on an Alpine Linux -based host
    • Linux hostname 5.15.82-0-virt #1-Alpine SMP Mon, 12 Dec 2022 09:15:17 +0000 x86_64 x86_64 x86_64 GNU/Linux
  • Reproduced in an Amazon Linux 2023 container on a Fedora-based host
    • Linux hostname 6.2.13-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 27 01:33:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • and a matrix of all of the above (images/hosts/kernels)
  • Reproduced in an Amazon Linux 2023 VM, to rule out the effects of Docker
    • Linux hostname 6.1.25-37.47.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Apr 24 23:20:16 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Does NOT reproduce in macOS 13.3.1 (Ventura); Intel and M1 machines
    • and weirdly, the original ava setup works with Worker threads enabled if I just use the darwin-x64 binary on Linux (cp -a node_modules/aws-crt/dist/bin/darwin-x64/aws-crt-nodejs.node node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node)

Memory:

  • Tested on Docker containers with 4 & 8 GB memory limits
  • Tested on VMs with 16 and 32 GB of RAM

Other:

  • Not sure of all of the glibc etc. versions for all the cases (especially as I'm unfamiliar with C(++) tooling and what exactly would be relevant), but at least for the minimal repro case below, the version is 2.34 (from Amazon Linux 2023 repos)

aws-crt-nodejs version used

1.15.16

nodejs version used

18.12.1

Operating System and version

Amazon Linux 2023, AMI: al2023-ami-2023.0.20230503.0-kernel-6.1-x86_64, uname -a: Linux hostname 6.1.25-37.47.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Apr 24 23:20:16 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

@mikkopiu mikkopiu added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 11, 2023
@bretambrose
Copy link
Contributor

This is the thread local storage crash issue mentioned near the bottom of this: aws/aws-iot-device-sdk-js-v2#360

Current plan is to push through the linked s2n patches and switch from aws-lc to openssl's libcrypto which doesn't have a thread local storage destruction problem. I don't have an ETA atm.

@jmklix jmklix removed the needs-triage This issue or PR still needs to be triaged. label May 11, 2023
@klindklind
Copy link

We have exactly the same problem currently which is blocking us from upgrading from aws-sdk v2 -> v3. Hopefully we get a fix soon 🙏

@ruvenzx
Copy link

ruvenzx commented May 22, 2023

Same issue reproduced, when running node 17.7 on ARM64 using aws-sdk/client-cognito-identity-provider package which indeed calls aws-crt and causes a SIGSEGV.
(I specifically run this on Docker alpine, error: EXITED(139)).

@xer0x
Copy link

xer0x commented May 30, 2023

+1 🙏 this has been very vexing for our team! Thank you for investigating! This has broken our AWS-CDK build process.

@bretambrose
Copy link
Contributor

https://github.com/awslabs/aws-crt-nodejs/releases/tag/v1.15.19 should fix this crash.

We will update the v2 IOT SDK for Javascript shortly. For other dependency updates, please contact the maintainer of the package directly.

@mikkopiu
Copy link
Author

mikkopiu commented Jul 7, 2023

Can confirm that it fixed the crash in both my minimal repro & our test setup with aws-sdk-js-v3, and for aws-sdk-js-v3 packages, at least on version 3.348.0, their nested dependency ranges allow in-place upgrades to v1.15.19 🎉

Thank you 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

6 participants