Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js crash at startup on arm architecture #132

Closed
rgranger opened this issue Jan 6, 2021 · 20 comments
Closed

Node.js crash at startup on arm architecture #132

rgranger opened this issue Jan 6, 2021 · 20 comments

Comments

@rgranger
Copy link

rgranger commented Jan 6, 2021

I have a CI on gitlab that builds applications for me using npm install. The CI runs on a x86 architecture but the builds worked so far on an arm based device.

Since these commits :

The build no longer work on an arm based device, and the application crash with "Illegal Instructions (core dumped)".

So :

  • why was it working before, and why is it not working anymore ?
  • what should I do in order to "fetch" the correct binaries from a x86 arch CI ?

  • I have tried running npm install --target_arch=arm but it doesn't appear to work
  • I have tried running the CI on an arm architecture on docker, but didn't succeed

Any help on how to setup a Gitlab CI in order to make it work again would be appreciated.

@rgranger
Copy link
Author

rgranger commented Jan 6, 2021

I have tried running the CI on an arm architecture on docker, but didn't succeed

I succeeded : The CI is now using an arm architecture so the command npm ci should fetch the correct binaries from this module. But now, it fails on other modules that don't provide them explicitly for the arm arch :

npm WARN prepare removing existing node_modules/ before installation
Unexpected platform or architecture: linux/arm
It seems there is no binary available for your platform/architecture
Try to install PhantomJS globally
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: `node install.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2021-01-06T16_01_49_516Z-debug.log

What I don't understand is that everything was working fine before these 2 commits :

Maybe they should trigger a major version instead of a patch?

@lpinca
Copy link
Member

lpinca commented Jan 7, 2021

Those commits only add prebuilt binaries for ARM. If your CI runs on x86 they are not used so I don't see any breaking change.

why was it working before, and why is it not working anymore ?

I don't know, there were no prebuilt binaries for ARM so the binary was probably created at install time on ARM devices.

@lpinca
Copy link
Member

lpinca commented Jan 7, 2021

What is the platform/device that gives you the "Illegal Instructions (core dumped)" error?

@rgranger
Copy link
Author

rgranger commented Jan 7, 2021

The device is running on a NXP i.MX Release Distro and the CPU architecture is ARMv7.

The build generated by the Gitlab CI (running on a x86 cpu architecture) was running fine on the device with

But is crashing now with

@lpinca
Copy link
Member

lpinca commented Jan 7, 2021

@lpinca
Copy link
Member

lpinca commented Jan 7, 2021

Also what does the build generated by the Gitlab CI do? Does it generate a bundle, binary or something for x86 that is then run on ARM?

@rgranger
Copy link
Author

rgranger commented Jan 7, 2021

Can you try to install [email protected] and [email protected] on that device and run https://github.com/websockets/bufferutil#example and https://github.com/websockets/utf-8-validate#example?

Unfortunately, no : npm is not installed on the device (the devices are mostly used without an internet connection).

Also what does the build generated by the Gitlab CI do? Does it generate a bundle, binary or something for x86 that is then run on ARM?

The Gitlab CI does the following :

  • npm ci which install all the dependencies according to the package-lock.json
  • npm run deploy

The deploy script generates a server.app.js file in a dist folder using Webpack as bundler.
Unfortunately, not everything is bundlable in a .js file, so we list them as "external" to Webpack. We currently have the following list : ['bufferutil', 'utf-8-validate', 'any-promise'].

For these dependencies, we have to copy them in a node_modules folder next to the server.app.js file. Since we have to add their sub-dependencies as well, we can't simply copy them from the global node_module installed by npm ci, so we run another npm i bufferutil utf-8-validate any-promise with a --prefix option in order to specify the folder in which to create them (next to the dist/server.app.js).

So the version the ci is installing for these "unbundlable" dependencies is not the one indicated in the package-lock.json, but the one in package.json (so it's installing minor & patch instead of exact version). We should be installing the package-lock.json versions instead, but we didn't find a good solution to do so yet.

The dist/ folder is then simply copied into the arm device and executed there.

@rgranger
Copy link
Author

rgranger commented Jan 7, 2021

To add further information :

The device is using Node v8.9.4
The Gitlab CI is using Node v8.9.4 as well with npm@6 (don't know exact version of npm) in order to be able to use the "package-lock.json" file (which is not used for the "unbundlable" dependencies unfortunately)

@rgranger
Copy link
Author

rgranger commented Jan 7, 2021

I succeeded : The CI is now using an arm architecture so the command npm ci should fetch the correct binaries from this module. But now, it fails on other modules that don't provide them explicitly for the arm arch :

So, now we removed the deprecated dependency "PhantomJS". The CI is using the docker image : https://hub.docker.com/r/arm32v7/node/. The installation en build step went well and should work on the arm arch.

But, when executing the new resulting artifact, we still get the error "Illegal Instruction (core dumped)" on the arm device...

But when executing the artifact and my local computer (arch x86), it works normally...

@rgranger
Copy link
Author

rgranger commented Jan 7, 2021

Can you try to install [email protected] and [email protected] on that device and run https://github.com/websockets/bufferutil#example and https://github.com/websockets/utf-8-validate#example?

So we managed to get internet access and npm on the device.

When running npm i [email protected] directly on the device, we get an error : not found: make

image

@rgranger
Copy link
Author

rgranger commented Jan 7, 2021

Another experimentation :

When I remove the folder prebuilds/linux-arm :

image

It works !

So far it was working because the folder was missing and he must have used the "fallback.js" file instead.
But now with version 4.0.3, he is using the prebuilds/linux-arm/ binarie which is not working for unknown reason.

@lpinca
Copy link
Member

lpinca commented Jan 7, 2021

Yes, my only guess was that it was previously using the fallback code, which makes sense since you installed it on x86 and there was no prebuilt binary for ARM.

To run the example linked above you can install it on any other device and move the whole directory inside the ARM device. I guess it will crash with the "Illegal Instructions" error.

I did not test the ARMv7 native addon, I've only tested the ARMv6 version on a Raspberry Pi.

@lpinca
Copy link
Member

lpinca commented Jan 7, 2021

I don't have an ARMv7 device so I can't test it myself but I think you can work around the issue by deleting the node.napi.armv7.node binary inside the linux-arm folder or the whole linux-arm folder during the build in Gitlab CI before copying the stuff to the device.

This will make it use the fallback code like it was with version 4.0.2

@rgranger
Copy link
Author

rgranger commented Jan 8, 2021

To run the example linked above you can install it on any other device and move the whole directory inside the ARM device. I guess it will crash with the "Illegal Instructions" error.

Ok, after running the example on the ARMv7 device, it crashes with "Illegal instruction (core dumped" as well.

I don't have an ARMv7 device so I can't test it myself but I think you can work around the issue by deleting the node.napi.armv7.node binary inside the linux-arm folder or the whole linux-arm folder during the build in Gitlab CI before copying the stuff to the device.

This will make it use the fallback code like it was with version 4.0.2

I think I would rather try to find a way to use the package-lock fixed 4.0.2 version directly in my "deploy" script. But that means I wouldn't be able to upgrade this dependency anymore (until the issue with ARMv7 is fixed).

I can also try to run npm rebuild in the CI to see what happens. We already tried to run npm rebuild directly on the device, but the script is failing with weird errors. Maybe we need to install Python and stuff for "node-gyp", but the device doesn't have enough storage for all of that.

@rgranger
Copy link
Author

rgranger commented Jan 8, 2021

I can also try to run npm rebuild in the CI to see what happens. We already tried to run npm rebuild directly on the device, but the script is failing with weird errors. Maybe we need to install Python and stuff for "node-gyp", but the device doesn't have enough storage for all of that.

The npm rebuild in the CI running on "ARMv7" succeeds, but the artifact generated stills crashes when run on the device...

I think I would rather try to find a way to use the package-lock fixed 4.0.2 version directly in my "deploy" script

What is weird is that "bufferutil" and "utf-8-validate" are not listed in the package-lock.json file... 🤔
Every dependencies and subdependencies should be in the package-lock.json with exact version, no?

@rgranger
Copy link
Author

rgranger commented Jan 8, 2021

The npm rebuild in the CI running on "ARMv7" succeeds, but the artifact generated stills crashes when run on the device...

After debugging with a coworker,

The binarie generated by the CI gives this :

eadelf -A node.napi.armv7.node                                    
Section d'Attribut: aeabi
Attributs du fichier
  Tag_CPU_name: "Cortex-A7"
  Tag_CPU_arch: v7
  Tag_CPU_arch_profile: Application
  Tag_ARM_ISA_use: Yes
  Tag_THUMB_ISA_use: Thumb-2
  Tag_FP_arch: VFPv2
  Tag_ABI_PCS_wchar_t: 4
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Needed
  Tag_ABI_FP_number_model: IEEE 754
  Tag_ABI_align_needed: 8-octet
  Tag_ABI_align_preserved: 8-octet, sauf pour feuille SP
  Tag_ABI_enum_size: int
  Tag_ABI_HardFP_use: Deprecated
  Tag_ABI_VFP_args: VFP registers
  Tag_CPU_unaligned_access: v6
  Tag_MPextension_use: Allowed
  Tag_DIV_use: Allowed in v7-A with integer division extension
  Tag_Virtualization_use: TrustZone and Virtualization Extensions

Whereas another binarie on the device itself gives this :

~ readelf -A test_pcie                                                     
Section d'Attribut: aeabi
Attributs du fichier
  Tag_CPU_name: "Cortex-A9"
  Tag_CPU_arch: v7
  Tag_CPU_arch_profile: Application
  Tag_ARM_ISA_use: Yes
  Tag_THUMB_ISA_use: Thumb-2
  Tag_FP_arch: VFPv3
  Tag_Advanced_SIMD_arch: NEONv1
  Tag_ABI_PCS_wchar_t: 4
  Tag_ABI_FP_rounding: Needed
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Needed
  Tag_ABI_FP_number_model: IEEE 754
  Tag_ABI_align_needed: 8-octet
  Tag_ABI_align_preserved: 8-octet, sauf pour feuille SP
  Tag_ABI_enum_size: int
  Tag_ABI_VFP_args: VFP registers
  Tag_CPU_unaligned_access: v6
  Tag_MPextension_use: Allowed
  Tag_Virtualization_use: TrustZone

It appears that, even though they are both arch ARMv7, c'est are incompatible, because of one uses arch VFPv3 and the other VFPv2.

So the solution for us, would be to run npm rebuild on the device itself (which we haven't achieved yet).

But my coworker told me there was many different and incompatible arch inside ARMv7 (or something like that).
If the prebuild/linux-arm doesn't work for everyone using "arm", wouldn't it be better to let the system use the fallback.js file instead (remove the prebuilds for linux-arm entirely) ?

@lpinca
Copy link
Member

lpinca commented Jan 8, 2021

FYI ws does not need bufferutil. It is an optional peer dependency. This means that even if bufferutil is not installed or copied in the device in your case ws will work fine.

Not installing bufferutil is the same of using the fallback code in it.

@lpinca
Copy link
Member

lpinca commented Jan 8, 2021

I'm not sure if reverting 82bd20a is a good idea.

cc: @BHSPitMonkey

@rgranger
Copy link
Author

rgranger commented Jan 8, 2021

What is weird is that "bufferutil" and "utf-8-validate" are not listed in the package-lock.json file... 🤔
Every dependencies and subdependencies should be in the package-lock.json with exact version, no?

Actually, we didn't have bufferutil nor utf-8-validate as dependencies, that's why they weren't in the package-lock.json. But we still have to declare them as "external" to Webpack in order to create the bundle (since Webpack can't find them in our node_modules). In our deploy script, we were then installing bufferutil and utf-8-validate separately and injecting them in a folder node_modules next to the final bundle.

A solution is to declare them as dependencies in our project, then Webpack can bundle them just fine in the final .js bundle. So we no longer have any binaries in a node_modules folder, which is great !

FYI ws does not need bufferutil. It is an optional peer dependency. This means that even if bufferutil is not installed or copied in the device in your case ws will work fine.

Hm, that means we could not declare them as dependencies : just add them to "external" in Webpack, and that should work. Thanks for the help !

@lpinca
Copy link
Member

lpinca commented Sep 26, 2021

I'm closing this due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants