Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_exporter 1.5.0 release SIGKILL immediately on macOS M1/arm64 #2539

Closed
petemounce opened this issue Nov 30, 2022 · 22 comments · Fixed by #3008
Closed

node_exporter 1.5.0 release SIGKILL immediately on macOS M1/arm64 #2539

petemounce opened this issue Nov 30, 2022 · 22 comments · Fixed by #3008

Comments

@petemounce
Copy link

Host operating system: output of uname -a

$ uname -a
Darwin Peters-MBP-2.cust.communityfibre.co.uk 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000 arm64

node_exporter version: output of node_exporter --version

No output. SIGKILL. See below.

node_exporter command line flags

--help

node_exporter log output

No output. SIGKILL. See below.

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

Ran the binary with --help

What did you expect to see?

The --help usage output.

What did you see instead?

The process was SIGKILL'd.

Full trace

pete at Peters-MBP-2 in ~/src/ef/_ (pete-node-exporter●)
$ uname -psm
Darwin arm64 arm

pete at Peters-MBP-2 in ~/src/ef/_ (pete-node-exporter●)
$ curl --location https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.darwin-arm64.tar.gz | tar xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 4411k  100 4411k    0     0  9061k      0 --:--:-- --:--:-- --:--:-- 15.0M

pete at Peters-MBP-2 in ~/src/ef/_ (pete-node-exporter●)
$ ./node_exporter-1.5.0.darwin-arm64/node_exporter --help
[1]    33812 killed     ./node_exporter-1.5.0.darwin-arm64/node_exporter --help

Other notes

I ran the same sequence on an AWS amd64 mac without error. This only happens on M1 macs (my laptop; AWS-hosted).

This happened for 1.4.0 as well but I saw the fresh release and figured I'd try it before reporting.

@SuperQ
Copy link
Member

SuperQ commented Nov 30, 2022

We haven't updated the osx build tools in a while. I guess we need to look into that.

@discordianfish
Copy link
Member

Are you sure this isn't System Integrity Protection? I'm not a mac expert but that looks like when that hit me in the past.

@petemounce
Copy link
Author

I'm not; how would I verify?

@rtuin
Copy link

rtuin commented Dec 16, 2022

I can confirm this issue, tried 1.5.0, 1.4.1 and 1.3.1. All the behaviour as described.
Perhaps this is specifically on macOS Ventura (released in Oct '22).

@petemounce
Copy link
Author

(I'm on Monterey.)

@discordianfish
Copy link
Member

Yeah likely code signing issue. I'm not a mac expert, dunno if its only on M1 or general depending on the macos version but unsigned binaries get sigkilled, e.g: nodejs/node#40827 (comment)

@pboiseau
Copy link

pboiseau commented Mar 3, 2023

I'm having the same issue. Do you know a way to solve it ?

@discordianfish
Copy link
Member

@pboiseau Did you try the link above?

@matthiasr
Copy link
Contributor

Is this the same issue as #2217?

There is something node exporter specific going on here. Comparing with the statsd exporter (released both before and after node exporter 1.5.0):

❯ codesign -vvv node_exporter
node_exporter: valid on disk
node_exporter: satisfies its Designated Requirement
❯ codesign -vvv ~/Downloads/node_exporter-1.5.0.darwin-arm64/node_exporter
/Users/matthiasrampke/Downloads/node_exporter-1.5.0.darwin-arm64/node_exporter: invalid signature (code or signature have been modified)
In architecture: arm64
❯ codesign -vvv ~/Downloads/statsd_exporter-0.23.1.darwin-arm64/statsd_exporter
/Users/matthiasrampke/Downloads/statsd_exporter-0.23.1.darwin-arm64/statsd_exporter: valid on disk
/Users/matthiasrampke/Downloads/statsd_exporter-0.23.1.darwin-arm64/statsd_exporter: satisfies its Designated Requirement
❯ codesign -vvv ~/Downloads/statsd_exporter-0.22.8.darwin-arm64/statsd_exporter
/Users/matthiasrampke/Downloads/statsd_exporter-0.22.8.darwin-arm64/statsd_exporter: valid on disk
/Users/matthiasrampke/Downloads/statsd_exporter-0.22.8.darwin-arm64/statsd_exporter: satisfies its Designated Requirement

@pboiseau
Copy link

@discordianfish I've tried but I have this error

codesign -s - node_exporter
node_exporter: internal error in Code Signing subsystem

@discordianfish
Copy link
Member

Unfortunately I have no idea how this mac code signing works. If someone has some suggestion what we can do better/different, let me know

@petemounce
Copy link
Author

I'm only a little familiar; it's a PITA.

In both cases, guidance is to be extremely cautious with one's Apple Developer certs. If they leak, other people can sign their own binaries with said certs leading to all sorts of mischief.

@pboiseau
Copy link

I found a solution.

I share with you an ansible script that I have done to sign the app on Apple M1 in order to prevent the process from being SIGKILL.

You need to have a Apple developer account and create a Developer ID Application certificate.

- name: Get developer ID application certificate
  ansible.builtin.shell: |
    echo {{ apple_developer_certificate }} > developerID_application.txt
    base64 -d -i developerID_application.txt -o developerID_application.cer
    rm developerID_application.txt

- name: Check if certificate already exist in keychain
  ansible.builtin.shell: |
    security find-certificate - c "Developer ID Application: <Name Of Your Certificate (XXXXXXXXX)>" -a -p /Library/Keychains/System.keychain
  failed_when: is_certificate.rc != 0 and is_certificate.rc != 44
  register: is_certificate

- name: Import developer ID application certificate
  ansible.builtin.shell: security import developerID_application.cer -k /Library/Keychains/System.keychain
  when: is_certificate.rc != 0

- name: Sign node_exporter binary
  become: true
  ansible.builtin.shell: "codesign -s - {{ node_exporter_binary_install_dir }}/node_exporter"

- name: Verify node_exporter signature
  ansible.builtin.shell: "codesign -vvvv {{ node_exporter_binary_install_dir }}/node_exporter"
  register: result
  failed_when: result.rc == 1

@discordianfish
Copy link
Member

It would be nice if we could sign the releases as well. Dunno what that would involve, probably need to get some key signed by apple. If someone is familiar with the process, let me know! Even better if someone wants to submit a PR to add this to CI.

@flichtenheld
Copy link

I just ran into this problem as well when switching to macOS 13. One existing work-around is to use the homebrew version (https://formulae.brew.sh/formula/node_exporter#default) which is correctly signed.

@gitperr
Copy link
Contributor

gitperr commented Sep 22, 2023

It would be nice if we could sign the releases as well. Dunno what that would involve, probably need to get some key signed by apple. If someone is familiar with the process, let me know! Even better if someone wants to submit a PR to add this to CI.

There is a write-up here: Automatic Code-signing and Notarization for macOS apps using GitHub Actions

There are two routes that can be taken:

  1. Sign with an official key that is maintained by node exporter maintainers:
    Needs:

And I'd recommend we use Fastlane for automating this process. It saves lots of headaches.

  • Generates signing key, encrypts and saves it on a git repo
  • Automates signing with easy to read Ruby config files called Fastfile
  1. Sign with ad-hoc method:
    If we don't want to go through this hassle (i.e. route 1 above), we can also try the ad-hoc signing, which @pboiseau recommended earlier:
    codesign -s - {{ node_exporter_binary_install_dir }}/node_exporter
    It seems like Homebrew people use the same method, as well.
    Needs:

The way I see it, route 2 is easier to take and we can try some builds like that, if it works, then let's use it? We can also add the test suggested above:
codesign -vvvv {{ node_exporter_binary_install_dir }}/node_exporter and/or some more complicated things:

  • like installing node exporter on macOS CI machine, and checking launchctl print system/io.prometheus.node_exporter for intended state, which would be running
  • and last exit reason is not OS_REASON_CODESIGNING.

I'd be happy to contribute a PR for this. Please let me know what you think.

@discordianfish
Copy link
Member

@gitperr sounds reasonable, so if you want to submit a PR that'd be great!

@gitperr
Copy link
Contributor

gitperr commented Oct 14, 2023

@discordianfish Thanks for the response.

I set my dev environment today on my mac (arm), and tried to build node_exporter.

So far my findings:

  • It builds properly (and signs) when built on mac hardware, output is like that when checked with codesign:
node_exporter % codesign -vvvv node_exporter
node_exporter: valid on disk
node_exporter: satisfies its Designated Requirement
node_exporter-1.6.1.darwin-arm64 % codesign -vvvv node_exporter
node_exporter: invalid signature (code or signature have been modified)
In architecture: arm64

So, I'm now looking at the CircleCI to add code signing step to the build pipeline. But I'm a bit confused, where exactly would it fit in there?
https://github.com/prometheus/node_exporter/blob/master/.circleci/config.yml

I had made something like this, but I'm not sure how I can test it:
gitperr@55f0083

Thanks for any pointers.

@gitperr
Copy link
Contributor

gitperr commented Oct 25, 2023

Alright, figured out some of the steps and got the pipelines running on commits (see my open MR #2833).

I got some ad-hoc code signed builds out, but they were made by Intel macs, and they did not work on arm.

Now, I got blocked by the resource class, like that: https://app.circleci.com/pipelines/github/prometheus/node_exporter/3817/workflows/18a9d70c-2317-4d49-b20f-e94fd82cf02a/jobs/19923/steps
Resource class macos for macos.m1.medium.gen1, image xcode:14.3.1 is not available for your project, or is not a valid resource class. This message will often appear if the pricing plan for this project does not support macos use.

Seems like the node exporter CircleCI plan does not support m1 mac use. Is it possible to change the plan for that? Seems like CircleCI will soon stop supporting Intel macs anyway. Also, m1 macs are capable of compiling for amd64 architecture, so we won't lose anything.

@gitperr
Copy link
Contributor

gitperr commented Dec 22, 2023

Do we plan on adding the M1 or M2 mac runner for this? Then I think it is very easy to finalize this fix.
Maybe @SuperQ might know about adding different nodes for CircleCI.

@SuperQ
Copy link
Member

SuperQ commented Dec 25, 2023

I think we need to update the xcode stuff in our golang-builder Docker image. It's been a very long time and the update process is really annoying/tricky due to Apple's licensing.

@gitperr
Copy link
Contributor

gitperr commented Dec 29, 2023

I created a PR for updating the xcode stuff in golang-builder docker image:
prometheus/golang-builder#239

gitperr added a commit to gitperr/node_exporter that referenced this issue Feb 7, 2024
This should hopefully fix the SIGKILL issue on OSX machines.
e.g. in: prometheus#2539

Signed-off-by: Alper Polat <[email protected]>
gitperr added a commit to gitperr/node_exporter that referenced this issue Feb 10, 2024
This should hopefully fix the SIGKILL issue on OSX machines.
e.g. in: prometheus#2539

Signed-off-by: Alper Polat <[email protected]>

Change the docker flags to correct ones

Signed-off-by: Alper Polat <[email protected]>

Fix errors in running the rcodesign from golang-builder

Signed-off-by: Alper Polat <[email protected]>

Use pwd instead

Readlink does not work to get the proper path, pwd might do it.
As promu seems to be copying the binaries based on working directory.

Signed-off-by: Alper Polat <[email protected]>

Try to run at the same job to see if it helps

So far I am unable to find the binary's location with
either pwd or readlink. I'm suspecting that the binary is
not on this specific host that is running the rcodesign.

Signed-off-by: Alper Polat <[email protected]>

Try to debug what files are in the current working directory

Signed-off-by: Alper Polat <[email protected]>

Print working directory as well

Signed-off-by: Alper Polat <[email protected]>

Add quote wrapping

Signed-off-by: Alper Polat <[email protected]>

Try to debug more

Signed-off-by: Alper Polat <[email protected]>

Nothing seems to be in .build directory here

Signed-off-by: Alper Polat <[email protected]>

Remove some of debug commands

Seems like the build does not get produced because of the
CircleCI node index that gets passed into `--parallelism-thread`.
Signed-off-by: Alper Polat <[email protected]>

Add a separate sign stage for code signing

Separate stage might be useful so that we have all of
the builds that end up in `.build` here, and sign the one(s)
that we want. First one being implemented here is darwin-arm64.

Signed-off-by: Alper Polat <[email protected]>

Run only if darwin-arm64 was built

Earlier I tried to add a separate stage for signing,
but seems like that was a bad idea because the pipeline
file has to exist in `master` for that so we can run
the tests properly. Checking with if might be one of the
simpler and better ideas...

Signed-off-by: Alper Polat <[email protected]>

Add forgotten quote

Fixing basic syntax error

Signed-off-by: Alper Polat <[email protected]>
gitperr added a commit to gitperr/node_exporter that referenced this issue Apr 30, 2024
Signed-off-by: Alper Polat <[email protected]>

Bump golang-builder version (prometheus#2908)

Signed-off-by: Alper Polat <[email protected]>

exec_bsd: Fix labels for vm.stats.sys.v_syscall sysctl (prometheus#2895)

Signed-off-by: David O'Rourke <[email protected]>

chore:remove constant from function (prometheus#2884)

Signed-off-by: tyltr <[email protected]>

build(deps): bump github.com/prometheus/common from 0.45.0 to 0.46.0 (prometheus#2910)

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.45.0 to 0.46.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](prometheus/common@v0.45.0...v0.46.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

build(deps): bump github.com/jsimonetti/rtnetlink from 1.4.0 to 1.4.1 (prometheus#2909)

Bumps [github.com/jsimonetti/rtnetlink](https://github.com/jsimonetti/rtnetlink) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/jsimonetti/rtnetlink/releases)
- [Commits](jsimonetti/rtnetlink@v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: github.com/jsimonetti/rtnetlink
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

fix hwmon nil ptr (prometheus#2873)

* fix hwmon nil ptr

syslink maybe lost in some cases.

---------

Signed-off-by: TaoGe <[email protected]>

Fix hwmon error capture (prometheus#2915)

Fix golangci-lint "ineffectual assignment" by correctly capturing any
errors within the hwmon gathering loop.

Signed-off-by: Ben Kochie <[email protected]>

Attempt to sign the node exporter darwin build

This should hopefully fix the SIGKILL issue on OSX machines.
e.g. in: prometheus#2539

Signed-off-by: Alper Polat <[email protected]>

Change the docker flags to correct ones

Signed-off-by: Alper Polat <[email protected]>

Fix errors in running the rcodesign from golang-builder

Signed-off-by: Alper Polat <[email protected]>

Use pwd instead

Readlink does not work to get the proper path, pwd might do it.
As promu seems to be copying the binaries based on working directory.

Signed-off-by: Alper Polat <[email protected]>

Try to run at the same job to see if it helps

So far I am unable to find the binary's location with
either pwd or readlink. I'm suspecting that the binary is
not on this specific host that is running the rcodesign.

Signed-off-by: Alper Polat <[email protected]>

Try to debug what files are in the current working directory

Signed-off-by: Alper Polat <[email protected]>

Print working directory as well

Signed-off-by: Alper Polat <[email protected]>

Add quote wrapping

Signed-off-by: Alper Polat <[email protected]>

Try to debug more

Signed-off-by: Alper Polat <[email protected]>

Nothing seems to be in .build directory here

Signed-off-by: Alper Polat <[email protected]>

Remove some of debug commands

Seems like the build does not get produced because of the
CircleCI node index that gets passed into `--parallelism-thread`.
Signed-off-by: Alper Polat <[email protected]>

Add a separate sign stage for code signing

Separate stage might be useful so that we have all of
the builds that end up in `.build` here, and sign the one(s)
that we want. First one being implemented here is darwin-arm64.

Signed-off-by: Alper Polat <[email protected]>

Run only if darwin-arm64 was built

Earlier I tried to add a separate stage for signing,
but seems like that was a bad idea because the pipeline
file has to exist in `master` for that so we can run
the tests properly. Checking with if might be one of the
simpler and better ideas...

Signed-off-by: Alper Polat <[email protected]>

Add forgotten quote

Fixing basic syntax error

Signed-off-by: Alper Polat <[email protected]>

Update common Prometheus files (prometheus#2917)

Signed-off-by: prombot <[email protected]>

Use promu to code sign

The functionality being replaced here is going to be
built into `promu` with prometheus/promu#284
So pipelines should use it instead.
Signed-off-by: Alper Polat <[email protected]>

Use Promu 0.17.0

Signed-off-by: Alper Polat <[email protected]>

Introduce one error first

We want to re-trigger the pipeline. But, the circleCI interface
does not allow re-runs. So, going to introduce a dummy error,
take it back and re-trigger the pipeline like that.
Signed-off-by: Alper Polat <[email protected]>

Set version to correct one

Signed-off-by: Alper Polat <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment