Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The metrics container (prometheus_wireguard_exporter) fails to run on arm64 #127

Closed
codestation opened this issue Dec 4, 2023 · 4 comments

Comments

@codestation
Copy link
Contributor

Describe the bug

The metrics container currently fails on arm64 so the pod cannot start.

The container log shows the following:

thread 'main' panicked at library/alloc/src/raw_vec.rs:534:5:
capacity overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

And passing the required env variable to the pod:

thread 'main' panicked at library/alloc/src/raw_vec.rs:534:5:
capacity overflow
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
   1: core::panicking::panic_fmt
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
   2: alloc::raw_vec::capacity_overflow
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/alloc/src/raw_vec.rs:534:5
   3: alloc::raw_vec::RawVec<T,A>::allocate_in
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/alloc/src/raw_vec.rs:177:27
   4: alloc::raw_vec::RawVec<T,A>::with_capacity_in
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/alloc/src/raw_vec.rs:130:9
   5: alloc::vec::Vec<T,A>::with_capacity_in
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/alloc/src/vec/mod.rs:670:20
   6: alloc::vec::Vec<T>::with_capacity
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/alloc/src/vec/mod.rs:479:9
   7: std::sys::unix::args::imp::clone
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys/unix/args.rs:146:28
   8: std::sys::unix::args::imp::args
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys/unix/args.rs:131:22
   9: std::sys::unix::args::args
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys/unix/args.rs:19:5
  10: std::env::args_os
             at ./rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/env.rs:794:21
  11: <core::pin::Pin<P> as core::future::future::Future>::poll
  12: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
  13: prometheus_wireguard_exporter::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

I also tried to compile the binary in debug mode and failed on this crate:

https://github.com/clap-rs/clap/blob/f8e9211e38d19f060980a6cb10f3fc8a2735c5b2/src/builder/command.rs#L548

To Reproduce
Steps to reproduce the behavior:

  1. kubectl apply -f https://github.com/jodevsa/wireguard-operator/releases/download/v2.0.17/release.yaml
  2. Create a server and a peer.
  3. The pod reports CrashLoopBackOff, with the metrics container stuck in a crash loop.

Expected behavior
The metrics container should start.

Additional context
I also ran the container on docker with the same result

docker run --rm --entrypoint /usr/local/bin/prometheus_wireguard_exporter ghcr.io/jodevsa/wireguard-operator/agent:v2.0.17

If i copy the prometheus_wireguard_exporter from the container to the host (Ubuntu 22.04 LTS) and run it, then it executes without errors, so the problem is with the alpine image. I also switched the final image from alpine:3.18 to debian:bookworm and it ran without errors.

Possible fixes:

  1. Use a final image based on glibc instead of alpine (debian, for example).
  2. Change the compilation targets on the metrics binary so is compatible with alpine.
  3. Workaround: disable the metrics container (cannot find a way to do it without editing the deployment by hand).
@jodevsa
Copy link
Owner

jodevsa commented Dec 7, 2023

Thanks @codestation. I'll try to look into this. I'm fine with removing those metrics. I'm not aware or anyone using them. The other alternative that I would prefer is to implement those metrics in the agent. so rewriting them in golang. This shouldn't be that much work.

I'm doing a language course along side working my job. So I cannot promise this will land soon. I'm also open for PRs with whatever soloution you choose from the suggestions you proposed.

Thanks,
Subhi

@codestation
Copy link
Contributor Author

I would be using those metrics once i finish my grafana setup so please keep it, maybe add some option to disable it per wireguard server. The reimplementation in Go sounds a good idea, as i cannot read rust code at all.

I pushed a simple PR with a image change. The image gets a little bigger but the controller now starts correctly (i currently have 2 days of uptime on my ARM Ampere instance).

@jodevsa
Copy link
Owner

jodevsa commented Dec 8, 2023

Thanks @codestation for fixing this. PR is merged and the functionality is available in the latest release. Please close the issue if the problem is now resolved

@codestation
Copy link
Contributor Author

Thank you, just updated and everything is running fine with the latest agent image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants