Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption when running the Hurl CLI with --test mode #3297

Open
lambrospetrou opened this issue Oct 9, 2024 · 10 comments
Open

Memory corruption when running the Hurl CLI with --test mode #3297

lambrospetrou opened this issue Oct 9, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@lambrospetrou
Copy link

lambrospetrou commented Oct 9, 2024

What is the current bug behavior?

There is a corrupt memory heap operation somewhere when finalizing the Hurl run. I am getting the following errors every few hundreds of runs of a simple Hurl script:

GET https://unpkg.com/[email protected]/dist/vue.global.prod.js
HTTP 200

Errors occurring:

hurl --test --repeat 100 vuejs.hurl
...

Executed files:    100
Executed requests: 100 (91.6/s)
Succeeded files:   100 (100.0%)
Failed files:      0 (0.0%)
Duration:          1092 ms

double free or corruption (fasttop)
double free or corruption (fasttop)
double free or corruption (fasttop)
Segmentation fault (core dumped)

One time I got this error instead as well malloc_consolidate(): unaligned fastbin chunk detected, which is memory related too.

Steps to reproduce

Create the file Dockerfile-hurl-memory-issue with contents:

# Hurl needs GLibc 2.28+. We want curl version 8.4.0+ as well
# to properly support --max-filesize: https://github.com/Orange-OpenSource/hurl/issues/3245
# https://launchpad.net/ubuntu/+source/curl
FROM ubuntu:noble

# Hurl dependencies.
RUN apt-get update && apt-get install --no-install-recommends -y ca-certificates curl libcurl4 libxml2

# Install Hurl as per docs: https://hurl.dev/docs/installation.html#debian-ubuntu
RUN VERSION=5.0.1 && curl --location --remote-name https://github.com/Orange-OpenSource/hurl/releases/download/$VERSION/hurl_${VERSION}_amd64.deb && apt update && apt install ./hurl_${VERSION}_amd64.deb

Build the image and get inside the container:

docker build -t hurl-memory-test -f Dockerfile-hurl-memory-issue . && docker run --rm -it --entrypoint /bin/bash hurl-memory-test

Create the Hurl file:

echo -e "GET https://unpkg.com/[email protected]/dist/vue.global.prod.js\nHTTP 200" > vuejs.hurl

Run the Hurl file (you might need to repeat the following a few times till you cause the memory error):

hurl --test --repeat 100 vuejs.hurl

What is the expected correct behavior?

It should complete successfully without any errors.

Execution context

  • Hurl Version (hurl --version):
root@f2abb0a5ee0c:/# hurl --version
hurl 5.0.1 (x86_64-pc-linux-gnu) libcurl/8.5.0 OpenSSL/3.0.13 zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.7 libssh/0.10.6/openssl/zlib nghttp2/1.59.0
Features (libcurl):  alt-svc AsynchDNS brotli HSTS HTTP2 IDN IPv6 Largefile libz NTLM SPNEGO SSL TLS-SRP UnixSockets zstd
Features (built-in): brotli

The issue occurs on ubuntu:noble latest tag as of today: https://hub.docker.com/layers/library/ubuntu/noble-20240904.1/images/sha256-74f92a6b3589aa5cac6028719aaac83de4037bad4371ae79ba362834389035aa?context=explore

And also happens with debian:trixie-slim: https://hub.docker.com/layers/library/debian/trixie-slim/images/sha256-4cc21783b221c03e841bc1274e5babef8c52ccabfc20abadc37e83e37bed6990?context=explore

Possible fixes

Not sure...

@lambrospetrou lambrospetrou added the bug Something isn't working label Oct 9, 2024
@lambrospetrou
Copy link
Author

To add more context, the issue is not related to --repeat. In Skybear.NET I run scripts without that option, hence only once, and every few days I get the above memory corruption errors. Today for example, it happened 3 times.

@fabricereix
Copy link
Collaborator

Hi @lambrospetrou,
thanks for reporting the issue.
I don't reproduce it yet on my local machine with your steps.
We are going to try to reproduce it in our pipeline, and we'll come back to you.

@lambrospetrou
Copy link
Author

lambrospetrou commented Oct 11, 2024

Hi @lambrospetrou, thanks for reporting the issue. I don't reproduce it yet on my local machine with your steps. We are going to try to reproduce it in our pipeline, and we'll come back to you.

Yeah, sometimes it needs a few attempts to get it crashing. I always get it happening if I run the hurl ... command in a bash loop or run it fast enough a few times.

@jcamiel
Copy link
Collaborator

jcamiel commented Oct 13, 2024

Hi all,

I tried to reproduce the bug on my machine (MBP M1), I didn't reproduce it either in local (with brew's Hurl), or in a Docker container (with instructions provided by @lambrospetrou ). I've tried hard 😅! In the case of Docker, I've not used the .deb file (its only for x86 arch), but I've installed the arm64 precompiled binary.

That's said, I've run the sample through valgrind and got this result:

$ cargo valgrind run -p hurl -- --test /tmp/vuejs.hurl 
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running `/root/.cargo/bin/cargo-valgrind target/debug/hurl --test --jobs 1 /tmp/vuejs.hurl`
       Error leaked 384 B in 1 block
        Info stack trace (user code at the bottom)
             at calloc
             at calloc (rtld-malloc.h:44)
             at allocate_dtv (dl-tls.c:370)
             at _dl_allocate_tls (dl-tls.c:629)
             at allocate_stack (allocatestack.c:429)
             at pthread_create@@GLIBC_2.34 (pthread_create.c:655)
             at std::sys::pal::unix::thread::Thread::new (thread.rs:87)
             at std::thread::Builder::spawn_unchecked_ (mod.rs:577)
             at std::thread::Builder::spawn_unchecked (mod.rs:456)
             at spawn<hurl::parallel::worker::{impl#3}::new::{closure_env#0}, core::result::Result<(), std::sync::mpsc::SendError<hurl::parallel::message::WorkerMessage>>> (mod.rs:388)
             at std::thread::spawn (mod.rs:694)
             at hurl::parallel::worker::Worker::new (worker.rs:75)
             at hurl::parallel::runner::ParallelRunner::new::{{closure}} (runner.rs:127)
             at core::iter::adapters::map::map_fold::{{closure}} (map.rs:89)
             at core::iter::traits::iterator::Iterator::fold (iterator.rs:2588)
     Summary Leaked 384 B total (0 other errors)

So there is one thread leaked detected by valgrind. Not sure if this is related but we should fix this.

@lambrospetrou can you try to reproduce the issue by not using --test? This way, the code will use the sequential run path and we will see if the issue is with the parallel implementation or not. If these errors start happening with 5.0.1, this could explain why...

@lambrospetrou
Copy link
Author

lambrospetrou commented Oct 13, 2024

I cannot reproduce without the --test option. I tried it back when I opened the issue, and it only happens when using the --test option, also confirmed it again now.

I am attaching a video showcasing that it happens every few times on my side with the --test option 😄 Check the 6th run, 11th run, and the last two.

20241013T1753-WindowsTerminal_X9zeNbsHax.mp4

Also, when using only 1 thread with hurl --test --repeat 100 --jobs 1 vuejs.hurl it also works fine, I cannot make it fail. So, indeed it seems to be something with the parallel mode.

@jcamiel
Copy link
Collaborator

jcamiel commented Oct 13, 2024

Your Docker host is a Windows machine I'm correct? I will try to reproduce the bug from a Windows machine (with Docker etc). You're reproducing it quite easily while I can't make it, it maybe make a difference? (It doesn't explain why it happens on skybear.net though)

@lambrospetrou
Copy link
Author

lambrospetrou commented Oct 13, 2024

Your Docker host is a Windows machine I'm correct? I will try to reproduce the bug from a Windows machine (with Docker etc). You're reproducing it quite easily while I can't make it, it maybe make a difference? (It doesn't explain why it happens on skybear.net though)

Locally, I am using WSL2 (Ubuntu 24.04.1 LTS) in Windows 11, and using Docker Desktop on the Windows side.

For Skybear.NET, I use Fly.io and my container is based off ubuntu:noble.

@fabricereix
Copy link
Collaborator

I have added an integ test that combines --test and --repeat
#3305
The bug is still not reproduced :-(

@jcamiel
Copy link
Collaborator

jcamiel commented Oct 14, 2024

I haven't managed to reproduce it, but I think there's something I would make a little cleaner in the graceful shutdown of threads workers. I don't know if it will fix this issue or not but it will be a cleaner and it's worth trying (plus it can fix the valgring issue that tickles me). As soon as I've time to do it !

@lambrospetrou
Copy link
Author

lambrospetrou commented Oct 15, 2024

For the record, I have retries on Skybear.NET now, so that when it happens again it just retries the script execution, so it's not urgent at the moment. It would be nice though to see what in the parallelism could cause race conditions on memory. I might take a stub in reading some Rust later in the week 😏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants