Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeakSanitiser break on some shutdowns in Daily build #1701

Open
achamayou opened this issue Oct 2, 2020 · 10 comments · Fixed by #5160
Open

LeakSanitiser break on some shutdowns in Daily build #1701

achamayou opened this issue Oct 2, 2020 · 10 comments · Fixed by #5160

Comments

@achamayou
Copy link
Member

eg. https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=13608&view=results

Tracer caught signal 11: addr=0x0 pc=0x4f4620 sp=0x7f4a60171d30
==1605==LeakSanitizer has encountered a fatal error.
==1605==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==1605==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
@achamayou
Copy link
Member Author

Turning on the suggested options does not reveal much more:

2020-10-09T11:54:12.8473188Z 48: ==2952==AddressSanitizer: libc interceptors initialized
2020-10-09T11:54:12.8473533Z 48: || `[0x10007fff8000, 0x7fffffffffff]` || HighMem    ||
2020-10-09T11:54:12.8473863Z 48: || `[0x02008fff7000, 0x10007fff7fff]` || HighShadow ||
2020-10-09T11:54:12.8474199Z 48: || `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap  ||
2020-10-09T11:54:12.8474523Z 48: || `[0x00007fff8000, 0x00008fff6fff]` || LowShadow  ||
2020-10-09T11:54:12.8474861Z 48: || `[0x000000000000, 0x00007fff7fff]` || LowMem     ||
2020-10-09T11:54:12.8475251Z 48: MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff
2020-10-09T11:54:12.8475763Z 48: redzone=16
2020-10-09T11:54:12.8475954Z 48: max_redzone=2048
2020-10-09T11:54:12.8476175Z 48: quarantine_size_mb=256M
2020-10-09T11:54:12.8476418Z 48: thread_local_quarantine_size_kb=1024K
2020-10-09T11:54:12.8476660Z 48: malloc_context_size=30
2020-10-09T11:54:12.8476887Z 48: SHADOW_SCALE: 3
2020-10-09T11:54:12.8477185Z 48: SHADOW_GRANULARITY: 8
2020-10-09T11:54:12.8477559Z 48: SHADOW_OFFSET: 0x7fff8000
2020-10-09T11:54:12.8477816Z 48: ==2952==Installed the sigaction for signal 11
2020-10-09T11:54:12.8478107Z 48: ==2952==Installed the sigaction for signal 7
2020-10-09T11:54:12.8478548Z 48: ==2952==Installed the sigaction for signal 8
2020-10-09T11:54:12.8478922Z 48: ==2952==T0: stack [0x7fff4e0dd000,0x7fff4e8dd000) size 0x800000; local=0x7fff4e8db854
2020-10-09T11:54:12.8479251Z 48: ==2952==AddressSanitizer Init done
2020-10-09T11:54:12.8479689Z 48: ==2952==T1: stack [0x7f0edb58a000,0x7f0edbd88f40) size 0x7fef40; local=0x7f0edbd88e34
2020-10-09T11:54:12.8480007Z 48: ==2952==T1 TSDDtor
2020-10-09T11:54:12.8480205Z 48: ==2952==T1 exited
2020-10-09T11:54:12.8480429Z 48: ==3231==Processing thread 2952.
2020-10-09T11:54:12.8481031Z 48: ==3231==Stack at 0x7fff4e0dd000-0x7fff4e8dd000 (SP = 0x7fff4e8db5c8).
2020-10-09T11:54:12.8481574Z 48: ==3231==TLS at 0x7f0eec07f440-0x7f0eec080500.
2020-10-09T11:54:12.8482103Z 48: ==3231==DTLS 4 at 0x1980001020000008-0x1c80000f2000000a.
2020-10-09T11:54:12.8482475Z 48: Tracer caught signal 11: addr=0x0 pc=0x4f4690 sp=0x7f0edb448d30
2020-10-09T11:54:12.8482825Z 48: ==2952==LeakSanitizer has encountered a fatal error.
2020-10-09T11:54:12.8483280Z 48: ==2952==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
2020-10-09T11:54:12.8483946Z 48: ==2952==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

@achamayou
Copy link
Member Author

Some potentially helpful information in here: google/sanitizers#723

I don't believe we have any protected memory though.

@achamayou
Copy link
Member Author

Also of interest: google/sanitizers#870

It looks like snmalloc may in fact protect some pages when built with USE_POSIX_COMMIT_CHECKS, but that's not the case for us. Let's try ASAN_OPTIONS=fast_unwind_on_malloc=0 and see how slow it is.

@achamayou
Copy link
Member Author

No luck with the slow unwind: https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=13938&view=logs&j=383d248c-4494-5797-d98f-6cef5140601e&t=bbcdfa66-260b-5619-96df-9537b476c35e

Issue happens still, and running is so slow that CI times out.

@achamayou
Copy link
Member Author

achamayou commented Oct 14, 2020

https://www.chromium.org/developers/testing/leaksanitizer suggests we ought to be using a debug version of libc++:

Using debug versions of shared libraries
(NOTE: the libstdc++ part is no longer required for Chromium, where we now use a custom libc++ binary for ASan builds.)

Be aware that ASan's fast stack unwinder depends on frame pointers, which are often missing in release versions of shared libraries. If you want to use the fast unwinder (enabled by default), you should at least install a debug version of libstdc++. This worked for us on Ubuntu:

sudo apt-get install libstdc++6-4.6-dbg
LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/debug" ASAN_OPTIONS="detect_leaks=1 strict_memcmp=0" out/Release/base_unittests

If you still see incomplete stack traces, you can disable the fast unwinder by adding fast_unwind_on_malloc=0 to ASAN_OPTIONS.

@achamayou
Copy link
Member Author

Addressed in #1776

@achamayou achamayou reopened this Oct 24, 2020
@achamayou
Copy link
Member Author

Much less frequent outside docker, but happened again in https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=14777&view=logs&j=88dd69b5-c778-5f1c-f9ac-1398f4203929&t=f18b20af-6d89-55d5-5278-1002871723a4

45: Tracer caught signal 11: addr=0x0 pc=0x4f46b0 sp=0x7feb2c010d30
45: ==3229==LeakSanitizer has encountered a fatal error.
45: ==3229==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
45: ==3229==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

@eddyashton
Copy link
Member

Another instance: https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=14858&view=logs&j=88dd69b5-c778-5f1c-f9ac-1398f4203929&t=f18b20af-6d89-55d5-5278-1002871723a4&l=19155

This time in ws_logging_raft, which is the first time I've noticed it in a test that doesn't involve recovery.

@achamayou
Copy link
Member Author

@eddyashton if this is resolved, we need to remove

if line.startswith("Tracer caught signal 11"):
, but my reading of #5160 suggests to me that it's not.

@eddyashton
Copy link
Member

@achamayou - Correct, this is not yet resolved. I tried removing those lines in #5160, but that resulted in test failures as some nodes are still emitting this on shutdown.

@achamayou achamayou removed the bug label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants