Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heap-use-after-free in ucs_config_parser_get_env_vars (SIGSEGV) #9426

Closed
farmerau opened this issue Oct 17, 2023 · 3 comments · Fixed by #9439
Closed

heap-use-after-free in ucs_config_parser_get_env_vars (SIGSEGV) #9426

farmerau opened this issue Oct 17, 2023 · 3 comments · Fixed by #9439
Assignees
Labels

Comments

@farmerau
Copy link

farmerau commented Oct 17, 2023

Describe the bug

Running ucx_perftest results in a SIGSEV for trying to read freed memory when running the destructor defined in init.cs on line 131. The call to ucs_profile_cleanup happens after the destructor in parser.c on line 2122 is invoked (at least on arm64 on BlueField-2 cards), which frees the memory for the hashtable.

Steps to Reproduce

To make this as easy to understand as possible, I recommend making the following code modifications for the sake of testing:

  • Adding some print statements inside of UCS_STATIC_CLEANUP in parser.c:
    • image
  • Adding a print statement inside of ucs_profile_write in profile.c to hammer home the point that it's being called after the destructor has been executed:
    • image
  • Adding some print statements indicating enter/exit of init.c's destructor:
    • image

I build the script using a modification of the DEBIAN release pipeline:

AZ_ARTIFACT_NAME='rpi-build.tar.gz'
POSTFIX='ucx-1.0.0-rpi'
# Build
./autogen.sh
./contrib/configure-devel --with-cuda --with-java=no --with-mad --enable-debug
echo "Making clean..."
make_clean distclean
echo "finish making clean.."
make dist
tarball=$(echo ucx*.tar.gz)
tar -xzvf ${tarball}              # extract the sources in a subdirectory
cd $(tar tf ${tarball} | head -1) # go to extracted tarball directory
echo 10 > debian/compat   # https://www.debian.org/doc/manuals/maint-guide/dother.en.htmdpl#compat
dpkg-buildpackage -us -uc -Pcuda,xpmem
cd ..                             # Move back to the working directory

# Rename DEB files          
VER="${POSTFIX#ucx-}"     # Remove 'ucx-' prefix from the POSTFIX string
find . -name "ucx*.deb" -exec bash -c 'mv "$1" "${1%%_*}-'"${VER}"'.deb"' _ {} \;
find . -name '*.deb'      # Show new names

# Remove superfluous dependency
dpkg-deb -R "ucx-cuda-${VER}.deb" tmp    # Extract
sed -i 's/libnvidia-compute-[0-9]* | libnvidia-ml1, //g' tmp/DEBIAN/control
dpkg-deb -b tmp "ucx-cuda-${VER}.deb"        # Rebuild
dpkg-deb -I "ucx-cuda-${VER}.deb"
dpkg-deb -I "ucx-${VER}.deb"

# Package
tar -cjf "${AZ_ARTIFACT_NAME}" *.deb *.ddeb # Package all DEBs AND Debug Symbols
tar -tjf "${AZ_ARTIFACT_NAME}"
  • This script runs in a docker container based on the ubuntu22.04-mofed5-cuda12 image defined here.

Actual Reproduction Steps

  • ucx_perftest -K mlx5_0:1 is run on the listener host
  • Something like ucx_perftest -K mlx5_0:1 -t tag_bw lid:40 -s 10 -n 1 is run on the other host
  • SIGSEV:
    • image

It's worth noting that compiling with debug symbols enabled (as seen in my build script) and packaging them appropriately does produce a slightly better callstack.

Further, I compiled with Address Sanitizer (added the -fsanitize=address -fno-omit-frame-pointer flags) and got more information:

==3748572==ERROR: AddressSanitizer: heap-use-after-free on address 0xfffff2c02cd0 at pc 0xfffff7133d18 bp 0xffffffffdd70 sp 0xffffffffdd80
READ of size 4 at 0xfffff2c02cd0 thread T0
#0 0xfffff7133d14 in ucs_config_parser_get_env_vars config/parser.c:2113
#1 0xfffff7157c08 in ucs_profile_write profile/profile.c:297
#2 0xfffff7158a60 in ucs_profile_dump profile/profile.c:700
#3 0xfffff7170cac in ucs_profile_cleanup profile/profile.c:767
#4 0xfffff711ef04 (/lib/libucs.so.0+0x2ef04)
#5 0xfffff7fc7394 in _dl_fini elf/dl-fini.c:142
#6 0xfffff6e2cde4 in __run_exit_handlers stdlib/exit.c:113
#7 0xfffff6e2cf08 in __GI_exit stdlib/exit.c:143
#8 0xfffff6e173fc in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:74
#9 0xfffff6e174c8 in __libc_start_main_impl ../csu/libc-start.c:392
#10 0xaaaaaaab3f6c (/usr/bin/ucx_perftest+0x13f6c)

0xfffff2c02cd0 is located 0 bytes inside of 4-byte region [0xfffff2c02cd0,0xfffff2c02cd4)
freed by thread T0 here:
#0 0xfffff75a9fe8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0xfffff711ecf0 (/lib/libucs.so.0+0x2ecf0)
#2 0xfffff7fc7394 in _dl_fini elf/dl-fini.c:142
#3 0xfffff6e2cde4 in __run_exit_handlers stdlib/exit.c:113
#4 0xfffff6e2cf08 in __GI_exit stdlib/exit.c:143
#5 0xfffff6e173fc in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:74
#6 0xfffff6e174c8 in __libc_start_main_impl ../csu/libc-start.c:392
#7 0xaaaaaaab3f6c (/usr/bin/ucx_perftest+0x13f6c)

previously allocated by thread T0 here:
#0 0xfffff75aa2f4 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0xfffff71203c8 in kh_resize_ucs_config_env_vars config/parser.c:47
#2 0xfffff7123b2c in kh_put_ucs_config_env_vars config/parser.c:47
#3 0xfffff71345d4 in ucs_config_parser_mark_env_var_used config/parser.c:1288
#4 0xfffff7139b78 in ucs_config_parser_print_env_vars_once config/parser.c:2060
#5 0xfffff733593c in ucp_worker_create core/ucp_worker.c:2606
#6 0xaaaaaaac1d6c in ucp_perf_setup lib/libperf.c:1662
#7 0xaaaaaaab9d98 in ucx_perf_run lib/libperf.c:1790
#8 0xaaaaaaab9d98 in run_test_recurs src/tools/perf/perftest_run.c:281
#9 0xaaaaaaab09a8 in run_test src/tools/perf/perftest_run.c:340
#10 0xaaaaaaab09a8 in main src/tools/perf/perftest.c:896
#11 0xfffff6e173f8 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#12 0xfffff6e174c8 in __libc_start_main_impl ../csu/libc-start.c:392
#13 0xaaaaaaab3f6c (/usr/bin/ucx_perftest+0x13f6c)

  • Doing this in conjunction with my print statements makes it pretty cut and dry:
    image

Setup and versions

  • OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
    • Ubuntu 22.04.2 LTS (aarch64)

Personal Remarks

Because C++ doesn't offer guarantees on destructor execution ordering, needing to rely on an object to not be destructed in another destructor seems risky. Likely, the calling of ucs_config_parser_get_env_vars will need to be moved somewhere outside of destruction.

However, I'm honestly not familiar enough with what the intended outcome or value of this function is-- so I can't in good faith make a recommendation on where or how this is done.

With guidance, I'm happy to open a pull request to address.

@farmerau farmerau added the Bug label Oct 17, 2023
@gleon99 gleon99 self-assigned this Oct 18, 2023
@gleon99
Copy link
Contributor

gleon99 commented Oct 22, 2023

Hi @farmerau
thanks for the very detailed ticket!

Can you try to use this revision and let me know whether it fixes the issue?

@farmerau
Copy link
Author

Hi @farmerau thanks for the very detailed ticket!

Can you try to use this revision and let me know whether it fixes the issue?

Thanks for the quick update! I'm building now and will perform testing this morning and will follow up.

@farmerau farmerau changed the title heap-use-after-free in ucs_config_parser_get_env_vars (SIGSEV) heap-use-after-free in ucs_config_parser_get_env_vars (SIGSEGV) Oct 23, 2023
@farmerau
Copy link
Author

farmerau commented Oct 23, 2023

@gleon99 My testing confirms that the revision corrects the ordering and alleviates the seg fault.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants