Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: check for specific cffi version #1

Merged
merged 1 commit into from
Jul 18, 2015

Conversation

trws
Copy link

@trws trws commented Jul 18, 2015

No description provided.

garlick added a commit that referenced this pull request Jul 18, 2015
config: check for specific cffi version
@garlick garlick merged commit fb0a709 into garlick:build-fixes Jul 18, 2015
garlick pushed a commit that referenced this pull request Jan 12, 2018
Fix a flux-sched autoconf problem when it uses
PKG_CHECK_MODULES to gather libraray info on jobspec
package.

Apparently pkg-config doesn't like the dependent
package name: "Yaml-cpp."

On quartz:
$ uname -a
Linux quartz1916 3.10.0-693.5.2.1chaos.ch6.x86_64 #1
SMP Wed Oct 25 16:20:31 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux

$ pkg-config --cflags flux-jobspec
Package Yaml-cpp was not found in the pkg-config search path.
Perhaps you should add the directory containing `Yaml-cpp.pc'
to the PKG_CONFIG_PATH environment variable
Package 'Yaml-cpp', required by 'flux-jobspec', not found

Get a simiar configure error when
PKG_CHECK_MODULES([JOBSPEC],[flux-jobspec],[],[]) is used.
garlick added a commit that referenced this pull request Sep 15, 2020
Problem: unloading resource module with events posted to eventlog
in flight can resut in segfault.

Program terminated with signal SIGSEGV, Segmentation fault.

 #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102
 102     ../sysdeps/x86_64/multiarch/strcmp-avx2.S: No such file or directory.
 [Current thread is 1 (Thread 0x7fe74b7fe700 (LWP 3495430))]
 (gdb) bt
 #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102
 #1  0x00007fe764f40de0 in aux_item_find (key=<optimized out>,
     head=0x7fe73c006180) at aux.c:88
 #2  aux_get (head=<optimized out>, key=0x7fe764f5b000 "flux::log") at aux.c:119
 #3  0x00007fe764f1f0d4 in getctx (h=h@entry=0x7fe73c00c6d0) at flog.c:72
 #4  0x00007fe764f1f3a5 in flux_vlog (h=0x7fe73c00c6d0, level=7,
     fmt=0x7fe7606318fc "%s: %s event posted", ap=ap@entry=0x7fe74b7fd790)
     at flog.c:146
 #5  0x00007fe764f1f333 in flux_log (h=<optimized out>, lev=lev@entry=7,
    fmt=fmt@entry=0x7fe7606318fc "%s: %s event posted") at flog.c:195
 flux-framework#6  0x00007fe76061166a in reslog_cb (reslog=<optimized out>,
     name=0x7fe73c016380 "online", arg=0x7fe73c013000) at acquire.c:319
 flux-framework#7  0x00007fe760610deb in notify_callback (event=<optimized out>,
     reslog=0x7fe73c005b90) at reslog.c:47
 flux-framework#8  post_handler (reslog=reslog@entry=0x7fe73c005b90, f=0x7fe73c00a510)
     at reslog.c:91
 flux-framework#9  0x00007fe760611250 in reslog_destroy (reslog=0x7fe73c005b90)
     at reslog.c:182
 flux-framework#10 0x00007fe76060e6b8 in resource_ctx_destroy (ctx=ctx@entry=0x7fe73c016640)
     at resource.c:129
 flux-framework#11 0x00007fe76060ef18 in resource_ctx_destroy (ctx=0x7fe73c016640)
     at resource.c:331

It looks like the acquire subsystem got a callback for a rank coming online
after its context was freed.  Set the reslog callback to NULL before
destroying the acquire context.

Also, set the monitor callback to NULL before destroying the discover
context, as it appears this destructor has a similar safety issue.
garlick added a commit that referenced this pull request Aug 1, 2021
Problem: a new valgrind test failure was encountered on aarch64,
Ubuntu 20.04.2 LTS and also the official Jetson Ubuntu 18.04:

==1705645== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==1705645==    at 0x4BDFE38: epoll_ctl (syscall-template.S:78)
==1705645==    by 0x48B37EF: epoll_modify (ev_epoll.c:96)
==1705645==    by 0x48B4F57: fd_reify (ev.c:2166)
==1705645==    by 0x48B4F57: ev_run (ev.c:3677)
==1705645==    by 0x48B4F57: ev_run (ev.c:3623)
==1705645==    by 0x48824FF: flux_reactor_run (reactor.c:126)
==1705645==    by 0x1113BF: main (broker.c:449)
==1705645==  Address 0x1ffefff22c is on thread 1's stack
==1705645==  in frame #1, created by epoll_modify (ev_epoll.c:72)

Since this is apparently internal to libev, add a suppression.
garlick added a commit that referenced this pull request Aug 1, 2021
Problem: a new valgrind test failure was encountered on aarch64,
Ubuntu 20.04.2 LTS and also the official Jetson Ubuntu 18.04:

==1705645== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==1705645==    at 0x4BDFE38: epoll_ctl (syscall-template.S:78)
==1705645==    by 0x48B37EF: epoll_modify (ev_epoll.c:96)
==1705645==    by 0x48B4F57: fd_reify (ev.c:2166)
==1705645==    by 0x48B4F57: ev_run (ev.c:3677)
==1705645==    by 0x48B4F57: ev_run (ev.c:3623)
==1705645==    by 0x48824FF: flux_reactor_run (reactor.c:126)
==1705645==    by 0x1113BF: main (broker.c:449)
==1705645==  Address 0x1ffefff22c is on thread 1's stack
==1705645==  in frame #1, created by epoll_modify (ev_epoll.c:72)

Since this is apparently internal to libev, add a suppression.

Fixes flux-framework#3808
garlick added a commit that referenced this pull request Apr 26, 2022
Problem: we need a way to tell rc scripts to restore content
on startup, and dump content on shutdown, for offline KVS garbage
collection of a system instance or user checkpoint/restart.

Add some logic to rc1 and rc3:

rc1:  If the content.restore broker attribute is set to a file path,
then load the content backing store module with the 'truncate' option,
and restore content from the file before loading the KVS.

rc3:  If the content.dump broker attribute is set to a file path,
then dump content to the file after unloading the KVS.

Additionally, if content.restore=auto, then rc1 looks for a symlink
named RESTORE in the broker's current working directory or ${statedir}
if defined.  If the symlink exists, then restore content from the file
it points to and remove the symlink on success.

If content.dump=auto, then rc3 dumps content to an automatically generated
file name containing the date in the current working directory or
${statedir} if defined, and creates the RESTORE symlink pointing to it.

Use case #1 - system instance:

The systemd unit file sets content.restore=auto.  Normally, the system
instance just reuses the backing store as now.  But if content.dump=auto
is set while the instance is running, a dump is created at shutdown, and
the backing store is recreated from the dump when the instance starts again,
accomplishing offline garbage collection.

The flux-shuntdown(1) command may set content.dump based on an option or
a "backing store needs GC" heuristic.  Tying the dump logic to
flux-shutdown(1) is helpful because then the shutdown can take longer than
the systemd TimeoutStopSec (90s) without getting killed.

Use case #2 - user checkpoint/restart:

A user may choose to checkpoint an instance by running:
  flux setattr content.dump=restart.tgz
and restart with
  flux start -o,-Scontent.restore=restart.tgz
Presumably a flux-shutdown(1) option would just work here as well.
garlick added a commit that referenced this pull request Aug 9, 2022
Problem: t4465-job-list-use-after-free.sh fails on aarch64.

==1006596== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==1006596==    at 0x4BC9AC8: epoll_ctl (syscall-template.S:120)
==1006596==    by 0x48B5E9F: epoll_modify (ev_epoll.c:96)
==1006596==    by 0x48B7A4F: fd_reify (ev.c:2457)
==1006596==    by 0x48B7A4F: ev_run (ev.c:4075)
==1006596==    by 0x48B7A4F: ev_run (ev.c:4021)
==1006596==    by 0x48833CF: flux_reactor_run (reactor.c:128)
==1006596==    by 0x114833: main (broker.c:507)
==1006596==  Address 0x1ffeffee64 is on thread 1's stack
==1006596==  in frame #1, created by epoll_modify (ev_epoll.c:84)
==1006596==

We already have a matching suppression, but valgrind is being
called here without the suppressions loaded.

Add suppressions to test.
garlick added a commit that referenced this pull request Aug 9, 2022
Problem: t4465-job-list-use-after-free.sh fails on aarch64.

==1006596== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==1006596==    at 0x4BC9AC8: epoll_ctl (syscall-template.S:120)
==1006596==    by 0x48B5E9F: epoll_modify (ev_epoll.c:96)
==1006596==    by 0x48B7A4F: fd_reify (ev.c:2457)
==1006596==    by 0x48B7A4F: ev_run (ev.c:4075)
==1006596==    by 0x48B7A4F: ev_run (ev.c:4021)
==1006596==    by 0x48833CF: flux_reactor_run (reactor.c:128)
==1006596==    by 0x114833: main (broker.c:507)
==1006596==  Address 0x1ffeffee64 is on thread 1's stack
==1006596==  in frame #1, created by epoll_modify (ev_epoll.c:84)
==1006596==

We already have a matching suppression, but valgrind is being
called here without the suppressions loaded.

Add suppressions to test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants