Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-module list works around lack of synchronization with a usleep #57

Closed
garlick opened this issue Oct 15, 2014 · 0 comments
Closed

Comments

@garlick
Copy link
Member

garlick commented Oct 15, 2014

modctl stores its state in the KVS, consisting of a reduced list of cmb.lsmod output, and an object for each module that modctl is managing.

$ flux kvs dir conf.modctl
conf.modctl.modules.xbarrier = {"args":{},"data":"0dNBhmGJO>000000000000ri+ ...
conf.modctl.seq = 2
conf.modctl.lsmod = {"seq":2,"mods":{"wrexec":{"name":"wrexec","size":64921 ...

The lsmod data in the KVS is only updated when a module is loaded/unloaded via modctl. However, it includes data like module idle time that needs to be grabbed in real time. One can request modctl to update this data using flux_modctl_update() but there is no way to know when the update has landed in the KVS hence this atrocity:

if (flux_modctl_update (h) < 0)
    err_exit ("flux_modctl_update");
/* FIXME: flux_modctl_update doesn't wait for KVS to be updated,
 * so there is a race here.  The following usleep should be removed
 * once this is addressed.
 */
usleep (1000*100);
...kvs_get (
garlick added a commit to garlick/flux-core that referenced this issue Dec 17, 2014
Redesigned modctl can operate on a hierarchy of dynamically loaded
modules (e.g. comms modules, modules that load modules, etc).
In addition the new modctl:

Fixes flux-framework#57 flux-module list works around lack of sync with usleep
Fixes flux-framework#56 flux-module cannot unload a module it did not load
Fixes flux-framework#94 Request flux module load rank option

modctl speaks RFC 5 to services that implement module extensions.
It implements an "mrpc" based protocol that executes module operations
in parallel on a user-specified nodeset.  The modctl "API" is no
longer exported via libflux-core and is kept private between modctl
and flux-module.

The use of mrpc greatly simplifies the design compared to the previous
modctl.  However, the new design has following caveats:
- does not yet implement scalable .so loading through the KVS
- there is no "smart" data reduction of RFC 5 responses,
  except what libmrpc achieves by KVS object squashing
- some scenarios exist where modctl or the flux-module may hang in
  a kvs_fence(), e.g. node failure or flux-module abort at just he
  wrong place

Some of these scalability and resiliency issues can be solved more
generally through changes to mrpc.  This is likey preferable to building
a more complex, standalone modctl system with these characteristics.
garlick added a commit to garlick/flux-core that referenced this issue Jan 6, 2019
Add a configure check for PyYAML.  Version 3.11 was released
in 2014 and not much has changed since then, so it seems like
a good minimum version to start with.

From https://pyyaml.org/wiki/PyYAML:

3.13 (2018-07-05)
* Rebuild wheels using latest Cython for Python 3.7 support.

3.12 (2016-08-28)
* Wheel packages for Windows binaries.
* Adding an implicit resolver to a derived loader should not affect
  the base loader (fixes issue flux-framework#57).
* Uniform representation for OrderedDict across different versions
  of Python (fixes issue flux-framework#61).
* Fixed comparison to None warning (closes issue flux-framework#64).

3.11 (2014-03-26)
* Source and binary distributions are rebuilt against the latest
  versions of Cython and LibYAML.
grondo added a commit to grondo/flux-core that referenced this issue Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant