Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker: add statedir attribute, drop content.backing-path #4248

Merged
merged 7 commits into from
Mar 30, 2022

Conversation

garlick
Copy link
Member

@garlick garlick commented Mar 30, 2022

This adds a statedir broker attribute that is set by the systemd unit file to the path of a directory that is preserved across broker restarts. The content.backing-path broker attribute is no more. Its users now place files in statedir, if set, with fallback to rundir.

Instructions can be made more clear: set statedir if you want flux to be restartable, and leave the details to us. If you don't set it, any files go to rundir, which is always cleaned up.

Also TODO if this is merged:

  • Update this faq entry about redirecting sqlite to another file system
  • Follow up with a PR to job-archive to use statedir for its database location by default, and to define another way of activating it, since right now configuration of the dbpath is what causes it to run.

garlick added 2 commits March 30, 2022 08:57
Problem: systemd is managing the system instance state directory,
but doesn't communicate its path to the broker.

Set the statedir broker attribute in the systemd unit file.
Problem: the 'statedir' attribute, if set, is not checked
for usability, and the attribute is read-write.

If the statedir attribute is set, check that the specified directory
exists, and that it has usable permissions.  Set the immutable flag
on the attribute.

Add some tests to t0001-basic.t.
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I question the requirement that statedir be strictly owned by the instance owner below, but this could easily be addressed later if it even becomes an issue, so that comment is more of a question. (It might be nice to update the doc though)

@@ -735,6 +738,11 @@ static int checkdir (const char *name, const char *path)
log_err ("cannot stat %s %s", name, path);
return -1;
}
if (sb.st_uid != getuid ()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should sb.st_uid == 0 also be allowed? (Actually, I don't know the reason for the directory ownership requirement, instead of just write access. Is it for security purposes (to ensure an existing content store could not have been modified by another user), or just to make it simpler to check for write access?)

If this requirement stays, then we might want to update the documentation of statedir to include something like: "If set, statedir must exist and be owned by the instance owner"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that the tests for u=rwx does not have the desired effect if "u" is not the instance owner. So we could just make the test more sophisticated, looking for write access to the directory by getuid()? That seemed a bit involved without a use case in mind. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that the tests for u=rwx does not have the desired effect if "u" is not the instance owner.

Yes, that part was clear. My thought is that it wasn't clear you can't specify a statedir to which you have write access but are not the owner. A scenario that is unlikely but was the first thing I tried (statedir=/tmp). It also might be the case that sysadmins might prefer a directory ownership of root for a statedir location for the system instance.

However, I agree there probably isn't a real use case at this time, and if we need to fix it the code already exists in flux-security to check a directory for secure permissions exhaustively, so we can just bring that code in when necessary.

My main suggestion was just to add documentation of the requirement at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. OK, I'll make the documentation change.

@garlick
Copy link
Member Author

garlick commented Mar 30, 2022

Added the suggested documentation change:

      statedir [Updates: C]
              A  directory in which persistent state such as the content back‐
              ing store data is stored  by  the  Flux  broker,  to  facilitate
              restarts.   If  unset,  this  data  goes  to  rundir where it is
              cleaned up on instance shutdown.  If set,  this  directory  must
              exist and be owned by the instance owner.  Default: unset.

Also some ci tests were failing due to a typo I introduced when fixing a && chain. Fixed. Sorry for the very public whack-a-mole!

@@ -657,7 +647,7 @@ int mod_main (flux_t *h, int argc, char **argv)
flux_log_error (h, "content_sqlite_create failed");
return -1;
}
if (content_sqlite_opendb(ctx) < 0)
if (content_sqlite_opendb (ctx) < 0)
goto done;
if (content_register_backing_store (h, "content-sqlite") < 0)
goto done;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal, but dunno if these whitespace / code style changes were supposed to go into this commit or not.

Comment on lines 48 to 49
A directory in which persistent state such as the content backing store data
is stored by the Flux broker, to facilitate restarts. If unset, this data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first sentence seems a little run on. Perhaps

A directory in which persistent state is stored by the Flux broker. For example, content backing store data will be stored here to facilitate restarts.

garlick added 5 commits March 30, 2022 15:28
Problem: content persistence semantics are confusing

Instead of having each content backing module implement persistence
semantics based on their interpretation of the content.backing-path
attribute (a directory in content-files, a file in content-sqlite),
simply make the content persistent if statedir is defined.

In the case of content-sqlite:  if statedir is defined, the backing
file is $statedir/content.sqlite, otherwise it is $rundir/content.sqlite,
and we know that statedir is persistent and rundir is not.

This means the basename of the content.sqlite file or the content-files
directory is hardwired - just the directory location changes.
If this is too inflexible, in the future we could make it configurable,
but for now it seems fine.

Update sharness tests.
Problem: statedir broker attribute has replaced content.backing-path,
so documentation is out of date.

Add a statedir entry to flux-broker-attributes(7), and update the
rundir entry to describe the new persistence semantics.

Drop content.backing-path entry.

Add statedir to the dictionary.
Problem: the broker checks that rundir, statedir, and the directory
containing the local uri are u+rwx, but it does not check that the
owner of the directory is the instance owner.

Check that st_uid == getuid ().

Update the local-uri override test in t0001-basic.t to use the
trash directory instead of /tmp to contain the test socket.
Problem: content.backing-path is obsolete, but is still set
by the systemd unit file.

Drop content.backing-path setting.
Problem: no space between function name and open parenthesis
violates RFC 7.

Add space.
@garlick
Copy link
Member Author

garlick commented Mar 30, 2022

Pushed those changes. I'm going to go ahead and set MWP. Hope that's OK.

@mergify mergify bot merged commit d483acd into flux-framework:master Mar 30, 2022
@garlick garlick deleted the statedir branch March 30, 2022 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants