-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broker: add statedir attribute, drop content.backing-path #4248
Conversation
Problem: systemd is managing the system instance state directory, but doesn't communicate its path to the broker. Set the statedir broker attribute in the systemd unit file.
Problem: the 'statedir' attribute, if set, is not checked for usability, and the attribute is read-write. If the statedir attribute is set, check that the specified directory exists, and that it has usable permissions. Set the immutable flag on the attribute. Add some tests to t0001-basic.t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I question the requirement that statedir
be strictly owned by the instance owner below, but this could easily be addressed later if it even becomes an issue, so that comment is more of a question. (It might be nice to update the doc though)
@@ -735,6 +738,11 @@ static int checkdir (const char *name, const char *path) | |||
log_err ("cannot stat %s %s", name, path); | |||
return -1; | |||
} | |||
if (sb.st_uid != getuid ()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should sb.st_uid == 0
also be allowed? (Actually, I don't know the reason for the directory ownership requirement, instead of just write access. Is it for security purposes (to ensure an existing content store could not have been modified by another user), or just to make it simpler to check for write access?)
If this requirement stays, then we might want to update the documentation of statedir
to include something like: "If set, statedir
must exist and be owned by the instance owner"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that the tests for u=rwx does not have the desired effect if "u" is not the instance owner. So we could just make the test more sophisticated, looking for write access to the directory by getuid()
? That seemed a bit involved without a use case in mind. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that the tests for u=rwx does not have the desired effect if "u" is not the instance owner.
Yes, that part was clear. My thought is that it wasn't clear you can't specify a statedir
to which you have write access but are not the owner. A scenario that is unlikely but was the first thing I tried (statedir=/tmp
). It also might be the case that sysadmins might prefer a directory ownership of root
for a statedir location for the system instance.
However, I agree there probably isn't a real use case at this time, and if we need to fix it the code already exists in flux-security
to check a directory for secure permissions exhaustively, so we can just bring that code in when necessary.
My main suggestion was just to add documentation of the requirement at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. OK, I'll make the documentation change.
Added the suggested documentation change:
Also some ci tests were failing due to a typo I introduced when fixing a && chain. Fixed. Sorry for the very public whack-a-mole! |
@@ -657,7 +647,7 @@ int mod_main (flux_t *h, int argc, char **argv) | |||
flux_log_error (h, "content_sqlite_create failed"); | |||
return -1; | |||
} | |||
if (content_sqlite_opendb(ctx) < 0) | |||
if (content_sqlite_opendb (ctx) < 0) | |||
goto done; | |||
if (content_register_backing_store (h, "content-sqlite") < 0) | |||
goto done; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but dunno if these whitespace / code style changes were supposed to go into this commit or not.
doc/man7/flux-broker-attributes.rst
Outdated
A directory in which persistent state such as the content backing store data | ||
is stored by the Flux broker, to facilitate restarts. If unset, this data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first sentence seems a little run on. Perhaps
A directory in which persistent state is stored by the Flux broker. For example, content backing store data will be stored here to facilitate restarts.
Problem: content persistence semantics are confusing Instead of having each content backing module implement persistence semantics based on their interpretation of the content.backing-path attribute (a directory in content-files, a file in content-sqlite), simply make the content persistent if statedir is defined. In the case of content-sqlite: if statedir is defined, the backing file is $statedir/content.sqlite, otherwise it is $rundir/content.sqlite, and we know that statedir is persistent and rundir is not. This means the basename of the content.sqlite file or the content-files directory is hardwired - just the directory location changes. If this is too inflexible, in the future we could make it configurable, but for now it seems fine. Update sharness tests.
Problem: statedir broker attribute has replaced content.backing-path, so documentation is out of date. Add a statedir entry to flux-broker-attributes(7), and update the rundir entry to describe the new persistence semantics. Drop content.backing-path entry. Add statedir to the dictionary.
Problem: the broker checks that rundir, statedir, and the directory containing the local uri are u+rwx, but it does not check that the owner of the directory is the instance owner. Check that st_uid == getuid (). Update the local-uri override test in t0001-basic.t to use the trash directory instead of /tmp to contain the test socket.
Problem: content.backing-path is obsolete, but is still set by the systemd unit file. Drop content.backing-path setting.
Problem: no space between function name and open parenthesis violates RFC 7. Add space.
Pushed those changes. I'm going to go ahead and set MWP. Hope that's OK. |
This adds a
statedir
broker attribute that is set by the systemd unit file to the path of a directory that is preserved across broker restarts. Thecontent.backing-path
broker attribute is no more. Its users now place files instatedir
, if set, with fallback torundir
.Instructions can be made more clear: set
statedir
if you want flux to be restartable, and leave the details to us. If you don't set it, any files go torundir
, which is always cleaned up.Also TODO if this is merged:
job-archive
to usestatedir
for its database location by default, and to define another way of activating it, since right now configuration of the dbpath is what causes it to run.