Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipathd: Implement shutdownifnomultipath #78

Closed
wants to merge 3 commits into from

Conversation

sergiodj
Copy link

Currently, socket activation doesn't work properly with
multipath-tools (see #76 for a lengthy discussion about the issue).
Due to the way multipath-tools and udev interact, it is not possible
to guarantee that the daemon will be stopped even if there are no
multipath devices/maps present in the system.

In order to workaround that limitation, this commit proposes a new
daemon command called shutdownifnomultipath. It does what it says
on the tin: if no multipath maps have been found during the system
initialization, the daemon will shut itself down. On top of that, we
use a oneshot (actually "exec") systemd service that will be executed
only once, after everything's been set up, to send the command
mentioned above to the daemon.

Note that this approach is still not perfect: if new devices are added
to the system while it's running, multipathd will be started, even if
no multipath'ing is done. But that's better than the current state
where the daemon stays on unconditionally.

@sergiodj
Copy link
Author

@mwilck Hey, could you take a look at this, please? Is it too naïve, or does it meet what we've been discussing on #76? TIA.

@bmarzins
Copy link
Contributor

I would prefer to see this broken up into two commits. One to add the functionality to multipathd, and the other to add the new service.

Also, since you don't have DefaultDependencies=no in multipathd/multipathd-shutdown-no-multipath.service, it should already be starting after basic.target, so your current "After" dependencies are unnecessary. But like Martin mentioned before, you might want to wait even longer for this to run, perhaps "After=multi-user.target". This would give devices unnecessary in boot as much time as possible to appear and get multipathed before you check if any multipath devices exist.

@sergiodj
Copy link
Author

I would prefer to see this broken up into two commits. One to add the
functionality to multipathd, and the other to add the new service.

Sure thing. I'll work on splitting the changes soon.

Also, since you don't have DefaultDependencies=no in
multipathd/multipathd-shutdown-no-multipath.service, it should already
be starting after basic.target, so your current "After" dependencies
are unnecessary. But like Martin mentioned before, you might want to
wait even longer for this to run, perhaps
"After=multi-user.target". This would give devices unnecessary in boot
as much time as possible to appear and get multipathed before you
check if any multipath devices exist.

Interesting, indeed I thought about using After=multi-user.target in
order to make sure that this is the last command to run, but wasn't
sure if that'd have unintended consequences. I'll change the service
file to do that. I'll also remove the current After= since it'll
become unnecessary.

Thanks.

@mwilck
Copy link
Contributor

mwilck commented Dec 4, 2023

Some remarks:

IIRC, the first step for fixing this issue should be to disable multipathd socket activation; at least nobody has contradicted my statement saying so in #76.

Secondly, I dislike the "shutdownifnomultipath" verb. multipathd supports multi-word commands, so we should be able to implement something like shutdown if-idle or even shutdown if idle.

Wrt waiting after boot, we could even combine the auto-shutdown service with a systemd timer. Basically what we could do is implement a timer that waits for (say) 60s after activation, and runs "shutdown if idle" after that. Such a timer could even be combined with a setup where multipathd.socket was active, and might fix the issue of multipathd being restarted when new devices are added.

@sergiodj
Copy link
Author

sergiodj commented Dec 4, 2023

IIRC, the first step for fixing this issue should be to disable multipathd socket activation; at least nobody has contradicted my statement saying so in #76.

I don't have much time to reply right now, but IIRC leaving the socket activation enabled has the benefit of covering the case when the user adds a device to a live system. Maybe I'm misremembering, though.

Secondly, I dislike the "shutdownifnomultipath" verb. multipathd supports multi-word commands, so we should be able to implement something like shutdown if-idle or even shutdown if idle.

I'll take a look at implementing the multi-word command. shutdown if-idle sounds good.

Wrt waiting after boot, we could even combine the auto-shutdown service with a systemd timer. Basically what we could do is implement a timer that waits for (say) 60s after activation, and runs "shutdown if idle" after that. Such a timer could even be combined with a setup where multipathd.socket was active, and might fix the issue of multipathd being restarted when new devices are added.

A systemd timer works, too. I like the idea of having multipathd shutting itself off even after the addition of new devices to a live system.

@mwilck
Copy link
Contributor

mwilck commented Dec 5, 2023

I don't have much time to reply right now, but IIRC leaving the socket activation enabled has the benefit of covering the case when the user adds a device to a live system. Maybe I'm misremembering, though.

I still think this is a corner case, as I said before ("We are now discussing automation of 4. to avoid 5."). In general I think users you have multipath hardware should generally enable multipathd, and users who don't should not. Is that asking too much from users?

But of course, if you want a fully automated setup that works for everyone without investing thought in their setup, we may want to go down that route.

Secondly, I dislike the "shutdownifnomultipath" verb. multipathd supports multi-word commands, so we should be able to implement something like shutdown if-idle or even shutdown if idle.

I'll take a look at implementing the multi-word command. shutdown if-idle sounds good.

Instead of creating just VRB_SHUTDOWN_NO_MPATH you'd create KEY_IF_IDLE and Q1_IF_IDLE. In callbacks.c you would do

	set_handler_callback(VRB_SHUTDOWN | Q1_IF_IDLE, 
                HANDLER(cli_shutdown_if_no_multipath));

A systemd timer works, too. I like the idea of having multipathd shutting itself off even after the addition of new devices to a live system.

The timer would work for both boot and and device addition. By setting After=multi-user.target on the timer, we can make sure that it doesn't expire prematurely even if booting takes a long time.

We should also be aware of the find_multipaths smart case. Currently, check_path_valid() calls is_path_valid(..., check_multipathd=true) for uevents, which would trigger the socket, before calling print_cmd_valid(), which will check for timed-out devices in the "smart" case. I suppose we could check /run/multipath/find_multipaths/$DEVNODE early on to see if the device had been considered before and has timed out, before connecting to the socket. This requires changes to the is_path_valid() logic though. I don't expect you to do it, but it's a thing we should keep in mind.

All this said, if you have patches, please submit them to the dm-devel mailing list as described in README.md.

Sergio Durigan Junior added 3 commits December 8, 2023 14:51
Currently, socket activation doesn't work properly with
multipath-tools (see opensvc#76 for a lengthy discussion about the issue).
Due to the way multipath-tools and udev interact, it is not possible
to guarantee that the daemon will be stopped even if there are no
multipath devices/maps present in the system.

In order to workaround that limitation, this commit proposes a new
daemon command called "shutdown if-idle".  What it does is check if
any multipath maps have been found during system initialization.  If
none exist, then the daemon will shut itself down.

Closes: opensvc#76

Signed-off-by: Sergio Durigan Junior <[email protected]>
…down if-idle"

On top of the new "shutdown if-idle" command, we use a
oneshot (actually "exec") systemd service that will be executed 120
seconds after booting up the system or after multipathd.service is
active, to send the command mentioned above to the daemon.

Note that this approach is still not perfect: if new devices are added
to the system while it's running, multipathd will be started, even if
no multipath'ing is done.  But that's better than the current state
where the daemon stays on unconditionally.

Signed-off-by: Sergio Durigan Junior <[email protected]>
@sergiodj
Copy link
Author

sergiodj commented Jan 2, 2024

Sorry, this fell between the cracks...

I've now updated the branch following the suggestions from @mwilck. I haven't tested it fully yet, but would appreciate if you could take another look meanwhile and check if it follows what you had in mind.

As for the find_multipaths smart case, I confess I haven't thought about it yet. Please correct me if I'm wrong, but IIUC it's not really a blocker to this PR. If that's indeed the case, then I'd rather leave it for a subsequent iteration.

@zeha
Copy link
Contributor

zeha commented Jan 3, 2024

A command that gets sent some fixed time later sounds like a good place to introduce race conditions under load. Can this not be a config option that is always active on distros choosing such a setup?

@mwilck
Copy link
Contributor

mwilck commented Jan 4, 2024

@zeha:

A command that gets sent some fixed time later sounds like a good place to introduce race conditions under load. Can this not be a config option that is always active on distros choosing such a setup?

I am not sure I understand what you're asking for. What config option do you want, and how should its semantics be? I figure that both you and @sergiodj are working for the Debian project, so perhaps the two of you want to find an agreement on the desired behavior?

Wrt the race condition, this is a valid concern. Something like the following could happen:

  1. an idle timer is runnning for multipathd
  2. some uevent arrives that would cause a multipath map to be created
  3. "multipath -u" is called in udev rules and finds multipathd running
  4. timer expires and multipathd exits
  5. the uevent would be passed to multipathd after rule processing, but it has exited

We could avoid this scenario by doing the equivalent of an "udevadm settle" (i.e. wait for pending uevents to finish, and make sure multipathd has processed all uevents it is aware of, i.e. multipathd's internal uevent queue is empty) when multipathd receives an "shutdown if-idle" command.

I think if we did this, we'd be safe. I can't perceive other race conditions. If the timer expired before 3. (or 2.) above, "multipath -u" would be called while processing the udev rules, and start another multipathd instance.

Anyway, @sergiodj, I need to ask you to send this to [email protected] (with @bmarzins and myself on CC). I think this needs a broader audience.

Copy link
Contributor

@mwilck mwilck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, this needs to be discussed on the dm-devel mailing list.

@@ -224,6 +224,7 @@ load_keys (void)
r += add_key(keys, "setmarginal", VRB_SETMARGINAL, 0);
r += add_key(keys, "unsetmarginal", VRB_UNSETMARGINAL, 0);
r += add_key(keys, "all", KEY_ALL, 0);
r += add_key(keys, "if-idle", KEY_IF_IDLE, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: as noted before, I would prefer the more natural shutdown if idle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand that that was your preference; I thought you were OK with either. I'll adjust the code accordingly.

@@ -6,6 +6,8 @@ Wants=systemd-udevd-kernel.socket @MODPROBE_UNIT@
After=systemd-udevd-kernel.socket @MODPROBE_UNIT@
After=multipathd.socket systemd-remount-fs.service
Before=initrd-cleanup.service
Wants=multipathd-shutdown-if-idle.timer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do this. The timer should be independently enabled by either the admin or the distro.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. I don't remember why I put that there, but I'll remove it.

@@ -0,0 +1,10 @@
[Unit]
Description=Device-Mapper Multipath Conditional Shutdown Trigger
After=multi-user.target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be certain, we should als add "After=multipathd.service" here IMO

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

condlog(3, "shutdown if no multipath map found (operator)");

if (VECTOR_SIZE (vecs->mpvec) == 0)
exit_daemon();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted by @zeha, this is race-prone. We need to make sure that there are no uevents pending, neither in udev itself nor in multipathd's own queues, before actually exiting.

@sergiodj
Copy link
Author

sergiodj commented Jan 4, 2024

@zeha:

A command that gets sent some fixed time later sounds like a good place to introduce race conditions under load. Can this not be a config option that is always active on distros choosing such a setup?

I am not sure I understand what you're asking for. What config option do you want, and how should its semantics be? I figure that both you and @sergiodj are working for the Debian project, so perhaps the two of you want to find an agreement on the desired behavior?

@zeha and I talked in private yesterday and I believe we've reached a consensus on how to incorporate these changes into Debian.

Wrt the race condition, this is a valid concern. Something like the following could happen:

1. an idle timer is runnning for multipathd

2. some uevent arrives that would cause a multipath map to be created

3. "multipath -u" is called in udev rules and finds multipathd running

4. timer expires and multipathd exits

5. the uevent would be passed to multipathd after rule processing, but it has exited

We could avoid this scenario by doing the equivalent of an "udevadm settle" (i.e. wait for pending uevents to finish, and make sure multipathd has processed all uevents it is aware of, i.e. multipathd's internal uevent queue is empty) when multipathd receives an "shutdown if-idle" command.

I think if we did this, we'd be safe. I can't perceive other race conditions. If the timer expired before 3. (or 2.) above, "multipath -u" would be called while processing the udev rules, and start another multipathd instance.

OK, I'll see about implementing this change. I'll have to discover how to access multipathd's internal uevent queue and check if it's empty.

Anyway, @sergiodj, I need to ask you to send this to [email protected] (with @bmarzins and myself on CC). I think this needs a broader audience.

Will do. Thanks.

@sergiodj
Copy link
Author

sergiodj commented Jan 4, 2024

Wrt the race condition, this is a valid concern. Something like the following could happen:

1. an idle timer is runnning for multipathd

2. some uevent arrives that would cause a multipath map to be created

3. "multipath -u" is called in udev rules and finds multipathd running

4. timer expires and multipathd exits

5. the uevent would be passed to multipathd after rule processing, but it has exited

We could avoid this scenario by doing the equivalent of an "udevadm settle" (i.e. wait for pending uevents to finish, and make sure multipathd has processed all uevents it is aware of, i.e. multipathd's internal uevent queue is empty) when multipathd receives an "shutdown if-idle" command.
I think if we did this, we'd be safe. I can't perceive other race conditions. If the timer expired before 3. (or 2.) above, "multipath -u" would be called while processing the udev rules, and start another multipathd instance.

OK, I'll see about implementing this change. I'll have to discover how to access multipathd's internal uevent queue and check if it's empty.

BTW, I'd appreciate some pointers here. My experience dealing with libudev is very limited, and I'm no expert on multipath-tools either.

I know programs aren't supposed to use udev_queue, and I don't know offhand if there's a proper way to query udev regarding the list of pending events that need to be processed.

@mwilck
Copy link
Contributor

mwilck commented Jan 5, 2024

BTW, I'd appreciate some pointers here. My experience dealing with libudev is very limited, and I'm no expert on multipath-tools either.

Right. On the multipathd side, you will find the relevant code in libmultipath/uevent.c. multipathd processes uevents in multiple steps. The uevent listener thread (uevent_listen()) monitors finished uevents and adds them to the uevq list. The uevent dispatcher thread (uevent_dispatch()) then passes them to uevent_trigger() to be actually processed. libmultipath provides a function is_uevent_busy() to check whether any uevents are pending. It should be sufficient to call this function to obtain multipathd's internal state.

I know programs aren't supposed to use udev_queue, and I don't know offhand if there's a proper way to query udev regarding the list of pending events that need to be processed.

I need to research this myself. There seems to be no official API for this. udevadm settle checks if /run/udev/queue exists, but that's not a documented API. I guess our "best" option would be to call udevadm settle -t 0, even though I don't like that much. Even if that command returns success, we can't be 100% certain that multipathd's uevent listener has already seen the events and added them to uevq. AFAICS, the only way to ensure this would be to make sure the uevent listener runs at a higher priority then the thread that services the socket, and call sched_yield().

A naïve implementation might look roughly like this:

int _is_udevq_idle() {
    int st;

    if (is_uevent_busy())
        return false;
    st = system("/usr/bin udevadm settle -t 0")
    if (!WIFEXITED(st) || WEXITSTATUS(st))
         return false;
    return true;
}

int is_udevq_idle() {
     if (!_is_udevq_idle())
         return false;
     // give uevent listener has a chance to  run
     // listener should have higher prio than this thread to be certain
     sched_yield();
     // then check again
     return _is_udevq_idle();
}

This isn't completely safe because new uevents can arrive at any time, but I suppose it's safe enough for practical purposes. The question is what we should do if this returns false (just fail, wait for some time, or call udevadm settle with some positive timeout).

This begs the question, again, whether the "automatic exit" idea makes sense at all. It's a lot of complexity added for a minor improvement of user experience (see my previous breakdown).

@zeha
Copy link
Contributor

zeha commented Jan 6, 2024

What config option do you want, and how should its semantics be?

My idea would basically be:

New config option, "multipathd_no_devices_found". (Upstream) defaults to "stay_running", keeping today's behaviour.
Interested distros or users could set "multipathd_no_devices_found exit", where if multipathd doesn't see any multipath-eligible devices it should exit again. Maybe also after the last multipath device disappeared.

But this is an idle idea, without having looked at the multipathd startup code.

@sergiodj
Copy link
Author

Hi,

Sorry for the delay in replying. I've been doing this work as a side task, but $DAYJOB is now a bit more hectic than expected. Mainly for this reason, but also because I agree with the assessment that this is a lot of work/complexity for a relatively minor gain, I do not intend to continue working on this feature.

Thank you all for the invaluable help and interesting discussions that made me understand more about the world of multipath'ed devices.

@zeha, I will send a Merge Request against Debian's multipath-tools package in order to revert part or all of the changes that I'd suggested a while ago regarding the socket activation feature. This whole "side project" made me realize that the changes don't work as expected, and as such should not be advertised in the d/NEWS file.

@sergiodj sergiodj closed this Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants