Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
panic, x86: Allow CPUs to save registers even if looping in NMI context
Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends an NMI IPI to CPUs which haven't called panic() to stop them, save their register information and do some cleanups for crash dumping. However, if such a CPU is infinitely looping in NMI context, we fail to save its register information into the crash dump. For example, this can happen when unknown NMIs are broadcast to all CPUs as follows: CPU 0 CPU 1 =========================== ========================== receive an unknown NMI unknown_nmi_error() panic() receive an unknown NMI spin_trylock(&panic_lock) unknown_nmi_error() crash_kexec() panic() spin_trylock(&panic_lock) panic_smp_self_stop() infinite loop kdump_nmi_shootdown_cpus() issue NMI IPI -----------> blocked until IRET infinite loop... Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET, so the NMI is not handled and the callback function to save registers is never called. In practice, this can happen on some servers which broadcast NMIs to all CPUs when the NMI button is pushed. To save registers in this case, we need to: a) Return from NMI handler instead of looping infinitely or b) Call the callback function directly from the infinite loop Inherently, a) is risky because NMI is also used to prevent corrupted data from being propagated to devices. So, we chose b). This patch does the following: 1. Move the infinite looping of CPUs which haven't called panic() in NMI context (actually done by panic_smp_self_stop()) outside of panic() to enable us to refer pt_regs. Please note that panic_smp_self_stop() is still used for normal context. 2. Call a callback of kdump_nmi_shootdown_cpus() directly to save registers and do some cleanups after setting waiting_for_crash_ipi which is used for counting down the number of CPUs which handled the callback Signed-off-by: Hidehiro Kawai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Aaron Tomlin <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Baoquan He <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Dave Young <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Don Zickus <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Gobinda Charan Maji <[email protected]> Cc: HATAYAMA Daisuke <[email protected]> Cc: Hidehiro Kawai <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Javi Merino <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Cc: [email protected] Cc: lkml <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Nicolas Iooss <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Stefan Lippers-Hollmann <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ulrich Obergfell <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs [ Cleanup comments, fixup formatting. ] Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
- Loading branch information