Skip to content

Commit

Permalink
Merge branch 'akpm/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
sfrothwell committed Jun 5, 2020
2 parents dca4419 + 3d88da3 commit 53f3765
Show file tree
Hide file tree
Showing 1,072 changed files with 4,597 additions and 4,992 deletions.
17 changes: 8 additions & 9 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1170,6 +1170,13 @@ PAGE_SIZE multiple when read back.
Under certain circumstances, the usage may go over the limit
temporarily.

In default configuration regular 0-order allocations always
succeed unless OOM killer chooses current task as a victim.

Some kinds of allocations don't invoke the OOM killer.
Caller could retry them differently, return into userspace
as -ENOMEM or silently ignore in cases like disk readahead.

This is the ultimate protection mechanism. As long as the
high limit is used and monitored properly, this limit's
utility is limited to providing the final safety net.
Expand Down Expand Up @@ -1226,17 +1233,9 @@ PAGE_SIZE multiple when read back.
The number of time the cgroup's memory usage was
reached the limit and allocation was about to fail.

Depending on context result could be invocation of OOM
killer and retrying allocation or failing allocation.

Failed allocation in its turn could be returned into
userspace as -ENOMEM or silently ignored in cases like
disk readahead. For now OOM in memory cgroup kills
tasks iff shortage has happened inside page fault.

This event is not raised if the OOM killer is not
considered as an option, e.g. for failed high-order
allocations.
allocations or if caller asked to not retry attempts.

oom_kill
The number of processes belonging to this cgroup
Expand Down
5 changes: 5 additions & 0 deletions Documentation/admin-guide/dynamic-debug-howto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ kernel code to obtain additional kernel information. Currently, if
``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
enabled per-callsite.

If you do not want to enable dynamic debug globally (i.e. in some embedded
system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic
debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any
modules which you'd like to dynamically debug later.

If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
shortcut for ``print_hex_dump(KERN_DEBUG)``.

Expand Down
8 changes: 8 additions & 0 deletions Documentation/admin-guide/kdump/kdump.rst
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,14 @@ will cause a kdump to occur at the panic() call. In cases where a user wants
to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
to achieve the same behaviour.

Trigger Kdump on add_taint()
============================

The kernel parameter panic_on_taint facilitates a conditional call to panic()
from within add_taint() whenever the value set in this bitmask matches with the
bit flag being set by add_taint().
This will cause a kdump to occur at the add_taint()->panic() call.

Contact
=======

Expand Down
34 changes: 28 additions & 6 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1445,7 +1445,7 @@
hardlockup_all_cpu_backtrace=
[KNL] Should the hard-lockup detector generate
backtraces on all cpus.
Format: <integer>
Format: 0 | 1

hashdist= [KNL,NUMA] Large hashes allocated during boot
are distributed across NUMA nodes. Defaults on
Expand Down Expand Up @@ -1513,9 +1513,9 @@

hung_task_panic=
[KNL] Should the hung task detector generate panics.
Format: <integer>
Format: 0 | 1

A nonzero value instructs the kernel to panic when a
A value of 1 instructs the kernel to panic when a
hung task is detected. The default value is controlled
by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
option. The value selected by this boot parameter can
Expand Down Expand Up @@ -3447,6 +3447,19 @@
bit 4: print ftrace buffer
bit 5: print all printk messages in buffer

panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
Format: <hex>[,nousertaint]
Hexadecimal bitmask representing the set of TAINT flags
that will cause the kernel to panic when add_taint() is
called with any of the flags in this set.
The optional switch "nousertaint" can be utilized to
prevent userspace forced crashes by writing to sysctl
/proc/sys/kernel/tainted any flagset matching with the
bitmask set on panic_on_taint.
See Documentation/admin-guide/tainted-kernels.rst for
extra details on the taint flags that users can pick
to compose the bitmask to assign to panic_on_taint.

panic_on_warn panic() instead of WARN(). Useful to cause kdump
on a WARN().

Expand Down Expand Up @@ -4654,9 +4667,9 @@

softlockup_panic=
[KNL] Should the soft-lockup detector generate panics.
Format: <integer>
Format: 0 | 1

A nonzero value instructs the soft-lockup detector
A value of 1 instructs the soft-lockup detector
to panic the machine when a soft-lockup occurs. It is
also controlled by the kernel.softlockup_panic sysctl
and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
Expand All @@ -4665,7 +4678,7 @@
softlockup_all_cpu_backtrace=
[KNL] Should the soft-lockup detector generate
backtraces on all cpus.
Format: <integer>
Format: 0 | 1

sonypi.*= [HW] Sony Programmable I/O Control Device driver
See Documentation/admin-guide/laptops/sonypi.rst
Expand Down Expand Up @@ -4958,6 +4971,15 @@

switches= [HW,M68k]

sysctl.*= [KNL]
Set a sysctl parameter, right before loading the init
process, as if the value was written to the respective
/proc/sys/... file. Both '.' and '/' are recognized as
separators. Unrecognized parameters and invalid values
are reported in the kernel log. Sysctls registered
later by a loaded module cannot be set this way.
Example: sysctl.vm.swappiness=40

sysfs.deprecated=0|1 [KNL]
Enable/disable old style sysfs layout for old udev
on older distributions. When this option is enabled
Expand Down
10 changes: 5 additions & 5 deletions Documentation/admin-guide/mm/numa_memory_policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -364,19 +364,19 @@ follows:

2) for querying the policy, we do not need to take an extra reference on the
target task's task policy nor vma policies because we always acquire the
task's mm's mmap_sem for read during the query. The set_mempolicy() and
mbind() APIs [see below] always acquire the mmap_sem for write when
task's mm's mmap_lock for read during the query. The set_mempolicy() and
mbind() APIs [see below] always acquire the mmap_lock for write when
installing or replacing task or vma policies. Thus, there is no possibility
of a task or thread freeing a policy while another task or thread is
querying it.

3) Page allocation usage of task or vma policy occurs in the fault path where
we hold them mmap_sem for read. Again, because replacing the task or vma
policy requires that the mmap_sem be held for write, the policy can't be
we hold them mmap_lock for read. Again, because replacing the task or vma
policy requires that the mmap_lock be held for write, the policy can't be
freed out from under us while we're using it for page allocation.

4) Shared policies require special consideration. One task can replace a
shared memory policy while another task, with a distinct mmap_sem, is
shared memory policy while another task, with a distinct mmap_lock, is
querying or allocating a page based on the policy. To resolve this
potential race, the shared policy infrastructure adds an extra reference
to the shared policy during lookup while holding a spin lock on the shared
Expand Down
2 changes: 1 addition & 1 deletion Documentation/admin-guide/mm/userfaultfd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ memory ranges) provides two primary functionalities:
The real advantage of userfaults if compared to regular virtual memory
management of mremap/mprotect is that the userfaults in all their
operations never involve heavyweight structures like vmas (in fact the
``userfaultfd`` runtime load never takes the mmap_sem for writing).
``userfaultfd`` runtime load never takes the mmap_lock for writing).

Vmas are not suitable for page- (or hugepage) granular fault tracking
when dealing with virtual address spaces that could span
Expand Down
37 changes: 37 additions & 0 deletions Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,20 @@ Path for the hotplug policy agent.
Default value is "``/sbin/hotplug``".


hung_task_all_cpu_backtrace:
================

If this option is set, the kernel will send an NMI to all CPUs to dump
their backtraces when a hung task is detected. This file shows up if
CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.

0: Won't show all CPUs backtraces when a hung task is detected.
This is the default behavior.

1: Will non-maskably interrupt all CPUs and dump their backtraces when
a hung task is detected.


hung_task_panic
===============

Expand Down Expand Up @@ -632,6 +646,22 @@ rate for each task.
scanned for a given scan.


oops_all_cpu_backtrace:
================

If this option is set, the kernel will send an NMI to all CPUs to dump
their backtraces when an oops event occurs. It should be used as a last
resort in case a panic cannot be triggered (to protect VMs running, for
example) or kdump can't be collected. This file shows up if CONFIG_SMP
is enabled.

0: Won't show all CPUs backtraces when an oops is detected.
This is the default behavior.

1: Will non-maskably interrupt all CPUs and dump their backtraces when
an oops event is detected.


osrelease, ostype & version
===========================

Expand Down Expand Up @@ -1239,6 +1269,13 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.

See :doc:`/admin-guide/tainted-kernels` for more information.

Note:
writes to this sysctl interface will fail with ``EINVAL`` if the kernel is
booted with the command line option ``panic_on_taint=<bitmask>,nousertaint``
and any of the ORed together values being written to ``tainted`` match with
the bitmask declared on panic_on_taint.
See :doc:`/admin-guide/kernel-parameters` for more details on that particular
kernel command line option and its optional ``nousertaint`` switch.

threads-max
===========
Expand Down
53 changes: 39 additions & 14 deletions Documentation/core-api/pin_user_pages.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,23 +148,48 @@ NOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's
because DAX pages do not have a separate page cache, and so "pinning" implies
locking down file system blocks, which is not (yet) supported in that way.

CASE 3: Hardware with page faulting support
-------------------------------------------
Here, a well-written driver doesn't normally need to pin pages at all. However,
if the driver does choose to do so, it can register MMU notifiers for the range,
and will be called back upon invalidation. Either way (avoiding page pinning, or
using MMU notifiers to unpin upon request), there is proper synchronization with
both filesystem and mm (page_mkclean(), munmap(), etc).

Therefore, neither flag needs to be set.

In this case, ideally, neither get_user_pages() nor pin_user_pages() should be
called. Instead, the software should be written so that it does not pin pages.
This allows mm and filesystems to operate more efficiently and reliably.
CASE 3: MMU notifier registration, with or without page faulting hardware
-------------------------------------------------------------------------
Device drivers can pin pages via get_user_pages*(), and register for mmu
notifier callbacks for the memory range. Then, upon receiving a notifier
"invalidate range" callback , stop the device from using the range, and unpin
the pages. There may be other possible schemes, such as for example explicitly
synchronizing against pending IO, that accomplish approximately the same thing.

Or, if the hardware supports replayable page faults, then the device driver can
avoid pinning entirely (this is ideal), as follows: register for mmu notifier
callbacks as above, but instead of stopping the device and unpinning in the
callback, simply remove the range from the device's page tables.

Either way, as long as the driver unpins the pages upon mmu notifier callback,
then there is proper synchronization with both filesystem and mm
(page_mkclean(), munmap(), etc). Therefore, neither flag needs to be set.

CASE 4: Pinning for struct page manipulation only
-------------------------------------------------
Here, normal GUP calls are sufficient, so neither flag needs to be set.
If only struct page data (as opposed to the actual memory contents that a page
is tracking) is affected, then normal GUP calls are sufficient, and neither flag
needs to be set.

CASE 5: Pinning in order to write to the data within the page
-------------------------------------------------------------
Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
access page's data, unpin" can cause a problem. Case 5 may be considered a
superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
other words, if the code is neither Case 1 nor Case 2, it may still require
FOLL_PIN, for patterns like this:

Correct (uses FOLL_PIN calls):
pin_user_pages()
access the data within the pages
set_page_dirty_lock()
unpin_user_pages()

INCORRECT (uses FOLL_GET calls):
get_user_pages()
access the data within the pages
set_page_dirty_lock()
put_page()

page_maybe_dma_pinned(): the whole point of pinning
===================================================
Expand Down
2 changes: 1 addition & 1 deletion Documentation/filesystems/locking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,7 @@ prototypes::
locking rules:

============= ======== ===========================
ops mmap_sem PageLocked(page)
ops mmap_lock PageLocked(page)
============= ======== ===========================
open: yes
close: yes
Expand Down
6 changes: 3 additions & 3 deletions Documentation/vm/hmm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -191,15 +191,15 @@ The usage pattern is::

again:
range.notifier_seq = mmu_interval_read_begin(&interval_sub);
down_read(&mm->mmap_sem);
mmap_read_lock(mm);
ret = hmm_range_fault(&range);
if (ret) {
up_read(&mm->mmap_sem);
mmap_read_unlock(mm);
if (ret == -EBUSY)
goto again;
return ret;
}
up_read(&mm->mmap_sem);
mmap_read_unlock(mm);

take_lock(driver->update);
if (mmu_interval_read_retry(&ni, range.notifier_seq) {
Expand Down
4 changes: 2 additions & 2 deletions Documentation/vm/transhuge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,9 @@ split_huge_page() or split_huge_pmd() has a cost.

To make pagetable walks huge pmd aware, all you need to do is to call
pmd_trans_huge() on the pmd returned by pmd_offset. You must hold the
mmap_sem in read (or write) mode to be sure a huge pmd cannot be
mmap_lock in read (or write) mode to be sure a huge pmd cannot be
created from under you by khugepaged (khugepaged collapse_huge_page
takes the mmap_sem in write mode in addition to the anon_vma lock). If
takes the mmap_lock in write mode in addition to the anon_vma lock). If
pmd_trans_huge returns false, you just fallback in the old code
paths. If instead pmd_trans_huge returns true, you have to take the
page table lock (pmd_lock()) and re-run pmd_trans_huge. Taking the
Expand Down
1 change: 0 additions & 1 deletion arch/alpha/boot/bootp.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@

#include <asm/console.h>
#include <asm/hwrpb.h>
#include <asm/pgtable.h>
#include <asm/io.h>

#include <stdarg.h>
Expand Down
1 change: 0 additions & 1 deletion arch/alpha/boot/bootpz.c
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@

#include <asm/console.h>
#include <asm/hwrpb.h>
#include <asm/pgtable.h>
#include <asm/io.h>

#include <stdarg.h>
Expand Down
1 change: 0 additions & 1 deletion arch/alpha/boot/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@

#include <asm/console.h>
#include <asm/hwrpb.h>
#include <asm/pgtable.h>

#include <stdarg.h>

Expand Down
Loading

0 comments on commit 53f3765

Please sign in to comment.